13 Commits

Author SHA1 Message Date
Zhen Ye
02e2170601
enhance: cherry pick patch of new DDL framework and CDC 2 (#45241)
issue: #43897, #44123
pr: #45224
also pick pr: #45216,#45154,#45033,#45145,#45092,#45058,#45029

enhance: Close channel replicator more gracefully (#45029)

issue: https://github.com/milvus-io/milvus/issues/44123

enhance: Show create time for import job (#45058)

issue: https://github.com/milvus-io/milvus/issues/45056

fix: wal state may be unconsistent after recovering from crash (#45092)

issue: #45088, #45086

- Message on control channel should trigger the checkpoint update.
- LastConfrimedMessageID should be recovered from the minimum of
checkpoint or the LastConfirmedMessageID of uncommitted txn.
- Add more log info for wal debugging.

fix: make ack of broadcaster cannot canceled by client (#45145)

issue: #45141

- make ack of broadcaster cannot canceled by rpc.
- make clone for assignment snapshot of wal balancer.
- add server id for GetReplicateCheckpoint to avoid failure.

enhance: support collection and index with WAL-based DDL framework
(#45033)

issue: #43897

- Part of collection/index related DDL is implemented by WAL-based DDL
framework now.
- Support following message type in wal, CreateCollection,
DropCollection, CreatePartition, DropPartition, CreateIndex, AlterIndex,
DropIndex.
- Part of collection/index related DDL can be synced by new CDC now.
- Refactor some UT for collection/index DDL.
- Add Tombstone scheduler to manage the tombstone GC for collection or
partition meta.
- Move the vchannel allocation into streaming pchannel manager.

enhance: support load/release collection/partition with WAL-based DDL
framework (#45154)

issue: #43897

- Load/Release collection/partition is implemented by WAL-based DDL
framework now.
- Support AlterLoadConfig/DropLoadConfig in wal now.
- Load/Release operation can be synced by new CDC now.
- Refactor some UT for load/release DDL.

enhance: Don't start cdc by default (#45216)

issue: https://github.com/milvus-io/milvus/issues/44123


fix: unrecoverable when replicate from old (#45224)

issue: #44962

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: yihao.dai <yihao.dai@zilliz.com>
2025-11-04 01:35:33 +08:00
Zhen Ye
318db122b8
enhance: cherry pick patch of new DDL framework and CDC (#45025)
issue: #43897, #44123
pr: #44898
related pr: #44607 #44642 #44792 #44809 #44564 #44560 #44735 #44822
#44865 #44850 #44942 #44874 #44963 #44886 #44898

enhance: remove redundant channel manager from datacoord (#44532)

issue: #41611

- After enabling streaming arch, channel manager of data coord is a
redundant component.


fix: Fix CDC OOM due to high buffer size (#44607)

Fix CDC OOM by:
1. free msg buffer manually.
2. limit max msg buffer size.
3. reduce scanner msg hander buffer size.

issue: https://github.com/milvus-io/milvus/issues/44123

fix: remove wrong start timetick to avoid filtering DML whose timetick
is less than it. (#44691)

issue: #41611

- introduced by #44532

enhance: support remove cluster from replicate topology (#44642)

issue: #44558, #44123
- Update config(A->C) to A and C, config(B) to B on replicate topology
(A->B,A->C) can remove the B from replicate topology
- Fix some metric error of CDC

fix: check if qn is sqn with label and streamingnode list (#44792)

issue: #44014

- On standalone, the query node inside need to load segment and watch
channel, so the querynode is not a embeded querynode in streamingnode
without `LabelStreamingNodeEmbeddedQueryNode`. The channel dist manager
can not confirm a standalone node is a embededStreamingNode.

Bug is introduced by #44099

enhance: Make GetReplicateInfo API work at the pchannel level (#44809)

issue: https://github.com/milvus-io/milvus/issues/44123

enhance: Speed up CDC scheduling (#44564)

Make CDC watch etcd replicate pchannel meta instead of listing them
periodically.

issue: https://github.com/milvus-io/milvus/issues/44123


enhance: refactor update replicate config operation using
wal-broadcast-based DDL/DCL framework (#44560)

issue: #43897

- UpdateReplicateConfig operation will broadcast AlterReplicateConfig
message into all pchannels with cluster-exclusive-lock.
- Begin txn message will use commit message timetick now (to avoid
timetick rollback when CDC with txn message).
- If current cluster is secondary, the UpdateReplicateConfig will wait
until the replicate configuration is consistent with the config
replicated from primary.


enhance: support rbac with WAL-based DDL framework (#44735)

issue: #43897

- RBAC(Roles/Users/Privileges/Privilege Groups) is implemented by
WAL-based DDL framework now.
- Support following message type in wal `AlterUser`, `DropUser`,
`AlterRole`, `DropRole`, `AlterUserRole`, `DropUserRole`,
`AlterPrivilege`, `DropPrivilege`, `AlterPrivilegeGroup`,
`DropPrivilegeGroup`, `RestoreRBAC`.
- RBAC can be synced by new CDC now.
- Refactor some UT for RBAC.


enhance: support database with WAL-based DDL framework (#44822)

issue: #43897

- Database related DDL is implemented by WAL-based DDL framework now.
- Support following message type in wal CreateDatabase, AlterDatabase,
DropDatabase.
- Database DDL can be synced by new CDC now.
- Refactor some UT for Database DDL.

enhance: support alias with WAL-based DDL framework (#44865)

issue: #43897

- Alias related DDL is implemented by WAL-based DDL framework now.
- Support following message type in wal AlterAlias, DropAlias.
- Alias DDL can be synced by new CDC now.
- Refactor some UT for Alias DDL.

enhance: Disable import for replicating cluster (#44850)

1. Import in replicating cluster is not supported yet, so disable it for
now.
2. Remove GetReplicateConfiguration wal API

issue: https://github.com/milvus-io/milvus/issues/44123


fix: use short debug string to avoid newline in debug logs (#44925)

issue: #44924

fix: rerank before requery if reranker didn't use field data (#44942)

issue: #44918


enhance: support resource group with WAL-based DDL framework (#44874)

issue: #43897

- Resource group related DDL is implemented by WAL-based DDL framework
now.
- Support following message type in wal AlterResourceGroup,
DropResourceGroup.
- Resource group DDL can be synced by new CDC now.
- Refactor some UT for resource group DDL.


fix: Fix Fix replication txn data loss during chaos (#44963)

Only confirm CommitMsg for txn messages to prevent data loss.

issue: https://github.com/milvus-io/milvus/issues/44962,
https://github.com/milvus-io/milvus/issues/44123

fix: wrong execution order of DDL/DCL on secondary (#44886)

issue: #44697, #44696

- The DDL executing order of secondary keep same with order of control
channel timetick now.
- filtering the control channel operation on shard manager of
streamingnode to avoid wrong vchannel of create segment.
- fix that the immutable txn message lost replicate header.


fix: Fix primary-secondary replication switch blocking (#44898)

1. Fix primary-secondary replication switchover blocking by delete
replicate pchannel meta using modRevision.
2. Stop channel replicator(scanner) when cluster role changes to prevent
continued message consumption and replication.
3. Close Milvus client to prevent goroutine leak.
4. Create Milvus client once for a channel replicator.
5. Simplify CDC controller and resources.

issue: https://github.com/milvus-io/milvus/issues/44123

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: yihao.dai <yihao.dai@zilliz.com>
2025-11-03 15:39:33 +08:00
yihao.dai
51f69f32d0
feat: Add CDC support (#44124)
This PR implements a new CDC service for Milvus 2.6, providing log-based
cross-cluster replication.

issue: https://github.com/milvus-io/milvus/issues/44123

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: chyezh <chyezh@outlook.com>
2025-09-16 16:32:01 +08:00
Zhen Ye
f47ab31f23
enhance: remove redundant resource key watch operation, just keep consistency of wal (#40235)
issue: #38399
related PR: #39522

- Just implement exclusive broadcaster between broadcast message with
same resource key to keep same order in different wal.
- After simplify the broadcast model, original watch-based broadcast is
too complicated and redundant, remove it.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-03-03 14:40:05 +08:00
congqixia
cb7f2fa6fd
enhance: Use v2 package name for pkg module (#39990)
Related to #39095

https://go.dev/doc/modules/version-numbers

Update pkg version according to golang dep version convention

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-02-22 23:15:58 +08:00
Zhen Ye
a9e0e0a852
enhance: broadcast with event-based notification (#39522)
issue: #38399

- broadcast message can carry multi resource key now.
- implement event-based notification for broadcast messages
- broadcast message use broadcast id as a unique identifier in message
- broadcasted message on vchannels keep the broadcasted vchannel now.
- broadcasted message and broadcast message have a common broadcast
header now.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-02-07 11:14:43 +08:00
Zhen Ye
5669016af0
enhance: erase the rpc level when wal is located at same node (#38858)
issue: #38399

- Make the wal scanner interface same with streaming scanner.
- Use wal if the wal is located at current node.
- Otherwise fallback the old logic.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-02-05 22:25:10 +08:00
Zhen Ye
fd84ed817c
enhance: add broadcast operation for msgstream (#39040)
issue: #38399

- make broadcast service available for msgstream by reusing the
architecture streaming service

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-14 15:14:59 +08:00
Zhen Ye
3bcdd92915
enhance: add broadcast for streaming service (#39020)
issue: #38399 

- Add new rpc for transfer broadcast to streaming coord
- Add broadcast service at streaming coord to make broadcast message
sent automicly

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-09 16:24:55 +08:00
Zhen Ye
99dff06391
enhance: using streaming service in insert/upsert/flush/delete/querynode (#35406)
issue: #33285

- using streaming service in insert/upsert/flush/delete/querynode
- fixup flusher bugs and refactor the flush operation
- enable streaming service for dml and ddl
- pass the e2e when enabling streaming service
- pass the integration tst when enabling streaming service

---------

Signed-off-by: chyezh <chyezh@outlook.com>
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-08-29 10:03:08 +08:00
Zhen Ye
4d69898cb2
enhance: support single pchannel level transaction (#35289)
issue: #33285

- support transaction on single wal.
- last confirmed message id can still be used when enable transaction.
- add fence operation for segment allocation interceptor.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-08-19 21:22:56 +08:00
chyezh
16b0aee97f
enhance: timetick interceptor optimization (#35287)
issue: #33285

- remove redundant goroutine by using insepctor.
- remove the coutinous non-message timetick persistence
- periodically push the time tick forward without persistent timetick
message.
- add 'message type filter' deliver filter.

Signed-off-by: chyezh <chyezh@outlook.com>
2024-08-12 18:58:25 +08:00
chyezh
14051fed7d
enhance: streaming service client (#34656)
issue: #33285

- implement streaming service client.
- implement producing and consuming service client by streaming coord
client and streaming node client.

Signed-off-by: chyezh <chyezh@outlook.com>
2024-08-05 21:38:15 +08:00