46 Commits

Author SHA1 Message Date
Zhen Ye
369c6eb206
enhance: support remove cluster from replicate topology (#44642)
issue: #44558, #44123
- Update config(A->C) to A and C, config(B) to B on replicate topology
(A->B,A->C) can remove the B from replicate topology
- Fix some metric error of CDC

Signed-off-by: chyezh <chyezh@outlook.com>
2025-10-13 11:07:58 +08:00
Zhen Ye
19e5e9f910
enhance: broadcaster will lock resource until message acked (#44508)
issue: #43897

- Return LastConfirmedMessageID when wal append operation.
- Add resource-key-based locker for broadcast-ack operation to protect
the coord state when executing ddl.
- Resource-key-based locker is held until the broadcast operation is
acked.
- ResourceKey support shared and exclusive lock.
- Add FastAck execute ack right away after the broadcast done to speed
up ddl.
- Ack callback will support broadcast message result now.
- Add tombstone for broadcaster to avoid to repeatedly commit DDL and
ABA issue.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-09-24 20:58:05 +08:00
Zhen Ye
c171280f63
enhance: support replicate message in wal. (#44456)
issue: #44123

- support replicate message  in wal of milvus.
- support CDC-replicate recovery from wal.
- fix some CDC replicator bugs

Signed-off-by: chyezh <chyezh@outlook.com>
2025-09-22 17:06:11 +08:00
zhenshan.cao
691a8df953
feat: Add RESTful api for rolling upgrade support (#44381)
issue: https://github.com/milvus-io/milvus/issues/43968

Co-authored-by: chyezh <ye.zhen@zilliz.com>
2025-09-16 20:08:00 +08:00
yihao.dai
51f69f32d0
feat: Add CDC support (#44124)
This PR implements a new CDC service for Milvus 2.6, providing log-based
cross-cluster replication.

issue: https://github.com/milvus-io/milvus/issues/44123

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: chyezh <chyezh@outlook.com>
2025-09-16 16:32:01 +08:00
Zhen Ye
cbe4c3d231
enhance: get cchannel before build message (#44229)
issue: #43897

- support never expire txn message.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-09-10 11:09:57 +08:00
Zhen Ye
9e2d1963d4
enhance: support cchannel for streaming service (#44143)
issue: #43897

- add cchannel as a special vchannel to hold some ddl and dcl.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-09-02 10:05:52 +08:00
Zhen Ye
3327df72e4
enhance: make immutable message as the param of ack operation for cdc (#43900)
issue: #43897

- The original broadcast ack operation need to recover message from
etcd, which can not support cdc.
- immutable message will set as the ack parameter to fix it.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-09-01 10:21:52 +08:00
Zhen Ye
d0e3a33c37
enhance: add IsRebalanceSuspended interface for wal balancer (#44026)
issue: #43968

Signed-off-by: chyezh <chyezh@outlook.com>
2025-08-24 09:19:47 +08:00
Zhen Ye
082ca62ec1
enhance: support balancer interface for streaming client to fetch streaming node information (#43969)
issue: #43968

- Add ListStreamingNode/GetWALDistribution to  fetch streaming node info
- Add SuspendRebalance/ResumeRebalance to enable or stop balance
- Add FreezeNodeIDs/DefreezeNodeIDs to freeze target node

Signed-off-by: chyezh <chyezh@outlook.com>
2025-08-21 15:55:47 +08:00
Zhen Ye
5551d99425
enhance: remove old arch non-streaming arch code (#43651)
issue: #41609

- remove all dml dead code at proxy
- remove dead code at l0_write_buffer
- remove msgstream dependency at proxy
- remove timetick reporter from proxy
- remove replicate stream implementation

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-08-06 14:41:40 +08:00
Zhen Ye
15a6631147
enhance: add quota limit based on sn consuming lag (#43105)
issue: #42995

- The consuming lag at streaming node will be reported to coordinator.
- The consuming lag will trigger the write limit and deny by quota
center.
- Set the ttProtection by default.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-11 14:10:49 +08:00
Zhen Ye
f598ca2b4e
fix: block at msgpack adaptor and wrong metrics (#43235)
issue: #43018

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-11 10:14:49 +08:00
Zhen Ye
66cc194ab2
enhance: add partition gc at streaming arch (#42179)
issue: #41976

- make drop partition message as a broadcast message.
- add gc when drop partition message is acked.
- add a call back to handle the broadcast message when ack.
- the ack operation of broadcast message will retry until success.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-29 23:20:30 +08:00
yihao.dai
65dd3982d8
fix: Fix ants.Pool goroutine leak (#41892)
1. Release the pool after it is no longer in use.
2. Upgrade ants.Pool to fix the goroutine leak issue (see [PR
#287](https://github.com/panjf2000/ants/pull/287)).

issue: https://github.com/milvus-io/milvus/issues/41838

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-05-19 17:56:22 +08:00
Zhen Ye
0a465bb5b7
enhance: use recovery+shardmanager, remove segment assignment interceptor (#41824)
issue: #41544

- add lock interceptor into wal.
- use recovery and shardmanager to replace the original implementation
of segment assignment.
- remove redundant implementation and unittest.
- remove redundant proto definition.
- use 2 streamingnode in e2e.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-14 23:00:23 +08:00
Zhen Ye
21d6d1669e
fix: wal should be reopen if wal append receive the fence error (#41807)
issue: #41544

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-14 01:02:56 +08:00
Zhen Ye
de8f0af20d
enhance: use dispatcher at delegator when enable streaming (#41266)
issue: #38399

- add an adaptor type to adapt the streaming service client and
msgstream client to reuse the msgdispatcher.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-06 01:12:53 +08:00
Zhen Ye
a3d621cb5e
fix: remove the concurrent limits for streaming service (#41484)
issue: #41479

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-24 20:36:38 +08:00
Zhen Ye
78fca7e88d
fix: transaction should retry if transaction is expired (#41379)
issue: #41248

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-20 22:38:36 +08:00
Zhen Ye
224728c2d2
fix: catchup cannot work if using StartAfter (#41201)
issue: #41062

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-10 19:04:27 +08:00
Zhen Ye
f6fb4bc442
fix: backoff will retry infinitely after reaching max elapse (#40589)
issue: #40588

Signed-off-by: chyezh <chyezh@outlook.com>
2025-03-13 16:24:06 +08:00
Zhen Ye
f47ab31f23
enhance: remove redundant resource key watch operation, just keep consistency of wal (#40235)
issue: #38399
related PR: #39522

- Just implement exclusive broadcaster between broadcast message with
same resource key to keep same order in different wal.
- After simplify the broadcast model, original watch-based broadcast is
too complicated and redundant, remove it.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-03-03 14:40:05 +08:00
Zhen Ye
84df80b5e4
enhance: refactor metrics of streaming (#40031)
issue: #38399

- add metrics for broadcaster component.
- add metrics for wal flusher component.
- add metrics for wal interceptors.
- add slow log for wal.
- add more label for some wal metrics. (local or remote/catcup or
tailing...)

Signed-off-by: chyezh <chyezh@outlook.com>
2025-02-25 12:25:56 +08:00
congqixia
cb7f2fa6fd
enhance: Use v2 package name for pkg module (#39990)
Related to #39095

https://go.dev/doc/modules/version-numbers

Update pkg version according to golang dep version convention

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-02-22 23:15:58 +08:00
Zhen Ye
21724ab52c
enhance: generate guaranteets at delegator if local wal (#39799)
issue: #38399, #39892

- use mvcc timestamp of wal as guaranteets if wal and delegator is
located at same node.
- fix: ignore growing option is lost at hibridsearch

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-02-17 15:22:15 +08:00
SimFG
047254665d
feat: support to replicate import msg (#39171)
- issue: #39849

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: chyezh <chyezh@outlook.com>
2025-02-16 00:08:13 +08:00
Zhen Ye
a9e0e0a852
enhance: broadcast with event-based notification (#39522)
issue: #38399

- broadcast message can carry multi resource key now.
- implement event-based notification for broadcast messages
- broadcast message use broadcast id as a unique identifier in message
- broadcasted message on vchannels keep the broadcasted vchannel now.
- broadcasted message and broadcast message have a common broadcast
header now.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-02-07 11:14:43 +08:00
Zhen Ye
5669016af0
enhance: erase the rpc level when wal is located at same node (#38858)
issue: #38399

- Make the wal scanner interface same with streaming scanner.
- Use wal if the wal is located at current node.
- Otherwise fallback the old logic.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-02-05 22:25:10 +08:00
Zhen Ye
92bde5b4f6
fix: panic when streaming release if using msgstream (#39374)
issue: #39367

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-17 11:47:02 +08:00
Zhen Ye
fd84ed817c
enhance: add broadcast operation for msgstream (#39040)
issue: #38399

- make broadcast service available for msgstream by reusing the
architecture streaming service

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-14 15:14:59 +08:00
Zhen Ye
3bcdd92915
enhance: add broadcast for streaming service (#39020)
issue: #38399 

- Add new rpc for transfer broadcast to streaming coord
- Add broadcast service at streaming coord to make broadcast message
sent automicly

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-09 16:24:55 +08:00
Zhen Ye
69a9fd6ead
enhance: enable rmq for streaming (#38669)
issue: #38399

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-12-24 20:24:48 +08:00
Zhen Ye
afac153c26
enhance: move the lifetime implementation out of server level lifetime (#38442)
issue: #38399

- move the lifetime implementation of common code out of the server
level lifetime implementation

Signed-off-by: chyezh <chyezh@outlook.com>
2024-12-17 11:42:44 +08:00
Zhen Ye
1b6edd0b4b
enhance: refactor the consumer grpc proto for reusing grpc stream for multi-consumer (#37564)
issue: #33285

- Modify the proto of consumer of streaming service.
- Make VChannel as a required option for streaming

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-11-11 17:24:29 +08:00
Zhen Ye
f0f5147aef
fix: streaming consumer may get stucked when handler is un-consumed (#36818)
issue: #36378

Signed-off-by: chyezh <chyezh@outlook.com>
2024-10-14 15:23:23 +08:00
Zhen Ye
2ec6e602d6
enhance: add streaming client metrics (#36523)
issue: #33285

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-10-08 21:25:19 +08:00
Zhen Ye
a6545b2e29
fix: refactor milvus config and change default txn timeout (#36522)
issue: #36498

Signed-off-by: chyezh <chyezh@outlook.com>
2024-09-29 11:01:15 +08:00
CharlesFeng
8787e65b1f
fix: lifeTime not released in time (#36093)
https://github.com/milvus-io/milvus/issues/36092

Signed-off-by: fengjun2016 <jornfeng@gmail.com>
2024-09-09 11:25:05 +08:00
Zhen Ye
99dff06391
enhance: using streaming service in insert/upsert/flush/delete/querynode (#35406)
issue: #33285

- using streaming service in insert/upsert/flush/delete/querynode
- fixup flusher bugs and refactor the flush operation
- enable streaming service for dml and ddl
- pass the e2e when enabling streaming service
- pass the integration tst when enabling streaming service

---------

Signed-off-by: chyezh <chyezh@outlook.com>
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-08-29 10:03:08 +08:00
Zhen Ye
4d69898cb2
enhance: support single pchannel level transaction (#35289)
issue: #33285

- support transaction on single wal.
- last confirmed message id can still be used when enable transaction.
- add fence operation for segment allocation interceptor.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-08-19 21:22:56 +08:00
yihao.dai
efadf22802
enhance: Append create partition msg to wal (#35398)
issue: https://github.com/milvus-io/milvus/issues/33285

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-08-13 14:28:20 +08:00
chyezh
16b0aee97f
enhance: timetick interceptor optimization (#35287)
issue: #33285

- remove redundant goroutine by using insepctor.
- remove the coutinous non-message timetick persistence
- periodically push the time tick forward without persistent timetick
message.
- add 'message type filter' deliver filter.

Signed-off-by: chyezh <chyezh@outlook.com>
2024-08-12 18:58:25 +08:00
yihao.dai
72a175478f
enhance: Append drop partition msg to wal (#35326)
issue: https://github.com/milvus-io/milvus/issues/33285

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-08-07 17:28:16 +08:00
chyezh
c725416288
enhance: move streaming proto into pkg (#35284)
issue: #33285

- move streaming related proto into pkg.
- add v2 message type and change flush message into v2 message.

Signed-off-by: chyezh <chyezh@outlook.com>
2024-08-07 10:34:16 +08:00
chyezh
14051fed7d
enhance: streaming service client (#34656)
issue: #33285

- implement streaming service client.
- implement producing and consuming service client by streaming coord
client and streaming node client.

Signed-off-by: chyezh <chyezh@outlook.com>
2024-08-05 21:38:15 +08:00