64 Commits

Author SHA1 Message Date
Zhen Ye
7c575a18b0
enhance: support AckSyncUp for broadcaster, and enable it in truncate api (#46313)
issue: #43897
also for issue: #46166

add ack_sync_up flag into broadcast message header, which indicates that
whether the broadcast operation is need to be synced up between the
streaming node and the coordinator.
If the ack_sync_up is false, the broadcast operation will be acked once
the recovery storage see the message at current vchannel, the fast ack
operation can be applied to speed up the broadcast operation.
If the ack_sync_up is true, the broadcast operation will be acked after
the checkpoint of current vchannel reach current message.
The fast ack operation can not be applied to speed up the broadcast
operation, because the ack operation need to be synced up with streaming
node.
e.g. if truncate collection operation want to call ack once callback
after the all segment are flushed at current vchannel, it should set the
ack_sync_up to be true.

TODO: current implementation doesn't promise the ack sync up semantic,
it only promise FastAck operation will not be applied, wait for 3.0 to
implement the ack sync up semantic. only for truncate api now.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-12-17 16:55:17 +08:00
sijie-ni-0214
f51de1a8ab
feat: support TruncateCollection api to clear collection data (#46167)
issue: https://github.com/milvus-io/milvus/issues/46166

---------

Signed-off-by: sijie-ni-0214 <sijie.ni@zilliz.com>
2025-12-12 10:31:14 +08:00
yihao.dai
f32f2694bc
enhance: Implement new FlushAllMessage and refactor flush all (#45920)
This PR:
1. Define and implement the new FlushAllMessage.
2. Refactor FlushAll to flush the entire cluster.

issue: https://github.com/milvus-io/milvus/issues/45919

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-12-10 19:27:13 +08:00
Zhen Ye
576084fe86
enhance: support alter collection/database with WAL-based DDL framework (#45266)
issue: #43897

- Alter collection/database is implemented by WAL-based DDL framework
now.
- Support AlterCollection/AlterDatabase in wal now.
- Alter operation can be synced by new CDC now.
- Refactor some UT for alter DDL.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-11-04 09:59:33 +08:00
Zhen Ye
309d564796
enhance: support collection and index with WAL-based DDL framework (#45033)
issue: #43897

- Part of collection/index related DDL is implemented by WAL-based DDL
framework now.
- Support following message type in wal, CreateCollection,
DropCollection, CreatePartition, DropPartition, CreateIndex, AlterIndex,
DropIndex.
- Part of collection/index related DDL can be synced by new CDC now.
- Refactor some UT for collection/index DDL.
- Add Tombstone scheduler to manage the tombstone GC for collection or
partition meta.
- Move the vchannel allocation into streaming pchannel manager.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-10-30 14:24:08 +08:00
Zhen Ye
c171280f63
enhance: support replicate message in wal. (#44456)
issue: #44123

- support replicate message  in wal of milvus.
- support CDC-replicate recovery from wal.
- fix some CDC replicator bugs

Signed-off-by: chyezh <chyezh@outlook.com>
2025-09-22 17:06:11 +08:00
yihao.dai
51f69f32d0
feat: Add CDC support (#44124)
This PR implements a new CDC service for Milvus 2.6, providing log-based
cross-cluster replication.

issue: https://github.com/milvus-io/milvus/issues/44123

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: chyezh <chyezh@outlook.com>
2025-09-16 16:32:01 +08:00
Zhen Ye
3b01388587
fix: use recovery snapshot checkpoint if no vchannel is on-recovering (#44246)
issue: #44194

Signed-off-by: chyezh <chyezh@outlook.com>
2025-09-10 14:15:56 +08:00
Zhen Ye
cbe4c3d231
enhance: get cchannel before build message (#44229)
issue: #43897

- support never expire txn message.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-09-10 11:09:57 +08:00
Zhen Ye
a86b6f2a54
enhance: extend the stats manage at streaming shard manager for L0 (#43371)
issue: #42416

- Rename the InsertMetric into ModifiedMetric.
- Add L0 control configuration.
- Add some L0 current state collect.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-08-18 20:41:46 +08:00
Zhen Ye
7b005c48bf
enhance: support util template generation for messages (#43881)
issue: #43880

Signed-off-by: chyezh <chyezh@outlook.com>
2025-08-18 01:19:44 +08:00
Zhen Ye
e9ab73e93d
enhance: add schema version at recovery storage (#43500)
issue: #43072, #43289

- manage the schema version at recovery storage.
- update the schema when creating collection or alter schema.
- get schema at write buffer based on version.
- recover the schema when upgrading from 2.5.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-23 21:38:54 +08:00
Zhen Ye
15a6631147
enhance: add quota limit based on sn consuming lag (#43105)
issue: #42995

- The consuming lag at streaming node will be reported to coordinator.
- The consuming lag will trigger the write limit and deny by quota
center.
- Set the ttProtection by default.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-11 14:10:49 +08:00
yihao.dai
e6da4a64b5
fix: Pre-check import message to prevent pipeline block indefinitely (#42415)
Pre-check import message to prevent pipeline block indefinitely.

issue: https://github.com/milvus-io/milvus/issues/42414

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Co-authored-by: chyezh <chyezh@outlook.com>
2025-06-11 13:40:38 +08:00
Zhen Ye
fc010e44a8
fix: release memory after pop from heap (#42482)
issue: #42481

Signed-off-by: chyezh <chyezh@outlook.com>
2025-06-04 10:00:32 +08:00
Zhen Ye
e479467582
fix: panic when upgrading from old arch (#42422)
issue: #42405

- add delete rows into header when upsert.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-31 22:56:29 +08:00
Zhen Ye
66cc194ab2
enhance: add partition gc at streaming arch (#42179)
issue: #41976

- make drop partition message as a broadcast message.
- add gc when drop partition message is acked.
- add a call back to handle the broadcast message when ack.
- the ack operation of broadcast message will retry until success.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-29 23:20:30 +08:00
Zhen Ye
b94cee2413
fix: growing segment from old arch is not flushed after upgrading (#42164)
issue: #42162

- enhance: add read ahead buffer size issue #42129
- fix: rocksmq consumer's close operation may get stucked
- fix: growing segment from old arch is not flushed after upgrading

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-29 23:00:28 +08:00
Zhen Ye
c9b0748ff9
enhance: add delete rows into delete msg header and more metric (#41952)
issue: #41544

- add delete rows into delete messsage header
- add more insert/delete metrics
- fix non-broadcast message has broadcast header

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-22 20:28:26 +08:00
Zhen Ye
458ab86894
fix: stop retry if collection not found too much when get recovery from coord (#41980)
issue: #41966

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-22 16:22:24 +08:00
Zhen Ye
59ab274dbe
fix: use flusher and recovery checkpoint together to determine the truncate position (#41934)
issue: #41544

- unify the log field of message
- use the minimum one of flusher and recovery storage checkpoint as the
truncate position

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-20 16:10:24 +08:00
Zhen Ye
59dff668dc
enhance: schema change without manual flush (#41882)
issue: #39718

- remove the manual flush message from schema change operation
- add flush segment id handle into schema change processes

Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: congqixia <congqi.xia@zilliz.com>
2025-05-19 10:14:22 +08:00
Zhen Ye
a3d5ad135e
fix: recover a dropped collection from wal if create collection message can be seen (#41902)
issue: #41654

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-17 07:38:21 +08:00
Zhen Ye
0a465bb5b7
enhance: use recovery+shardmanager, remove segment assignment interceptor (#41824)
issue: #41544

- add lock interceptor into wal.
- use recovery and shardmanager to replace the original implementation
of segment assignment.
- remove redundant implementation and unittest.
- remove redundant proto definition.
- use 2 streamingnode in e2e.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-14 23:00:23 +08:00
Zhen Ye
e675da76e4
enhance: simplify the proto message, make segment assignment code more clean (#41671)
issue: #41544

- simplify the proto message for flush and create segment.
- simplify the msg handler for flowgraph.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-11 20:49:00 +08:00
Zhen Ye
452d6fb709
fix: write buffer leak if the wal flusher is cancelled when recovery (#41719)
issue: #41715

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-10 09:32:56 +08:00
Zhen Ye
de8f0af20d
enhance: use dispatcher at delegator when enable streaming (#41266)
issue: #38399

- add an adaptor type to adapt the streaming service client and
msgstream client to reuse the msgdispatcher.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-06 01:12:53 +08:00
Zhen Ye
dfbb02a5f7
enhance: make streaming message as a log field for easier coding (#41545)
issue: #41544

- implement message can be logged as a field by zap.
- fix too many slow log for woodpecker.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-28 14:38:42 +08:00
Zhen Ye
ecfc868dcb
fix: write buffer not unregistered when datasyncservice is gone (#41496)
issue: #41495

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-24 19:38:38 +08:00
congqixia
b36c88f3c8
enhance: [AddField] Broadcast schema change via WAL (#41373)
Related to #39718

Add Broadcast logic for collection schema change and notifies:
- Streamnode - Delegator
- Streamnode - Flush component
- QueryNodes via grpc

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-22 16:28:37 +08:00
Zhen Ye
9339bccccc
enhance: move sent first timeticksync, make recovery more easier (#41405)
issue: #38399

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-21 17:18:37 +08:00
Xianhui Lin
f9febe3bae
enhance: Merge RootCoord, DataCoord And QueryCoord into MixCoord (#41006)
Merge RootCoord, DataCoord And QueryCoord into MixCoord
Make Session into one
issue : https://github.com/milvus-io/milvus/issues/37764

---------

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-04-11 16:36:30 +08:00
Zhen Ye
224728c2d2
fix: catchup cannot work if using StartAfter (#41201)
issue: #41062

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-10 19:04:27 +08:00
Ted Xu
1bcea2a775
fix: assigning the correct storage version in sync and index tasks (#41093)
See #39663 #40667

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-04-08 10:14:25 +08:00
Zhen Ye
af80a4dac2
fix: auto flush all segment that is not created by streaming service (#40767)
issue: #40532

Signed-off-by: chyezh <chyezh@outlook.com>
2025-03-26 16:32:22 +08:00
Zhen Ye
f6fb4bc442
fix: backoff will retry infinitely after reaching max elapse (#40589)
issue: #40588

Signed-off-by: chyezh <chyezh@outlook.com>
2025-03-13 16:24:06 +08:00
Zhen Ye
5735c3ef19
fix: too many memory usage of streaming node (#40606)
issue: #40592

Signed-off-by: chyezh <chyezh@outlook.com>
2025-03-13 07:10:07 +08:00
sthuang
63a7c4570e
feat: storage v2 sync (#39663)
related: #39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-03-05 11:22:15 +08:00
Zhen Ye
84df80b5e4
enhance: refactor metrics of streaming (#40031)
issue: #38399

- add metrics for broadcaster component.
- add metrics for wal flusher component.
- add metrics for wal interceptors.
- add slow log for wal.
- add more label for some wal metrics. (local or remote/catcup or
tailing...)

Signed-off-by: chyezh <chyezh@outlook.com>
2025-02-25 12:25:56 +08:00
congqixia
cb7f2fa6fd
enhance: Use v2 package name for pkg module (#39990)
Related to #39095

https://go.dev/doc/modules/version-numbers

Update pkg version according to golang dep version convention

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-02-22 23:15:58 +08:00
SimFG
047254665d
feat: support to replicate import msg (#39171)
- issue: #39849

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: chyezh <chyezh@outlook.com>
2025-02-16 00:08:13 +08:00
Zhen Ye
034575396f
fix: streaming consume checkpoint is always nil and limit resource of ci (#39781)
issue: #38399

- fix the nil pointer bug
- limit the resource usage for streaming e2e
- enhance the go test
- fix: rootcoord block when graceful stop

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-02-13 19:18:14 +08:00
Zhen Ye
0988807160
enhance: enable write ahead buffer for streaming service (#39771)
issue: #38399

- Make a timetick-commit-based write ahead buffer at write side.
- Add a switchable scanner at read side to transfer the state between
catchup and tailing read

Signed-off-by: chyezh <chyezh@outlook.com>
2025-02-12 20:38:46 +08:00
Zhen Ye
d3e32bb599
enhance: make pchannel level flusher (#39275)
issue: #38399

- Add a pchannel level checkpoint for flush processing
- Refactor the recovery of flushers of wal
- make a shared wal scanner first, then make multi datasyncservice on it

Signed-off-by: chyezh <chyezh@outlook.com>
2025-02-10 16:32:45 +08:00
Zhen Ye
5669016af0
enhance: erase the rpc level when wal is located at same node (#38858)
issue: #38399

- Make the wal scanner interface same with streaming scanner.
- Use wal if the wal is located at current node.
- Otherwise fallback the old logic.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-02-05 22:25:10 +08:00
Zhen Ye
c84a0748c4
enhance: add rw/ro streaming query node replica management (#38677)
issue: #38399

- Embed the query node into streaming node to make delegator available
at streaming node.
- The embedded query node has a special server label
`QUERYNODE_STREAMING-EMBEDDED`.
- Change the balance strategy to make the channel assigned to streaming
node as much as possible.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-24 16:55:07 +08:00
aoiasd
9cb4c4e8ac
fix: bm25 import segment without bm25 stats meta (#38855)
relate: https://github.com/milvus-io/milvus/issues/38854

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-01-21 11:09:04 +08:00
Zhen Ye
bb8d1ab3bf
enhance: make new go package to manage proto (#39114)
issue: #39095

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-10 10:49:01 +08:00
Zhen Ye
3bcdd92915
enhance: add broadcast for streaming service (#39020)
issue: #38399 

- Add new rpc for transfer broadcast to streaming coord
- Add broadcast service at streaming coord to make broadcast message
sent automicly

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-09 16:24:55 +08:00
Zhen Ye
984a605d47
enhance: use lazy initializing client for streaming node (#38400)
issue: #38399

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-12-23 11:36:47 +08:00