35 Commits

Author SHA1 Message Date
yihao.dai
5b97cb70a0
enhance: Support delaying scanner startup (#46369)
Introduce a ScannerStartupDelay configuration to enable WAL write-only
recovery, allowing fence messages to be persisted during
primary–secondary switchover when the StreamingNode is trapped in crash
loops.

issue: https://github.com/milvus-io/milvus/issues/46368

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added a configurable WAL scanner pause/resume and a consumer request
flag to optionally ignore pause signals.

* **Metrics**
* Added a scanner pause gauge and pause-duration tracking for WAL
scanning.

* **Tests**
* Added coverage for pause-consumption behavior and cleanup in stream
client tests.

* **Chores**
* Consolidated flush-all logging into a single field and added a helper
for bulk message conversion.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-12-24 11:53:19 +08:00
Zhen Ye
823c7f7e3e
fix: use remote wal when local wal shutdown (#45753)
issue: #45750

Signed-off-by: chyezh <chyezh@outlook.com>
2025-11-22 16:17:05 +08:00
Zhen Ye
00d8d2c33d
enhance: support load/release collection/partition with WAL-based DDL framework (#45154)
issue: #43897

- Load/Release collection/partition is implemented by WAL-based DDL
framework now.
- Support AlterLoadConfig/DropLoadConfig in wal now.
- Load/Release operation can be synced by new CDC now.
- Refactor some UT for load/release DDL.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-11-02 18:39:32 +08:00
Zhen Ye
6e5189fe19
fix: make ack of broadcaster cannot canceled by client (#45145)
issue: #45141

- make ack of broadcaster cannot canceled by rpc.
- make clone for assignment snapshot of wal balancer.
- add server id for GetReplicateCheckpoint to avoid failure.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-10-29 20:34:11 +08:00
Zhen Ye
19e5e9f910
enhance: broadcaster will lock resource until message acked (#44508)
issue: #43897

- Return LastConfirmedMessageID when wal append operation.
- Add resource-key-based locker for broadcast-ack operation to protect
the coord state when executing ddl.
- Resource-key-based locker is held until the broadcast operation is
acked.
- ResourceKey support shared and exclusive lock.
- Add FastAck execute ack right away after the broadcast done to speed
up ddl.
- Ack callback will support broadcast message result now.
- Add tombstone for broadcaster to avoid to repeatedly commit DDL and
ABA issue.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-09-24 20:58:05 +08:00
Zhen Ye
c171280f63
enhance: support replicate message in wal. (#44456)
issue: #44123

- support replicate message  in wal of milvus.
- support CDC-replicate recovery from wal.
- fix some CDC replicator bugs

Signed-off-by: chyezh <chyezh@outlook.com>
2025-09-22 17:06:11 +08:00
yihao.dai
51f69f32d0
feat: Add CDC support (#44124)
This PR implements a new CDC service for Milvus 2.6, providing log-based
cross-cluster replication.

issue: https://github.com/milvus-io/milvus/issues/44123

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: chyezh <chyezh@outlook.com>
2025-09-16 16:32:01 +08:00
Zhen Ye
3327df72e4
enhance: make immutable message as the param of ack operation for cdc (#43900)
issue: #43897

- The original broadcast ack operation need to recover message from
etcd, which can not support cdc.
- immutable message will set as the ack parameter to fix it.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-09-01 10:21:52 +08:00
Zhen Ye
8ff118a9ff
fix: call IntoMessageProto instead of Payload when rpc (#43678)
issue: #43677

Signed-off-by: chyezh <chyezh@outlook.com>
2025-08-06 14:45:40 +08:00
Zhen Ye
15a6631147
enhance: add quota limit based on sn consuming lag (#43105)
issue: #42995

- The consuming lag at streaming node will be reported to coordinator.
- The consuming lag will trigger the write limit and deny by quota
center.
- Set the ttProtection by default.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-11 14:10:49 +08:00
rhys
48661655d6
fix: streamingcoord and streamingnode client support internal tls (#42685)
https://github.com/milvus-io/milvus/issues/42680

streamingnode/streamingcoord support internal tls

Signed-off-by: rhys <sdbwlr@163.com>
2025-06-27 17:50:42 +08:00
Zhen Ye
0567f512b3
fix: streamingnode get stucked when stop (#42501)
issue: #42498

- fix: sealed segment cannot be flushed after upgrading
- fix: get mvcc panic when upgrading
- ignore the L0 segment when graceful stop of querynode.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-06-05 12:22:31 +08:00
Zhen Ye
4bad293655
enhance: make upgrading from 2.5.x less down time (#42082)
issue: #40532

- start timeticksync at rootcoord if the streaming service is not
available
- stop timeticksync if the streaming service is available
- open a read-only wal if some nodes in cluster is not upgrading to 2.6
- allow to open read-write wal after all nodes in cluster is upgrading
to 2.6

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-29 23:02:29 +08:00
Zhen Ye
0a465bb5b7
enhance: use recovery+shardmanager, remove segment assignment interceptor (#41824)
issue: #41544

- add lock interceptor into wal.
- use recovery and shardmanager to replace the original implementation
of segment assignment.
- remove redundant implementation and unittest.
- remove redundant proto definition.
- use 2 streamingnode in e2e.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-14 23:00:23 +08:00
Zhen Ye
21d6d1669e
fix: wal should be reopen if wal append receive the fence error (#41807)
issue: #41544

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-14 01:02:56 +08:00
Zhen Ye
f6fb4bc442
fix: backoff will retry infinitely after reaching max elapse (#40589)
issue: #40588

Signed-off-by: chyezh <chyezh@outlook.com>
2025-03-13 16:24:06 +08:00
Zhen Ye
84df80b5e4
enhance: refactor metrics of streaming (#40031)
issue: #38399

- add metrics for broadcaster component.
- add metrics for wal flusher component.
- add metrics for wal interceptors.
- add slow log for wal.
- add more label for some wal metrics. (local or remote/catcup or
tailing...)

Signed-off-by: chyezh <chyezh@outlook.com>
2025-02-25 12:25:56 +08:00
congqixia
cb7f2fa6fd
enhance: Use v2 package name for pkg module (#39990)
Related to #39095

https://go.dev/doc/modules/version-numbers

Update pkg version according to golang dep version convention

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-02-22 23:15:58 +08:00
Zhen Ye
fd701eca71
fix: local wal perform different with remote wal (#39967)
issue: #38399

Signed-off-by: chyezh <chyezh@outlook.com>
2025-02-19 19:12:51 +08:00
Zhen Ye
21724ab52c
enhance: generate guaranteets at delegator if local wal (#39799)
issue: #38399, #39892

- use mvcc timestamp of wal as guaranteets if wal and delegator is
located at same node.
- fix: ignore growing option is lost at hibridsearch

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-02-17 15:22:15 +08:00
Zhen Ye
a9e0e0a852
enhance: broadcast with event-based notification (#39522)
issue: #38399

- broadcast message can carry multi resource key now.
- implement event-based notification for broadcast messages
- broadcast message use broadcast id as a unique identifier in message
- broadcasted message on vchannels keep the broadcasted vchannel now.
- broadcasted message and broadcast message have a common broadcast
header now.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-02-07 11:14:43 +08:00
Zhen Ye
5669016af0
enhance: erase the rpc level when wal is located at same node (#38858)
issue: #38399

- Make the wal scanner interface same with streaming scanner.
- Use wal if the wal is located at current node.
- Otherwise fallback the old logic.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-02-05 22:25:10 +08:00
Zhen Ye
7addeea70c
fix: wrong streaming mockery package name (#39260)
issue: #39095

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-15 15:51:00 +08:00
Zhen Ye
bb8d1ab3bf
enhance: make new go package to manage proto (#39114)
issue: #39095

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-10 10:49:01 +08:00
Zhen Ye
afac153c26
enhance: move the lifetime implementation out of server level lifetime (#38442)
issue: #38399

- move the lifetime implementation of common code out of the server
level lifetime implementation

Signed-off-by: chyezh <chyezh@outlook.com>
2024-12-17 11:42:44 +08:00
congqixia
b0bd290a6e
enhance: Use internal json(sonic) to replace std json lib (#37708)
Related to #35020

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-18 10:46:31 +08:00
Zhen Ye
1b6edd0b4b
enhance: refactor the consumer grpc proto for reusing grpc stream for multi-consumer (#37564)
issue: #33285

- Modify the proto of consumer of streaming service.
- Make VChannel as a required option for streaming

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-11-11 17:24:29 +08:00
Zhen Ye
f0f5147aef
fix: streaming consumer may get stucked when handler is un-consumed (#36818)
issue: #36378

Signed-off-by: chyezh <chyezh@outlook.com>
2024-10-14 15:23:23 +08:00
Zhen Ye
c03eb6f664
fix: streaming node consume blocks if recv message is too large (#36151)
issue: #36081

Signed-off-by: chyezh <chyezh@outlook.com>
2024-09-13 16:41:08 +08:00
Zhen Ye
4d69898cb2
enhance: support single pchannel level transaction (#35289)
issue: #33285

- support transaction on single wal.
- last confirmed message id can still be used when enable transaction.
- add fence operation for segment allocation interceptor.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-08-19 21:22:56 +08:00
chyezh
16b0aee97f
enhance: timetick interceptor optimization (#35287)
issue: #33285

- remove redundant goroutine by using insepctor.
- remove the coutinous non-message timetick persistence
- periodically push the time tick forward without persistent timetick
message.
- add 'message type filter' deliver filter.

Signed-off-by: chyezh <chyezh@outlook.com>
2024-08-12 18:58:25 +08:00
chyezh
c725416288
enhance: move streaming proto into pkg (#35284)
issue: #33285

- move streaming related proto into pkg.
- add v2 message type and change flush message into v2 message.

Signed-off-by: chyezh <chyezh@outlook.com>
2024-08-07 10:34:16 +08:00
chyezh
14051fed7d
enhance: streaming service client (#34656)
issue: #33285

- implement streaming service client.
- implement producing and consuming service client by streaming coord
client and streaming node client.

Signed-off-by: chyezh <chyezh@outlook.com>
2024-08-05 21:38:15 +08:00
chyezh
39c7e06bc5
enhance: add message and msgstream msgpack adaptor (#34874)
issue: #33285

- make message builder and message conversion type safe
- add adaptor type and function to adapt old msgstream msgpack and
message interface

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-07-22 20:59:42 +08:00
chyezh
86eff6e589
enhance: streaming node client implementation (#34653)
issue: #33285

- add streaming node grpc client wrapper
- add unittest for streaming node grpc client side
- fix binary unsafe bug for message

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-07-19 17:37:40 +08:00