113 Commits

Author SHA1 Message Date
Zhen Ye
adbdf916e1
enhance: support proxy DML forward (#45921)
issue: #45812

- 2.6 proxy will try to forward DWL to 2.5 proxy if streaming service is
not ready

Signed-off-by: chyezh <chyezh@outlook.com>
2025-12-01 19:37:10 +08:00
Zhen Ye
8e0ae6433d
fix: LastConfirmedMessageID may be wrong if high concurrent writing (#45873)
issue: #45872

Signed-off-by: chyezh <chyezh@outlook.com>
2025-11-27 12:01:07 +08:00
tinswzy
1427825133
enhance: improve WAL retention strategy (#45350)
issue: #44369 
woodpecker related[ issue:
#59](https://github.com/zilliztech/woodpecker/issues/59)

Refactor the WAL retention logic in Milvus StreamingNode:
- Remove the simple sampling-based truncation mechanism.
- After flush, WAL data is directly truncated.
- The retention control is now delegated to the underlying message queue
(MQ) implementation.

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-11-23 21:41:05 +08:00
Zhen Ye
40e2042728
enhance: add more metrics for DDL framework (#45558)
issue: #43897

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-11-14 15:19:37 +08:00
junjiejiangjjj
102481e53f
feat: Support add_function/alter_function/drop_function (#44895)
https://github.com/milvus-io/milvus/issues/44053

Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>
2025-11-13 20:53:39 +08:00
Xiaofan
a9895bb904
enhance: add robust handle etcd servercrash (#45304)
related to #45303
fix milvus pod may restart when etcd pod start

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
2025-11-13 10:23:36 +08:00
Zhen Ye
b7fb8ed38c
fix: use the right resource key lock for ddl and use new ddl in transfer replica (#45506)
issue: #45452

- alias/rename related DDL should use database level exclusive lock
- alias cannot use as the resource key of lock, use collection name
instead
- transfer replica should use WAL-based framework

Signed-off-by: chyezh <chyezh@outlook.com>
2025-11-12 19:01:38 +08:00
Zhen Ye
4797bb6ab2
fix: wrong update timetick of collection meta info (#45461)
issue: #45403, #45463

- fix the Nightly E2E failures.
- fix the wrong update timetick of altering collection to fix the
related load failure.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-11-11 16:01:36 +08:00
Zhen Ye
31a609c21d
fix: kafka should auto reset the offset from earliest to read (#45237)
issue: #44172, #45210, #44851

kafka will auto reset the offset to "latest" if the offset is
Out-of-range. the recovery of milvus wal cannot read any message from
that. So once the offset is out-of-range, kafka should read from eariest
to read the latest uncleared data.


https://kafka.apache.org/documentation/#consumerconfigs_auto.offset.reset

Signed-off-by: chyezh <chyezh@outlook.com>
2025-11-03 21:07:33 +08:00
Zhen Ye
00d8d2c33d
enhance: support load/release collection/partition with WAL-based DDL framework (#45154)
issue: #43897

- Load/Release collection/partition is implemented by WAL-based DDL
framework now.
- Support AlterLoadConfig/DropLoadConfig in wal now.
- Load/Release operation can be synced by new CDC now.
- Refactor some UT for load/release DDL.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-11-02 18:39:32 +08:00
Zhen Ye
309d564796
enhance: support collection and index with WAL-based DDL framework (#45033)
issue: #43897

- Part of collection/index related DDL is implemented by WAL-based DDL
framework now.
- Support following message type in wal, CreateCollection,
DropCollection, CreatePartition, DropPartition, CreateIndex, AlterIndex,
DropIndex.
- Part of collection/index related DDL can be synced by new CDC now.
- Refactor some UT for collection/index DDL.
- Add Tombstone scheduler to manage the tombstone GC for collection or
partition meta.
- Move the vchannel allocation into streaming pchannel manager.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-10-30 14:24:08 +08:00
Zhen Ye
ce164db1f3
fix: wal state may be unconsistent after recovering from crash (#45092)
issue: #45088, #45086

- Message on control channel should trigger the checkpoint update.
- LastConfrimedMessageID should be recovered from the minimum of
checkpoint or the LastConfirmedMessageID of uncommitted txn.
- Add more log info for wal debugging.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-10-29 16:26:10 +08:00
Zhen Ye
2aa48bf4ca
fix: wrong execution order of DDL/DCL on secondary (#44886)
issue: #44697, #44696

- The DDL executing order of secondary keep same with order of control
channel timetick now.
- filtering the control channel operation on shard manager of
streamingnode to avoid wrong vchannel of create segment.
- fix that the immutable txn message lost replicate header.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-10-21 22:38:05 +08:00
Zhen Ye
8bf7d6ae72
enhance: refactor update replicate config operation using wal-broadcast-based DDL/DCL framework (#44560)
issue: #43897

- UpdateReplicateConfig operation will broadcast AlterReplicateConfig
message into all pchannels with cluster-exclusive-lock.
- Begin txn message will use commit message timetick now (to avoid
timetick rollback when CDC with txn message).
- If current cluster is secondary, the UpdateReplicateConfig will wait
until the replicate configuration is consistent with the config
replicated from primary.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-10-15 15:26:01 +08:00
tinswzy
f342f49b32
enhance: add support for Azure Blob Storage in wp (#44592)
#44485 
add support for blob in woodpecker

#43638 
upgrade wp v0.1.6

related wp [issue#11](https://github.com/zilliztech/woodpecker/issues/11
)

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-09-29 09:51:44 +08:00
Zhen Ye
19e5e9f910
enhance: broadcaster will lock resource until message acked (#44508)
issue: #43897

- Return LastConfirmedMessageID when wal append operation.
- Add resource-key-based locker for broadcast-ack operation to protect
the coord state when executing ddl.
- Resource-key-based locker is held until the broadcast operation is
acked.
- ResourceKey support shared and exclusive lock.
- Add FastAck execute ack right away after the broadcast done to speed
up ddl.
- Ack callback will support broadcast message result now.
- Add tombstone for broadcaster to avoid to repeatedly commit DDL and
ABA issue.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-09-24 20:58:05 +08:00
Zhen Ye
c171280f63
enhance: support replicate message in wal. (#44456)
issue: #44123

- support replicate message  in wal of milvus.
- support CDC-replicate recovery from wal.
- fix some CDC replicator bugs

Signed-off-by: chyezh <chyezh@outlook.com>
2025-09-22 17:06:11 +08:00
tinswzy
c7f21d5a06
enhance: purge small files right after wp segment compaction (#44473)
#43638 
improve wp log output
[wp#43](https://github.com/zilliztech/woodpecker/issues/43)
intro purge small files right after segment compaction
[wp#47](https://github.com/zilliztech/woodpecker/issues/47)
The rootpath configured by milvus is uniformly used as the base for wp
local fs storage.
update to v0.1.5

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-09-21 16:32:01 +08:00
Zhen Ye
ba289891c0
enhance: add all ddl message into messages (#44407)
issue: #43897

- add ddl messages proto and add some message utilities.
- support shard/exclusive resource-key-lock.
- add all ddl callbacks future into broadcast registry.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-09-18 10:08:00 +08:00
yihao.dai
51f69f32d0
feat: Add CDC support (#44124)
This PR implements a new CDC service for Milvus 2.6, providing log-based
cross-cluster replication.

issue: https://github.com/milvus-io/milvus/issues/44123

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: chyezh <chyezh@outlook.com>
2025-09-16 16:32:01 +08:00
Zhen Ye
cbe4c3d231
enhance: get cchannel before build message (#44229)
issue: #43897

- support never expire txn message.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-09-10 11:09:57 +08:00
Zhen Ye
9e2d1963d4
enhance: support cchannel for streaming service (#44143)
issue: #43897

- add cchannel as a special vchannel to hold some ddl and dcl.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-09-02 10:05:52 +08:00
Zhen Ye
3327df72e4
enhance: make immutable message as the param of ack operation for cdc (#43900)
issue: #43897

- The original broadcast ack operation need to recover message from
etcd, which can not support cdc.
- immutable message will set as the ack parameter to fix it.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-09-01 10:21:52 +08:00
XuanYang-cn
37a447d166
feat: Add CMEK cipher plugin (#43722)
1. Enable Milvus to read cipher configs
2. Enable cipher plugin in binlog reader and writer
3. Add a testCipher for unittests
4. Support pooling for datanode
5. Add encryption in storagev2

See also: #40321 
Signed-off-by: yangxuan <xuan.yang@zilliz.com>

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-08-27 11:15:52 +08:00
Zhen Ye
5bdc593b8a
enhance: use v0.15.1 official pulsar client and add logging for pulsar client (#43913)
issue: #43785

- pulsar client will print log into milvus logger now.
- pulsar client open the metric by default.
- upgrade the pulsar client to v0.15.1, and use offical repo.
- the fixing of milvus-io/pulsar-client-go is already covered by
official v0.15.1.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-08-26 16:45:53 +08:00
Zhen Ye
d0e3a33c37
enhance: add IsRebalanceSuspended interface for wal balancer (#44026)
issue: #43968

Signed-off-by: chyezh <chyezh@outlook.com>
2025-08-24 09:19:47 +08:00
Zhen Ye
082ca62ec1
enhance: support balancer interface for streaming client to fetch streaming node information (#43969)
issue: #43968

- Add ListStreamingNode/GetWALDistribution to  fetch streaming node info
- Add SuspendRebalance/ResumeRebalance to enable or stop balance
- Add FreezeNodeIDs/DefreezeNodeIDs to freeze target node

Signed-off-by: chyezh <chyezh@outlook.com>
2025-08-21 15:55:47 +08:00
Zhen Ye
f5cee0012a
fix: remove panic for message type in recovery storage and marshal log (#43976)
issue: #43897

Signed-off-by: chyezh <chyezh@outlook.com>
2025-08-21 14:23:47 +08:00
tinswzy
6a342edc5a
fix: empty error returned on append timeout when MinIO is unavailable (#43926)
#43810 
Fixed the issue where the result err returned by append timeout was
empty when objectstorage was unavailable, causing the client to
mistakenly believe that the write was successful.

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-08-19 10:47:45 +08:00
Zhen Ye
a86b6f2a54
enhance: extend the stats manage at streaming shard manager for L0 (#43371)
issue: #42416

- Rename the InsertMetric into ModifiedMetric.
- Add L0 control configuration.
- Add some L0 current state collect.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-08-18 20:41:46 +08:00
Zhen Ye
7b005c48bf
enhance: support util template generation for messages (#43881)
issue: #43880

Signed-off-by: chyezh <chyezh@outlook.com>
2025-08-18 01:19:44 +08:00
tinswzy
084f777552
enhance: use wp internal writer without lock (#43775)
#43638  #43810 
add internal writer without session lock;
refactor and unify read state and log entry
refactor data reading related methods;
fix bug where a closed writer is reused for finalize;

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-08-18 01:15:44 +08:00
Zhen Ye
8ff118a9ff
fix: call IntoMessageProto instead of Payload when rpc (#43678)
issue: #43677

Signed-off-by: chyezh <chyezh@outlook.com>
2025-08-06 14:45:40 +08:00
Zhen Ye
3e3775fb81
fix: panics when describe collection internal failure (#43630)
issue: #43629

- also fix the scanner_switchable panic underlying wal scanner return
context error.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-29 20:33:36 +08:00
tinswzy
173efe2b98
enhance: wp metrics and update deps to v0.1.0 (#43569)
#43574   #43604 #43431  #43603 
Fix wp metrics not registered bug;
Update the version dependent on wp to v0.1.2-rc1;
improve advanced reader with concurrent prefetch blks;
add the segment rolling policy based on the number of blocks;
improve concurrent compaction
release lock failed bug

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-07-29 14:51:35 +08:00
Zhen Ye
648994182f
fix: pulsar use more memory for queue (#43565)
issue: #43564

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-28 14:00:56 +08:00
Zhen Ye
070aabd27e
enhance: fix remove flushing state of segment (#43560)
issue: #43559, #42884

- also fix the data lost when streaming resuming from old arch message.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-25 18:08:54 +08:00
Zhen Ye
e9ab73e93d
enhance: add schema version at recovery storage (#43500)
issue: #43072, #43289

- manage the schema version at recovery storage.
- update the schema when creating collection or alter schema.
- get schema at write buffer based on version.
- recover the schema when upgrading from 2.5.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-23 21:38:54 +08:00
Zhen Ye
07fa2cbdd3
enhance: wal balance consider the wal status on streamingnode (#43265)
issue: #42995

- don't balance the wal if the producing-consuming lag is too long.
- don't balance if the rebalance is set as false.
- don't balance if the wal is balanced recently.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-18 11:10:51 +08:00
tinswzy
7da62698e0
enhance: improve WP parallel sync mechanism and fencing logic (#42892)
related: #42595 
improve WP parallel sync mechanism and fencing logic; remove redundant
metrics and labels

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-07-13 23:04:49 +08:00
Zhen Ye
15a6631147
enhance: add quota limit based on sn consuming lag (#43105)
issue: #42995

- The consuming lag at streaming node will be reported to coordinator.
- The consuming lag will trigger the write limit and deny by quota
center.
- Set the ttProtection by default.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-11 14:10:49 +08:00
Zhen Ye
f598ca2b4e
fix: block at msgpack adaptor and wrong metrics (#43235)
issue: #43018

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-11 10:14:49 +08:00
Zhen Ye
490c5d5088
fix: lost message version after compatible message modification (#43217)
issue: #43018

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-10 10:36:48 +08:00
Zhen Ye
6798fdc3b3
fix: rocksmq cannot graceful stop (#42841)
issue: #40532

Signed-off-by: chyezh <chyezh@outlook.com>
2025-06-19 19:38:39 +08:00
Zhen Ye
593662970b
fix: reuse consumer for backlog clear and use shared consumer (#42822)
issue: #42820

- fix that ro pulsar cannot be closed when upgrading milvus.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-06-19 19:36:48 +08:00
Zhen Ye
4f5409e1fe
fix: panic when schema change (#42727)
issue: #42723

Signed-off-by: chyezh <chyezh@outlook.com>
2025-06-13 17:43:07 +08:00
Zhen Ye
1f66b650e9
fix: pulsar cannot work properly if backlog exceed (#42653)
issue: #42649

- the sync operation of different pchannel is concurrent now.
- add a option to notify the backlog clear automatically.
- make pulsar walimpls can be recovered from backlog exceed.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-06-13 14:28:37 +08:00
yihao.dai
e6da4a64b5
fix: Pre-check import message to prevent pipeline block indefinitely (#42415)
Pre-check import message to prevent pipeline block indefinitely.

issue: https://github.com/milvus-io/milvus/issues/42414

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Co-authored-by: chyezh <chyezh@outlook.com>
2025-06-11 13:40:38 +08:00
tinswzy
36a4b74fc0
enhance: Adjust the default parameters of WP according to performance tests (#42598)
#42595

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-06-10 16:30:35 +08:00
tinswzy
f55f900c85
fix insert hang caused by WAL writer writing to a closing logfile (#42078)
related issue #42049 
wp commit
[94de4](94de4cbc60)

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-06-03 09:58:36 +08:00