milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2025-12-28 22:45:26 +08:00

Author	SHA1	Message	Date
Zhen Ye	7c575a18b0	enhance: support AckSyncUp for broadcaster, and enable it in truncate api (#46313 ) issue: #43897 also for issue: #46166 add ack_sync_up flag into broadcast message header, which indicates that whether the broadcast operation is need to be synced up between the streaming node and the coordinator. If the ack_sync_up is false, the broadcast operation will be acked once the recovery storage see the message at current vchannel, the fast ack operation can be applied to speed up the broadcast operation. If the ack_sync_up is true, the broadcast operation will be acked after the checkpoint of current vchannel reach current message. The fast ack operation can not be applied to speed up the broadcast operation, because the ack operation need to be synced up with streaming node. e.g. if truncate collection operation want to call ack once callback after the all segment are flushed at current vchannel, it should set the ack_sync_up to be true. TODO: current implementation doesn't promise the ack sync up semantic, it only promise FastAck operation will not be applied, wait for 3.0 to implement the ack sync up semantic. only for truncate api now. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-12-17 16:55:17 +08:00
sijie-ni-0214	f51de1a8ab	feat: support TruncateCollection api to clear collection data (#46167 ) issue: https://github.com/milvus-io/milvus/issues/46166 --------- Signed-off-by: sijie-ni-0214 <sijie.ni@zilliz.com>	2025-12-12 10:31:14 +08:00
yihao.dai	f32f2694bc	enhance: Implement new FlushAllMessage and refactor flush all (#45920 ) This PR: 1. Define and implement the new FlushAllMessage. 2. Refactor FlushAll to flush the entire cluster. issue: https://github.com/milvus-io/milvus/issues/45919 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-12-10 19:27:13 +08:00
tinswzy	1917bb720f	enhance: add fallback mechanism for WP when accessing object storage without Condition Write support (#45735 ) related issue: #45733 related [wp issue: #60](https://github.com/zilliztech/woodpecker/issues/60) Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>	2025-12-07 21:59:11 +08:00
Zhen Ye	adbdf916e1	enhance: support proxy DML forward (#45921 ) issue: #45812 - 2.6 proxy will try to forward DWL to 2.5 proxy if streaming service is not ready Signed-off-by: chyezh <chyezh@outlook.com>	2025-12-01 19:37:10 +08:00
Zhen Ye	8e0ae6433d	fix: LastConfirmedMessageID may be wrong if high concurrent writing (#45873 ) issue: #45872 Signed-off-by: chyezh <chyezh@outlook.com>	2025-11-27 12:01:07 +08:00
tinswzy	1427825133	enhance: improve WAL retention strategy (#45350 ) issue: #44369 woodpecker related[ issue: #59](https://github.com/zilliztech/woodpecker/issues/59) Refactor the WAL retention logic in Milvus StreamingNode: - Remove the simple sampling-based truncation mechanism. - After flush, WAL data is directly truncated. - The retention control is now delegated to the underlying message queue (MQ) implementation. Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>	2025-11-23 21:41:05 +08:00
Zhen Ye	40e2042728	enhance: add more metrics for DDL framework (#45558 ) issue: #43897 --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-11-14 15:19:37 +08:00
junjiejiangjjj	102481e53f	feat: Support add_function/alter_function/drop_function (#44895 ) https://github.com/milvus-io/milvus/issues/44053 Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>	2025-11-13 20:53:39 +08:00
Xiaofan	a9895bb904	enhance: add robust handle etcd servercrash (#45304 ) related to #45303 fix milvus pod may restart when etcd pod start Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>	2025-11-13 10:23:36 +08:00
Zhen Ye	b7fb8ed38c	fix: use the right resource key lock for ddl and use new ddl in transfer replica (#45506 ) issue: #45452 - alias/rename related DDL should use database level exclusive lock - alias cannot use as the resource key of lock, use collection name instead - transfer replica should use WAL-based framework Signed-off-by: chyezh <chyezh@outlook.com>	2025-11-12 19:01:38 +08:00
Zhen Ye	4797bb6ab2	fix: wrong update timetick of collection meta info (#45461 ) issue: #45403, #45463 - fix the Nightly E2E failures. - fix the wrong update timetick of altering collection to fix the related load failure. Signed-off-by: chyezh <chyezh@outlook.com>	2025-11-11 16:01:36 +08:00
Zhen Ye	31a609c21d	fix: kafka should auto reset the offset from earliest to read (#45237 ) issue: #44172, #45210, #44851 kafka will auto reset the offset to "latest" if the offset is Out-of-range. the recovery of milvus wal cannot read any message from that. So once the offset is out-of-range, kafka should read from eariest to read the latest uncleared data. https://kafka.apache.org/documentation/#consumerconfigs_auto.offset.reset Signed-off-by: chyezh <chyezh@outlook.com>	2025-11-03 21:07:33 +08:00
Zhen Ye	00d8d2c33d	enhance: support load/release collection/partition with WAL-based DDL framework (#45154 ) issue: #43897 - Load/Release collection/partition is implemented by WAL-based DDL framework now. - Support AlterLoadConfig/DropLoadConfig in wal now. - Load/Release operation can be synced by new CDC now. - Refactor some UT for load/release DDL. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-11-02 18:39:32 +08:00
Zhen Ye	309d564796	enhance: support collection and index with WAL-based DDL framework (#45033 ) issue: #43897 - Part of collection/index related DDL is implemented by WAL-based DDL framework now. - Support following message type in wal, CreateCollection, DropCollection, CreatePartition, DropPartition, CreateIndex, AlterIndex, DropIndex. - Part of collection/index related DDL can be synced by new CDC now. - Refactor some UT for collection/index DDL. - Add Tombstone scheduler to manage the tombstone GC for collection or partition meta. - Move the vchannel allocation into streaming pchannel manager. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-10-30 14:24:08 +08:00
Zhen Ye	ce164db1f3	fix: wal state may be unconsistent after recovering from crash (#45092 ) issue: #45088, #45086 - Message on control channel should trigger the checkpoint update. - LastConfrimedMessageID should be recovered from the minimum of checkpoint or the LastConfirmedMessageID of uncommitted txn. - Add more log info for wal debugging. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-10-29 16:26:10 +08:00
Zhen Ye	2aa48bf4ca	fix: wrong execution order of DDL/DCL on secondary (#44886 ) issue: #44697, #44696 - The DDL executing order of secondary keep same with order of control channel timetick now. - filtering the control channel operation on shard manager of streamingnode to avoid wrong vchannel of create segment. - fix that the immutable txn message lost replicate header. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-10-21 22:38:05 +08:00
Zhen Ye	8bf7d6ae72	enhance: refactor update replicate config operation using wal-broadcast-based DDL/DCL framework (#44560 ) issue: #43897 - UpdateReplicateConfig operation will broadcast AlterReplicateConfig message into all pchannels with cluster-exclusive-lock. - Begin txn message will use commit message timetick now (to avoid timetick rollback when CDC with txn message). - If current cluster is secondary, the UpdateReplicateConfig will wait until the replicate configuration is consistent with the config replicated from primary. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-10-15 15:26:01 +08:00
tinswzy	f342f49b32	enhance: add support for Azure Blob Storage in wp (#44592 ) #44485 add support for blob in woodpecker #43638 upgrade wp v0.1.6 related wp [issue#11](https://github.com/zilliztech/woodpecker/issues/11 ) Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>	2025-09-29 09:51:44 +08:00
Zhen Ye	19e5e9f910	enhance: broadcaster will lock resource until message acked (#44508 ) issue: #43897 - Return LastConfirmedMessageID when wal append operation. - Add resource-key-based locker for broadcast-ack operation to protect the coord state when executing ddl. - Resource-key-based locker is held until the broadcast operation is acked. - ResourceKey support shared and exclusive lock. - Add FastAck execute ack right away after the broadcast done to speed up ddl. - Ack callback will support broadcast message result now. - Add tombstone for broadcaster to avoid to repeatedly commit DDL and ABA issue. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-09-24 20:58:05 +08:00
Zhen Ye	c171280f63	enhance: support replicate message in wal. (#44456 ) issue: #44123 - support replicate message in wal of milvus. - support CDC-replicate recovery from wal. - fix some CDC replicator bugs Signed-off-by: chyezh <chyezh@outlook.com>	2025-09-22 17:06:11 +08:00
tinswzy	c7f21d5a06	enhance: purge small files right after wp segment compaction (#44473 ) #43638 improve wp log output [wp#43](https://github.com/zilliztech/woodpecker/issues/43) intro purge small files right after segment compaction [wp#47](https://github.com/zilliztech/woodpecker/issues/47) The rootpath configured by milvus is uniformly used as the base for wp local fs storage. update to v0.1.5 Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>	2025-09-21 16:32:01 +08:00
Zhen Ye	ba289891c0	enhance: add all ddl message into messages (#44407 ) issue: #43897 - add ddl messages proto and add some message utilities. - support shard/exclusive resource-key-lock. - add all ddl callbacks future into broadcast registry. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-09-18 10:08:00 +08:00
yihao.dai	51f69f32d0	feat: Add CDC support (#44124 ) This PR implements a new CDC service for Milvus 2.6, providing log-based cross-cluster replication. issue: https://github.com/milvus-io/milvus/issues/44123 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com> Signed-off-by: chyezh <chyezh@outlook.com> Co-authored-by: chyezh <chyezh@outlook.com>	2025-09-16 16:32:01 +08:00
Zhen Ye	cbe4c3d231	enhance: get cchannel before build message (#44229 ) issue: #43897 - support never expire txn message. Signed-off-by: chyezh <chyezh@outlook.com>	2025-09-10 11:09:57 +08:00
Zhen Ye	9e2d1963d4	enhance: support cchannel for streaming service (#44143 ) issue: #43897 - add cchannel as a special vchannel to hold some ddl and dcl. Signed-off-by: chyezh <chyezh@outlook.com>	2025-09-02 10:05:52 +08:00
Zhen Ye	3327df72e4	enhance: make immutable message as the param of ack operation for cdc (#43900 ) issue: #43897 - The original broadcast ack operation need to recover message from etcd, which can not support cdc. - immutable message will set as the ack parameter to fix it. Signed-off-by: chyezh <chyezh@outlook.com>	2025-09-01 10:21:52 +08:00
XuanYang-cn	37a447d166	feat: Add CMEK cipher plugin (#43722 ) 1. Enable Milvus to read cipher configs 2. Enable cipher plugin in binlog reader and writer 3. Add a testCipher for unittests 4. Support pooling for datanode 5. Add encryption in storagev2 See also: #40321 Signed-off-by: yangxuan <xuan.yang@zilliz.com> --------- Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-08-27 11:15:52 +08:00
Zhen Ye	5bdc593b8a	enhance: use v0.15.1 official pulsar client and add logging for pulsar client (#43913 ) issue: #43785 - pulsar client will print log into milvus logger now. - pulsar client open the metric by default. - upgrade the pulsar client to v0.15.1, and use offical repo. - the fixing of milvus-io/pulsar-client-go is already covered by official v0.15.1. Signed-off-by: chyezh <chyezh@outlook.com>	2025-08-26 16:45:53 +08:00
Zhen Ye	d0e3a33c37	enhance: add IsRebalanceSuspended interface for wal balancer (#44026 ) issue: #43968 Signed-off-by: chyezh <chyezh@outlook.com>	2025-08-24 09:19:47 +08:00
Zhen Ye	082ca62ec1	enhance: support balancer interface for streaming client to fetch streaming node information (#43969 ) issue: #43968 - Add ListStreamingNode/GetWALDistribution to fetch streaming node info - Add SuspendRebalance/ResumeRebalance to enable or stop balance - Add FreezeNodeIDs/DefreezeNodeIDs to freeze target node Signed-off-by: chyezh <chyezh@outlook.com>	2025-08-21 15:55:47 +08:00
Zhen Ye	f5cee0012a	fix: remove panic for message type in recovery storage and marshal log (#43976 ) issue: #43897 Signed-off-by: chyezh <chyezh@outlook.com>	2025-08-21 14:23:47 +08:00
tinswzy	6a342edc5a	fix: empty error returned on append timeout when MinIO is unavailable (#43926 ) #43810 Fixed the issue where the result err returned by append timeout was empty when objectstorage was unavailable, causing the client to mistakenly believe that the write was successful. Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>	2025-08-19 10:47:45 +08:00
Zhen Ye	a86b6f2a54	enhance: extend the stats manage at streaming shard manager for L0 (#43371 ) issue: #42416 - Rename the InsertMetric into ModifiedMetric. - Add L0 control configuration. - Add some L0 current state collect. Signed-off-by: chyezh <chyezh@outlook.com>	2025-08-18 20:41:46 +08:00
Zhen Ye	7b005c48bf	enhance: support util template generation for messages (#43881 ) issue: #43880 Signed-off-by: chyezh <chyezh@outlook.com>	2025-08-18 01:19:44 +08:00
tinswzy	084f777552	enhance: use wp internal writer without lock (#43775 ) #43638 #43810 add internal writer without session lock; refactor and unify read state and log entry refactor data reading related methods; fix bug where a closed writer is reused for finalize; Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>	2025-08-18 01:15:44 +08:00
Zhen Ye	8ff118a9ff	fix: call IntoMessageProto instead of Payload when rpc (#43678 ) issue: #43677 Signed-off-by: chyezh <chyezh@outlook.com>	2025-08-06 14:45:40 +08:00
Zhen Ye	3e3775fb81	fix: panics when describe collection internal failure (#43630 ) issue: #43629 - also fix the scanner_switchable panic underlying wal scanner return context error. Signed-off-by: chyezh <chyezh@outlook.com>	2025-07-29 20:33:36 +08:00
tinswzy	173efe2b98	enhance: wp metrics and update deps to v0.1.0 (#43569 ) #43574 #43604 #43431 #43603 Fix wp metrics not registered bug; Update the version dependent on wp to v0.1.2-rc1; improve advanced reader with concurrent prefetch blks; add the segment rolling policy based on the number of blocks; improve concurrent compaction release lock failed bug Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>	2025-07-29 14:51:35 +08:00
Zhen Ye	648994182f	fix: pulsar use more memory for queue (#43565 ) issue: #43564 Signed-off-by: chyezh <chyezh@outlook.com>	2025-07-28 14:00:56 +08:00
Zhen Ye	070aabd27e	enhance: fix remove flushing state of segment (#43560 ) issue: #43559, #42884 - also fix the data lost when streaming resuming from old arch message. Signed-off-by: chyezh <chyezh@outlook.com>	2025-07-25 18:08:54 +08:00
Zhen Ye	e9ab73e93d	enhance: add schema version at recovery storage (#43500 ) issue: #43072, #43289 - manage the schema version at recovery storage. - update the schema when creating collection or alter schema. - get schema at write buffer based on version. - recover the schema when upgrading from 2.5. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-07-23 21:38:54 +08:00
Zhen Ye	07fa2cbdd3	enhance: wal balance consider the wal status on streamingnode (#43265 ) issue: #42995 - don't balance the wal if the producing-consuming lag is too long. - don't balance if the rebalance is set as false. - don't balance if the wal is balanced recently. Signed-off-by: chyezh <chyezh@outlook.com>	2025-07-18 11:10:51 +08:00
tinswzy	7da62698e0	enhance: improve WP parallel sync mechanism and fencing logic (#42892 ) related: #42595 improve WP parallel sync mechanism and fencing logic; remove redundant metrics and labels Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>	2025-07-13 23:04:49 +08:00
Zhen Ye	15a6631147	enhance: add quota limit based on sn consuming lag (#43105 ) issue: #42995 - The consuming lag at streaming node will be reported to coordinator. - The consuming lag will trigger the write limit and deny by quota center. - Set the ttProtection by default. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-07-11 14:10:49 +08:00
Zhen Ye	f598ca2b4e	fix: block at msgpack adaptor and wrong metrics (#43235 ) issue: #43018 Signed-off-by: chyezh <chyezh@outlook.com>	2025-07-11 10:14:49 +08:00
Zhen Ye	490c5d5088	fix: lost message version after compatible message modification (#43217 ) issue: #43018 Signed-off-by: chyezh <chyezh@outlook.com>	2025-07-10 10:36:48 +08:00
Zhen Ye	6798fdc3b3	fix: rocksmq cannot graceful stop (#42841 ) issue: #40532 Signed-off-by: chyezh <chyezh@outlook.com>	2025-06-19 19:38:39 +08:00
Zhen Ye	593662970b	fix: reuse consumer for backlog clear and use shared consumer (#42822 ) issue: #42820 - fix that ro pulsar cannot be closed when upgrading milvus. Signed-off-by: chyezh <chyezh@outlook.com>	2025-06-19 19:36:48 +08:00
Zhen Ye	4f5409e1fe	fix: panic when schema change (#42727 ) issue: #42723 Signed-off-by: chyezh <chyezh@outlook.com>	2025-06-13 17:43:07 +08:00

1 2 3

117 Commits