milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-07 19:31:51 +08:00

Author	SHA1	Message	Date
wei liu	975c91df16	feat: Add comprehensive snapshot functionality for collections (#44361 ) issue: #44358 Implement complete snapshot management system including creation, deletion, listing, description, and restoration capabilities across all system components. Key features: - Create snapshots for entire collections - Drop snapshots by name with proper cleanup - List snapshots with collection filtering - Describe snapshot details and metadata Components added/modified: - Client SDK with full snapshot API support and options - DataCoord snapshot service with metadata management - Proxy layer with task-based snapshot operations - Protocol buffer definitions for snapshot RPCs - Comprehensive unit tests with mockey framework - Integration tests for end-to-end validation Technical implementation: - Snapshot metadata storage in etcd with proper indexing - File-based snapshot data persistence in object storage - Garbage collection integration for snapshot cleanup - Error handling and validation across all operations - Thread-safe operations with proper locking mechanisms <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - Core invariant/assumption: snapshots are immutable point‑in‑time captures identified by (collection, snapshot name/ID); etcd snapshot metadata is authoritative for lifecycle (PENDING → COMMITTED → DELETING) and per‑segment manifests live in object storage (Avro / StorageV2). GC and restore logic must see snapshotRefIndex loaded (snapshotMeta.IsRefIndexLoaded) before reclaiming or relying on segment/index files. - New capability added: full end‑to‑end snapshot subsystem — client SDK APIs (Create/Drop/List/Describe/Restore + restore job queries), DataCoord SnapshotWriter/Reader (Avro + StorageV2 manifests), snapshotMeta in meta, SnapshotManager orchestration (create/drop/describe/list/restore), copy‑segment restore tasks/inspector/checker, proxy & RPC surface, GC integration, and docs/tests — enabling point‑in‑time collection snapshots persisted to object storage and restorations orchestrated across components. - Logic removed/simplified and why: duplicated recursive compaction/delta‑log traversal and ad‑hoc lookup code were consolidated behind two focused APIs/owners (Handler.GetDeltaLogFromCompactTo for delta traversal and SnapshotManager/SnapshotReader for snapshot I/O). MixCoord/coordinator broker paths were converted to thin RPC proxies. This eliminates multiple implementations of the same traversal/lookup, reducing divergence and simplifying responsibility boundaries. - Why this does NOT introduce data loss or regressions: snapshot create/drop use explicit two‑phase semantics (PENDING → COMMIT/DELETING) with SnapshotWriter writing manifests and metadata before commit; GC uses snapshotRefIndex guards and IsRefIndexLoaded/GetSnapshotBySegment/GetSnapshotByIndex checks to avoid removing referenced files; restore flow pre‑allocates job IDs, validates resources (partitions/indexes), performs rollback on failure (rollbackRestoreSnapshot), and converts/updates segment/index metadata only after successful copy tasks. Extensive unit and integration tests exercise pending/deleting/GC/restore/error paths to ensure idempotence and protection against premature deletion. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2026-01-06 10:15:24 +08:00
Zhen Ye	7c575a18b0	enhance: support AckSyncUp for broadcaster, and enable it in truncate api (#46313 ) issue: #43897 also for issue: #46166 add ack_sync_up flag into broadcast message header, which indicates that whether the broadcast operation is need to be synced up between the streaming node and the coordinator. If the ack_sync_up is false, the broadcast operation will be acked once the recovery storage see the message at current vchannel, the fast ack operation can be applied to speed up the broadcast operation. If the ack_sync_up is true, the broadcast operation will be acked after the checkpoint of current vchannel reach current message. The fast ack operation can not be applied to speed up the broadcast operation, because the ack operation need to be synced up with streaming node. e.g. if truncate collection operation want to call ack once callback after the all segment are flushed at current vchannel, it should set the ack_sync_up to be true. TODO: current implementation doesn't promise the ack sync up semantic, it only promise FastAck operation will not be applied, wait for 3.0 to implement the ack sync up semantic. only for truncate api now. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-12-17 16:55:17 +08:00
Zhen Ye	97c987f313	fix: remove wrong start timetick to avoid filtering DML whose timetick is less than it. (#44691 ) issue: #41611 - introduced by #44532 Signed-off-by: chyezh <chyezh@outlook.com>	2025-10-09 20:39:58 +08:00
Zhen Ye	30091a3bb7	enhance: remove redundant channel manager from datacoord (#44532 ) issue: #41611 - After enabling streaming arch, channel manager of data coord is a redundant component. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-10-09 11:01:57 +08:00
Zhen Ye	7877aaa96c	fix: dirty cp metrics after drop (#43567 ) issue: #42688 - The channel cp is dropped by garbage collector - The channel is dropped and the cp is marked as math.Uint64 - If we drop it here, the update channel checkpoints will write the dirty cp back. Signed-off-by: chyezh <chyezh@outlook.com>	2025-07-27 23:22:55 +08:00
Bingyi Sun	1bf960b1a8	enhance: Check loaded segments before gc (#42639 ) issue: https://github.com/milvus-io/milvus/issues/42412 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-06-13 17:44:38 +08:00
Xianhui Lin	98067f5fc6	fix: datacoord stop get stuck After upgrading from 2.5 to 2.6 (#42674 ) datacoord stop get stuck After upgrading from 2.5 to 2.6 issue:https://github.com/milvus-io/milvus/issues/42656 Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-06-12 16:56:36 +08:00
cai.zhang	57c60af00d	fix: Unsorted small segments should not be considered as indexed (#42614 ) issue: #42143 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-06-12 14:30:35 +08:00
Zhen Ye	0567f512b3	fix: streamingnode get stucked when stop (#42501 ) issue: #42498 - fix: sealed segment cannot be flushed after upgrading - fix: get mvcc panic when upgrading - ignore the L0 segment when graceful stop of querynode. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-06-05 12:22:31 +08:00
Zhen Ye	7dca7ef4d0	fix: empty growing segment cannot be recovered by streamingnode (#41666 ) issue: #41665 Signed-off-by: chyezh <chyezh@outlook.com>	2025-05-08 14:50:53 +08:00
wei liu	0420dc1eb1	fix: use correct delete checkpoint to prevent premature data cleanup (#40366 ) issue: #40292 related to #39552 - Fix incorrect delete checkpoint usage in SyncDistribution - Change checkpoint parameter from action.GetCheckpoint() to action.GetDeleteCP() in SyncTargetVersion call - This resolves the issue where delete buffer data was being cleaned prematurely due to wrong checkpoint reference Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-03-12 15:00:08 +08:00
Zhen Ye	9ca5088f62	fix: duplicate consuming from stream for invisble segment (#40316 ) issue: #40207 Signed-off-by: chyezh <chyezh@outlook.com>	2025-03-04 15:54:00 +08:00
wei liu	69b8b89369	enhance: Remove QueryCoord's scheduling of L0 segments (#39552 ) issue: #39551 This PR remove querycoord's scheduling of l0 segments: - only load l0 segment when watch channel - only release l0 segment when release channel or sync data distribution --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-02-26 21:38:00 +08:00
congqixia	cb7f2fa6fd	enhance: Use v2 package name for pkg module (#39990 ) Related to #39095 https://go.dev/doc/modules/version-numbers Update pkg version according to golang dep version convention --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-02-22 23:15:58 +08:00
cai.zhang	3a6408b237	fix: Record a map to avoid repeatedly traversing the CompactionFrom (#38925 ) issue: #38811 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-01-15 10:02:58 +08:00
Zhen Ye	bb8d1ab3bf	enhance: make new go package to manage proto (#39114 ) issue: #39095 --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-01-10 10:49:01 +08:00
cai.zhang	a348122758	fix: Support get segments from current segments view (#38512 ) issue: #38511 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-12-18 18:00:54 +08:00
tinswzy	1dbb6cd7cb	enhance: refine the datacoord meta related interfaces (#37957 ) issue: #35917 This PR refines the meta-related APIs in datacoord to allow the ctx to be passed down to the catalog operation interfaces Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>	2024-11-26 19:46:34 +08:00
cai.zhang	4dc684126e	enhance: Handoff growing segment after sorted (#37385 ) issue: #33744 1. Segments generated from inserts will be loaded as growing until they are sorted by primary key. 2. This PR may increase memory pressure on the delegator, but we need to test the performance of stats. In local testing, the speed of stats is greater than the insert speed. Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-11-07 16:08:24 +08:00
cai.zhang	ac8c5fcd5d	enhance: Remove pre-marking segments as L2 during clustering compaction (#36799 ) issue: #36686 This pr will remove pre-marking segments as L2 during clustering compaction in version 2.5, and ensure compatibility with version 2.4. The core of this change is to ensure that the many-to-many lineage derivation logic is correct, making sure that both the parent and child cannot simultaneously exist in the target segment view. feature: - Clustering compaction no longer marks the input segments as L2. - Add a new field `is_invisible` to `segmentInfo`, and mark segments that have completed clustering but have not yet built indexes as `is_invisible` to prevent them from being loaded prematurely." - Do not mark the input segment as `Dropped` before the clustering compaction is completed. - After compaction fails, only the result segment needs to be marked as Dropped. compatibility: - If the upgraded task has not failed, there are no compatibility issues. - If the status after the upgrade is `MetaSaved`, then skip the stats task based on whether TmpSegments is empty. - If the failure occurs before `MetaSaved`: - there are no ResultSegments, and InputSegments have not been marked as dropped yet. - the level of input segments need to revert to LastLevel - If the failure occurs after `MetaSaved`: - ResultSegments have already been generated, and InputSegments have been marked as Dropped. At this point, simply make the ResultSegments visible. - the level of ResultSegments needs to be set to L1（in order to participate in mixCompaction） --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-10-23 17:15:28 +08:00
jaime	ef1832ff9c	enhance: enable manual compaction for collections without indexes (#36577 ) issue: #36576 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-10-08 19:57:18 +08:00
XuanYang-cn	4e0ea39235	fix: Remove neighbors if compactTo is unindexed (#36503 ) See also: #36360 --------- Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2024-10-08 14:15:19 +08:00
XuanYang-cn	82743c5c50	fix: Clear channelcp meta and metrics ASAP (#35658 ) See also: #35588 --------- Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2024-08-26 12:22:57 +08:00
cai.zhang	497afcb897	enhance: Refine code for GetRecoveryInfo (#34973 ) issue: #34495 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-07-29 00:01:46 +08:00
Chun Han	96dcee5dff	fix:load major compaction partial result(#34051 ) (#34052 ) related: #34051 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2024-06-28 14:04:05 +08:00
Chun Han	f7af323d1e	fix: sync partitiion stats blocking balance task(#33741 ) (#33742 ) related: #33741 Signed-off-by: MrPresent-Han <chun.han@zilliz.com>	2024-06-11 14:21:56 +08:00
wayblink	a1232fafda	feat: Major compaction (#33620 ) #30633 Signed-off-by: wayblink <anyang.wang@zilliz.com> Co-authored-by: MrPresent-Han <chun.han@zilliz.com>	2024-06-10 21:34:08 +08:00
yiwangdr	180d754158	fix: speed up segment lookup via channel name in datacoord (#33530 ) issue: #33342 Signed-off-by: yiwangdr <yiwangdr@gmail.com>	2024-06-03 14:47:47 +08:00
congqixia	cedb33ceec	enhance: Improve datacoord segment filtering with collection (#32831 ) See also #32165 This PR modify the `SelectSegments` interface to utilizing collection id information when selecting segment with provided collection --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-05-08 21:37:29 +08:00
zhagnlu	e2c38750c7	fix: modify retry error (#32351 ) #32322 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-04-18 14:25:14 +08:00
zhagnlu	4586bcef9f	fix: correct AssignSegmentID return and add retry for loadCollectionF… (#32335 ) #32322 #31942 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-04-16 10:20:10 -07:00
wei liu	0d849a6c0a	fix: fix collectionInfo leak in datacoord (#32175 ) issue: #32029 lack of logic to clean collection info in datacoord's meta, This PR clean collection info after drop channel, to avoid collection info leak in datacoord --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-15 16:33:19 +08:00
congqixia	a647b84f3e	enhance: Add AllPartitionsID const to replace InvalidPartitionID (#31438 ) "-1" as `InvalidPartitionID` previously used as All partition place holder in delete cases. It's confusing and hard to maintain when a const var has more than one meaning. This PR add `AllPartitionsID` to replace these usages in delete scenarios. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-20 19:01:05 +08:00
congqixia	aa967de0a8	enhance: Explicitly pass LevelZero segment ids in vchan info (#29612 ) See also #27675 For `GetRecoveryInfo` & `GetRecoveryInfoV2`, Level zero segment ids shall be specified in vchan info so that querycoord could re-fetch current segment info during watch procedure without having all segment info Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-01-04 16:46:45 +08:00
XuanYang-cn	a153950b10	Change channel to Interface (#27839 ) This PR changes `*channel` into RWChannel interface See also: #25309 Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2023-11-13 11:16:18 +08:00
jaime	6749957e71	Refine RPC call in unwatch drop channel (#27864 ) Signed-off-by: jaime <yun.zhang@zilliz.com>	2023-10-24 17:46:15 +08:00
SimFG	26f06dd732	Format the code (#27275 ) Signed-off-by: SimFG <bang.fu@zilliz.com>	2023-09-21 09:45:27 +08:00
Jiquan Long	61c7b0990d	Workaround fix ChannelManager holding mutex too long (#26870 ) Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2023-09-06 14:29:40 +08:00
SimFG	3be4ac4022	Fix datanode/datacoord continuous restart (#26470 ) Signed-off-by: SimFG <bang.fu@zilliz.com>	2023-08-20 21:20:24 +08:00
congqixia	8d13717cac	Fill Collection start position timestamp in WatchInfo (#26370 ) Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-08-16 09:05:32 +08:00
congqixia	1045c88102	Support replace indexed field in QueryCoord (#25747 ) Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-07-19 21:22:58 +08:00
Enwei Jiao	66fdc71479	Refactor logs in DataCoord & DataNode (#25574 ) Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>	2023-07-14 15:56:31 +08:00
wei liu	cba0feb119	add coordinator broker, to unify rootcoord api access (#25187 ) Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2023-07-03 18:28:25 +08:00
congqixia	597a4d9227	Treat small segment without index as sealed (#25237 ) Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-07-02 19:50:23 +08:00
congqixia	41af0a98fa	Use go-api/v2 for milvus-proto (#24770 ) Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-06-09 01:28:37 +08:00
Ikko Eltociear Ashimine	eb9ef23cca	Fix typo in handler.go (#24125 ) postitions -> positions Signed-off-by: Ikko Eltociear Ashimine <eltociear@gmail.com>	2023-05-17 14:25:23 +08:00
yihao.dai	fe0a1bc2d9	Fix panic caused by wrong logic of getting unindexed segments (#24044 ) Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2023-05-15 20:59:28 +08:00
congqixia	5aa9db0d38	Add collection level auto compaction enabled config (#24013 ) Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-05-10 17:45:20 +08:00
Xiaofan	016311ad48	Fix Has Collection Failed on datacoord (#23710 ) Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>	2023-04-27 09:50:34 +08:00
wei liu	cbfe7a45ef	fix pull target (#23491 ) Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2023-04-18 18:30:32 +08:00

1 2

91 Commits