milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2025-12-07 09:38:39 +08:00

Author	SHA1	Message	Date
congqixia	f94b04e642	feat: [2.6] integrate Loon FFI for manifest-based segment loading and index building (#46076 ) Cherry-pick from master pr: #45061 #45488 #45803 #46017 #44991 #45132 #45723 #45726 #45798 #45897 #45918 #44998 This feature integrates the Storage V2 (Loon) FFI interface as a unified storage layer for segment loading and index building in Milvus. It enables manifest-based data access, replacing the traditional binlog-based approach with a more efficient columnar storage format. Key changes: ### Segment Self-Managed Loading Architecture - Move segment loading orchestration from Go layer to C++ segcore - Add NewSegmentWithLoadInfo() API for passing load info during segment creation - Implement SetLoadInfo() and Load() methods in SegmentInterface - Support parallel loading of indexed and non-indexed fields - Enable both sealed and growing segments to self-manage loading ### Storage V2 FFI Integration - Integrate milvus-storage library's FFI interface for packed columnar data - Add manifest path support throughout the data path (SegmentInfo, LoadInfo) - Implement ManifestReader for generating manifests from binlogs - Support zero-copy data exchange using Arrow C Data Interface - Add ToCStorageConfig() for Go-to-C storage config conversion ### Manifest-Based Index Building - Extend FileManagerContext to carry loon_ffi_properties - Implement GetFieldDatasFromManifest() using Arrow C Stream interface - Support manifest-based reading in DiskFileManagerImpl and MemFileManagerImpl - Add fallback to traditional segment insert files when manifest unavailable ### Compaction Pipeline Updates - Include manifest path in all compaction task builders (clustering, L0, mix) - Update BulkPackWriterV2 to return manifest path - Propagate manifest metadata through compaction pipeline ### Configuration & Protocol - Add common.storageV2.useLoonFFI config option (default: false) - Add manifest_path field to SegmentLoadInfo and related proto messages - Add manifest field to compaction segment messages ### Bug Fixes - Fix mmap settings not applied during segment load (key typo fix) - Populate index info after segment loading to prevent redundant load tasks - Fix memory corruption by removing premature transaction handle destruction Related issues: #44956, #45060, #39173 ## Individual Cherry-Picked Commits 1. e1c923b5cc - fix: apply mmap settings correctly during segment load (#46017) 2. 63b912370b - enhance: use milvus-storage internal C++ Reader API for Loon FFI (#45897) 3. bfc192faa5 - enhance: Resolve issues integrating loon FFI (#45918) 4. fb18564631 - enhance: support manifest-based index building with Loon FFI reader (#45726) 5. b9ec2392b9 - enhance: integrate StorageV2 FFI interface for manifest-based segment loading (#45798) 6. 66db3c32e6 - enhance: integrate Storage V2 FFI interface for unified storage access (#45723) 7. ae789273ac - fix: populate index info after segment loading to prevent redundant load tasks (#45803) 8. 49688b0be2 - enhance: Move segment loading logic from Go layer to segcore for self-managed loading (#45488) 9. 5b2df88bac - enhance: [StorageV2] Integrate FFI interface for packed reader (#45132) 10. 91ff5706ac - enhance: [StorageV2] add manifest path support for FFI integration (#44991) 11. 2192bb4a85 - enhance: add NewSegmentWithLoadInfo API to support segment self-managed loading (#45061) 12. 4296b01da0 - enhance: update delta log serialization APIs to integrate storage V2 (#44998) ## Technical Details ### Architecture Changes - Before: Go layer orchestrated segment loading, making multiple CGO calls - After: Segments autonomously manage loading in C++ layer with single entry point ### Storage Access Pattern - Before: Read individual binlog files through Go storage layer - After: Read manifest file that references packed columnar data via FFI ### Benefits - Reduced cross-language call overhead - Better resource management at C++ level - Improved I/O performance through batched streaming reads - Cleaner separation of concerns between Go and C++ layers - Foundation for proactive schema evolution handling --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com> Signed-off-by: Congqi Xia <congqi.xia@zilliz.com> Co-authored-by: Ted Xu <ted.xu@zilliz.com>	2025-12-04 17:09:12 +08:00
XuanYang-cn	37a447d166	feat: Add CMEK cipher plugin (#43722 ) 1. Enable Milvus to read cipher configs 2. Enable cipher plugin in binlog reader and writer 3. Add a testCipher for unittests 4. Support pooling for datanode 5. Add encryption in storagev2 See also: #40321 Signed-off-by: yangxuan <xuan.yang@zilliz.com> --------- Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-08-27 11:15:52 +08:00
Spade A	faeb7fd410	feat: impl StructArray -- create schema, insert, and retrieve data (#42855 ) Ref https://github.com/milvus-io/milvus/issues/42148 https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of storage for handling with VectorArray. This PR: 1. impls the go part of storage for VectorArray 2. impls the collection creation with StructArrayField and VectorArray 3. insert and retrieve data from the collection. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>	2025-07-27 01:30:55 +08:00
Ted Xu	9041bf1b9a	fix: including shouldCopy parameter in file readers (#43578 ) This parameter determines whether the returned value should be a copy or a reference from the arrow array. The updates enhance memory management and provide more control over data handling during deserialization. See #43186 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-07-26 17:30:55 +08:00
Ted Xu	078ccf5e08	fix: the underlying record got released in clustering compaction (#43551 ) See: #43186 In this PR: 1. Flush renamed to FlushChunk, while a new Flush primitive is introduced to serialize values to records. 2. Segment mapping in clustering compaction now process data by records instead of values, it calls flush to all buffers after each record is processed. Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-07-25 15:04:54 +08:00
congqixia	1cf8ed505f	fix: Implement `NeededFields` feature in `RecordReader` (#43523 ) Related to #43522 Currently, passing partial schema to storage v2 packed reader may trigger SEGV during clustering compaction unit test. This patch implement `NeededFields` differently in each `RecordReader` imlementation. For now, v2 will implemented as no-op. This will be supported after packed reader support this API. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-24 00:22:54 +08:00
cai.zhang	e26a532504	enhance: Only download necessary fields during clustering analyze phase (#43322 ) issue: #43310 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-07-22 16:40:52 +08:00
yihao.dai	b69e601fe1	fix: [StorageV2] Correct read and write buffer size (#43335 ) Correct read and buffer size to 64MB to prevent OOM during clustering compaction. issue: https://github.com/milvus-io/milvus/issues/43310 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-07-16 14:28:52 +08:00
congqixia	5a9efb3f81	enhance: [StorageV2] Refine storage rw option usage & validation (#43175 ) Related to #39173 This PR: - Make all datanode task passes storage config via storage config option - Remove legacy comments, rootPath & bucketName parameters - Fix clustering compaction option behavior - Add validation logic for `rwOptions` - Use correct storageType from storageConfig - Add storage config in sync task --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-11 01:14:48 +08:00
cai.zhang	6989e18599	enhance: Move sort stats task to sort compaction (#42562 ) issue: #42560 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-07-08 20:22:47 +08:00
congqixia	a9aaa86193	enhance: [StorageV2] Pass bucket name for compaction readers (#42607 ) Related to #39173 Like logic in #41919, storage v2 fs shall use complete paths with bucketName prefix to be compatible with its definition. This PR fills bucket name from config when creating reader for compaction tasks. NOTE: the bucket name shall be read from task params config for compaction task pooling. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-06-10 10:20:35 +08:00
cai.zhang	80fe573c76	enhance: Pass the compaction configuration through request parameters (#41979 ) issue: #41123 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-05-26 11:52:27 +08:00
yihao.dai	65dd3982d8	fix: Fix ants.Pool goroutine leak (#41892 ) 1. Release the pool after it is no longer in use. 2. Upgrade ants.Pool to fix the goroutine leak issue (see [PR #287](https://github.com/panjf2000/ants/pull/287)). issue: https://github.com/milvus-io/milvus/issues/41838 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-05-19 17:56:22 +08:00
yihao.dai	dccfc69660	enhance: Get compaction params from request (#41125 ) Make DataNode use compaction parameters from request instead of configuration. issue: https://github.com/milvus-io/milvus/issues/41123 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-04-15 10:28:53 +08:00
yihao.dai	5b78ef0a49	fix: Fix delete data loss due to duplicate binlogID (#40960 ) With concurrenct L0 compaction (https://github.com/milvus-io/milvus/pull/36816), delta logs might be written to the same L1 segment, causing logID duplication when using the incremental beginLogID. This PR removes the beginLogID mechanism and instead passes a log ID range, where the number of IDs in the range equals the number of compaction segment binlogs multiplied by an expansion factor. issue: https://github.com/milvus-io/milvus/issues/40207 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-04-01 10:36:22 +08:00
sthuang	d7df78a6c9	feat: Storage v2 compaction (#40667 ) - Feat: Support Mix compaction. Covering tests include compatibility and rollback ability. - Read v1 segments and compact with v2 format. - Read both v1 and v2 segments and compact with v2 format. - Read v2 segments and compact with v2 format. - Compact with duplicate primary key test. - Compact with bm25 segments. - Compact with merge sort segments. - Compact with no expiration segments. - Compact with lack binlog segments. - Compact with nullable field segments. - Feat: Support Clustering compaction. Covering tests include compatibility and rollback ability. - Read v1 segments and compact with v2 format. - Read both v1 and v2 segments and compact with v2 format. - Read v2 segments and compact with v2 format. - Compact bm25 segments with v2 format. - Compact with memory limit. - Enhance: Use serdeMap serialize in BuildRecord function to support all Milvus data types. related: #39173 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-03-21 10:16:12 +08:00
Ted Xu	df4285c9ef	enhance: API integration with storage v2 in clustering-compactions (#40133 ) See #39173 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-03-13 14:12:06 +08:00
XuanYang-cn	837ac295fa	enhance: Remove iterators in datanode (#40301 ) Iterators are long deprecated, but sort are still using it. This PR unifies stats task with the latest compaction common functions and remove the usage of iterators. 1. Rename `datanode/compaction` to `datanode/compactor` 2. Add `internal/compaction` and move some compaction commons into it. 3. Replace `DeltalogIterators` with `ComposeDeleteFromDeltalogs` 4. Remove `datanode/iterators` See also: #39242 Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-03-04 12:14:00 +08:00

18 Commits