milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2025-12-07 17:48:29 +08:00

Author	SHA1	Message	Date
congqixia	c01fd94a6a	enhance: integrate Storage V2 FFI interface for unified storage access (#45723 ) Related #44956 This commit integrates the Storage V2 FFI (Foreign Function Interface) interface throughout the Milvus codebase, enabling unified storage access through the Loon FFI layer. This is a significant step towards standardizing storage operations across different storage versions. 1. Configuration Support - configs/milvus.yaml: Added `useLoonFFI` configuration flag under `common.storage.file.splitByAvgSize` section - Allows runtime toggle between traditional binlog readers and new FFI-based manifest readers - Default: `false` (maintains backward compatibility) 2. Core FFI Infrastructure Enhanced Utilities (internal/core/src/storage/loon_ffi/util.cpp/h) - ToCStorageConfig(): Converts Go's `StorageConfig` to C's `CStorageConfig` struct for FFI calls - GetManifest(): Parses manifest JSON and retrieves latest column groups using FFI - Accepts manifest path with `base_path` and `ver` fields - Calls `get_latest_column_groups()` FFI function - Returns column group information as string - Comprehensive error handling for JSON parsing and FFI errors 3. Dependency Updates - internal/core/thirdparty/milvus-storage/CMakeLists.txt: - Updated milvus-storage version from `0883026` to `302143c` - Ensures compatibility with latest FFI interfaces 4. Data Coordinator Changes All compaction task builders now include manifest path in segment binlogs: - compaction_task_clustering.go: Added `Manifest: segInfo.GetManifestPath()` to segment binlogs - compaction_task_l0.go: Added manifest path to both L0 segment selection and compaction plan building - compaction_task_mix.go: Added manifest path to mixed compaction segment binlogs - meta.go: Updated metadata completion logic: - `completeClusterCompactionMutation()`: Set `ManifestPath` in new segment info - `completeMixCompactionMutation()`: Preserve manifest path in compacted segments - `completeSortCompactionMutation()`: Include manifest path in sorted segments 5. Data Node Compactor Enhancements All compactors updated to support dual-mode reading (binlog vs manifest): 6. Flush & Sync Manager Updates Pack Writer V2 (pack_writer_v2.go) - BulkPackWriterV2.Write(): Extended return signature to include `manifest string` - Implementation: - Generate manifest path: `path.Join(pack.segmentID, "manifest.json")` - Write packed data using FFI-based writer - Return manifest path along with binlogs, deltas, and stats Task Handling (task.go) - Updated all sync task result handling to accommodate new manifest return value - Ensured backward compatibility for callers not using manifest 7. Go Storage Layer Integration New Interfaces and Implementations - record_reader.go: Interface for unified record reading across storage versions - record_writer.go: Interface for unified record writing across storage versions - binlog_record_writer.go: Concrete implementation for traditional binlog-based writing Enhanced Schema Support (schema.go, schema_test.go) - Schema conversion utilities to support FFI-based storage operations - Ensures proper Arrow schema mapping for V2 storage Serialization Updates - serde.go, serde_events.go, serde_events_v2.go: Updated to work with new reader/writer interfaces - Test files updated to validate dual-mode serialization 8. Storage V2 Packed Format FFI Common (storagev2/packed/ffi_common.go) - Common FFI utilities and type conversions for packed storage format Packed Writer FFI (storagev2/packed/packed_writer_ffi.go) - FFI-based implementation of packed writer - Integrates with Loon storage layer for efficient columnar writes Packed Reader FFI (storagev2/packed/packed_reader_ffi.go) - Already existed, now complemented by writer implementation 9. Protocol Buffer Updates data_coord.proto & datapb/data_coord.pb.go - Added `manifest` field to compaction segment messages - Enables passing manifest metadata through compaction pipeline worker.proto & workerpb/worker.pb.go - Added compaction parameter for `useLoonFFI` flag - Allows workers to receive FFI configuration from coordinator 10. Parameter Configuration component_param.go - Added `UseLoonFFI` parameter to compaction configuration - Reads from `common.storage.file.useLoonFFI` config path - Default: `false` for safe rollout 11. Test Updates - clustering_compactor_storage_v2_test.go: Updated signatures to handle manifest return value - mix_compactor_storage_v2_test.go: Updated test helpers for manifest support - namespace_compactor_test.go: Adjusted writer calls to expect manifest - pack_writer_v2_test.go: Validated manifest generation in pack writing This integration follows a dual-mode approach: 1. Legacy Path: Traditional binlog-based reading/writing (when `useLoonFFI=false` or no manifest) 2. FFI Path: Manifest-based reading/writing through Loon FFI (when `useLoonFFI=true` and manifest exists) --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-11-24 19:57:07 +08:00
XuanYang-cn	40fdf1e828	enhance: Enable to merge sort one segment (#45652 ) Remove the log stack when setting isCompacting Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-11-19 15:21:05 +08:00
congqixia	55bfd610b6	enhance: [StorageV2] Integrate FFI interface for packed reader (#45132 ) Related to #44956 Integrate the StorageV2 FFI interface as the unified storage layer for reading packed columnar data, replacing the custom iterative reader with a manifest-based approach using the milvus-storage library. Changes: - Add C++ FFI reader implementation (ffi_reader_c.cpp/h) with Arrow C Stream interface - Implement utility functions to convert CStorageConfig to milvus-storage Properties - Create ManifestReader in Go that generates manifests from binlogs - Add FFI packed reader CGO bindings (packed_reader_ffi.go) - Refactor NewBinlogRecordReader to use ManifestReader for V2 storage - Support both manifest file paths and direct manifest content - Enable configurable buffer sizes and column projection Technical improvements: - Zero-copy data exchange using Arrow C Data Interface - Optimized I/O operations through milvus-storage library - Simplified code path with manifest-based reading - Better performance with batched streaming reads --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-11-05 19:57:34 +08:00
XuanYang-cn	a3bdabb328	enhance: Unify compaction executor task state management (#44721 ) Remove stopTask. Replace multiple task tracking maps with single unified taskState map. Fix slot tracking, improve state transitions, and add comprehensive test See also: #44714 --------- Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-10-11 17:53:57 +08:00
congqixia	e83c7e0c92	fix: Use eventually & fix task id appear in both executing&completed (#44698 ) Related to #44620 This PR: - Use eventually instead of `time.Sleep` in accesslog writer unit test - Make sure compaction task results have only one state from executor API Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-10-10 10:05:56 +08:00
yihao.dai	f61952adfc	fix: Fix compaction task blocking due to executor loop exit (#44543 ) 1. Use goroutine pool instead of sem. 2. Remove compaction executor from pipeline, since in streaming mode pipeline should be decoupled from compaction. issue: https://github.com/milvus-io/milvus/issues/44541 Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-09-28 11:03:04 +08:00
XuanYang-cn	3e0d77eb8f	fix: pooling datanode upload without rootpath in L0 compaction (#44374 ) See also: #44289 --------- Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-09-16 18:10:00 +08:00
cai.zhang	76f6768ea1	enhance: Remove timeout for compaction task (#44277 ) issue: #44272 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-09-15 11:03:58 +08:00
congqixia	64140d696f	enhance: Resolve PR conflict #44282 #44253 (#44302 ) Resolve conflict between pr #44282 and #44253 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-09-10 19:13:56 +08:00
XuanYang-cn	c5a8aace18	fix: Pooling datanode decompress statslog without rootpath (#44288 ) This bug makes pooling datanode unable to execute L0 compactions See also: #44289 Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-09-10 16:11:56 +08:00
congqixia	f5618d5153	enhance: [StorageV2] Utilized advance split policy and persist in meta (#44282 ) Related to #44257 This PR: - Utilize configurable split policy for storage v2, enabling system field policy - Store split result in field binlog struct - Adapt legacy binlog without child fields --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-09-10 14:47:57 +08:00
Bingyi Sun	08a2678f07	feat: add a namespace compactor (#44253 ) #44011 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-09-10 14:45:56 +08:00
cai.zhang	fb43651a74	fix: Fix MultiSegmentWrite only write one segment (#44256 ) issue: #44254 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-09-09 14:28:10 +08:00
Bingyi Sun	ef392bb1b2	enhance: merge sort support multiple fields (#44191 ) issue: #44011 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-09-04 19:53:53 +08:00
Bingyi Sun	6624011927	enhance: storage sort can sort by multiple fields (#43994 ) https://github.com/milvus-io/milvus/issues/44011 this is to support compaction that sorts records by partition key and pk in the future --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-09-03 10:11:52 +08:00
XuanYang-cn	09b29a88aa	enhance: Remove not inused allocator (#43821 ) See also: #44039 --------- Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-08-27 14:31:50 +08:00
XuanYang-cn	37a447d166	feat: Add CMEK cipher plugin (#43722 ) 1. Enable Milvus to read cipher configs 2. Enable cipher plugin in binlog reader and writer 3. Add a testCipher for unittests 4. Support pooling for datanode 5. Add encryption in storagev2 See also: #40321 Signed-off-by: yangxuan <xuan.yang@zilliz.com> --------- Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-08-27 11:15:52 +08:00
Ted Xu	e37cd19da2	enhance: enable storage v2 by default (#43652 ) Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-08-01 08:59:36 +08:00
sthuang	a2c7ed2780	fix: [StorageV2] sort field binlogs paths for packed reader and writer (#43585 ) key changes: * fix unstable storage v2 compaction unit test by guaranteeing the order of paths during sync. * bump milvus-storage version, include https://github.com/milvus-io/milvus-storage/pull/222 https://github.com/milvus-io/milvus-storage/pull/223 https://github.com/milvus-io/milvus-storage/pull/224 https://github.com/milvus-io/milvus-storage/pull/225 https://github.com/milvus-io/milvus-storage/pull/226 * Also fix the below related oom issue. related: https://github.com/milvus-io/milvus/issues/43310 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-07-30 08:09:36 +08:00
Spade A	faeb7fd410	feat: impl StructArray -- create schema, insert, and retrieve data (#42855 ) Ref https://github.com/milvus-io/milvus/issues/42148 https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of storage for handling with VectorArray. This PR: 1. impls the go part of storage for VectorArray 2. impls the collection creation with StructArrayField and VectorArray 3. insert and retrieve data from the collection. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>	2025-07-27 01:30:55 +08:00
Ted Xu	9041bf1b9a	fix: including shouldCopy parameter in file readers (#43578 ) This parameter determines whether the returned value should be a copy or a reference from the arrow array. The updates enhance memory management and provide more control over data handling during deserialization. See #43186 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-07-26 17:30:55 +08:00
Ted Xu	078ccf5e08	fix: the underlying record got released in clustering compaction (#43551 ) See: #43186 In this PR: 1. Flush renamed to FlushChunk, while a new Flush primitive is introduced to serialize values to records. 2. Segment mapping in clustering compaction now process data by records instead of values, it calls flush to all buffers after each record is processed. Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-07-25 15:04:54 +08:00
congqixia	4bdb5ccafa	fix: Close segment writer when reader returns error (#43531 ) Realted #43520 Datanode may have memory leakage when reader returns error. In previously mention issue, datanodes got OOM killed due to continueous error in read path. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-24 11:18:54 +08:00
congqixia	1cf8ed505f	fix: Implement `NeededFields` feature in `RecordReader` (#43523 ) Related to #43522 Currently, passing partial schema to storage v2 packed reader may trigger SEGV during clustering compaction unit test. This patch implement `NeededFields` differently in each `RecordReader` imlementation. For now, v2 will implemented as no-op. This will be supported after packed reader support this API. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-24 00:22:54 +08:00
Zhen Ye	e9ab73e93d	enhance: add schema version at recovery storage (#43500 ) issue: #43072, #43289 - manage the schema version at recovery storage. - update the schema when creating collection or alter schema. - get schema at write buffer based on version. - recover the schema when upgrading from 2.5. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-07-23 21:38:54 +08:00
cai.zhang	e26a532504	enhance: Only download necessary fields during clustering analyze phase (#43322 ) issue: #43310 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-07-22 16:40:52 +08:00
yihao.dai	b69e601fe1	fix: [StorageV2] Correct read and write buffer size (#43335 ) Correct read and buffer size to 64MB to prevent OOM during clustering compaction. issue: https://github.com/milvus-io/milvus/issues/43310 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-07-16 14:28:52 +08:00
congqixia	5a9efb3f81	enhance: [StorageV2] Refine storage rw option usage & validation (#43175 ) Related to #39173 This PR: - Make all datanode task passes storage config via storage config option - Remove legacy comments, rootPath & bucketName parameters - Fix clustering compaction option behavior - Add validation logic for `rwOptions` - Use correct storageType from storageConfig - Add storage config in sync task --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-11 01:14:48 +08:00
cai.zhang	3ffd44f302	fix: Fix remaining issues with Datanode pooling and StorageV2 (#43147 ) issue: #43146 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-07-10 14:26:48 +08:00
cai.zhang	6989e18599	enhance: Move sort stats task to sort compaction (#42562 ) issue: #42560 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-07-08 20:22:47 +08:00
congqixia	ab818dcbca	fix: [StorageV2] Pass storage config for compaction rw (#43167 ) Related to #43148 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-07 15:32:46 +08:00
congqixia	d09764508a	fix: [Storagev2] Close segment readers in mergeSort (#43116 ) Related to #43062 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-04 23:56:44 +08:00
Zhen Ye	08fff353af	fix: Revert "enhance: Enable mergeSort by default starting from version 2.6.0 (#42981 )" (#43046 ) issue: #43034 - implementation of mergeSortMultipleSegments is wrong. Signed-off-by: chyezh <chyezh@outlook.com>	2025-07-01 17:30:29 +08:00
sthuang	ed5dbf3eaa	enhance: [StorageV2] sync separate vector datatype into its own column group (#42638 ) related: #39173 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-06-16 11:48:37 +08:00
congqixia	a9aaa86193	enhance: [StorageV2] Pass bucket name for compaction readers (#42607 ) Related to #39173 Like logic in #41919, storage v2 fs shall use complete paths with bucketName prefix to be compatible with its definition. This PR fills bucket name from config when creating reader for compaction tasks. NOTE: the bucket name shall be read from task params config for compaction task pooling. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-06-10 10:20:35 +08:00
yihao.dai	e0113b375e	fix: Fix sort stats generates large binlogs (#42456 ) Remove the hardcoded batchSize of 100,000 and instead trigger a write every 64MB based on actual data size. This prevents sort stats from generating excessively large binlog files. issue: https://github.com/milvus-io/milvus/issues/42400 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-06-04 09:56:39 +08:00
cai.zhang	80fe573c76	enhance: Pass the compaction configuration through request parameters (#41979 ) issue: #41123 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-05-26 11:52:27 +08:00
yihao.dai	142bd2fc05	enhance: Pooling for data tasks (#41256 ) 1. Add global scheduler for datacoord. 2. Define and implement new CreateTask, QueryTask, DropTask interfaces. 3. Refine Import, Compaction, Stats, Index task. issue: https://github.com/milvus-io/milvus/issues/41123 Co-authored-by: Cai Zhang <cai.zhang@zilliz.com>	2025-05-20 21:06:24 +08:00
yihao.dai	65dd3982d8	fix: Fix ants.Pool goroutine leak (#41892 ) 1. Release the pool after it is no longer in use. 2. Upgrade ants.Pool to fix the goroutine leak issue (see [PR #287](https://github.com/panjf2000/ants/pull/287)). issue: https://github.com/milvus-io/milvus/issues/41838 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-05-19 17:56:22 +08:00
congqixia	b8d7045539	enhance: [Add Field] Use consistent schema for single buffer (#41891 ) Related to #41873 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-05-17 19:46:22 +08:00
congqixia	a6d09ff4cd	enhance: [StorageV2] fix issues integrating basic RW operations (#41834 ) Related to #39173 This PR: - Upgrade milvus-storage commit to fix filesystem finalized issue - Add bucket-name as prefix for all fs style access io - Initial arrow fs on querynodes startup - Fix timestamp access when loading sealed segment --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-05-15 09:52:23 +08:00
yihao.dai	36e9e41627	fix: Fix no candidate segments error for small import (#41771 ) When autoID is enabled, the preimport task estimates row distribution by evenly dividing the total row count (numRows) across all vchannels: `estimatedCount = numRows / vchannelNum`. However, the actual import task hashes real auto-generated IDs to determine the target vchannel. This mismatch can lead to inaccurate row distribution estimation in such corner cases: - Importing 1 row into 2 vchannels: • Preimport: 1 / 2 = 0 → both v0 and v1 are estimated to have 0 rows • Import: real autoID (e.g., 457975852966809057) hashes to v1 → actual result: v0 = 0, v1 = 1 To resolve such corner case, we now allocate at least one segment for each vchannel when autoID is enabled, ensuring all vchannels are prepared to receive data even if no rows are estimated for them. issue: https://github.com/milvus-io/milvus/issues/41759 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-05-14 15:30:21 +08:00
SimFG	91d40fa558	fix: Update logging context and upgrade dependencies (#41318 ) - issue: #41291 --------- Signed-off-by: SimFG <bang.fu@zilliz.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2025-04-23 10:52:38 +08:00
yihao.dai	dccfc69660	enhance: Get compaction params from request (#41125 ) Make DataNode use compaction parameters from request instead of configuration. issue: https://github.com/milvus-io/milvus/issues/41123 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-04-15 10:28:53 +08:00
Ted Xu	1bcea2a775	fix: assigning the correct storage version in sync and index tasks (#41093 ) See #39663 #40667 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-04-08 10:14:25 +08:00
wei liu	bf8547578f	fix: Address manual balance and balance check issues (#41037 ) issue: #37651 - Fix context propagation for manual balance segment task creation from PR #38080. - Optimize stopping balance by preventing redundant checks per round, addressing performance regression from PR #40297. - Decrease default `checkBalanceInterval` from 3000ms to 300ms. - Correct minor log messages in `BalanceChecker`. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-04-03 15:48:27 +08:00
yihao.dai	5b78ef0a49	fix: Fix delete data loss due to duplicate binlogID (#40960 ) With concurrenct L0 compaction (https://github.com/milvus-io/milvus/pull/36816), delta logs might be written to the same L1 segment, causing logID duplication when using the incremental beginLogID. This PR removes the beginLogID mechanism and instead passes a log ID range, where the number of IDs in the range equals the number of compaction segment binlogs multiplied by an expansion factor. issue: https://github.com/milvus-io/milvus/issues/40207 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-04-01 10:36:22 +08:00
Ted Xu	128efaa3e3	enhance: simplify size calculation in file writers (#40808 ) See: #40342 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-03-26 20:04:22 +08:00
sthuang	d7df78a6c9	feat: Storage v2 compaction (#40667 ) - Feat: Support Mix compaction. Covering tests include compatibility and rollback ability. - Read v1 segments and compact with v2 format. - Read both v1 and v2 segments and compact with v2 format. - Read v2 segments and compact with v2 format. - Compact with duplicate primary key test. - Compact with bm25 segments. - Compact with merge sort segments. - Compact with no expiration segments. - Compact with lack binlog segments. - Compact with nullable field segments. - Feat: Support Clustering compaction. Covering tests include compatibility and rollback ability. - Read v1 segments and compact with v2 format. - Read both v1 and v2 segments and compact with v2 format. - Read v2 segments and compact with v2 format. - Compact bm25 segments with v2 format. - Compact with memory limit. - Enhance: Use serdeMap serialize in BuildRecord function to support all Milvus data types. related: #39173 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-03-21 10:16:12 +08:00
Ted Xu	be86d31ea3	feat: compaction to support add field (#40415 ) See: #39718 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-03-18 11:32:12 +08:00

1 2

53 Commits