milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2025-12-06 17:18:35 +08:00

Author	SHA1	Message	Date
congqixia	c01fd94a6a	enhance: integrate Storage V2 FFI interface for unified storage access (#45723 ) Related #44956 This commit integrates the Storage V2 FFI (Foreign Function Interface) interface throughout the Milvus codebase, enabling unified storage access through the Loon FFI layer. This is a significant step towards standardizing storage operations across different storage versions. 1. Configuration Support - configs/milvus.yaml: Added `useLoonFFI` configuration flag under `common.storage.file.splitByAvgSize` section - Allows runtime toggle between traditional binlog readers and new FFI-based manifest readers - Default: `false` (maintains backward compatibility) 2. Core FFI Infrastructure Enhanced Utilities (internal/core/src/storage/loon_ffi/util.cpp/h) - ToCStorageConfig(): Converts Go's `StorageConfig` to C's `CStorageConfig` struct for FFI calls - GetManifest(): Parses manifest JSON and retrieves latest column groups using FFI - Accepts manifest path with `base_path` and `ver` fields - Calls `get_latest_column_groups()` FFI function - Returns column group information as string - Comprehensive error handling for JSON parsing and FFI errors 3. Dependency Updates - internal/core/thirdparty/milvus-storage/CMakeLists.txt: - Updated milvus-storage version from `0883026` to `302143c` - Ensures compatibility with latest FFI interfaces 4. Data Coordinator Changes All compaction task builders now include manifest path in segment binlogs: - compaction_task_clustering.go: Added `Manifest: segInfo.GetManifestPath()` to segment binlogs - compaction_task_l0.go: Added manifest path to both L0 segment selection and compaction plan building - compaction_task_mix.go: Added manifest path to mixed compaction segment binlogs - meta.go: Updated metadata completion logic: - `completeClusterCompactionMutation()`: Set `ManifestPath` in new segment info - `completeMixCompactionMutation()`: Preserve manifest path in compacted segments - `completeSortCompactionMutation()`: Include manifest path in sorted segments 5. Data Node Compactor Enhancements All compactors updated to support dual-mode reading (binlog vs manifest): 6. Flush & Sync Manager Updates Pack Writer V2 (pack_writer_v2.go) - BulkPackWriterV2.Write(): Extended return signature to include `manifest string` - Implementation: - Generate manifest path: `path.Join(pack.segmentID, "manifest.json")` - Write packed data using FFI-based writer - Return manifest path along with binlogs, deltas, and stats Task Handling (task.go) - Updated all sync task result handling to accommodate new manifest return value - Ensured backward compatibility for callers not using manifest 7. Go Storage Layer Integration New Interfaces and Implementations - record_reader.go: Interface for unified record reading across storage versions - record_writer.go: Interface for unified record writing across storage versions - binlog_record_writer.go: Concrete implementation for traditional binlog-based writing Enhanced Schema Support (schema.go, schema_test.go) - Schema conversion utilities to support FFI-based storage operations - Ensures proper Arrow schema mapping for V2 storage Serialization Updates - serde.go, serde_events.go, serde_events_v2.go: Updated to work with new reader/writer interfaces - Test files updated to validate dual-mode serialization 8. Storage V2 Packed Format FFI Common (storagev2/packed/ffi_common.go) - Common FFI utilities and type conversions for packed storage format Packed Writer FFI (storagev2/packed/packed_writer_ffi.go) - FFI-based implementation of packed writer - Integrates with Loon storage layer for efficient columnar writes Packed Reader FFI (storagev2/packed/packed_reader_ffi.go) - Already existed, now complemented by writer implementation 9. Protocol Buffer Updates data_coord.proto & datapb/data_coord.pb.go - Added `manifest` field to compaction segment messages - Enables passing manifest metadata through compaction pipeline worker.proto & workerpb/worker.pb.go - Added compaction parameter for `useLoonFFI` flag - Allows workers to receive FFI configuration from coordinator 10. Parameter Configuration component_param.go - Added `UseLoonFFI` parameter to compaction configuration - Reads from `common.storage.file.useLoonFFI` config path - Default: `false` for safe rollout 11. Test Updates - clustering_compactor_storage_v2_test.go: Updated signatures to handle manifest return value - mix_compactor_storage_v2_test.go: Updated test helpers for manifest support - namespace_compactor_test.go: Adjusted writer calls to expect manifest - pack_writer_v2_test.go: Validated manifest generation in pack writing This integration follows a dual-mode approach: 1. Legacy Path: Traditional binlog-based reading/writing (when `useLoonFFI=false` or no manifest) 2. FFI Path: Manifest-based reading/writing through Loon FFI (when `useLoonFFI=true` and manifest exists) --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-11-24 19:57:07 +08:00
XuanYang-cn	623a9e5156	fix: Accurate size estimation for sliced arrow arrays in compaction (#45294 ) Sliced arrow arrays "incorrectly" returned the original array's size via SizeInBytes(), causing inaccurate memory estimates during compaction. This resulted in segments closing prematurely in mergeSplit mode - expected 500MB compactions produced 4x100+MB segments instead. Fixed by calculating actual byte size of sliced arrays, ensuring proper segment sizing and more accurate memory usage tracking. See also: #45293 Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-11-06 14:57:34 +08:00
Spade A	c4f3f0ce4c	feat: impl StructArray -- support more types of vector in STRUCT (#44736 ) ref: https://github.com/milvus-io/milvus/issues/42148 --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-10-15 10:25:59 +08:00
Spade A	7cb15ef141	feat: impl StructArray -- optimize vector array serialization (#44035 ) issue: https://github.com/milvus-io/milvus/issues/42148 Optimized from Go VectorArray → VectorArray Proto → Binary → C++ VectorArray Proto → C++ VectorArray local impl → Memory to Go VectorArray → Arrow ListArray → Memory --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-09-03 16:39:53 +08:00
Spade A	faeb7fd410	feat: impl StructArray -- create schema, insert, and retrieve data (#42855 ) Ref https://github.com/milvus-io/milvus/issues/42148 https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of storage for handling with VectorArray. This PR: 1. impls the go part of storage for VectorArray 2. impls the collection creation with StructArrayField and VectorArray 3. insert and retrieve data from the collection. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>	2025-07-27 01:30:55 +08:00
Ted Xu	9041bf1b9a	fix: including shouldCopy parameter in file readers (#43578 ) This parameter determines whether the returned value should be a copy or a reference from the arrow array. The updates enhance memory management and provide more control over data handling during deserialization. See #43186 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-07-26 17:30:55 +08:00
sthuang	238bd30f42	fix: [StorageV2] end to end minor issues for sync, stats, and load (#42948 ) Fix issues in end-to-end tests: 1. Split column groups based on schema, rather than estimating by average chunk row size. Ensure column group consistency within a segment, to avoid errors caused by loading multiple column group chunks simultaneously. 2. Use sorted segmentId when generating the stats binlog path, to ensure consistent and correct file path resolution. 3. Determine field IDs as follows: For multi-column column groups, retrieve the field ID list from metadata. For single-column column groups, use the column group ID directly as the field ID. related: #39173 fix: #42862 --------- Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-06-27 14:44:42 +08:00
sthuang	ed5dbf3eaa	enhance: [StorageV2] sync separate vector datatype into its own column group (#42638 ) related: #39173 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-06-16 11:48:37 +08:00
sthuang	9439eaef52	fix: [StorageV2] sync with int8 vector data type core dumped (#42616 ) related: https://github.com/milvus-io/milvus/issues/42613, #39173 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-06-10 11:42:35 +08:00
XuanYang-cn	540456041f	enhance: Remove not inuse binlog iterator (#41359 ) See also: #41466 Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-04-24 12:04:38 +08:00
Ted Xu	128efaa3e3	enhance: simplify size calculation in file writers (#40808 ) See: #40342 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-03-26 20:04:22 +08:00
sthuang	d7df78a6c9	feat: Storage v2 compaction (#40667 ) - Feat: Support Mix compaction. Covering tests include compatibility and rollback ability. - Read v1 segments and compact with v2 format. - Read both v1 and v2 segments and compact with v2 format. - Read v2 segments and compact with v2 format. - Compact with duplicate primary key test. - Compact with bm25 segments. - Compact with merge sort segments. - Compact with no expiration segments. - Compact with lack binlog segments. - Compact with nullable field segments. - Feat: Support Clustering compaction. Covering tests include compatibility and rollback ability. - Read v1 segments and compact with v2 format. - Read both v1 and v2 segments and compact with v2 format. - Read v2 segments and compact with v2 format. - Compact bm25 segments with v2 format. - Compact with memory limit. - Enhance: Use serdeMap serialize in BuildRecord function to support all Milvus data types. related: #39173 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-03-21 10:16:12 +08:00
Ted Xu	df4285c9ef	enhance: API integration with storage v2 in clustering-compactions (#40133 ) See #39173 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-03-13 14:12:06 +08:00
sthuang	90acc8a58f	enhance: upgrade go arrow version from 12.0.1 to 17.0.0 (#39916 ) related: https://github.com/milvus-io/milvus/issues/39915 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-02-25 10:30:02 +08:00
congqixia	cb7f2fa6fd	enhance: Use v2 package name for pkg module (#39990 ) Related to #39095 https://go.dev/doc/modules/version-numbers Update pkg version according to golang dep version convention --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-02-22 23:15:58 +08:00
Ted Xu	2978b0890e	enhance: iterative download data during compaction to reduce memory cost (#39724 ) See #37234 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-02-13 10:36:47 +08:00
Ted Xu	427b6a4c94	enhance: reduce stats task cost by skipping ser/de (#39568 ) See #37234 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-02-06 17:14:45 +08:00
zhagnlu	6ee94d00b9	fix:fix calculate arrow nest type and add ut (#38527 ) #37767 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-12-18 11:54:44 +08:00
shaoting-huang	f4dd7c7efb	enhance: add delta log stream new format reader and writer (#34116 ) issue: #34123 Benchmark case: The benchmark run the go benchmark function `BenchmarkDeltalogFormat` which is put in the Files changed. It tests the performance of serializing and deserializing from two different data formats under a 10 million delete log dataset. Metrics: The benchmarks measure the average time taken per operation (ns/op), memory allocated per operation (MB/op), and the number of memory allocations per operation (allocs/op). \| Test Name \| Avg Time (ns/op) \| Time Comparison \| Memory Allocation (MB/op) \| Memory Comparison \| Allocation Count (allocs/op) \| Allocation Comparison \| \|---------------------------------\|------------------\|-----------------\|---------------------------\|-------------------\|------------------------------\|------------------------\| \| one_string_format_reader \| 2,781,990,000 \| Baseline \| 2,422 \| Baseline \| 20,336,539 \| Baseline \| \| pk_ts_separate_format_reader \| 480,682,639 \| -82.72% \| 1,765 \| -27.14% \| 20,396,958 \| +0.30% \| \| one_string_format_writer \| 5,483,436,041 \| Baseline \| 13,900 \| Baseline \| 70,057,473 \| Baseline \| \| pk_and_ts_separate_format_writer\| 798,591,584 \| -85.43% \| 2,178 \| -84.34% \| 30,270,488 \| -56.78% \| Both read and write operations show significant improvements in both speed and memory allocation. Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2024-07-06 09:08:09 +08:00
Ted Xu	6d5747cb3e	feat: adding deltalog stream reader and writer (#33844 ) See #31679 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-06-19 14:42:01 +08:00
Ted Xu	066c8ea175	feat: stream reader/writer to support nulls (#33080 ) See: #31728 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-05-27 16:27:42 +08:00
Ted Xu	a8bd9bea39	fix: adding blob memory size in binlog serde (#33324 ) See: #33280 Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-05-24 10:33:40 +08:00
Ted Xu	a9c7ce72b8	enhance: enable stream writer in compactions (#32612 ) See #31679 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-05-17 15:05:37 +08:00
Ted Xu	dc5ea6f17c	feat: adding binlog streaming writer (#31537 ) See #31679 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-04-11 10:33:20 +08:00
Buqian Zheng	d7dbc3c9d8	fix: [sparse float vector] support the new streaming deserialize reader (#31325 ) issue: https://github.com/milvus-io/milvus/issues/31324 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2024-03-17 13:59:04 +08:00
Ted Xu	987d9023a5	enhance: Enable binlog deserialize reader in datanode compaction (#31036 ) See #30863 Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-03-08 18:25:02 +08:00
Ted Xu	71adafa933	enhance: adding a streaming deserialize reader for binlogs (#30860 ) See #30863 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-03-04 19:31:09 +08:00

27 Commits