milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2025-12-06 09:08:43 +08:00

Author	SHA1	Message	Date
congqixia	c01fd94a6a	enhance: integrate Storage V2 FFI interface for unified storage access (#45723 ) Related #44956 This commit integrates the Storage V2 FFI (Foreign Function Interface) interface throughout the Milvus codebase, enabling unified storage access through the Loon FFI layer. This is a significant step towards standardizing storage operations across different storage versions. 1. Configuration Support - configs/milvus.yaml: Added `useLoonFFI` configuration flag under `common.storage.file.splitByAvgSize` section - Allows runtime toggle between traditional binlog readers and new FFI-based manifest readers - Default: `false` (maintains backward compatibility) 2. Core FFI Infrastructure Enhanced Utilities (internal/core/src/storage/loon_ffi/util.cpp/h) - ToCStorageConfig(): Converts Go's `StorageConfig` to C's `CStorageConfig` struct for FFI calls - GetManifest(): Parses manifest JSON and retrieves latest column groups using FFI - Accepts manifest path with `base_path` and `ver` fields - Calls `get_latest_column_groups()` FFI function - Returns column group information as string - Comprehensive error handling for JSON parsing and FFI errors 3. Dependency Updates - internal/core/thirdparty/milvus-storage/CMakeLists.txt: - Updated milvus-storage version from `0883026` to `302143c` - Ensures compatibility with latest FFI interfaces 4. Data Coordinator Changes All compaction task builders now include manifest path in segment binlogs: - compaction_task_clustering.go: Added `Manifest: segInfo.GetManifestPath()` to segment binlogs - compaction_task_l0.go: Added manifest path to both L0 segment selection and compaction plan building - compaction_task_mix.go: Added manifest path to mixed compaction segment binlogs - meta.go: Updated metadata completion logic: - `completeClusterCompactionMutation()`: Set `ManifestPath` in new segment info - `completeMixCompactionMutation()`: Preserve manifest path in compacted segments - `completeSortCompactionMutation()`: Include manifest path in sorted segments 5. Data Node Compactor Enhancements All compactors updated to support dual-mode reading (binlog vs manifest): 6. Flush & Sync Manager Updates Pack Writer V2 (pack_writer_v2.go) - BulkPackWriterV2.Write(): Extended return signature to include `manifest string` - Implementation: - Generate manifest path: `path.Join(pack.segmentID, "manifest.json")` - Write packed data using FFI-based writer - Return manifest path along with binlogs, deltas, and stats Task Handling (task.go) - Updated all sync task result handling to accommodate new manifest return value - Ensured backward compatibility for callers not using manifest 7. Go Storage Layer Integration New Interfaces and Implementations - record_reader.go: Interface for unified record reading across storage versions - record_writer.go: Interface for unified record writing across storage versions - binlog_record_writer.go: Concrete implementation for traditional binlog-based writing Enhanced Schema Support (schema.go, schema_test.go) - Schema conversion utilities to support FFI-based storage operations - Ensures proper Arrow schema mapping for V2 storage Serialization Updates - serde.go, serde_events.go, serde_events_v2.go: Updated to work with new reader/writer interfaces - Test files updated to validate dual-mode serialization 8. Storage V2 Packed Format FFI Common (storagev2/packed/ffi_common.go) - Common FFI utilities and type conversions for packed storage format Packed Writer FFI (storagev2/packed/packed_writer_ffi.go) - FFI-based implementation of packed writer - Integrates with Loon storage layer for efficient columnar writes Packed Reader FFI (storagev2/packed/packed_reader_ffi.go) - Already existed, now complemented by writer implementation 9. Protocol Buffer Updates data_coord.proto & datapb/data_coord.pb.go - Added `manifest` field to compaction segment messages - Enables passing manifest metadata through compaction pipeline worker.proto & workerpb/worker.pb.go - Added compaction parameter for `useLoonFFI` flag - Allows workers to receive FFI configuration from coordinator 10. Parameter Configuration component_param.go - Added `UseLoonFFI` parameter to compaction configuration - Reads from `common.storage.file.useLoonFFI` config path - Default: `false` for safe rollout 11. Test Updates - clustering_compactor_storage_v2_test.go: Updated signatures to handle manifest return value - mix_compactor_storage_v2_test.go: Updated test helpers for manifest support - namespace_compactor_test.go: Adjusted writer calls to expect manifest - pack_writer_v2_test.go: Validated manifest generation in pack writing This integration follows a dual-mode approach: 1. Legacy Path: Traditional binlog-based reading/writing (when `useLoonFFI=false` or no manifest) 2. FFI Path: Manifest-based reading/writing through Loon FFI (when `useLoonFFI=true` and manifest exists) --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-11-24 19:57:07 +08:00
Spade A	faeb7fd410	feat: impl StructArray -- create schema, insert, and retrieve data (#42855 ) Ref https://github.com/milvus-io/milvus/issues/42148 https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of storage for handling with VectorArray. This PR: 1. impls the go part of storage for VectorArray 2. impls the collection creation with StructArrayField and VectorArray 3. insert and retrieve data from the collection. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>	2025-07-27 01:30:55 +08:00
Ted Xu	9041bf1b9a	fix: including shouldCopy parameter in file readers (#43578 ) This parameter determines whether the returned value should be a copy or a reference from the arrow array. The updates enhance memory management and provide more control over data handling during deserialization. See #43186 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-07-26 17:30:55 +08:00
Ted Xu	be86d31ea3	feat: compaction to support add field (#40415 ) See: #39718 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-03-18 11:32:12 +08:00
Ted Xu	df4285c9ef	enhance: API integration with storage v2 in clustering-compactions (#40133 ) See #39173 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-03-13 14:12:06 +08:00
Ted Xu	878ce56079	fix: correct memory size estimation on arrays (#40312 ) See: #40342 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-03-05 16:54:09 +08:00
sthuang	90acc8a58f	enhance: upgrade go arrow version from 12.0.1 to 17.0.0 (#39916 ) related: https://github.com/milvus-io/milvus/issues/39915 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-02-25 10:30:02 +08:00
congqixia	cb7f2fa6fd	enhance: Use v2 package name for pkg module (#39990 ) Related to #39095 https://go.dev/doc/modules/version-numbers Update pkg version according to golang dep version convention --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-02-22 23:15:58 +08:00
smellthemoon	8b974c5742	enhance: support compact if lack of binlog (#40000 ) https://github.com/milvus-io/milvus/issues/39718 Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2025-02-22 10:51:56 +08:00
Ted Xu	2978b0890e	enhance: iterative download data during compaction to reduce memory cost (#39724 ) See #37234 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-02-13 10:36:47 +08:00
Ted Xu	427b6a4c94	enhance: reduce stats task cost by skipping ser/de (#39568 ) See #37234 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-02-06 17:14:45 +08:00
Ted Xu	bc9562feb1	enhance: avoid memory copy and serde in mix compaction (#37479 ) See: #37234 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-11-07 16:30:57 -08:00
Ted Xu	41646c8439	feat: integrate new deltalog format (#35522 ) See #34123 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-08-20 19:06:56 +08:00
shaoting-huang	88b373b024	enhance: binlog primary key turn off dict encoding (#34358 ) issue: #34357 Go Parquet uses dictionary encoding by default, and it will fall back to plain encoding if the dictionary size exceeds the dictionary size page limit. Users can specify custom fallback encoding by using `parquet.WithEncoding(ENCODING_METHOD)` in writer properties. However, Go Parquet [fallbacks to plain encoding](`e65c1e295d/go/parquet/file/column_writer_types.gen.go.tmpl (L238)`) rather than custom encoding method users provide. Therefore, this patch only turns off dictionary encoding for the primary key. With a 5 million auto ID primary key benchmark, the parquet file size improves from 13.93 MB to 8.36 MB when dictionary encoding is turned off, reducing primary key storage space by 40%. Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2024-07-17 17:47:44 +08:00
shaoting-huang	f4dd7c7efb	enhance: add delta log stream new format reader and writer (#34116 ) issue: #34123 Benchmark case: The benchmark run the go benchmark function `BenchmarkDeltalogFormat` which is put in the Files changed. It tests the performance of serializing and deserializing from two different data formats under a 10 million delete log dataset. Metrics: The benchmarks measure the average time taken per operation (ns/op), memory allocated per operation (MB/op), and the number of memory allocations per operation (allocs/op). \| Test Name \| Avg Time (ns/op) \| Time Comparison \| Memory Allocation (MB/op) \| Memory Comparison \| Allocation Count (allocs/op) \| Allocation Comparison \| \|---------------------------------\|------------------\|-----------------\|---------------------------\|-------------------\|------------------------------\|------------------------\| \| one_string_format_reader \| 2,781,990,000 \| Baseline \| 2,422 \| Baseline \| 20,336,539 \| Baseline \| \| pk_ts_separate_format_reader \| 480,682,639 \| -82.72% \| 1,765 \| -27.14% \| 20,396,958 \| +0.30% \| \| one_string_format_writer \| 5,483,436,041 \| Baseline \| 13,900 \| Baseline \| 70,057,473 \| Baseline \| \| pk_and_ts_separate_format_writer\| 798,591,584 \| -85.43% \| 2,178 \| -84.34% \| 30,270,488 \| -56.78% \| Both read and write operations show significant improvements in both speed and memory allocation. Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2024-07-06 09:08:09 +08:00
Ted Xu	6d5747cb3e	feat: adding deltalog stream reader and writer (#33844 ) See #31679 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-06-19 14:42:01 +08:00

16 Commits