Related #44956
This commit integrates the Storage V2 FFI (Foreign Function Interface)
interface throughout the Milvus codebase, enabling unified storage
access through the Loon FFI layer. This is a significant step towards
standardizing storage operations across different storage versions.
1. Configuration Support
- **configs/milvus.yaml**: Added `useLoonFFI` configuration flag under
`common.storage.file.splitByAvgSize` section
- Allows runtime toggle between traditional binlog readers and new
FFI-based manifest readers
- Default: `false` (maintains backward compatibility)
2. Core FFI Infrastructure
Enhanced Utilities (internal/core/src/storage/loon_ffi/util.cpp/h)
- **ToCStorageConfig()**: Converts Go's `StorageConfig` to C's
`CStorageConfig` struct for FFI calls
- **GetManifest()**: Parses manifest JSON and retrieves latest column
groups using FFI
- Accepts manifest path with `base_path` and `ver` fields
- Calls `get_latest_column_groups()` FFI function
- Returns column group information as string
- Comprehensive error handling for JSON parsing and FFI errors
3. Dependency Updates
- **internal/core/thirdparty/milvus-storage/CMakeLists.txt**:
- Updated milvus-storage version from `0883026` to `302143c`
- Ensures compatibility with latest FFI interfaces
4. Data Coordinator Changes
All compaction task builders now include manifest path in segment
binlogs:
- **compaction_task_clustering.go**: Added `Manifest:
segInfo.GetManifestPath()` to segment binlogs
- **compaction_task_l0.go**: Added manifest path to both L0 segment
selection and compaction plan building
- **compaction_task_mix.go**: Added manifest path to mixed compaction
segment binlogs
- **meta.go**: Updated metadata completion logic:
- `completeClusterCompactionMutation()`: Set `ManifestPath` in new
segment info
- `completeMixCompactionMutation()`: Preserve manifest path in compacted
segments
- `completeSortCompactionMutation()`: Include manifest path in sorted
segments
5. Data Node Compactor Enhancements
All compactors updated to support dual-mode reading (binlog vs
manifest):
6. Flush & Sync Manager Updates
Pack Writer V2 (pack_writer_v2.go)
- **BulkPackWriterV2.Write()**: Extended return signature to include
`manifest string`
- Implementation:
- Generate manifest path: `path.Join(pack.segmentID, "manifest.json")`
- Write packed data using FFI-based writer
- Return manifest path along with binlogs, deltas, and stats
Task Handling (task.go)
- Updated all sync task result handling to accommodate new manifest
return value
- Ensured backward compatibility for callers not using manifest
7. Go Storage Layer Integration
New Interfaces and Implementations
- **record_reader.go**: Interface for unified record reading across
storage versions
- **record_writer.go**: Interface for unified record writing across
storage versions
- **binlog_record_writer.go**: Concrete implementation for traditional
binlog-based writing
Enhanced Schema Support (schema.go, schema_test.go)
- Schema conversion utilities to support FFI-based storage operations
- Ensures proper Arrow schema mapping for V2 storage
Serialization Updates
- **serde.go, serde_events.go, serde_events_v2.go**: Updated to work
with new reader/writer interfaces
- Test files updated to validate dual-mode serialization
8. Storage V2 Packed Format
FFI Common (storagev2/packed/ffi_common.go)
- Common FFI utilities and type conversions for packed storage format
Packed Writer FFI (storagev2/packed/packed_writer_ffi.go)
- FFI-based implementation of packed writer
- Integrates with Loon storage layer for efficient columnar writes
Packed Reader FFI (storagev2/packed/packed_reader_ffi.go)
- Already existed, now complemented by writer implementation
9. Protocol Buffer Updates
data_coord.proto & datapb/data_coord.pb.go
- Added `manifest` field to compaction segment messages
- Enables passing manifest metadata through compaction pipeline
worker.proto & workerpb/worker.pb.go
- Added compaction parameter for `useLoonFFI` flag
- Allows workers to receive FFI configuration from coordinator
10. Parameter Configuration
component_param.go
- Added `UseLoonFFI` parameter to compaction configuration
- Reads from `common.storage.file.useLoonFFI` config path
- Default: `false` for safe rollout
11. Test Updates
- **clustering_compactor_storage_v2_test.go**: Updated signatures to
handle manifest return value
- **mix_compactor_storage_v2_test.go**: Updated test helpers for
manifest support
- **namespace_compactor_test.go**: Adjusted writer calls to expect
manifest
- **pack_writer_v2_test.go**: Validated manifest generation in pack
writing
This integration follows a **dual-mode approach**:
1. **Legacy Path**: Traditional binlog-based reading/writing (when
`useLoonFFI=false` or no manifest)
2. **FFI Path**: Manifest-based reading/writing through Loon FFI (when
`useLoonFFI=true` and manifest exists)
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Sliced arrow arrays "incorrectly" returned the original array's size via
SizeInBytes(), causing inaccurate memory estimates during compaction.
This resulted in segments closing prematurely in mergeSplit mode -
expected 500MB compactions produced 4x100+MB segments instead.
Fixed by calculating actual byte size of sliced arrays, ensuring proper
segment sizing and more accurate memory usage tracking.
See also: #45293
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/42148
Optimized from
Go VectorArray → VectorArray Proto → Binary → C++ VectorArray Proto →
C++ VectorArray local impl → Memory
to
Go VectorArray → Arrow ListArray → Memory
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Ref https://github.com/milvus-io/milvus/issues/42148https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of
storage for handling with VectorArray.
This PR:
1. impls the go part of storage for VectorArray
2. impls the collection creation with StructArrayField and VectorArray
3. insert and retrieve data from the collection.
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>
This parameter determines whether the returned value should be a copy or
a reference from the arrow array. The updates enhance memory management
and provide more control over data handling during deserialization.
See #43186
---------
Signed-off-by: Ted Xu <ted.xu@zilliz.com>
Fix issues in end-to-end tests:
1. **Split column groups based on schema**, rather than estimating by
average chunk row size. **Ensure column group consistency within a
segment**, to avoid errors caused by loading multiple column group
chunks simultaneously.
2. **Use sorted segmentId** when generating the stats binlog path, to
ensure consistent and correct file path resolution.
3. **Determine field IDs as follows**:
For multi-column column groups, retrieve the field ID list from
metadata.
For single-column column groups, use the column group ID directly as the
field ID.
related: #39173fix: #42862
---------
Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
- Feat: Support Mix compaction. Covering tests include compatibility and
rollback ability.
- Read v1 segments and compact with v2 format.
- Read both v1 and v2 segments and compact with v2 format.
- Read v2 segments and compact with v2 format.
- Compact with duplicate primary key test.
- Compact with bm25 segments.
- Compact with merge sort segments.
- Compact with no expiration segments.
- Compact with lack binlog segments.
- Compact with nullable field segments.
- Feat: Support Clustering compaction. Covering tests include
compatibility and rollback ability.
- Read v1 segments and compact with v2 format.
- Read both v1 and v2 segments and compact with v2 format.
- Read v2 segments and compact with v2 format.
- Compact bm25 segments with v2 format.
- Compact with memory limit.
- Enhance: Use serdeMap serialize in BuildRecord function to support all
Milvus data types.
related: #39173
Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
issue: #34123
Benchmark case: The benchmark run the go benchmark function
`BenchmarkDeltalogFormat` which is put in the Files changed. It tests
the performance of serializing and deserializing from two different data
formats under a 10 million delete log dataset.
Metrics: The benchmarks measure the average time taken per operation
(ns/op), memory allocated per operation (MB/op), and the number of
memory allocations per operation (allocs/op).
| Test Name | Avg Time (ns/op) | Time Comparison | Memory Allocation
(MB/op) | Memory Comparison | Allocation Count (allocs/op) | Allocation
Comparison |
|---------------------------------|------------------|-----------------|---------------------------|-------------------|------------------------------|------------------------|
| one_string_format_reader | 2,781,990,000 | Baseline | 2,422 | Baseline
| 20,336,539 | Baseline |
| pk_ts_separate_format_reader | 480,682,639 | -82.72% | 1,765 | -27.14%
| 20,396,958 | +0.30% |
| one_string_format_writer | 5,483,436,041 | Baseline | 13,900 |
Baseline | 70,057,473 | Baseline |
| pk_and_ts_separate_format_writer| 798,591,584 | -85.43% | 2,178 |
-84.34% | 30,270,488 | -56.78% |
Both read and write operations show significant improvements in both
speed and memory allocation.
Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>