Related #44956
This commit integrates the Storage V2 FFI (Foreign Function Interface)
interface throughout the Milvus codebase, enabling unified storage
access through the Loon FFI layer. This is a significant step towards
standardizing storage operations across different storage versions.
1. Configuration Support
- **configs/milvus.yaml**: Added `useLoonFFI` configuration flag under
`common.storage.file.splitByAvgSize` section
- Allows runtime toggle between traditional binlog readers and new
FFI-based manifest readers
- Default: `false` (maintains backward compatibility)
2. Core FFI Infrastructure
Enhanced Utilities (internal/core/src/storage/loon_ffi/util.cpp/h)
- **ToCStorageConfig()**: Converts Go's `StorageConfig` to C's
`CStorageConfig` struct for FFI calls
- **GetManifest()**: Parses manifest JSON and retrieves latest column
groups using FFI
- Accepts manifest path with `base_path` and `ver` fields
- Calls `get_latest_column_groups()` FFI function
- Returns column group information as string
- Comprehensive error handling for JSON parsing and FFI errors
3. Dependency Updates
- **internal/core/thirdparty/milvus-storage/CMakeLists.txt**:
- Updated milvus-storage version from `0883026` to `302143c`
- Ensures compatibility with latest FFI interfaces
4. Data Coordinator Changes
All compaction task builders now include manifest path in segment
binlogs:
- **compaction_task_clustering.go**: Added `Manifest:
segInfo.GetManifestPath()` to segment binlogs
- **compaction_task_l0.go**: Added manifest path to both L0 segment
selection and compaction plan building
- **compaction_task_mix.go**: Added manifest path to mixed compaction
segment binlogs
- **meta.go**: Updated metadata completion logic:
- `completeClusterCompactionMutation()`: Set `ManifestPath` in new
segment info
- `completeMixCompactionMutation()`: Preserve manifest path in compacted
segments
- `completeSortCompactionMutation()`: Include manifest path in sorted
segments
5. Data Node Compactor Enhancements
All compactors updated to support dual-mode reading (binlog vs
manifest):
6. Flush & Sync Manager Updates
Pack Writer V2 (pack_writer_v2.go)
- **BulkPackWriterV2.Write()**: Extended return signature to include
`manifest string`
- Implementation:
- Generate manifest path: `path.Join(pack.segmentID, "manifest.json")`
- Write packed data using FFI-based writer
- Return manifest path along with binlogs, deltas, and stats
Task Handling (task.go)
- Updated all sync task result handling to accommodate new manifest
return value
- Ensured backward compatibility for callers not using manifest
7. Go Storage Layer Integration
New Interfaces and Implementations
- **record_reader.go**: Interface for unified record reading across
storage versions
- **record_writer.go**: Interface for unified record writing across
storage versions
- **binlog_record_writer.go**: Concrete implementation for traditional
binlog-based writing
Enhanced Schema Support (schema.go, schema_test.go)
- Schema conversion utilities to support FFI-based storage operations
- Ensures proper Arrow schema mapping for V2 storage
Serialization Updates
- **serde.go, serde_events.go, serde_events_v2.go**: Updated to work
with new reader/writer interfaces
- Test files updated to validate dual-mode serialization
8. Storage V2 Packed Format
FFI Common (storagev2/packed/ffi_common.go)
- Common FFI utilities and type conversions for packed storage format
Packed Writer FFI (storagev2/packed/packed_writer_ffi.go)
- FFI-based implementation of packed writer
- Integrates with Loon storage layer for efficient columnar writes
Packed Reader FFI (storagev2/packed/packed_reader_ffi.go)
- Already existed, now complemented by writer implementation
9. Protocol Buffer Updates
data_coord.proto & datapb/data_coord.pb.go
- Added `manifest` field to compaction segment messages
- Enables passing manifest metadata through compaction pipeline
worker.proto & workerpb/worker.pb.go
- Added compaction parameter for `useLoonFFI` flag
- Allows workers to receive FFI configuration from coordinator
10. Parameter Configuration
component_param.go
- Added `UseLoonFFI` parameter to compaction configuration
- Reads from `common.storage.file.useLoonFFI` config path
- Default: `false` for safe rollout
11. Test Updates
- **clustering_compactor_storage_v2_test.go**: Updated signatures to
handle manifest return value
- **mix_compactor_storage_v2_test.go**: Updated test helpers for
manifest support
- **namespace_compactor_test.go**: Adjusted writer calls to expect
manifest
- **pack_writer_v2_test.go**: Validated manifest generation in pack
writing
This integration follows a **dual-mode approach**:
1. **Legacy Path**: Traditional binlog-based reading/writing (when
`useLoonFFI=false` or no manifest)
2. **FFI Path**: Manifest-based reading/writing through Loon FFI (when
`useLoonFFI=true` and manifest exists)
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #43897
- Alter collection/database is implemented by WAL-based DDL framework
now.
- Support AlterCollection/AlterDatabase in wal now.
- Alter operation can be synced by new CDC now.
- Refactor some UT for alter DDL.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
Related to #44956
Add manifest_path field throughout the data path to support LOON Storage
V2 manifest tracking. The manifest stores metadata for segment data
files and enables the unified Storage V2 FFI interface.
Changes include:
- Add manifest_path field to SegmentInfo and SaveBinlogPathsRequest
proto messages
- Add UpdateManifest operator to datacoord meta operations
- Update metacache, sync manager, and meta writer to propagate manifest
paths
- Include manifest_path in segment load info for query coordinator
This is part of the Storage V2 FFI interface integration.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
See #39173
In this PR:
- Adjusted the delta log serialization APIs.
- Refactored the stats collector to improve the collection and digest of
primary key and BM25 statistics.
- Introduced new tests for the delta log reader/writer and stats
collectors to ensure functionality and correctness.
---------
Signed-off-by: Ted Xu <ted.xu@zilliz.com>
issue: #44730
Fix the issue where logs were not outputting as expected due to
incorrect log package imports across multiple components.
Changes include:
- Add golangci-lint rule to forbid github.com/pingcap/log usage
- Replace github.com/pingcap/log with
github.com/milvus-io/milvus/pkg/v2/log
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
1. Use goroutine pool instead of sem.
2. Remove compaction executor from pipeline, since in streaming mode
pipeline should be decoupled from compaction.
issue: https://github.com/milvus-io/milvus/issues/44541
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Related to #39173
backlog issue that memory size and log size shared same value. This
patch add `GetFileSize` api to get remote compressed binlog size as meta
log file size to calculate usage more accurate.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #44257
This PR:
- Utilize configurable split policy for storage v2, enabling system
field policy
- Store split result in field binlog struct
- Adapt legacy binlog without child fields
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
1. Enable Milvus to read cipher configs
2. Enable cipher plugin in binlog reader and writer
3. Add a testCipher for unittests
4. Support pooling for datanode
5. Add encryption in storagev2
See also: #40321
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
---------
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
Ref https://github.com/milvus-io/milvus/issues/42148https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of
storage for handling with VectorArray.
This PR:
1. impls the go part of storage for VectorArray
2. impls the collection creation with StructArrayField and VectorArray
3. insert and retrieve data from the collection.
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>
issue: #43072, #43289
- manage the schema version at recovery storage.
- update the schema when creating collection or alter schema.
- get schema at write buffer based on version.
- recover the schema when upgrading from 2.5.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #43434
- the segment start position can be carried by other segment sync
operation. so the sync start position operation can happens before
insert.
- TODO: It's a wired design should be removed.
Signed-off-by: chyezh <chyezh@outlook.com>
- Introduce dynamic buffer sizing to avoid generating small binlogs
during import
- Refactor import slot calculation based on CPU and memory constraints
- Implement dynamic pool sizing for sync manager and import tasks
according to CPU core count
issue: https://github.com/milvus-io/milvus/issues/43131
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Fix issues in end-to-end tests:
1. **Split column groups based on schema**, rather than estimating by
average chunk row size. **Ensure column group consistency within a
segment**, to avoid errors caused by loading multiple column group
chunks simultaneously.
2. **Use sorted segmentId** when generating the stats binlog path, to
ensure consistent and correct file path resolution.
3. **Determine field IDs as follows**:
For multi-column column groups, retrieve the field ID list from
metadata.
For single-column column groups, use the column group ID directly as the
field ID.
related: #39173fix: #42862
---------
Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
Related to #42084
Embedding node cached schema when created, causing schema mismatch after
schema change. This PR make embeddingNode use schema from metacache,
which will be updated.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #39173
This PR:
- Upgrade milvus-storage commit to fix filesystem finalized issue
- Add bucket-name as prefix for all fs style access io
- Initial arrow fs on querynodes startup
- Fix timestamp access when loading sealed segment
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #41544
- simplify the proto message for flush and create segment.
- simplify the msg handler for flowgraph.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
Related to #39718
This PR:
- Add reopen logic for growing & sealed segments
- Lazy reopen when schema version increases
- Add FinishLoad api for loading progress
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Merge RootCoord, DataCoord And QueryCoord into MixCoord
Make Session into one
issue : https://github.com/milvus-io/milvus/issues/37764
---------
Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>