milvus/pkg/proto/segcore.proto
congqixia 03f5d7c0a5
enhance: integrate StorageV2 FFI interface for manifest-based segment loading (#45798)
Related to #44956

**New Translator (C++)**
- Added `ManifestGroupTranslator`
(`internal/core/src/segcore/storagev2translator/`)
  - Translates manifest-based column groups to Milvus internal format
  - Implements `GroupCTMeta` interface for chunk-based column access
  - Supports both memory and mmap storage modes
  - Handles cache warmup policies for vector and scalar data

**ChunkedSegmentSealedImpl**
(`internal/core/src/segcore/ChunkedSegmentSealedImpl.cpp:333`)
- Added `LoadColumnGroups(const std::string& manifest_path)`: Main entry
point for manifest-based loading
  - Creates milvus-storage Reader from manifest file
  - Parallelizes column group loading using thread pool
  - Aggregates loading exceptions and reports errors
- Added `LoadColumnGroup()`: Loads individual column group
  - Extracts field IDs from column group metadata
  - Creates ManifestGroupTranslator for each column group
  - Builds ProxyChunkColumn for field access
  - Special handling for timestamp field index construction

**SegmentGrowingImpl**
(`internal/core/src/segcore/SegmentGrowingImpl.cpp`)
- Added similar `LoadColumnGroups()` and `LoadColumnGroup()` methods for
growing segments
- Maintains consistency with sealed segment loading path

Storage FFI Utilities

**loon_ffi/util** (`internal/core/src/storage/loon_ffi/util.cpp`)
- Added `MakeInternalPropertiesFromStorageConfig()`: Converts C storage
config to internal Properties
  - Maps all storage configuration fields (S3, GCS, Azure, local)
  - Handles SSL, IAM, virtual host settings
  - Configures connection timeouts and max connections
- Added `MakeInternalLocalProperies()`: Creates local filesystem
properties
- Added `ToCStorageConfig()`: Converts Go StorageConfig to C
representation
- Added `GetColumnGroups()`: Extracts column groups from manifest file
using Transaction API

Protocol Buffer Changes

**segcore.proto** (`pkg/proto/segcore.proto:121`)
- Added `manifest_path` field to `SegmentLoadInfo` message
- Enables passing manifest file path from Go layer to C++ core

Go Integration

**segment.go** (`internal/util/segcore/segment.go:372`)
- Updated `ConvertToSegcoreSegmentLoadInfo()` to propagate
`ManifestPath` field
- Bridges QueryNode segment load info to Segcore format

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-25 17:27:07 +08:00

123 lines
2.9 KiB
Protocol Buffer

syntax = "proto3";
package milvus.proto.segcore;
option go_package = "github.com/milvus-io/milvus/pkg/v2/proto/segcorepb";
import "schema.proto";
import "common.proto";
message Binlog {
int64 entries_num = 1;
uint64 timestamp_from = 2;
uint64 timestamp_to = 3;
string log_path = 4;
int64 log_size = 5;
int64 logID = 6;
int64 memory_size = 7;
}
message FieldBinlog {
int64 fieldID = 1;
repeated Binlog binlogs = 2;
repeated int64 child_fields = 3;
}
message TextIndexStats {
int64 fieldID = 1;
int64 version = 2;
repeated string files = 3;
int64 log_size = 4;
int64 memory_size = 5;
int64 buildID = 6;
}
message JsonKeyStats {
int64 fieldID = 1;
int64 version = 2;
repeated string files = 3;
int64 log_size = 4;
int64 memory_size = 5;
int64 buildID = 6;
int64 json_key_stats_data_format = 7;
}
message RetrieveResults {
schema.IDs ids = 1;
repeated int64 offset = 2;
repeated schema.FieldData fields_data = 3;
int64 all_retrieve_count = 4;
bool has_more_result = 5;
int64 scanned_remote_bytes = 6;
int64 scanned_total_bytes = 7;
}
message LoadFieldMeta {
int64 min_timestamp = 1;
int64 max_timestamp = 2;
int64 row_count = 3;
}
message LoadSegmentMeta {
// TODOs
repeated LoadFieldMeta metas = 1;
int64 total_size = 2;
}
message InsertRecord {
repeated schema.FieldData fields_data = 1;
int64 num_rows = 2;
}
message FieldIndexMeta {
int64 fieldID = 1;
int64 collectionID = 2;
string index_name = 3;
repeated common.KeyValuePair type_params = 4;
repeated common.KeyValuePair index_params = 5;
bool is_auto_index = 6;
repeated common.KeyValuePair user_index_params = 7;
}
message CollectionIndexMeta {
int64 maxIndexRowCount = 1;
repeated FieldIndexMeta index_metas = 2;
}
message FieldIndexInfo {
int64 fieldID = 1;
bool enable_index = 2 [deprecated = true];
string index_name = 3;
int64 indexID = 4;
int64 buildID = 5;
repeated common.KeyValuePair index_params = 6;
repeated string index_file_paths = 7;
int64 index_size = 8;
int64 index_version = 9;
int64 num_rows = 10;
int32 current_index_version = 11;
int64 index_store_version = 12;
}
message SegmentLoadInfo {
int64 segmentID = 1;
int64 partitionID = 2;
int64 collectionID = 3;
int64 dbID = 4;
int64 flush_time = 5;
repeated FieldBinlog binlog_paths = 6;
int64 num_of_rows = 7;
repeated FieldBinlog statslogs = 8;
repeated FieldBinlog deltalogs = 9;
repeated int64 compactionFrom = 10; // segmentIDs compacted from
repeated FieldIndexInfo index_infos = 11;
int64 segment_size = 12 [deprecated = true];
string insert_channel = 13;
int64 readableVersion = 14;
int64 storageVersion = 15;
bool is_sorted = 16;
map<int64, TextIndexStats> textStatsLogs = 17;
repeated FieldBinlog bm25logs = 18;
map<int64, JsonKeyStats> jsonKeyStatsLogs = 19;
common.LoadPriority priority = 20;
string manifest_path = 21;
}