milvus/pkg/proto/worker.proto
congqixia c01fd94a6a
enhance: integrate Storage V2 FFI interface for unified storage access (#45723)
Related #44956
This commit integrates the Storage V2 FFI (Foreign Function Interface)
interface throughout the Milvus codebase, enabling unified storage
access through the Loon FFI layer. This is a significant step towards
standardizing storage operations across different storage versions.

1. Configuration Support
- **configs/milvus.yaml**: Added `useLoonFFI` configuration flag under
`common.storage.file.splitByAvgSize` section
- Allows runtime toggle between traditional binlog readers and new
FFI-based manifest readers
  - Default: `false` (maintains backward compatibility)

2. Core FFI Infrastructure

Enhanced Utilities (internal/core/src/storage/loon_ffi/util.cpp/h)
- **ToCStorageConfig()**: Converts Go's `StorageConfig` to C's
`CStorageConfig` struct for FFI calls
- **GetManifest()**: Parses manifest JSON and retrieves latest column
groups using FFI
  - Accepts manifest path with `base_path` and `ver` fields
  - Calls `get_latest_column_groups()` FFI function
  - Returns column group information as string
  - Comprehensive error handling for JSON parsing and FFI errors

3. Dependency Updates
- **internal/core/thirdparty/milvus-storage/CMakeLists.txt**:
  - Updated milvus-storage version from `0883026` to `302143c`
  - Ensures compatibility with latest FFI interfaces

4. Data Coordinator Changes

All compaction task builders now include manifest path in segment
binlogs:

- **compaction_task_clustering.go**: Added `Manifest:
segInfo.GetManifestPath()` to segment binlogs
- **compaction_task_l0.go**: Added manifest path to both L0 segment
selection and compaction plan building
- **compaction_task_mix.go**: Added manifest path to mixed compaction
segment binlogs
- **meta.go**: Updated metadata completion logic:
- `completeClusterCompactionMutation()`: Set `ManifestPath` in new
segment info
- `completeMixCompactionMutation()`: Preserve manifest path in compacted
segments
- `completeSortCompactionMutation()`: Include manifest path in sorted
segments

5. Data Node Compactor Enhancements

All compactors updated to support dual-mode reading (binlog vs
manifest):

6. Flush & Sync Manager Updates

Pack Writer V2 (pack_writer_v2.go)
- **BulkPackWriterV2.Write()**: Extended return signature to include
`manifest string`
- Implementation:
  - Generate manifest path: `path.Join(pack.segmentID, "manifest.json")`
  - Write packed data using FFI-based writer
  - Return manifest path along with binlogs, deltas, and stats

Task Handling (task.go)
- Updated all sync task result handling to accommodate new manifest
return value
- Ensured backward compatibility for callers not using manifest

7. Go Storage Layer Integration

New Interfaces and Implementations
- **record_reader.go**: Interface for unified record reading across
storage versions
- **record_writer.go**: Interface for unified record writing across
storage versions
- **binlog_record_writer.go**: Concrete implementation for traditional
binlog-based writing

Enhanced Schema Support (schema.go, schema_test.go)
- Schema conversion utilities to support FFI-based storage operations
- Ensures proper Arrow schema mapping for V2 storage

Serialization Updates
- **serde.go, serde_events.go, serde_events_v2.go**: Updated to work
with new reader/writer interfaces
- Test files updated to validate dual-mode serialization

8. Storage V2 Packed Format

FFI Common (storagev2/packed/ffi_common.go)
- Common FFI utilities and type conversions for packed storage format

Packed Writer FFI (storagev2/packed/packed_writer_ffi.go)
- FFI-based implementation of packed writer
- Integrates with Loon storage layer for efficient columnar writes

Packed Reader FFI (storagev2/packed/packed_reader_ffi.go)
- Already existed, now complemented by writer implementation

9. Protocol Buffer Updates

data_coord.proto & datapb/data_coord.pb.go
- Added `manifest` field to compaction segment messages
- Enables passing manifest metadata through compaction pipeline

worker.proto & workerpb/worker.pb.go
- Added compaction parameter for `useLoonFFI` flag
- Allows workers to receive FFI configuration from coordinator

10. Parameter Configuration

component_param.go
- Added `UseLoonFFI` parameter to compaction configuration
- Reads from `common.storage.file.useLoonFFI` config path
- Default: `false` for safe rollout

11. Test Updates
- **clustering_compactor_storage_v2_test.go**: Updated signatures to
handle manifest return value
- **mix_compactor_storage_v2_test.go**: Updated test helpers for
manifest support
- **namespace_compactor_test.go**: Adjusted writer calls to expect
manifest
- **pack_writer_v2_test.go**: Validated manifest generation in pack
writing

This integration follows a **dual-mode approach**:
1. **Legacy Path**: Traditional binlog-based reading/writing (when
`useLoonFFI=false` or no manifest)
2. **FFI Path**: Manifest-based reading/writing through Loon FFI (when
`useLoonFFI=true` and manifest exists)

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-24 19:57:07 +08:00

286 lines
7.3 KiB
Protocol Buffer

syntax = "proto3";
package milvus.proto.index;
option go_package = "github.com/milvus-io/milvus/pkg/v2/proto/workerpb";
import "common.proto";
import "schema.proto";
import "data_coord.proto";
import "index_coord.proto";
import "milvus.proto";
service IndexNode {
// Deprecated
rpc CreateJob(CreateJobRequest) returns (common.Status) {}
// Deprecated
rpc QueryJobs(QueryJobsRequest) returns (QueryJobsResponse) {}
// Deprecated
rpc DropJobs(DropJobsRequest) returns (common.Status) {}
// Deprecated
rpc GetJobStats(GetJobStatsRequest) returns (GetJobStatsResponse) {}
// Deprecated
rpc CreateJobV2(CreateJobV2Request) returns (common.Status) {}
// Deprecated
rpc QueryJobsV2(QueryJobsV2Request) returns (QueryJobsV2Response) {}
// Deprecated
rpc DropJobsV2(DropJobsV2Request) returns (common.Status) {}
// https://wiki.lfaidata.foundation/display/MIL/MEP+8+--+Add+metrics+for+proxy
rpc GetMetrics(milvus.GetMetricsRequest)
returns (milvus.GetMetricsResponse) {
}
rpc CreateTask(CreateTaskRequest) returns (common.Status) {}
rpc QueryTask(QueryTaskRequest) returns (QueryTaskResponse) {}
rpc DropTask(DropTaskRequest) returns (common.Status) {}
}
message CreateTaskRequest {
// request body
bytes payload = 1;
// request properties, must contain:
// - clusterID
// - taskID
// - taskType
// - taskSlot
map<string, string> properties = 2;
}
message QueryTaskRequest {
// request properties, which contain:
// - clusterID
// - taskID
// - taskType
map<string, string> properties = 1;
}
message QueryTaskResponse {
common.Status status = 1;
// response body
bytes payload = 2;
// response properties, which contain:
// - taskState
map<string, string> properties = 3;
}
message DropTaskRequest {
// request properties, which contain:
// - clusterID
// - taskID
// - taskType
map<string, string> properties = 1;
}
// CreateJobRequest is CreateIndexRequest
message CreateJobRequest {
string clusterID = 1;
string index_file_prefix = 2;
int64 buildID = 3;
repeated string data_paths = 4;
int64 index_version = 5;
int64 indexID = 6;
string index_name = 7;
index.StorageConfig storage_config = 8;
repeated common.KeyValuePair index_params = 9;
repeated common.KeyValuePair type_params = 10;
int64 num_rows = 11;
int32 current_index_version = 12;
int64 collectionID = 13;
int64 partitionID = 14;
int64 segmentID = 15;
int64 fieldID = 16;
string field_name = 17;
schema.DataType field_type = 18;
string store_path = 19;
int64 store_version = 20;
string index_store_path = 21;
int64 dim = 22;
repeated int64 data_ids = 23;
repeated index.OptionalFieldInfo optional_scalar_fields = 24;
schema.FieldSchema field = 25;
bool partition_key_isolation = 26;
int32 current_scalar_index_version = 27;
int64 task_slot = 28;
int64 storage_version = 29;
int64 lack_binlog_rows = 30;
repeated data.FieldBinlog insert_logs = 31;
repeated common.KeyValuePair plugin_context = 32;
string manifest = 33;
}
message QueryJobsRequest {
string clusterID = 1;
repeated int64 taskIDs = 2;
}
message QueryJobsResponse {
common.Status status = 1;
string clusterID = 2;
repeated IndexTaskInfo index_infos = 3;
}
message DropJobsRequest {
string clusterID = 1;
repeated int64 taskIDs = 2;
}
message GetJobStatsRequest {
}
message GetJobStatsResponse {
common.Status status = 1;
int64 total_job_num = 2;
int64 in_progress_job_num = 3;
int64 enqueue_job_num = 4;
int64 available_slots = 5;
repeated index.JobInfo job_infos = 6;
// deprecated
bool enable_disk = 7;
int64 total_slots = 8;
}
message AnalyzeRequest {
string clusterID = 1;
int64 taskID = 2;
int64 collectionID = 3;
int64 partitionID = 4;
int64 fieldID = 5;
string fieldName = 6;
schema.DataType field_type = 7;
map<int64, index.SegmentStats> segment_stats = 8;
int64 version = 9;
index.StorageConfig storage_config = 10;
int64 dim = 11;
double max_train_size_ratio = 12;
int64 num_clusters = 13;
schema.FieldSchema field = 14;
double min_cluster_size_ratio = 15;
double max_cluster_size_ratio = 16;
int64 max_cluster_size = 17;
int64 task_slot = 18;
repeated common.KeyValuePair plugin_context = 19;
}
message CreateStatsRequest {
string clusterID = 1;
int64 taskID = 2;
int64 collectionID = 3;
int64 partitionID = 4;
string insert_channel = 5;
int64 segmentID = 6;
repeated data.FieldBinlog insert_logs = 7;
// deprecated, after sort stats moved, its not used.
repeated data.FieldBinlog delta_logs = 8;
index.StorageConfig storage_config = 9;
schema.CollectionSchema schema = 10;
index.StatsSubJob subJobType = 11;
int64 targetSegmentID = 12;
int64 startLogID = 13;
int64 endLogID = 14;
int64 num_rows = 15;
// deprecated, after sort stats moved, its not used.
int64 collection_ttl = 16;
// deprecated, after sort stats moved, its not used.
uint64 current_ts = 17;
int64 task_version = 18;
// deprecated, after sort stats moved, its not used.
uint64 binlogMaxSize = 19;
bool enable_json_key_stats = 20;
int64 json_key_stats_tantivy_memory = 21;
int64 json_key_stats_data_format = 22;
// deprecated, the sort logic has been moved into the compaction process.
bool enable_json_key_stats_in_sort = 23;
int64 task_slot = 24;
int64 storage_version = 25;
int32 current_scalar_index_version = 26;
repeated common.KeyValuePair plugin_context = 27;
int64 json_stats_max_shredding_columns = 28;
double json_stats_shredding_ratio_threshold = 29;
int64 json_stats_write_batch_size = 30;
}
message CreateJobV2Request {
string clusterID = 1;
int64 taskID = 2;
index.JobType job_type = 3;
oneof request {
AnalyzeRequest analyze_request = 4;
CreateJobRequest index_request = 5;
CreateStatsRequest stats_request = 6;
}
}
message QueryJobsV2Request {
string clusterID = 1;
repeated int64 taskIDs = 2;
index.JobType job_type = 3;
}
message IndexTaskInfo {
int64 buildID = 1;
common.IndexState state = 2;
repeated string index_file_keys = 3;
uint64 serialized_size = 4;
string fail_reason = 5;
int32 current_index_version = 6;
int64 index_store_version = 7;
uint64 mem_size = 8;
int32 current_scalar_index_version = 9;
}
message IndexJobResults {
repeated IndexTaskInfo results = 1;
}
message AnalyzeResult {
int64 taskID = 1;
index.JobState state = 2;
string fail_reason = 3;
string centroids_file = 4;
}
message AnalyzeResults {
repeated AnalyzeResult results = 1;
}
message StatsResult {
int64 taskID = 1;
index.JobState state = 2;
string fail_reason = 3;
int64 collectionID = 4;
int64 partitionID = 5;
int64 segmentID = 6;
string channel = 7;
repeated data.FieldBinlog insert_logs = 8;
repeated data.FieldBinlog stats_logs = 9;
map<int64, data.TextIndexStats> text_stats_logs = 10;
int64 num_rows = 11;
repeated data.FieldBinlog bm25_logs = 12;
map<int64, data.JsonKeyStats> json_key_stats_logs = 13;
}
message StatsResults {
repeated StatsResult results = 1;
}
message QueryJobsV2Response {
common.Status status = 1;
string clusterID = 2;
oneof result {
IndexJobResults index_job_results = 3;
AnalyzeResults analyze_job_results = 4;
StatsResults stats_job_results = 5;
}
}
message DropJobsV2Request {
string clusterID = 1;
repeated int64 taskIDs = 2;
index.JobType job_type = 3;
}