enhance: support pk isolation optional field data loading from manifest for index build (#46480)

### **User description**
Related to #44956

Add manifest-based data loading path for optional fields in
`cache_opt_field_memory_v2`. When a manifest file is provided in the
config, the function now retrieves field data directly from the manifest
using `GetFieldDatasFromManifest` instead of reading from segment insert
files. This enables storage v2 compatibility for building indexes with
optional fields.


___

### **PR Type**
Enhancement


___

### **Description**
- Add manifest-based data loading for optional fields in index building

- Support storage v2 compatibility via `GetFieldDatasFromManifest`
function

- Enable PK isolation optional field handling without segment insert
files


___

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
This commit is contained in:
congqixia 2025-12-23 14:55:21 +08:00 committed by GitHub
parent 674ac8a006
commit d3b15ac136
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -373,6 +373,39 @@ MemFileManagerImpl::cache_opt_field_memory_v2(const Config& config) {
"vector index build with multiple fields is not supported yet");
}
auto manifest =
index::GetValueFromConfig<std::string>(config, SEGMENT_MANIFEST_KEY);
// use manifest file for storage v2
auto manifest_path_str = manifest.value_or("");
if (manifest_path_str != "") {
AssertInfo(loon_ffi_properties_ != nullptr,
"[StorageV2] loon ffi properties is null when build index "
"with manifest");
std::unordered_map<int64_t, std::vector<std::vector<uint32_t>>> res;
for (auto& [field_id, tup] : fields_map) {
const auto& field_type = std::get<1>(tup);
const auto& element_type = std::get<2>(tup);
// compose field schema for optional field
proto::schema::FieldSchema field_schema;
field_schema.set_fieldid(field_id);
field_schema.set_nullable(true); // use always nullable
milvus::storage::FieldDataMeta field_meta{field_meta_.collection_id,
field_meta_.partition_id,
field_meta_.segment_id,
field_id,
field_schema};
auto field_datas = GetFieldDatasFromManifest(manifest_path_str,
loon_ffi_properties_,
field_meta_,
field_type,
1, // scalar field
element_type);
res[field_id] = GetOptFieldIvfData(field_type, field_datas);
}
return res;
}
auto segment_insert_files =
index::GetValueFromConfig<std::vector<std::vector<std::string>>>(
config, SEGMENT_INSERT_FILES_KEY);