issue: #44320
Replace the DeduplicateFieldData function with CheckDuplicatePkExist
that returns an error when duplicate primary keys are detected in the
same batch, instead of silently deduplicating.
Changes:
- Replace DeduplicateFieldData with CheckDuplicatePkExist in util.go
- Update upsertTask.PreExecute to return error on duplicate PKs
- Simplify helper function from findLastOccurrenceIndices to
hasDuplicates
- Update unit tests to verify the new error behavior
- Add Python integration tests for duplicate PK error cases
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #44320
This change adds deduplication logic to handle duplicate primary keys
within a single upsert batch, keeping the last occurrence of each
primary key.
Key changes:
- Add DeduplicateFieldData function to remove duplicate PKs from field
data, supporting both Int64 and VarChar primary keys
- Refactor fillFieldPropertiesBySchema into two separate functions:
validateFieldDataColumns for validation and fillFieldPropertiesOnly for
property filling, improving code clarity and reusability
- Integrate deduplication logic in upsertTask.PreExecute to
automatically deduplicate data before processing
- Add comprehensive unit tests for deduplication with various PK types
(Int64, VarChar) and field types (scalar, vector)
- Add Python integration tests to verify end-to-end behavior
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #44800
This commit enhances the upsert and validation logic to properly handle
nullable Geometry (WKT/WKB) and Timestamptz data types:
- Add ToCompressedFormatNullable support for TimestamptzData,
GeometryWktData, and GeometryData to filter out null values during data
compression
- Implement GenNullableFieldData for Timestamptz and Geometry types to
generate nullable field data structures
- Update FillWithNullValue to handle both GeometryData and
GeometryWktData with null value filling logic
- Add UpdateFieldData support for Timestamptz, GeometryData, and
GeometryWktData field updates
- Comprehensive unit tests covering all new data type handling scenarios
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
#44212
Also, record metrics only when storageUsageTracking is enabled.
Use MB for scanned_remote counter and scanned_total counter metrics to
avoid overflow.
---------
Signed-off-by: chasingegg <chao.gao@zilliz.com>
issue: #44212
Implement search/query storage usage statistics in go side(result
reduce), only record storage usage in vector search C++ path. Need to be
implemented in query c++ path in next prs.
---------
Signed-off-by: chasingegg <chao.gao@zilliz.com>
Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com>
Co-authored-by: marcelo.chen <marcelo.chen@zilliz.com>
issue: #43980
This commit optimizes the partial update merge logic by standardizing
nullable field representation before merge operations to avoid corner
cases during the merge process.
Key changes:
- Unify nullable field data format to FULL FORMAT before merge execution
- Add extensive unit tests for bounds checking and edge cases
The optimization ensures:
- Consistent nullable field representation across SDK and internal
- Proper handling of null values during merge operations
- Prevention of index out-of-bounds errors in vector field updates
- Better error handling and validation for partial update scenarios
This resolves issues where different nullable field formats could cause
merge failures or data corruption during partial update operations.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #43980
Fix panic issue caused by incorrect nullable field merging logic when
upsert converts to insert operation on empty tables.
- Add AppendFieldDataWithNullData to handle nullable field merging
- Fix existing data merge with skipAppendNullData=false
- Fix insert data merge with skipAppendNullData=true
- Add unit tests for nullable field data appending scenarios
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #43980
Fixes a panic that occurred when a partial update was converted to an
insert due to a non-existent primary key. The panic was caused by
missing nullable fields that were not provided in the original partial
update request.
The upsert pre-execution logic is refactored to handle this correctly:
- Explicitly splits upsert data into 'insert' and 'update' batches.
- Automatically generates data for missing nullable or default-value
fields during inserts, preventing the panic.
- Enhances `typeutil.UpdateFieldData` to support different source and
destination indexes for flexible data merging.
- Adds comprehensive unit tests for mixed upsert, pure insert, and pure
update scenarios.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Ref https://github.com/milvus-io/milvus/issues/42148
This PR supports create index for vector array (now, only for
`DataType.FLOAT_VECTOR`) and search on it.
The index type supported in this PR is `EMB_LIST_HNSW` and the metric
type is `MAX_SIM` only.
The way to use it:
```python
milvus_client = MilvusClient("xxx:19530")
schema = milvus_client.create_schema(enable_dynamic_field=True, auto_id=True)
...
struct_schema = milvus_client.create_struct_array_field_schema("struct_array_field")
...
struct_schema.add_field("struct_float_vec", DataType.ARRAY_OF_VECTOR, element_type=DataType.FLOAT_VECTOR, dim=128, max_capacity=1000)
...
schema.add_struct_array_field(struct_schema)
index_params = milvus_client.prepare_index_params()
index_params.add_index(field_name="struct_float_vec", index_type="EMB_LIST_HNSW", metric_type="MAX_SIM", index_params={"nlist": 128})
...
milvus_client.create_index(COLLECTION_NAME, schema=schema, index_params=index_params)
```
Note: This PR uses `Lims` to convey offsets of the vector array to
knowhere where vectors of multiple vector arrays are concatenated and we
need offsets to specify which vectors belong to which vector array.
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
issue: #29735
Implement partial field update functionality for upsert operations,
supporting scalar, vector, and dynamic JSON fields without requiring all
collection fields.
Changes:
- Add queryPreExecute to retrieve existing records before upsert
- Implement UpdateFieldData function for merging data
- Add IDsChecker utility for efficient primary key lookups
- Fix JSON data creation in tests using proper map marshaling
- Add test cases for partial updates of different field types
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Ref https://github.com/milvus-io/milvus/issues/42148https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of
storage for handling with VectorArray.
This PR:
1. impls the go part of storage for VectorArray
2. impls the collection creation with StructArrayField and VectorArray
3. insert and retrieve data from the collection.
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>
Related to #41858
This PR add log while debugging schema mismatch between pymilvus cache
and proxy schema.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Merge RootCoord, DataCoord And QueryCoord into MixCoord
Make Session into one
issue : https://github.com/milvus-io/milvus/issues/37764
---------
Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/39818
This PR mimics Varchar data type, allows insert, search, query, delete,
full-text search and others.
Functionalities related to filter expressions are disabled temporarily.
Storage changes for Text data type will be in the following PRs.
Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>
enhance: Add schema update time verification for insert and upsert to
use cache
issue: https://github.com/milvus-io/milvus/issues/39093
---------
Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
"-1" as `InvalidPartitionID` previously used as All partition place
holder in delete cases. It's confusing and hard to maintain when a const
var has more than one meaning.
This PR add `AllPartitionsID` to replace these usages in delete
scenarios.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
See also #29113
The collection schema is crucial when performing search/query but some
of the information is calculated for every request.
This PR change schema field of cached collection info into a utility
`schemaInfo` type to store some stable result, say pk field,
partitionKeyEnabled, etc. And provided field name to id map for
search/query services.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>