74 Commits

Author SHA1 Message Date
wei liu
f85e86a6ec
fix: change upsert duplicate PK behavior from dedup to error (#45997)
issue: #44320

Replace the DeduplicateFieldData function with CheckDuplicatePkExist
that returns an error when duplicate primary keys are detected in the
same batch, instead of silently deduplicating.

Changes:
- Replace DeduplicateFieldData with CheckDuplicatePkExist in util.go
- Update upsertTask.PreExecute to return error on duplicate PKs
- Simplify helper function from findLastOccurrenceIndices to
hasDuplicates
- Update unit tests to verify the new error behavior
- Add Python integration tests for duplicate PK error cases

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-12-04 10:23:11 +08:00
wei liu
7aed88113c
enhance: Deduplicate primary keys in upsert request batch (#45249)
issue: #44320

This change adds deduplication logic to handle duplicate primary keys
within a single upsert batch, keeping the last occurrence of each
primary key.

Key changes:
- Add DeduplicateFieldData function to remove duplicate PKs from field
data, supporting both Int64 and VarChar primary keys
- Refactor fillFieldPropertiesBySchema into two separate functions:
validateFieldDataColumns for validation and fillFieldPropertiesOnly for
property filling, improving code clarity and reusability
- Integrate deduplication logic in upsertTask.PreExecute to
automatically deduplicate data before processing
- Add comprehensive unit tests for deduplication with various PK types
(Int64, VarChar) and field types (scalar, vector)
- Add Python integration tests to verify end-to-end behavior

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-11-17 21:35:40 +08:00
zhenshan.cao
6327c9a514
fix: Fix bugs related to TimestampTz (#45111)
issue: https://github.com/milvus-io/milvus/issues/44527
https://github.com/milvus-io/milvus/issues/44537
https://github.com/milvus-io/milvus/issues/44538
https://github.com/milvus-io/milvus/issues/44585
https://github.com/milvus-io/milvus/issues/44622

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2025-11-04 16:51:33 +08:00
Bingyi Sun
633cae9461
enhance: add namespace for query and search request (#44343)
issue: #44011

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-10-16 17:52:01 +08:00
wei liu
529c98520c
enhance: Add nullable support for Geometry and Timestamptz types (#44846)
issue: #44800
This commit enhances the upsert and validation logic to properly handle
nullable Geometry (WKT/WKB) and Timestamptz data types:

- Add ToCompressedFormatNullable support for TimestamptzData,
GeometryWktData, and GeometryData to filter out null values during data
compression
- Implement GenNullableFieldData for Timestamptz and Geometry types to
generate nullable field data structures
- Update FillWithNullValue to handle both GeometryData and
GeometryWktData with null value filling logic
- Add UpdateFieldData support for Timestamptz, GeometryData, and
GeometryWktData field updates
- Comprehensive unit tests covering all new data type handling scenarios

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-10-15 14:04:00 +08:00
cai.zhang
7a93cfe890
fix: Fix bug for nullable geometry (#44732)
issue: #44648

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-10-11 11:27:57 +08:00
Gao
3cc59a0d69
enhance: add storage usage for delete/upsert/restful (#44512)
#44212 

Also, record metrics only when storageUsageTracking is enabled.
Use MB for scanned_remote counter and scanned_total counter metrics to
avoid overflow.

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2025-09-30 00:31:06 +08:00
junjiejiangjjj
f07979f91d
enhance: add support for controlling function output field insertion (#44162)
#44053

Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>
2025-09-24 17:26:04 +08:00
Tianx
4d5afec9a8
fix: upsert error for timestamptz (#44548)
issue: https://github.com/milvus-io/milvus/issues/44527

Signed-off-by: xtx <xtianx@smail.nju.edu.cn>
2025-09-24 10:28:04 +08:00
Bingyi Sun
94d53a5ac6
feat: encode cluster id in auto id (#44471)
https://github.com/milvus-io/milvus/issues/44326
prev:
[physical_ts][logical_ts]
after
[sign_bit][cluster_id][physical_ts][logical_ts]

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-09-22 10:40:02 +08:00
Gao
d3784c6515
enhance: add storage resource usage for vector search (#44308)
issue: #44212 

Implement search/query storage usage statistics in go side(result
reduce), only record storage usage in vector search C++ path. Need to be
implemented in query c++ path in next prs.

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com>
Co-authored-by: marcelo.chen <marcelo.chen@zilliz.com>
2025-09-19 20:20:02 +08:00
Bingyi Sun
5cd2d99799
enhance: Revert "feat: encode cluster id in auto id (#44324)" (#44426)
This reverts commit 7af159410395f0e7079d4875d96544c01f1d477b
2025-09-17 17:56:01 +08:00
Bingyi Sun
7af1594103
feat: encode cluster id in auto id (#44324)
https://github.com/milvus-io/milvus/issues/44326
prev:
`[physical_ts][logical_ts]`
after
`[sign_bit][cluster_id][physical_ts][logical_ts]`

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-09-17 16:56:01 +08:00
wei liu
18371773dd
enhance: Optimize partial update merge logic by unifying nullable format (#44197)
issue: #43980
This commit optimizes the partial update merge logic by standardizing
nullable field representation before merge operations to avoid corner
cases during the merge process.

Key changes:
- Unify nullable field data format to FULL FORMAT before merge execution
- Add extensive unit tests for bounds checking and edge cases

The optimization ensures:
- Consistent nullable field representation across SDK and internal
- Proper handling of null values during merge operations
- Prevention of index out-of-bounds errors in vector field updates
- Better error handling and validation for partial update scenarios

This resolves issues where different nullable field formats could cause
merge failures or data corruption during partial update operations.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-09-10 17:27:56 +08:00
Bingyi Sun
e2eb8562f1
feat: Auto add namespace field data if namespace is enabled (#44198)
issue: #44011

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-09-09 16:17:56 +08:00
wei liu
5ef793c393
fix: Fix panic when upsert with partial_update=true on empty table (#44155)
issue: #43980
Fix panic issue caused by incorrect nullable field merging logic when
upsert converts to insert operation on empty tables.

- Add AppendFieldDataWithNullData to handle nullable field merging
- Fix existing data merge with skipAppendNullData=false
- Fix insert data merge with skipAppendNullData=true
- Add unit tests for nullable field data appending scenarios

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-09-02 16:47:52 +08:00
wei liu
16af4e230a
fix: Prevent panic in upsert due to missing nullable fields [Proxy] (#44070)
issue: #43980
Fixes a panic that occurred when a partial update was converted to an
insert due to a non-existent primary key. The panic was caused by
missing nullable fields that were not provided in the original partial
update request.

The upsert pre-execution logic is refactored to handle this correctly:
- Explicitly splits upsert data into 'insert' and 'update' batches.
- Automatically generates data for missing nullable or default-value
fields during inserts, preventing the panic.
- Enhances `typeutil.UpdateFieldData` to support different source and
destination indexes for flexible data merging.
- Adds comprehensive unit tests for mixed upsert, pure insert, and pure
update scenarios.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-08-29 18:33:51 +08:00
junjiejiangjjj
f3d7e47227
feat: Supports more rerankers (#43270)
https://github.com/milvus-io/milvus/issues/35856

Signed-off-by: junjiejiangjjj <junjie.jiang@zilliz.com>
2025-08-22 17:29:47 +08:00
Spade A
d6a428e880
feat: impl StructArray -- support create index for vector array (embedding list) and search on it (#43726)
Ref https://github.com/milvus-io/milvus/issues/42148

This PR supports create index for vector array (now, only for
`DataType.FLOAT_VECTOR`) and search on it.
The index type supported in this PR is `EMB_LIST_HNSW` and the metric
type is `MAX_SIM` only.

The way to use it:
```python
milvus_client = MilvusClient("xxx:19530")
schema = milvus_client.create_schema(enable_dynamic_field=True, auto_id=True)
...
struct_schema = milvus_client.create_struct_array_field_schema("struct_array_field")
...
struct_schema.add_field("struct_float_vec", DataType.ARRAY_OF_VECTOR, element_type=DataType.FLOAT_VECTOR, dim=128, max_capacity=1000)
...
schema.add_struct_array_field(struct_schema)
index_params = milvus_client.prepare_index_params()
index_params.add_index(field_name="struct_float_vec", index_type="EMB_LIST_HNSW", metric_type="MAX_SIM", index_params={"nlist": 128})
...
milvus_client.create_index(COLLECTION_NAME, schema=schema, index_params=index_params)
```

Note: This PR uses `Lims` to convey offsets of the vector array to
knowhere where vectors of multiple vector arrays are concatenated and we
need offsets to specify which vectors belong to which vector array.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-08-20 10:27:46 +08:00
wei liu
d3c95eaa77
enhance: Support partial field updates with upsert API (#42877)
issue: #29735
Implement partial field update functionality for upsert operations,
supporting scalar, vector, and dynamic JSON fields without requiring all
collection fields.

Changes:
- Add queryPreExecute to retrieve existing records before upsert
- Implement UpdateFieldData function for merging data
- Add IDsChecker utility for efficient primary key lookups
- Fix JSON data creation in tests using proper map marshaling
- Add test cases for partial updates of different field types

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-08-19 11:15:45 +08:00
Zhen Ye
5551d99425
enhance: remove old arch non-streaming arch code (#43651)
issue: #41609

- remove all dml dead code at proxy
- remove dead code at l0_write_buffer
- remove msgstream dependency at proxy
- remove timetick reporter from proxy
- remove replicate stream implementation

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-08-06 14:41:40 +08:00
Spade A
faeb7fd410
feat: impl StructArray -- create schema, insert, and retrieve data (#42855)
Ref https://github.com/milvus-io/milvus/issues/42148

https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of
storage for handling with VectorArray.
This PR:
1. impls the go part of storage for VectorArray
2. impls the collection creation with StructArrayField and VectorArray
3. insert and retrieve data from the collection.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>
2025-07-27 01:30:55 +08:00
congqixia
880915e08b
enhance: Print out-of-date schema ts when returning ErrSchemaMismatch (#42790)
Related to #41858

This PR add log while debugging schema mismatch between pymilvus cache
and proxy schema.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-06-17 10:38:37 +08:00
Xianhui Lin
f9febe3bae
enhance: Merge RootCoord, DataCoord And QueryCoord into MixCoord (#41006)
Merge RootCoord, DataCoord And QueryCoord into MixCoord
Make Session into one
issue : https://github.com/milvus-io/milvus/issues/37764

---------

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-04-11 16:36:30 +08:00
Ted Xu
688505ab1c
enhance: cleanup lint check exclusions (#40829)
See: #40828

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-03-21 18:12:14 +08:00
junjiejiangjjj
359e7efd8e
feat: Add function running monitoring (#40358)
#35856 
#40004 
1. Optimize model verification logic
2. Add profiling code

Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>
2025-03-10 22:28:05 +08:00
congqixia
cb7f2fa6fd
enhance: Use v2 package name for pkg module (#39990)
Related to #39095

https://go.dev/doc/modules/version-numbers

Update pkg version according to golang dep version convention

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-02-22 23:15:58 +08:00
SimFG
ad36347fb3
fix: add BeginTimestamp and EndTimestamp to insert and upsert messages (#40110)
- issue: #40109
- caused by: #38656

Signed-off-by: SimFG <bang.fu@zilliz.com>
2025-02-22 12:29:53 +08:00
Patrick Weizhi Xu
04fff74a56
feat: introduce Text data type (#39874)
issue: https://github.com/milvus-io/milvus/issues/39818

This PR mimics Varchar data type, allows insert, search, query, delete,
full-text search and others.
Functionalities related to filter expressions are disabled temporarily. 

Storage changes for Text data type will be in the following PRs.

Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>
2025-02-19 11:04:51 +08:00
Xianhui Lin
82f9689711
enhance: Add schema update time verification for insert and upsert to use cache (#39096)
enhance: Add schema update time verification for insert and upsert to
use cache
issue: https://github.com/milvus-io/milvus/issues/39093

---------

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-02-07 14:10:45 +08:00
aoiasd
2b4caba76e
fix: check utf-8 format for varchar with analyzer open (#39299)
relate: https://github.com/milvus-io/milvus/issues/39285

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-02-06 17:11:51 +08:00
junjiejiangjjj
16cbdfb3b1
feat: Add Text Embedding Function (#36366)
https://github.com/milvus-io/milvus/issues/35856

Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>
2025-01-24 14:23:06 +08:00
SimFG
2afe2eaf3e
feat: support to replicate collection when the services contains the system tt msg (#37559)
- issue: #37105

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-12-17 09:08:46 +08:00
tinswzy
27229f7907
enhance: refine exists log print with ctx (#38080)
issue: #35917 
Refines exists log print with ctx

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2024-12-14 22:36:44 +08:00
tinswzy
5768dbbb5d
enhance: refine pular related mq interfaces (#38007)
issue: #35917 
Refines the pulsar-related mq APIs to allow the ctx to be passed down

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2024-12-04 20:50:39 +08:00
SimFG
302650ae0e
fix: use the default partition for the limit quota when the request partition name is empty (#38005)
- issue: #37685

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-11-27 11:00:36 +08:00
jaime
5686a9a024
fix: unhandle error in upsert task (#36604)
issue: #36611

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-09-30 12:59:16 +08:00
jaime
52cce4de58
fix: iaccurate size estimation for encoded array data (#36373)
issue: #36029

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-09-24 14:51:14 +08:00
congqixia
fe20366b5c
enhance: Remove duplicated schema helper creation in proxy (#35489)
Related to PRs of #35415

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-08-15 19:18:53 +08:00
smellthemoon
6106a48acb
fix: upsert result use the previous pk (#34672)
#34668

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-07-31 15:25:51 +08:00
congqixia
de8a266d8a
enhance: Enable linux code checker (#35084)
See also #34483

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-30 15:53:51 +08:00
Jiquan Long
a2ac84bd64
feat: record the duration waiting in the proxy queue (#34744)
fix: https://github.com/milvus-io/milvus/issues/34743

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-07-23 14:23:52 +08:00
smellthemoon
07b94b4615
enhance: support upsert autoid==true (#30342)
related with: #29258

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-07-11 16:53:35 +08:00
aoiasd
186757e622
enhance: support mark error as user error (#33498)
relate: https://github.com/milvus-io/milvus/issues/33492

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-07-01 14:56:12 +08:00
congqixia
a647b84f3e
enhance: Add AllPartitionsID const to replace InvalidPartitionID (#31438)
"-1" as `InvalidPartitionID` previously used as All partition place
holder in delete cases. It's confusing and hard to maintain when a const
var has more than one meaning.

This PR add `AllPartitionsID` to replace these usages in delete
scenarios.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-20 19:01:05 +08:00
zhagnlu
c5363c70db
fix: fix upsert using wrong field to compute partition key (#30772)
#30607

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-02-23 09:54:53 +08:00
cai.zhang
40ca98f57f
enhance: Skip timestamp allocation when search/query consistency level is eventually (#29773)
issue: #29772 
1. Skip timestamp allocation when search/query consistency level is
eventually.

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-02-21 09:52:59 +08:00
XuanYang-cn
7f059b1025
fix: record apply pk latency metric to ms (#29987)
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-01-17 10:11:03 +08:00
congqixia
4f8c540c77
enhance: cache collection schema attributes to reduce proxy cpu (#29668)
See also #29113

The collection schema is crucial when performing search/query but some
of the information is calculated for every request.

This PR change schema field of cached collection info into a utility
`schemaInfo` type to store some stable result, say pk field,
partitionKeyEnabled, etc. And provided field name to id map for
search/query services.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-04 17:28:46 +08:00
yah01
be980fbc38
Refine state check (#27541)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-10-11 21:01:35 +08:00