milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2025-12-07 01:28:27 +08:00

Author	SHA1	Message	Date
wei liu	f85e86a6ec	fix: change upsert duplicate PK behavior from dedup to error (#45997 ) issue: #44320 Replace the DeduplicateFieldData function with CheckDuplicatePkExist that returns an error when duplicate primary keys are detected in the same batch, instead of silently deduplicating. Changes: - Replace DeduplicateFieldData with CheckDuplicatePkExist in util.go - Update upsertTask.PreExecute to return error on duplicate PKs - Simplify helper function from findLastOccurrenceIndices to hasDuplicates - Update unit tests to verify the new error behavior - Add Python integration tests for duplicate PK error cases Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-12-04 10:23:11 +08:00
wei liu	7aed88113c	enhance: Deduplicate primary keys in upsert request batch (#45249 ) issue: #44320 This change adds deduplication logic to handle duplicate primary keys within a single upsert batch, keeping the last occurrence of each primary key. Key changes: - Add DeduplicateFieldData function to remove duplicate PKs from field data, supporting both Int64 and VarChar primary keys - Refactor fillFieldPropertiesBySchema into two separate functions: validateFieldDataColumns for validation and fillFieldPropertiesOnly for property filling, improving code clarity and reusability - Integrate deduplication logic in upsertTask.PreExecute to automatically deduplicate data before processing - Add comprehensive unit tests for deduplication with various PK types (Int64, VarChar) and field types (scalar, vector) - Add Python integration tests to verify end-to-end behavior --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-11-17 21:35:40 +08:00
zhenshan.cao	6327c9a514	fix: Fix bugs related to TimestampTz (#45111 ) issue: https://github.com/milvus-io/milvus/issues/44527 https://github.com/milvus-io/milvus/issues/44537 https://github.com/milvus-io/milvus/issues/44538 https://github.com/milvus-io/milvus/issues/44585 https://github.com/milvus-io/milvus/issues/44622 Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2025-11-04 16:51:33 +08:00
Bingyi Sun	633cae9461	enhance: add namespace for query and search request (#44343 ) issue: #44011 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-10-16 17:52:01 +08:00
wei liu	529c98520c	enhance: Add nullable support for Geometry and Timestamptz types (#44846 ) issue: #44800 This commit enhances the upsert and validation logic to properly handle nullable Geometry (WKT/WKB) and Timestamptz data types: - Add ToCompressedFormatNullable support for TimestamptzData, GeometryWktData, and GeometryData to filter out null values during data compression - Implement GenNullableFieldData for Timestamptz and Geometry types to generate nullable field data structures - Update FillWithNullValue to handle both GeometryData and GeometryWktData with null value filling logic - Add UpdateFieldData support for Timestamptz, GeometryData, and GeometryWktData field updates - Comprehensive unit tests covering all new data type handling scenarios Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-10-15 14:04:00 +08:00
cai.zhang	7a93cfe890	fix: Fix bug for nullable geometry (#44732 ) issue: #44648 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-10-11 11:27:57 +08:00
Gao	3cc59a0d69	enhance: add storage usage for delete/upsert/restful (#44512 ) #44212 Also, record metrics only when storageUsageTracking is enabled. Use MB for scanned_remote counter and scanned_total counter metrics to avoid overflow. --------- Signed-off-by: chasingegg <chao.gao@zilliz.com>	2025-09-30 00:31:06 +08:00
junjiejiangjjj	f07979f91d	enhance: add support for controlling function output field insertion (#44162 ) #44053 Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>	2025-09-24 17:26:04 +08:00
Tianx	4d5afec9a8	fix: upsert error for timestamptz (#44548 ) issue: https://github.com/milvus-io/milvus/issues/44527 Signed-off-by: xtx <xtianx@smail.nju.edu.cn>	2025-09-24 10:28:04 +08:00
Bingyi Sun	94d53a5ac6	feat: encode cluster id in auto id (#44471 ) https://github.com/milvus-io/milvus/issues/44326 prev: [physical_ts][logical_ts] after [sign_bit][cluster_id][physical_ts][logical_ts] --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-09-22 10:40:02 +08:00
Gao	d3784c6515	enhance: add storage resource usage for vector search (#44308 ) issue: #44212 Implement search/query storage usage statistics in go side(result reduce), only record storage usage in vector search C++ path. Need to be implemented in query c++ path in next prs. --------- Signed-off-by: chasingegg <chao.gao@zilliz.com> Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com> Co-authored-by: marcelo.chen <marcelo.chen@zilliz.com>	2025-09-19 20:20:02 +08:00
Bingyi Sun	5cd2d99799	enhance: Revert "feat: encode cluster id in auto id (#44324 )" (#44426 ) This reverts commit 7af159410395f0e7079d4875d96544c01f1d477b	2025-09-17 17:56:01 +08:00
Bingyi Sun	7af1594103	feat: encode cluster id in auto id (#44324 ) https://github.com/milvus-io/milvus/issues/44326 prev: `[physical_ts][logical_ts]` after `[sign_bit][cluster_id][physical_ts][logical_ts]` --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-09-17 16:56:01 +08:00
wei liu	18371773dd	enhance: Optimize partial update merge logic by unifying nullable format (#44197 ) issue: #43980 This commit optimizes the partial update merge logic by standardizing nullable field representation before merge operations to avoid corner cases during the merge process. Key changes: - Unify nullable field data format to FULL FORMAT before merge execution - Add extensive unit tests for bounds checking and edge cases The optimization ensures: - Consistent nullable field representation across SDK and internal - Proper handling of null values during merge operations - Prevention of index out-of-bounds errors in vector field updates - Better error handling and validation for partial update scenarios This resolves issues where different nullable field formats could cause merge failures or data corruption during partial update operations. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-09-10 17:27:56 +08:00
Bingyi Sun	e2eb8562f1	feat: Auto add namespace field data if namespace is enabled (#44198 ) issue: #44011 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-09-09 16:17:56 +08:00
wei liu	5ef793c393	fix: Fix panic when upsert with partial_update=true on empty table (#44155 ) issue: #43980 Fix panic issue caused by incorrect nullable field merging logic when upsert converts to insert operation on empty tables. - Add AppendFieldDataWithNullData to handle nullable field merging - Fix existing data merge with skipAppendNullData=false - Fix insert data merge with skipAppendNullData=true - Add unit tests for nullable field data appending scenarios Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-09-02 16:47:52 +08:00
wei liu	16af4e230a	fix: Prevent panic in upsert due to missing nullable fields [Proxy] (#44070 ) issue: #43980 Fixes a panic that occurred when a partial update was converted to an insert due to a non-existent primary key. The panic was caused by missing nullable fields that were not provided in the original partial update request. The upsert pre-execution logic is refactored to handle this correctly: - Explicitly splits upsert data into 'insert' and 'update' batches. - Automatically generates data for missing nullable or default-value fields during inserts, preventing the panic. - Enhances `typeutil.UpdateFieldData` to support different source and destination indexes for flexible data merging. - Adds comprehensive unit tests for mixed upsert, pure insert, and pure update scenarios. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-08-29 18:33:51 +08:00
junjiejiangjjj	f3d7e47227	feat: Supports more rerankers (#43270 ) https://github.com/milvus-io/milvus/issues/35856 Signed-off-by: junjiejiangjjj <junjie.jiang@zilliz.com>	2025-08-22 17:29:47 +08:00
Spade A	d6a428e880	feat: impl StructArray -- support create index for vector array (embedding list) and search on it (#43726 ) Ref https://github.com/milvus-io/milvus/issues/42148 This PR supports create index for vector array (now, only for `DataType.FLOAT_VECTOR`) and search on it. The index type supported in this PR is `EMB_LIST_HNSW` and the metric type is `MAX_SIM` only. The way to use it: ```python milvus_client = MilvusClient("xxx:19530") schema = milvus_client.create_schema(enable_dynamic_field=True, auto_id=True) ... struct_schema = milvus_client.create_struct_array_field_schema("struct_array_field") ... struct_schema.add_field("struct_float_vec", DataType.ARRAY_OF_VECTOR, element_type=DataType.FLOAT_VECTOR, dim=128, max_capacity=1000) ... schema.add_struct_array_field(struct_schema) index_params = milvus_client.prepare_index_params() index_params.add_index(field_name="struct_float_vec", index_type="EMB_LIST_HNSW", metric_type="MAX_SIM", index_params={"nlist": 128}) ... milvus_client.create_index(COLLECTION_NAME, schema=schema, index_params=index_params) ``` Note: This PR uses `Lims` to convey offsets of the vector array to knowhere where vectors of multiple vector arrays are concatenated and we need offsets to specify which vectors belong to which vector array. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-08-20 10:27:46 +08:00
wei liu	d3c95eaa77	enhance: Support partial field updates with upsert API (#42877 ) issue: #29735 Implement partial field update functionality for upsert operations, supporting scalar, vector, and dynamic JSON fields without requiring all collection fields. Changes: - Add queryPreExecute to retrieve existing records before upsert - Implement UpdateFieldData function for merging data - Add IDsChecker utility for efficient primary key lookups - Fix JSON data creation in tests using proper map marshaling - Add test cases for partial updates of different field types Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-08-19 11:15:45 +08:00
Zhen Ye	5551d99425	enhance: remove old arch non-streaming arch code (#43651 ) issue: #41609 - remove all dml dead code at proxy - remove dead code at l0_write_buffer - remove msgstream dependency at proxy - remove timetick reporter from proxy - remove replicate stream implementation --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-08-06 14:41:40 +08:00
Spade A	faeb7fd410	feat: impl StructArray -- create schema, insert, and retrieve data (#42855 ) Ref https://github.com/milvus-io/milvus/issues/42148 https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of storage for handling with VectorArray. This PR: 1. impls the go part of storage for VectorArray 2. impls the collection creation with StructArrayField and VectorArray 3. insert and retrieve data from the collection. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>	2025-07-27 01:30:55 +08:00
congqixia	880915e08b	enhance: Print out-of-date schema ts when returning ErrSchemaMismatch (#42790 ) Related to #41858 This PR add log while debugging schema mismatch between pymilvus cache and proxy schema. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-06-17 10:38:37 +08:00
Xianhui Lin	f9febe3bae	enhance: Merge RootCoord, DataCoord And QueryCoord into MixCoord (#41006 ) Merge RootCoord, DataCoord And QueryCoord into MixCoord Make Session into one issue : https://github.com/milvus-io/milvus/issues/37764 --------- Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-04-11 16:36:30 +08:00
Ted Xu	688505ab1c	enhance: cleanup lint check exclusions (#40829 ) See: #40828 Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-03-21 18:12:14 +08:00
junjiejiangjjj	359e7efd8e	feat: Add function running monitoring (#40358 ) #35856 #40004 1. Optimize model verification logic 2. Add profiling code Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>	2025-03-10 22:28:05 +08:00
congqixia	cb7f2fa6fd	enhance: Use v2 package name for pkg module (#39990 ) Related to #39095 https://go.dev/doc/modules/version-numbers Update pkg version according to golang dep version convention --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-02-22 23:15:58 +08:00
SimFG	ad36347fb3	fix: add BeginTimestamp and EndTimestamp to insert and upsert messages (#40110 ) - issue: #40109 - caused by: #38656 Signed-off-by: SimFG <bang.fu@zilliz.com>	2025-02-22 12:29:53 +08:00
Patrick Weizhi Xu	04fff74a56	feat: introduce Text data type (#39874 ) issue: https://github.com/milvus-io/milvus/issues/39818 This PR mimics Varchar data type, allows insert, search, query, delete, full-text search and others. Functionalities related to filter expressions are disabled temporarily. Storage changes for Text data type will be in the following PRs. Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>	2025-02-19 11:04:51 +08:00
Xianhui Lin	82f9689711	enhance: Add schema update time verification for insert and upsert to use cache (#39096 ) enhance: Add schema update time verification for insert and upsert to use cache issue: https://github.com/milvus-io/milvus/issues/39093 --------- Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-02-07 14:10:45 +08:00
aoiasd	2b4caba76e	fix: check utf-8 format for varchar with analyzer open (#39299 ) relate: https://github.com/milvus-io/milvus/issues/39285 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-02-06 17:11:51 +08:00
junjiejiangjjj	16cbdfb3b1	feat: Add Text Embedding Function (#36366 ) https://github.com/milvus-io/milvus/issues/35856 Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>	2025-01-24 14:23:06 +08:00
SimFG	2afe2eaf3e	feat: support to replicate collection when the services contains the system tt msg (#37559 ) - issue: #37105 --------- Signed-off-by: SimFG <bang.fu@zilliz.com>	2024-12-17 09:08:46 +08:00
tinswzy	27229f7907	enhance: refine exists log print with ctx (#38080 ) issue: #35917 Refines exists log print with ctx Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>	2024-12-14 22:36:44 +08:00
tinswzy	5768dbbb5d	enhance: refine pular related mq interfaces (#38007 ) issue: #35917 Refines the pulsar-related mq APIs to allow the ctx to be passed down Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>	2024-12-04 20:50:39 +08:00
SimFG	302650ae0e	fix: use the default partition for the limit quota when the request partition name is empty (#38005 ) - issue: #37685 Signed-off-by: SimFG <bang.fu@zilliz.com>	2024-11-27 11:00:36 +08:00
jaime	5686a9a024	fix: unhandle error in upsert task (#36604 ) issue: #36611 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-09-30 12:59:16 +08:00
jaime	52cce4de58	fix: iaccurate size estimation for encoded array data (#36373 ) issue: #36029 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-09-24 14:51:14 +08:00
congqixia	fe20366b5c	enhance: Remove duplicated schema helper creation in proxy (#35489 ) Related to PRs of #35415 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-08-15 19:18:53 +08:00
smellthemoon	6106a48acb	fix: upsert result use the previous pk (#34672 ) #34668 Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-07-31 15:25:51 +08:00
congqixia	de8a266d8a	enhance: Enable linux code checker (#35084 ) See also #34483 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-07-30 15:53:51 +08:00
Jiquan Long	a2ac84bd64	feat: record the duration waiting in the proxy queue (#34744 ) fix: https://github.com/milvus-io/milvus/issues/34743 --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2024-07-23 14:23:52 +08:00
smellthemoon	07b94b4615	enhance: support upsert autoid==true (#30342 ) related with: #29258 --------- Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-07-11 16:53:35 +08:00
aoiasd	186757e622	enhance: support mark error as user error (#33498 ) relate: https://github.com/milvus-io/milvus/issues/33492 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2024-07-01 14:56:12 +08:00
congqixia	a647b84f3e	enhance: Add AllPartitionsID const to replace InvalidPartitionID (#31438 ) "-1" as `InvalidPartitionID` previously used as All partition place holder in delete cases. It's confusing and hard to maintain when a const var has more than one meaning. This PR add `AllPartitionsID` to replace these usages in delete scenarios. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-20 19:01:05 +08:00
zhagnlu	c5363c70db	fix: fix upsert using wrong field to compute partition key (#30772 ) #30607 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-02-23 09:54:53 +08:00
cai.zhang	40ca98f57f	enhance: Skip timestamp allocation when search/query consistency level is eventually (#29773 ) issue: #29772 1. Skip timestamp allocation when search/query consistency level is eventually. Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-02-21 09:52:59 +08:00
XuanYang-cn	7f059b1025	fix: record apply pk latency metric to ms (#29987 ) Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2024-01-17 10:11:03 +08:00
congqixia	4f8c540c77	enhance: cache collection schema attributes to reduce proxy cpu (#29668 ) See also #29113 The collection schema is crucial when performing search/query but some of the information is calculated for every request. This PR change schema field of cached collection info into a utility `schemaInfo` type to store some stable result, say pk field, partitionKeyEnabled, etc. And provided field name to id map for search/query services. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-01-04 17:28:46 +08:00
yah01	be980fbc38	Refine state check (#27541 ) Signed-off-by: yah01 <yah2er0ne@outlook.com>	2023-10-11 21:01:35 +08:00

1 2

74 Commits