milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2025-12-28 14:35:27 +08:00

Author	SHA1	Message	Date
marcelo-cjl	3b599441fd	feat: Add nullable vector support for proxy and querynode (#46305 ) related: #45993 This commit extends nullable vector support to the proxy layer, querynode, and adds comprehensive validation, search reduce, and field data handling for nullable vectors with sparse storage. Proxy layer changes: - Update validate_util.go checkAligned() with getExpectedVectorRows() helper to validate nullable vector field alignment using valid data count - Update checkFloatVectorFieldData/checkSparseFloatVectorFieldData for nullable vector validation with proper row count expectations - Add FieldDataIdxComputer in typeutil/schema.go for logical-to-physical index translation during search reduce operations - Update search_reduce_util.go reduceSearchResultData to use idxComputers for correct field data indexing with nullable vectors - Update task.go, task_query.go, task_upsert.go for nullable vector handling - Update msg_pack.go with nullable vector field data processing QueryNode layer changes: - Update segments/result.go for nullable vector result handling - Update segments/search_reduce.go with nullable vector offset translation Storage and index changes: - Update data_codec.go and utils.go for nullable vector serialization - Update indexcgowrapper/dataset.go and index.go for nullable vector indexing Utility changes: - Add FieldDataIdxComputer struct with Compute() method for efficient logical-to-physical index mapping across multiple field data - Update EstimateEntitySize() and AppendFieldData() with fieldIdxs parameter - Update funcutil.go with nullable vector support functions <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Full support for nullable vector fields (float, binary, float16, bfloat16, int8, sparse) across ingest, storage, indexing, search and retrieval; logical↔physical offset mapping preserves row semantics. * Client: compaction control and compaction-state APIs. * Bug Fixes * Improved validation for adding vector fields (nullable + dimension checks) and corrected search/query behavior for nullable vectors. * Chores * Persisted validity maps with indexes and on-disk formats. * Tests * Extensive new and updated end-to-end nullable-vector tests. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>	2025-12-24 10:13:19 +08:00
wei liu	d7050c417f	fix: Add field data alignment validation to prevent partial update panic (#46177 ) issue: #46176 - Add checkAligned validation before processing partial update field data to prevent index out of range panic when field data arrays have mismatched lengths - Fix GetNumRowOfFieldDataWithSchema to handle Timestamptz string format and Geometry WKT format properly - Add unit tests for empty data array scenarios in partial update --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-12-09 14:17:12 +08:00
wei liu	f85e86a6ec	fix: change upsert duplicate PK behavior from dedup to error (#45997 ) issue: #44320 Replace the DeduplicateFieldData function with CheckDuplicatePkExist that returns an error when duplicate primary keys are detected in the same batch, instead of silently deduplicating. Changes: - Replace DeduplicateFieldData with CheckDuplicatePkExist in util.go - Update upsertTask.PreExecute to return error on duplicate PKs - Simplify helper function from findLastOccurrenceIndices to hasDuplicates - Update unit tests to verify the new error behavior - Add Python integration tests for duplicate PK error cases Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-12-04 10:23:11 +08:00
wei liu	7aed88113c	enhance: Deduplicate primary keys in upsert request batch (#45249 ) issue: #44320 This change adds deduplication logic to handle duplicate primary keys within a single upsert batch, keeping the last occurrence of each primary key. Key changes: - Add DeduplicateFieldData function to remove duplicate PKs from field data, supporting both Int64 and VarChar primary keys - Refactor fillFieldPropertiesBySchema into two separate functions: validateFieldDataColumns for validation and fillFieldPropertiesOnly for property filling, improving code clarity and reusability - Integrate deduplication logic in upsertTask.PreExecute to automatically deduplicate data before processing - Add comprehensive unit tests for deduplication with various PK types (Int64, VarChar) and field types (scalar, vector) - Add Python integration tests to verify end-to-end behavior --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-11-17 21:35:40 +08:00
zhenshan.cao	6327c9a514	fix: Fix bugs related to TimestampTz (#45111 ) issue: https://github.com/milvus-io/milvus/issues/44527 https://github.com/milvus-io/milvus/issues/44537 https://github.com/milvus-io/milvus/issues/44538 https://github.com/milvus-io/milvus/issues/44585 https://github.com/milvus-io/milvus/issues/44622 Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2025-11-04 16:51:33 +08:00
Bingyi Sun	633cae9461	enhance: add namespace for query and search request (#44343 ) issue: #44011 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-10-16 17:52:01 +08:00
wei liu	529c98520c	enhance: Add nullable support for Geometry and Timestamptz types (#44846 ) issue: #44800 This commit enhances the upsert and validation logic to properly handle nullable Geometry (WKT/WKB) and Timestamptz data types: - Add ToCompressedFormatNullable support for TimestamptzData, GeometryWktData, and GeometryData to filter out null values during data compression - Implement GenNullableFieldData for Timestamptz and Geometry types to generate nullable field data structures - Update FillWithNullValue to handle both GeometryData and GeometryWktData with null value filling logic - Add UpdateFieldData support for Timestamptz, GeometryData, and GeometryWktData field updates - Comprehensive unit tests covering all new data type handling scenarios Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-10-15 14:04:00 +08:00
cai.zhang	7a93cfe890	fix: Fix bug for nullable geometry (#44732 ) issue: #44648 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-10-11 11:27:57 +08:00
Gao	3cc59a0d69	enhance: add storage usage for delete/upsert/restful (#44512 ) #44212 Also, record metrics only when storageUsageTracking is enabled. Use MB for scanned_remote counter and scanned_total counter metrics to avoid overflow. --------- Signed-off-by: chasingegg <chao.gao@zilliz.com>	2025-09-30 00:31:06 +08:00
junjiejiangjjj	f07979f91d	enhance: add support for controlling function output field insertion (#44162 ) #44053 Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>	2025-09-24 17:26:04 +08:00
Tianx	4d5afec9a8	fix: upsert error for timestamptz (#44548 ) issue: https://github.com/milvus-io/milvus/issues/44527 Signed-off-by: xtx <xtianx@smail.nju.edu.cn>	2025-09-24 10:28:04 +08:00
Bingyi Sun	94d53a5ac6	feat: encode cluster id in auto id (#44471 ) https://github.com/milvus-io/milvus/issues/44326 prev: [physical_ts][logical_ts] after [sign_bit][cluster_id][physical_ts][logical_ts] --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-09-22 10:40:02 +08:00
Gao	d3784c6515	enhance: add storage resource usage for vector search (#44308 ) issue: #44212 Implement search/query storage usage statistics in go side(result reduce), only record storage usage in vector search C++ path. Need to be implemented in query c++ path in next prs. --------- Signed-off-by: chasingegg <chao.gao@zilliz.com> Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com> Co-authored-by: marcelo.chen <marcelo.chen@zilliz.com>	2025-09-19 20:20:02 +08:00
Bingyi Sun	5cd2d99799	enhance: Revert "feat: encode cluster id in auto id (#44324 )" (#44426 ) This reverts commit 7af159410395f0e7079d4875d96544c01f1d477b	2025-09-17 17:56:01 +08:00
Bingyi Sun	7af1594103	feat: encode cluster id in auto id (#44324 ) https://github.com/milvus-io/milvus/issues/44326 prev: `[physical_ts][logical_ts]` after `[sign_bit][cluster_id][physical_ts][logical_ts]` --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-09-17 16:56:01 +08:00
wei liu	18371773dd	enhance: Optimize partial update merge logic by unifying nullable format (#44197 ) issue: #43980 This commit optimizes the partial update merge logic by standardizing nullable field representation before merge operations to avoid corner cases during the merge process. Key changes: - Unify nullable field data format to FULL FORMAT before merge execution - Add extensive unit tests for bounds checking and edge cases The optimization ensures: - Consistent nullable field representation across SDK and internal - Proper handling of null values during merge operations - Prevention of index out-of-bounds errors in vector field updates - Better error handling and validation for partial update scenarios This resolves issues where different nullable field formats could cause merge failures or data corruption during partial update operations. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-09-10 17:27:56 +08:00
Bingyi Sun	e2eb8562f1	feat: Auto add namespace field data if namespace is enabled (#44198 ) issue: #44011 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-09-09 16:17:56 +08:00
wei liu	5ef793c393	fix: Fix panic when upsert with partial_update=true on empty table (#44155 ) issue: #43980 Fix panic issue caused by incorrect nullable field merging logic when upsert converts to insert operation on empty tables. - Add AppendFieldDataWithNullData to handle nullable field merging - Fix existing data merge with skipAppendNullData=false - Fix insert data merge with skipAppendNullData=true - Add unit tests for nullable field data appending scenarios Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-09-02 16:47:52 +08:00
wei liu	16af4e230a	fix: Prevent panic in upsert due to missing nullable fields [Proxy] (#44070 ) issue: #43980 Fixes a panic that occurred when a partial update was converted to an insert due to a non-existent primary key. The panic was caused by missing nullable fields that were not provided in the original partial update request. The upsert pre-execution logic is refactored to handle this correctly: - Explicitly splits upsert data into 'insert' and 'update' batches. - Automatically generates data for missing nullable or default-value fields during inserts, preventing the panic. - Enhances `typeutil.UpdateFieldData` to support different source and destination indexes for flexible data merging. - Adds comprehensive unit tests for mixed upsert, pure insert, and pure update scenarios. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-08-29 18:33:51 +08:00
junjiejiangjjj	f3d7e47227	feat: Supports more rerankers (#43270 ) https://github.com/milvus-io/milvus/issues/35856 Signed-off-by: junjiejiangjjj <junjie.jiang@zilliz.com>	2025-08-22 17:29:47 +08:00
Spade A	d6a428e880	feat: impl StructArray -- support create index for vector array (embedding list) and search on it (#43726 ) Ref https://github.com/milvus-io/milvus/issues/42148 This PR supports create index for vector array (now, only for `DataType.FLOAT_VECTOR`) and search on it. The index type supported in this PR is `EMB_LIST_HNSW` and the metric type is `MAX_SIM` only. The way to use it: ```python milvus_client = MilvusClient("xxx:19530") schema = milvus_client.create_schema(enable_dynamic_field=True, auto_id=True) ... struct_schema = milvus_client.create_struct_array_field_schema("struct_array_field") ... struct_schema.add_field("struct_float_vec", DataType.ARRAY_OF_VECTOR, element_type=DataType.FLOAT_VECTOR, dim=128, max_capacity=1000) ... schema.add_struct_array_field(struct_schema) index_params = milvus_client.prepare_index_params() index_params.add_index(field_name="struct_float_vec", index_type="EMB_LIST_HNSW", metric_type="MAX_SIM", index_params={"nlist": 128}) ... milvus_client.create_index(COLLECTION_NAME, schema=schema, index_params=index_params) ``` Note: This PR uses `Lims` to convey offsets of the vector array to knowhere where vectors of multiple vector arrays are concatenated and we need offsets to specify which vectors belong to which vector array. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-08-20 10:27:46 +08:00
wei liu	d3c95eaa77	enhance: Support partial field updates with upsert API (#42877 ) issue: #29735 Implement partial field update functionality for upsert operations, supporting scalar, vector, and dynamic JSON fields without requiring all collection fields. Changes: - Add queryPreExecute to retrieve existing records before upsert - Implement UpdateFieldData function for merging data - Add IDsChecker utility for efficient primary key lookups - Fix JSON data creation in tests using proper map marshaling - Add test cases for partial updates of different field types Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-08-19 11:15:45 +08:00
Zhen Ye	5551d99425	enhance: remove old arch non-streaming arch code (#43651 ) issue: #41609 - remove all dml dead code at proxy - remove dead code at l0_write_buffer - remove msgstream dependency at proxy - remove timetick reporter from proxy - remove replicate stream implementation --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-08-06 14:41:40 +08:00
Spade A	faeb7fd410	feat: impl StructArray -- create schema, insert, and retrieve data (#42855 ) Ref https://github.com/milvus-io/milvus/issues/42148 https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of storage for handling with VectorArray. This PR: 1. impls the go part of storage for VectorArray 2. impls the collection creation with StructArrayField and VectorArray 3. insert and retrieve data from the collection. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>	2025-07-27 01:30:55 +08:00
congqixia	880915e08b	enhance: Print out-of-date schema ts when returning ErrSchemaMismatch (#42790 ) Related to #41858 This PR add log while debugging schema mismatch between pymilvus cache and proxy schema. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-06-17 10:38:37 +08:00
Xianhui Lin	f9febe3bae	enhance: Merge RootCoord, DataCoord And QueryCoord into MixCoord (#41006 ) Merge RootCoord, DataCoord And QueryCoord into MixCoord Make Session into one issue : https://github.com/milvus-io/milvus/issues/37764 --------- Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-04-11 16:36:30 +08:00
Ted Xu	688505ab1c	enhance: cleanup lint check exclusions (#40829 ) See: #40828 Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-03-21 18:12:14 +08:00
junjiejiangjjj	359e7efd8e	feat: Add function running monitoring (#40358 ) #35856 #40004 1. Optimize model verification logic 2. Add profiling code Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>	2025-03-10 22:28:05 +08:00
congqixia	cb7f2fa6fd	enhance: Use v2 package name for pkg module (#39990 ) Related to #39095 https://go.dev/doc/modules/version-numbers Update pkg version according to golang dep version convention --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-02-22 23:15:58 +08:00
SimFG	ad36347fb3	fix: add BeginTimestamp and EndTimestamp to insert and upsert messages (#40110 ) - issue: #40109 - caused by: #38656 Signed-off-by: SimFG <bang.fu@zilliz.com>	2025-02-22 12:29:53 +08:00
Patrick Weizhi Xu	04fff74a56	feat: introduce Text data type (#39874 ) issue: https://github.com/milvus-io/milvus/issues/39818 This PR mimics Varchar data type, allows insert, search, query, delete, full-text search and others. Functionalities related to filter expressions are disabled temporarily. Storage changes for Text data type will be in the following PRs. Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>	2025-02-19 11:04:51 +08:00
Xianhui Lin	82f9689711	enhance: Add schema update time verification for insert and upsert to use cache (#39096 ) enhance: Add schema update time verification for insert and upsert to use cache issue: https://github.com/milvus-io/milvus/issues/39093 --------- Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-02-07 14:10:45 +08:00
aoiasd	2b4caba76e	fix: check utf-8 format for varchar with analyzer open (#39299 ) relate: https://github.com/milvus-io/milvus/issues/39285 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-02-06 17:11:51 +08:00
junjiejiangjjj	16cbdfb3b1	feat: Add Text Embedding Function (#36366 ) https://github.com/milvus-io/milvus/issues/35856 Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>	2025-01-24 14:23:06 +08:00
SimFG	2afe2eaf3e	feat: support to replicate collection when the services contains the system tt msg (#37559 ) - issue: #37105 --------- Signed-off-by: SimFG <bang.fu@zilliz.com>	2024-12-17 09:08:46 +08:00
tinswzy	27229f7907	enhance: refine exists log print with ctx (#38080 ) issue: #35917 Refines exists log print with ctx Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>	2024-12-14 22:36:44 +08:00
tinswzy	5768dbbb5d	enhance: refine pular related mq interfaces (#38007 ) issue: #35917 Refines the pulsar-related mq APIs to allow the ctx to be passed down Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>	2024-12-04 20:50:39 +08:00
SimFG	302650ae0e	fix: use the default partition for the limit quota when the request partition name is empty (#38005 ) - issue: #37685 Signed-off-by: SimFG <bang.fu@zilliz.com>	2024-11-27 11:00:36 +08:00
jaime	5686a9a024	fix: unhandle error in upsert task (#36604 ) issue: #36611 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-09-30 12:59:16 +08:00
jaime	52cce4de58	fix: iaccurate size estimation for encoded array data (#36373 ) issue: #36029 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-09-24 14:51:14 +08:00
congqixia	fe20366b5c	enhance: Remove duplicated schema helper creation in proxy (#35489 ) Related to PRs of #35415 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-08-15 19:18:53 +08:00
smellthemoon	6106a48acb	fix: upsert result use the previous pk (#34672 ) #34668 Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-07-31 15:25:51 +08:00
congqixia	de8a266d8a	enhance: Enable linux code checker (#35084 ) See also #34483 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-07-30 15:53:51 +08:00
Jiquan Long	a2ac84bd64	feat: record the duration waiting in the proxy queue (#34744 ) fix: https://github.com/milvus-io/milvus/issues/34743 --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2024-07-23 14:23:52 +08:00
smellthemoon	07b94b4615	enhance: support upsert autoid==true (#30342 ) related with: #29258 --------- Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-07-11 16:53:35 +08:00
aoiasd	186757e622	enhance: support mark error as user error (#33498 ) relate: https://github.com/milvus-io/milvus/issues/33492 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2024-07-01 14:56:12 +08:00
congqixia	a647b84f3e	enhance: Add AllPartitionsID const to replace InvalidPartitionID (#31438 ) "-1" as `InvalidPartitionID` previously used as All partition place holder in delete cases. It's confusing and hard to maintain when a const var has more than one meaning. This PR add `AllPartitionsID` to replace these usages in delete scenarios. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-20 19:01:05 +08:00
zhagnlu	c5363c70db	fix: fix upsert using wrong field to compute partition key (#30772 ) #30607 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-02-23 09:54:53 +08:00
cai.zhang	40ca98f57f	enhance: Skip timestamp allocation when search/query consistency level is eventually (#29773 ) issue: #29772 1. Skip timestamp allocation when search/query consistency level is eventually. Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-02-21 09:52:59 +08:00
XuanYang-cn	7f059b1025	fix: record apply pk latency metric to ms (#29987 ) Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2024-01-17 10:11:03 +08:00

1 2

76 Commits