milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2025-12-28 22:45:26 +08:00

Author	SHA1	Message	Date
aoiasd	df80f54151	feat: support use user's file as dictionary for analyzer filter (#46145 ) relate: https://github.com/milvus-io/milvus/issues/43687 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-12-16 11:45:16 +08:00
congqixia	bb2a08ed71	enhance: pass manifest path to stats task for storage v2 support (#46350 ) Related #44956 Add manifest_path field to CreateStatsRequest and propagate it through the stats task pipeline. This enables stats tasks and text index building to access segment manifest for storage v2 format operations. - Add manifest_path field to CreateStatsRequest proto - Set ManifestPath from segment metadata in DataCoord - Pass manifest to BuildIndexInfo in stats task builder - Include manifest in compaction text index creation Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-12-16 11:11:16 +08:00
yihao.dai	889505872a	enhance: Return FlushAllMsg in response (#46347 ) issue: https://github.com/milvus-io/milvus/issues/45919 Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-12-16 10:35:16 +08:00
Zhen Ye	675a6b9ba0	fix: illegal reference count of record in binlog writer (#46344 ) issue: #46205 Signed-off-by: chyezh <chyezh@outlook.com>	2025-12-15 22:51:15 +08:00
Lanqing Yang	3e15604f2e	fix: use rlock for pinindex (#45932 ) fixes: https://github.com/milvus-io/milvus/issues/45934 pinIndex is a const and only do read operations rlock would be the right choice for performance Signed-off-by: Lanqing Yang <lanqingy93@gmail.com>	2025-12-15 22:33:16 +08:00
congqixia	ab90dd287f	fix: bump milvus-storage to fix initialization race condition (#46336 ) Related to #44647 Update milvus-storage from 91df193 to 839a8e5 to include milvus-io/milvus-storage#342, which fixes a race condition in S3GlobalContext initialization. The fix moves the is_initialized_ flag update from before DoInitialize() to after it completes. This ensures the initialization flag is only set to true after the actual initialization is done, preventing potential issues if DoInitialize() fails or if other code checks the flag during initialization. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-12-15 19:51:15 +08:00
congqixia	18fbaaca0a	enhance: support specified version manifest write (#46331 ) Related to #44956 Support specified version manifest write - Add `baseVersion` parameter to `NewPackedRecordManifestWriter` and `NewFFIPackedWriter` to support writing manifest based on a specific version instead of always overwriting the latest - Add `manifestPath` tracking in `BulkPackWriterV2` to maintain manifest state across writes - Add `GetManifestInfo` method to parse existing manifest path and extract base path and version - Add `UpdateManifestPath` metacache action to track manifest path in segment info - Update `transaction_begin` FFI call to use the specified base version --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-12-15 19:49:14 +08:00
Feilong Hou	971085b033	test: enable debug_mode to observe test case instability. (#46341 ) Issue: #46333 Signed-off-by: Eric Hou <eric.hou@zilliz.com> Co-authored-by: Eric Hou <eric.hou@zilliz.com>	2025-12-15 17:55:16 +08:00
zhuwenxing	5f8daa0f6d	test: Add geometry operations test suite for RESTful API (#46174 ) /kind improvement --------- Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>	2025-12-15 15:45:15 +08:00
Zhen Ye	9ce5f08cc7	fix: lost broadcasting persisted before making message broadcast (#46328 ) issue: #43897 Signed-off-by: chyezh <chyezh@outlook.com>	2025-12-15 13:59:15 +08:00
Spade A	f6f716bcfd	feat: impl StructArray -- support embedding searches embeddings in embedding list with element level filter expression (#45830 ) issue: https://github.com/milvus-io/milvus/issues/42148 For a vector field inside a STRUCT, since a STRUCT can only appear as the element type of an ARRAY field, the vector field in STRUCT is effectively an array of vectors, i.e. an embedding list. Milvus already supports searching embedding lists with metrics whose names start with the prefix MAX_SIM_. This PR allows Milvus to search embeddings inside an embedding list using the same metrics as normal embedding fields. Each embedding in the list is treated as an independent vector and participates in ANN search. Further, since STRUCT may contain scalar fields that are highly related to the embedding field, this PR introduces an element-level filter expression to refine search results. The grammar of the element-level filter is: element_filter(structFieldName, $[subFieldName] == 3) where $[subFieldName] refers to the value of subFieldName in each element of the STRUCT array structFieldName. It can be combined with existing filter expressions, for example: "varcharField == 'aaa' && element_filter(struct_field, $[struct_int] == 3)" A full example: ``` struct_schema = milvus_client.create_struct_field_schema() struct_schema.add_field("struct_str", DataType.VARCHAR, max_length=65535) struct_schema.add_field("struct_int", DataType.INT32) struct_schema.add_field("struct_float_vec", DataType.FLOAT_VECTOR, dim=EMBEDDING_DIM) schema.add_field( "struct_field", datatype=DataType.ARRAY, element_type=DataType.STRUCT, struct_schema=struct_schema, max_capacity=1000, ) ... filter = "varcharField == 'aaa' && element_filter(struct_field, $[struct_int] == 3 && $[struct_str] == 'abc')" res = milvus_client.search( COLLECTION_NAME, data=query_embeddings, limit=10, anns_field="struct_field[struct_float_vec]", filter=filter, output_fields=["struct_field[struct_int]", "varcharField"], ) ``` TODO: 1. When an `element_filter` expression is used, a regular filter expression must also be present. Remove this restriction. 2. Implement `element_filter` expressions in the `query`. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-12-15 12:01:15 +08:00
Xiaofan	ca2e27f576	enhance: remove uncessary segment size estimation and make it configurable (#46302 ) fix #46300 remove unused segment size estimation, and make size estimation configurable Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>	2025-12-13 02:58:46 +08:00
Zhen Ye	05b8b3b4c6	fix: stack overflow when gc json or json key (#46317 ) issue: #46316 Signed-off-by: chyezh <chyezh@outlook.com>	2025-12-12 20:05:15 +08:00
huanghaoyuanhhy	addb66f89c	fix: fix DescribeCollection always returning db_id = 0 (#46092 ) fix: #46089 Signed-off-by: huanghaoyuanhhy <haoyuan.huang@zilliz.com>	2025-12-12 20:03:14 +08:00
Zhen Ye	d24cd6200b	fix: always retry when writing binlog (#46309 ) issue: #46205 --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-12-12 18:27:15 +08:00
Buqian Zheng	76aa00a4c6	fix: fix CanUseIndexForJson (#46286 ) issue: https://github.com/milvus-io/milvus/issues/46269 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-12-12 18:25:20 +08:00
aoiasd	0c54875832	enhance: ValidateAnalyzer return ValidateAnalyzerResponse instead common.Status (#46292 ) Prepare for return more info when validate analyzer. relate: https://github.com/milvus-io/milvus/issues/43687 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-12-12 10:35:14 +08:00
sijie-ni-0214	f51de1a8ab	feat: support TruncateCollection api to clear collection data (#46167 ) issue: https://github.com/milvus-io/milvus/issues/46166 --------- Signed-off-by: sijie-ni-0214 <sijie.ni@zilliz.com>	2025-12-12 10:31:14 +08:00
wei liu	d2c403ce4b	enhance: Improve disk quota metrics update when cluster quota changes (#46278 ) issue: #46277 - Update db/collection/partition disk quota metrics when cluster disk quota changes, since they use cluster quota as default value - Fix incorrect label "collection" to "partition" in disk quota per partition watcher Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-12-11 20:45:14 +08:00
aoiasd	82e1dfc7d0	fix: highlight queries not work when not BM25 search (#46288 ) Should aways init highlight queries. relate: https://github.com/milvus-io/milvus/issues/42589 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-12-11 20:07:14 +08:00
wei liu	a195c33b71	fix: Prevent target update blocking when replica lacks nodes during scaling (#46088 ) issue: #46087 The previous implementation checked if the total number of ready delegators >= replicaNum per channel. This could cause target updates to block indefinitely when dynamically increasing replicas, because some replicas might lack nodes while the total count still met the threshold. This change switches to a replica-based check approach: - Iterate through each replica individually - For each replica, verify all channels have at least one ready delegator - Only sync delegators from fully ready replicas - Skip replicas that are not ready (e.g., missing nodes for some channels) This ensures target updates can proceed with ready replicas while replicas that lack nodes during dynamic scaling are gracefully skipped. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-12-11 17:09:14 +08:00
Feilong Hou	224a7943ad	test: add e2e case for timestamptz on restful (#46254 ) Issue: #46253 On branch feature/timestamps Changes to be committed: new file: testcases/test_timestamptz.py Signed-off-by: Eric Hou <eric.hou@zilliz.com> Co-authored-by: Eric Hou <eric.hou@zilliz.com>	2025-12-11 14:21:13 +08:00
zhagnlu	a86b8b7a12	enhance: move jsonshredding meta from parquet to meta.json (#46130 ) #42533 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-12-11 14:01:13 +08:00
zhuwenxing	3aa0b769e5	test: add unique error message collection in chaos checker (#46262 ) /kind improvement - Add normalize_error_message function to extract and normalize error text - Collect unique error messages during chaos test execution - Display error details in assertion messages for better debugging Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>	2025-12-11 13:49:12 +08:00
zhuwenxing	75d6f0d509	test: add ST_ISVALID geometry function test cases (#46232 ) /kind improvement Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>	2025-12-11 13:47:21 +08:00
congqixia	b2c49d0197	enhance: bump milvus-storage to resolve credentials provider namespace conflict (#46263 ) Upgrade milvus-storage from 33bf815 to 91df193. This includes the fix from milvus-io/milvus-storage#337, which resolves a namespace collision where both Milvus and milvus-storage defined identical credentials provider classes in the same namespace. Although no compile-time redefinition errors occurred, the dynamic linker could resolve to the wrong implementation at runtime, potentially causing cloud authentication failures due to configuration mismatches. The fix changes milvus-storage's credentials provider namespace to `milvus_storage`, ensuring each project uses its own implementation. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-12-11 10:09:13 +08:00
Zhen Ye	15f8dfc7ad	enhance: introduce a tolerance duration to delay the drop operation (#46251 ) issue: #46214 Signed-off-by: chyezh <chyezh@outlook.com>	2025-12-10 19:57:13 +08:00
congqixia	416113d11a	enhance: Change RootCoord default port to non-ephemeral port (#46256 ) Related to #46255 Change RootCoord default gRPC port from 53100 to 22125 to avoid conflicts with ephemeral port range. The previous port 53100 falls within the Linux ephemeral port range (32768-60999), which could cause conflicts with other connections including Milvus's own outbound connections. Port 22125 is outside the ephemeral range and provides more reliable service binding. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-12-10 19:47:13 +08:00
yihao.dai	f32f2694bc	enhance: Implement new FlushAllMessage and refactor flush all (#45920 ) This PR: 1. Define and implement the new FlushAllMessage. 2. Refactor FlushAll to flush the entire cluster. issue: https://github.com/milvus-io/milvus/issues/45919 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-12-10 19:27:13 +08:00
congqixia	8780e12570	fix: use assertion instead of modifying schema under shared lock (#46242 ) Related to #46225 Replace the heterogeneous insert data handling logic that modified schema_ while holding a shared lock with an assertion. The previous implementation had a concurrency bug where schema modification operations were performed under a shared_lock, which violates mutex semantics and can lead to data races. Issue: #46225 reported two problems: 1. Schema modification under shared_lock (not exclusive lock) 2. Access to schema_ not protected by mutex in growing segment The removed code attempted to handle "added fields" by: - Adding new field to schema (schema_->AddField) - Appending field metadata to insert_record_ - Setting default data for existing rows All these write operations were performed while holding only a shared_lock, which is incorrect since shared_locks are meant for read-only operations. This fix replaces the unsafe modification with an assertion that fails if an unexpected new field is encountered in a growing segment with existing data. The proper handling of schema changes should go through the Reopen() path which correctly acquires a unique_lock before modifying schema_. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-12-10 16:25:13 +08:00
Chun Han	d9f8e38d6a	fix: query failed for int value on edge(#46075 ) (#46126 ) related: #46075 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-12-10 15:59:12 +08:00
Buqian Zheng	ab2e51b1c7	fix: VectorArrayChunkWriter::calculate_size (#46244 ) issue: #46238 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-12-10 15:27:14 +08:00
sparknack	5fb420b156	fix: milvus-common update (#45929 ) issue: #41435 fix some usage tracking bugs in caching layer. Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-12-10 14:53:13 +08:00
aoiasd	c84b6d56f8	fix: char_group tokenizer only support one byte char as delimiters (#46193 ) relate: https://github.com/milvus-io/milvus/issues/46192 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-12-10 14:33:13 +08:00
wgcn	6e2872c982	fix: wrong reduce lantency metric (#46233 ) #46248 Signed-off-by: wgcn <wangg48@chinatelecom.cn> Co-authored-by: wgcn <wangg48@chinatelecom.cn>	2025-12-10 14:17:13 +08:00
Buqian Zheng	85a7a7b1e3	fix: skip json path index if the query path includes number (#46200 ) issue: #45511 our tantivy inverted index currently does not include item index if the value is an array, thus we can't do `a[0] == 'b'` type of look up in the inverted index. for such, we need to skip the index and use brute force search. we may improve our index in the future, so this is a temp solution Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-12-10 13:59:13 +08:00
cai.zhang	bb486c0db3	fix: Fix path concatenation error when rootPath = "." in minio (#46220 ) issue: #46219 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-12-10 13:53:13 +08:00
liliu-z	3f063a29b0	feat: Support Search By PK (#45820 ) issue: #39157 Overview: Support search by PK by resolving IDs to vectors on Proxy side. Upgrade go-api to adapt to new proto definitions. Design: - Upgrade milvus-proto/go-api to latest master. - Implement handleIfSearchByPK in Proxy: resolve IDs to vectors via internal Query, then rewrite SearchRequest. - Adapt to 'SearchInput' oneof field in SearchRequest across client and handlers. - Fix binary vector stride calculation bug in placeholder utils. Compatibility: - Old Pymilvus can still work w/o this feature What is included: - Dense and Sparse - Multi vector fields - Rejection on BM25 What is not include: - Hybrid Search - EmbeddingList - Restful API Signed-off-by: Li Liu <li.liu@zilliz.com>	2025-12-10 10:59:14 +08:00
cai.zhang	b5e11f810d	fix: Fix panic when search empty result with output geometry field (#46230 ) issue: #46146 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-12-09 20:37:13 +08:00
zhagnlu	8f0b7983ec	enhance: add jemalloc cached monitor (#46041 ) #46133 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-12-09 19:53:13 +08:00
zhuwenxing	f9ff0e8402	test: add testcases for add/alter/drop text embedding function (#46229 ) /kind improvement Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>	2025-12-09 19:23:14 +08:00
zhuwenxing	abe0318bec	test: use predefined fake_de instead of creating new Faker instances to reduce run time (#46194 ) related: https://github.com/milvus-io/milvus/issues/46014 /kind improvement Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>	2025-12-09 17:59:14 +08:00
wei liu	046693eaf7	test: [skip e2e] fix race condition in TestQueryNodePipeline/TestBasic (#46218 ) issue: #46217 The test was failing intermittently because it didn't wait for the pipeline to finish processing messages before exiting. The test sent a message to the pipeline and immediately returned, causing the deferred Close() to execute before ProcessInsert, ProcessDelete, and UpdateTSafe could be called. Fix by: - Moving message construction before mock expectations setup - Adding a done channel to synchronize on UpdateTSafe completion - Waiting for the signal before test exits Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-12-09 17:57:14 +08:00
zhenshan.cao	765768b0e4	fix: restfulv2 parsing fixes and schema defaults support with timestamptz (#46057 ) issue: https://github.com/milvus-io/milvus/issues/44585 Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2025-12-09 17:53:17 +08:00
Feilong Hou	624147740b	test: fix timestamptz e2e case failure on Jenkins Weekly (#46210 ) Issue: #46188 Bug was caused by inconsistent version of tzdata as well as wrong month assignment in convert_timestamptz function. Also fix when debug_mode=True the compare function can correctly return True or False. --------- Signed-off-by: Eric Hou <eric.hou@zilliz.com> Co-authored-by: Eric Hou <eric.hou@zilliz.com>	2025-12-09 16:09:15 +08:00
wei liu	d7050c417f	fix: Add field data alignment validation to prevent partial update panic (#46177 ) issue: #46176 - Add checkAligned validation before processing partial update field data to prevent index out of range panic when field data arrays have mismatched lengths - Fix GetNumRowOfFieldDataWithSchema to handle Timestamptz string format and Geometry WKT format properly - Add unit tests for empty data array scenarios in partial update --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-12-09 14:17:12 +08:00
congqixia	728cdc15b2	fix: fill partition_id in load index info and close RemoteOutputStream properly (#46203 ) This PR fixes two issues related to segment loading and index deserialization: 1. Fill partition_id in LoadIndexInfo when converting field index info, which is required by cardinal (DiskANN) index deserialization. 2. Close RemoteOutputStream in destructor to ensure buffer flushed and resources released properly. issue: #46141 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-12-09 13:27:13 +08:00
zhuwenxing	8fac376afd	test: upgrade minio sdk from 7.1.5 to 7.2.0 (#46186 ) /kind improvement fix minio sdk error ``` 2025-12-08T07:54:48Z {container="step-test"} def _put_object(self, bucket_name, object_name, data, headers, 2025-12-08T07:54:48Z {container="step-test"} query_params=None): 2025-12-08T07:54:48Z {container="step-test"} """Execute PutObject S3 API.""" 2025-12-08T07:54:48Z {container="step-test"} response = self._execute( 2025-12-08T07:54:48Z {container="step-test"} "PUT", 2025-12-08T07:54:48Z {container="step-test"} bucket_name, 2025-12-08T07:54:48Z {container="step-test"} object_name, 2025-12-08T07:54:48Z {container="step-test"} body=data, 2025-12-08T07:54:48Z {container="step-test"} headers=headers, 2025-12-08T07:54:48Z {container="step-test"} query_params=query_params, 2025-12-08T07:54:48Z {container="step-test"} no_body_trace=True, 2025-12-08T07:54:48Z {container="step-test"} ) 2025-12-08T07:54:48Z {container="step-test"} return ObjectWriteResult( 2025-12-08T07:54:48Z {container="step-test"} bucket_name, 2025-12-08T07:54:48Z {container="step-test"} object_name, 2025-12-08T07:54:48Z {container="step-test"} > response.getheader("x-amz-version-id"), 2025-12-08T07:54:48Z {container="step-test"} response.getheader("etag").replace('"', ""), 2025-12-08T07:54:48Z {container="step-test"} response.getheaders(), 2025-12-08T07:54:48Z {container="step-test"} ) 2025-12-08T07:54:48Z {container="step-test"} E AttributeError: 'HTTPResponse' object has no attribute 'getheader' 2025-12-08T07:54:48Z {container="step-test"} 2025-12-08T07:54:48Z {container="step-test"} /usr/local/lib/python3.10/site-packages/minio/api.py:1582: AttributeError ``` --------- Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>	2025-12-09 11:39:12 +08:00
Zhen Ye	b8086cb62b	fix: lost database in restful v2 (#46171 ) issue: #45812 --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-12-09 10:59:13 +08:00
Zhen Ye	459425ac84	fix: wrong context using by session of grpc client (#46183 ) issue: #46182 Signed-off-by: chyezh <chyezh@outlook.com>	2025-12-08 21:47:12 +08:00

1 2 3 4 5 ...

23668 Commits