milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-07 19:31:51 +08:00

Author	SHA1	Message	Date
Spade A	7cb15ef141	feat: impl StructArray -- optimize vector array serialization (#44035 ) issue: https://github.com/milvus-io/milvus/issues/42148 Optimized from Go VectorArray → VectorArray Proto → Binary → C++ VectorArray Proto → C++ VectorArray local impl → Memory to Go VectorArray → Arrow ListArray → Memory --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-09-03 16:39:53 +08:00
Bingyi Sun	8622a2726a	enhance: Fix operator apply in UpdateSegment (#44158 ) Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-09-03 00:27:53 +08:00
Bingyi Sun	0c0630cc38	feat: support dropping index without releasing collection (#42941 ) issue: #42942 This pr includes the following changes: 1. Added checks for index checker in querycoord to generate drop index tasks 2. Added drop index interface to querynode 3. To avoid search failure after dropping the index, the querynode allows the use of lazy mode (warmup=disable) to load raw data even when indexes contain raw data. 4. In segcore, loading the index no longer deletes raw data; instead, it evicts it. 5. In expr, the index is pinned to prevent concurrent errors. --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-09-02 16:17:52 +08:00
Zhen Ye	9e2d1963d4	enhance: support cchannel for streaming service (#44143 ) issue: #43897 - add cchannel as a special vchannel to hold some ddl and dcl. Signed-off-by: chyezh <chyezh@outlook.com>	2025-09-02 10:05:52 +08:00
zhagnlu	fc876639cf	enhance: support json stats with shredding design (#42534 ) #42533 Co-authored-by: luzhang <luzhang@zilliz.com>	2025-09-01 10:49:52 +08:00
Zhen Ye	3327df72e4	enhance: make immutable message as the param of ack operation for cdc (#43900 ) issue: #43897 - The original broadcast ack operation need to recover message from etcd, which can not support cdc. - immutable message will set as the ack parameter to fix it. Signed-off-by: chyezh <chyezh@outlook.com>	2025-09-01 10:21:52 +08:00
cai.zhang	c16296a53f	fix: Handle compaction retry state (#44119 ) issue: #43776 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-08-29 13:31:51 +08:00
wei liu	d84c4c580a	enhance: [DataCoord] Remove full-collection index work from metrics (#43859 ) issue: #43858 - Remove full-collection index handling in getCollectionMetrics - Avoid heavy metadata scans and RPC calls during metrics - Reduce latency and CPU/memory usage on large datasets - No functional change to metrics semantics Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-08-29 12:05:50 +08:00
congqixia	ba88cfa7a9	enhance: Add unified GRPC latency metrics in inteceptor (#44089 ) Related to #43966 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-08-28 09:53:51 +08:00
cai.zhang	7f470e6bd3	fix: Fix retry state with palyload is not nil (#44068 ) issue: #43776 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-08-27 18:11:49 +08:00
Chun Han	da156981c6	feat: milvus support posix-compatible mode(milvus-io#43942) (#43944 ) related: #43942 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-08-27 16:29:50 +08:00
XuanYang-cn	37a447d166	feat: Add CMEK cipher plugin (#43722 ) 1. Enable Milvus to read cipher configs 2. Enable cipher plugin in binlog reader and writer 3. Add a testCipher for unittests 4. Support pooling for datanode 5. Add encryption in storagev2 See also: #40321 Signed-off-by: yangxuan <xuan.yang@zilliz.com> --------- Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-08-27 11:15:52 +08:00
Spade A	d6a428e880	feat: impl StructArray -- support create index for vector array (embedding list) and search on it (#43726 ) Ref https://github.com/milvus-io/milvus/issues/42148 This PR supports create index for vector array (now, only for `DataType.FLOAT_VECTOR`) and search on it. The index type supported in this PR is `EMB_LIST_HNSW` and the metric type is `MAX_SIM` only. The way to use it: ```python milvus_client = MilvusClient("xxx:19530") schema = milvus_client.create_schema(enable_dynamic_field=True, auto_id=True) ... struct_schema = milvus_client.create_struct_array_field_schema("struct_array_field") ... struct_schema.add_field("struct_float_vec", DataType.ARRAY_OF_VECTOR, element_type=DataType.FLOAT_VECTOR, dim=128, max_capacity=1000) ... schema.add_struct_array_field(struct_schema) index_params = milvus_client.prepare_index_params() index_params.add_index(field_name="struct_float_vec", index_type="EMB_LIST_HNSW", metric_type="MAX_SIM", index_params={"nlist": 128}) ... milvus_client.create_index(COLLECTION_NAME, schema=schema, index_params=index_params) ``` Note: This PR uses `Lims` to convey offsets of the vector array to knowhere where vectors of multiple vector arrays are concatenated and we need offsets to specify which vectors belong to which vector array. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-08-20 10:27:46 +08:00
wei liu	3e9e830074	enhance: Implement rewatch mechanism for etcd failure scenarios (#43829 ) issue: #43828 Implement robust rewatch mechanism to handle etcd connection failures and node reconnection scenarios in DataCoord and QueryCoord, along with heartbeat lag monitoring capabilities. Changes include: - Implement rewatchDataNodes/rewatchQueryNodes callbacks for etcd reconnection scenarios - Add idempotent rewatchNodes method to handle etcd session recovery gracefully - Add QueryCoordLastHeartbeatTimeStamp metric for monitoring node heartbeat lag - Clean up heartbeat metrics when nodes go down to prevent metric leaks --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-08-14 10:31:44 +08:00
cai.zhang	77f2fb562f	fix: Fix task state is InProgress but payload is nil (#43777 ) issue: #43776 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-08-11 14:13:42 +08:00
aoiasd	eca51ed2c6	enhance: add file resource api (#43766 ) relate: https://github.com/milvus-io/milvus/issues/43687 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-08-08 14:17:41 +08:00
Zhen Ye	5551d99425	enhance: remove old arch non-streaming arch code (#43651 ) issue: #41609 - remove all dml dead code at proxy - remove dead code at l0_write_buffer - remove msgstream dependency at proxy - remove timetick reporter from proxy - remove replicate stream implementation --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-08-06 14:41:40 +08:00
wei liu	1fae8f5ae3	enhance: Optimize FlushAll performance for multi-table scenarios (#43339 ) Replace multiple per-table flush RPC calls with single FlushAll RPC to improve performance in multi-table scenarios. issue: #43338 - Implement server-side FlushAll request processing in DataCoord/MixCoord - Add flushAllTask to handle unified flush operations across all tables - Replace proxy-side per-table flush iteration with single RPC call - Support both streaming and non-streaming service execution paths - Add comprehensive unit tests for new FlushAll implementation --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-07-30 15:37:37 +08:00
Zhen Ye	cd38d65417	fix: make savebinlogpath idompotent at binlog level (#43615 ) issue: #43574 - update all binlog every time when calling udpate savebinlogpath. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-07-29 19:47:36 +08:00
yihao.dai	a29b3272b0	fix: Improve import memory management to prevent OOM (#43568 ) 1. Use blocking memory allocation to wait until memory becomes available 2. Perform memory allocation at the file level instead of per task 3. Limit Parquet file reader batch size to prevent excessive memory consumption 4. Limit import buffer size from 20% to 10% of total memory issue: https://github.com/milvus-io/milvus/issues/43387, https://github.com/milvus-io/milvus/issues/43131 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-07-28 21:25:35 +08:00
yihao.dai	192521c6bd	enhance: Fix unbalanced task scheduling (#43581 ) Make scheduler always pick the node with the most available slots. issue: https://github.com/milvus-io/milvus/issues/43580 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-07-28 12:58:55 +08:00
Zhen Ye	7877aaa96c	fix: dirty cp metrics after drop (#43567 ) issue: #42688 - The channel cp is dropped by garbage collector - The channel is dropped and the cp is marked as math.Uint64 - If we drop it here, the update channel checkpoints will write the dirty cp back. Signed-off-by: chyezh <chyezh@outlook.com>	2025-07-27 23:22:55 +08:00
Zhen Ye	feb5db60f2	fix: make flush save binlog paths idempotent (#43579 ) issue: #43574 Signed-off-by: chyezh <chyezh@outlook.com>	2025-07-27 23:14:55 +08:00
Spade A	faeb7fd410	feat: impl StructArray -- create schema, insert, and retrieve data (#42855 ) Ref https://github.com/milvus-io/milvus/issues/42148 https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of storage for handling with VectorArray. This PR: 1. impls the go part of storage for VectorArray 2. impls the collection creation with StructArrayField and VectorArray 3. insert and retrieve data from the collection. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>	2025-07-27 01:30:55 +08:00
Zhen Ye	070aabd27e	enhance: fix remove flushing state of segment (#43560 ) issue: #43559, #42884 - also fix the data lost when streaming resuming from old arch message. Signed-off-by: chyezh <chyezh@outlook.com>	2025-07-25 18:08:54 +08:00
yihao.dai	0e1f367164	enhance: Fail compaction task to prevent data loss (#43545 ) We’ve frequently observed data loss caused by broken mutual exclusion in compaction tasks. This PR introduces a post-check: before modifying metadata upon compaction task completion, it verifies the state of the input segments. If any input segment has been dropped, the compaction task will be marked as failed. issue: https://github.com/milvus-io/milvus/issues/43513 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-07-25 16:24:54 +08:00
yihao.dai	804a7692a6	fix: Fix delete loss caused by missing mutual exclusion in sort compaction (#43540 ) issue: https://github.com/milvus-io/milvus/issues/43513 Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-07-24 14:53:34 +08:00
Zhen Ye	e9ab73e93d	enhance: add schema version at recovery storage (#43500 ) issue: #43072, #43289 - manage the schema version at recovery storage. - update the schema when creating collection or alter schema. - get schema at write buffer based on version. - recover the schema when upgrading from 2.5. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-07-23 21:38:54 +08:00
cai.zhang	74c08069ef	fix: Set result storage version for sort compaction (#43521 ) issue: #43520 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-07-23 19:04:53 +08:00
cai.zhang	f19e0ef6e4	fix: Ensure task execution order by using a priority queue (#43271 ) issue: #43260 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-07-22 17:42:53 +08:00
yihao.dai	a839017e81	fix: Handle retry state in import task (#43474 ) issue: https://github.com/milvus-io/milvus/issues/43473 Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-07-22 14:52:53 +08:00
yihao.dai	5124ed9758	fix: Fix import fileStats incorrectly set to nil (#43463 ) 1. Ensure that tasks in the InProgress state return valid fileStats. 2. Enhance import logs. issue: https://github.com/milvus-io/milvus/issues/43387 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-07-22 12:37:01 +08:00
Bingyi Sun	09b6407e63	enhance: optimize error msg for json index inconsistent parameters (#43345 ) Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-07-21 00:32:52 +08:00
yihao.dai	df8ceb123b	enhance: Support parallel execution of L0 import tasks (#43213 ) issue: https://github.com/milvus-io/milvus/issues/43212 Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-07-17 10:14:50 +08:00
XuanYang-cn	4dcaa97682	fix: Use diskSegmentMaxSize for coll with sparse and dense vectors (#43194 ) Previous code uses diskSegmentMaxSize if and only if all of the collection's vector fields are indexed with DiskANN index. When introducing sparse vectors, since sparse vector cannot be indexed with DiskANN index, collections with both dense and sparse vectors will use maxSize instead. This PR changes the requirments of using diskSegmentMaxSize to all dense vectors are indexed with DiskANN indexs, ignoring sparse vector fields. See also: #43193 Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-07-16 18:04:52 +08:00
cai.zhang	c54a04c71c	fix: L2 segments remain as L2 even after sort compaction (#43237 ) issue: #43186 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-07-11 11:30:48 +08:00
congqixia	5a9efb3f81	enhance: [StorageV2] Refine storage rw option usage & validation (#43175 ) Related to #39173 This PR: - Make all datanode task passes storage config via storage config option - Remove legacy comments, rootPath & bucketName parameters - Fix clustering compaction option behavior - Add validation logic for `rwOptions` - Use correct storageType from storageConfig - Add storage config in sync task --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-11 01:14:48 +08:00
cai.zhang	3ffd44f302	fix: Fix remaining issues with Datanode pooling and StorageV2 (#43147 ) issue: #43146 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-07-10 14:26:48 +08:00
yihao.dai	ee9a95189a	enhance: Print segments info after import done (#43200 ) issue: https://github.com/milvus-io/milvus/issues/42488 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-07-10 12:38:47 +08:00
cai.zhang	47144429bf	fix: Fix regeneratePartitionStats failed after restore clusteringCompactionTask (#43205 ) issue: #43186 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-07-10 10:40:47 +08:00
cai.zhang	6989e18599	enhance: Move sort stats task to sort compaction (#42562 ) issue: #42560 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-07-08 20:22:47 +08:00
yihao.dai	9cbd194c6b	fix: Prevent import from generating small binlogs (#43132 ) - Introduce dynamic buffer sizing to avoid generating small binlogs during import - Refactor import slot calculation based on CPU and memory constraints - Implement dynamic pool sizing for sync manager and import tasks according to CPU core count issue: https://github.com/milvus-io/milvus/issues/43131 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-07-07 21:32:47 +08:00
Zhen Ye	e97e44d56e	enhance: limit the gc concurrency when cpu is high (#43059 ) issue: #42833 Signed-off-by: chyezh <chyezh@outlook.com>	2025-07-04 09:22:43 +08:00
cai.zhang	f6b2a71c95	enhance: Remove chunkmanager-related dependencies from datanode (#43021 ) issue: #41611 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-07-03 14:44:45 +08:00
Spade A	26ec841feb	feat: optimize `Like` query with n-gram (#41803 ) Ref #42053 This is the first PR for optimizing `LIKE` with ngram inverted index. Now, only VARCHAR data type is supported and only InnerMatch LIKE (%xxx%) query is supported. How to use it: ``` milvus_client = MilvusClient("http://localhost:19530") schema = milvus_client.create_schema() ... schema.add_field("content_ngram", DataType.VARCHAR, max_length=10000) ... index_params = milvus_client.prepare_index_params() index_params.add_index(field_name="content_ngram", index_type="NGRAM", index_name="ngram_index", min_gram=2, max_gram=3) milvus_client.create_collection(COLLECTION_NAME, ...) ``` min_gram and max_gram controls how we tokenize the documents. For example, for min_gram=2 and max_gram=4, we will tokenize each document with 2-gram, 3-gram and 4-gram. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-07-01 10:08:44 +08:00
Zhen Ye	2d73e6eaa8	fix: mixcoord will not handle timetick anymore (#42965 ) issue: #42954 Signed-off-by: chyezh <chyezh@outlook.com>	2025-06-26 19:14:42 +08:00
yihao.dai	d7c9914eff	fix: Consider fields number when preallocating ids for import (#42810 ) In corner cases where there are many fields but only a small number of rows to import, the default preallocated IDs may be insufficient. To address this, consider the number of fields when preallocating IDs. issue: https://github.com/milvus-io/milvus/issues/42518 Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-06-25 23:38:41 +08:00
XuanYang-cn	0adf44e6f8	enhance: Check if segment has too many deletions together (#42668 ) This PR moves the deltalog file count check inside hasTooManyDeletions check. Unifies the logic on checking if a segment has too many deletions including: delta log count, deleted rows ratio and deltalog size. This change removes several uncessary traverse through segment's binlogs and deltalogs. And add more clear trigger logs Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-06-24 16:30:49 +08:00
Zhen Ye	2fd8f910b0	fix: data duplicated when msgdispatcher make splitting (#42827 ) issue: #41570 Signed-off-by: chyezh <chyezh@outlook.com>	2025-06-19 16:32:39 +08:00
cai.zhang	a9dcd4a380	enhance: ChunkManager is no longer created during datanode initialization (#42791 ) issue: #41611 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-06-17 17:06:38 +08:00

1 2 3 4 5 ...

1477 Commits