milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2025-12-07 09:38:39 +08:00

Author	SHA1	Message	Date
Gao	d3784c6515	enhance: add storage resource usage for vector search (#44308 ) issue: #44212 Implement search/query storage usage statistics in go side(result reduce), only record storage usage in vector search C++ path. Need to be implemented in query c++ path in next prs. --------- Signed-off-by: chasingegg <chao.gao@zilliz.com> Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com> Co-authored-by: marcelo.chen <marcelo.chen@zilliz.com>	2025-09-19 20:20:02 +08:00
congqixia	7b83314bf3	enhance: [StorageV2] Make datanode use non-singleton fs (#44418 ) Related to #39173 According to the current design, datanode shall create fs from storage config in request instead of using singleton fs. This PR upgrade milvus-storage and make packed reader/writer compose new fs from storage config. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-09-18 20:06:00 +08:00
Spade A	7cb15ef141	feat: impl StructArray -- optimize vector array serialization (#44035 ) issue: https://github.com/milvus-io/milvus/issues/42148 Optimized from Go VectorArray → VectorArray Proto → Binary → C++ VectorArray Proto → C++ VectorArray local impl → Memory to Go VectorArray → Arrow ListArray → Memory --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-09-03 16:39:53 +08:00
zhagnlu	fc876639cf	enhance: support json stats with shredding design (#42534 ) #42533 Co-authored-by: luzhang <luzhang@zilliz.com>	2025-09-01 10:49:52 +08:00
congqixia	e3b3502287	fix: Use correct regex for cppcheck (#44077 ) Related to #44076 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-08-27 20:57:50 +08:00
Chun Han	da156981c6	feat: milvus support posix-compatible mode(milvus-io#43942) (#43944 ) related: #43942 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-08-27 16:29:50 +08:00
XuanYang-cn	37a447d166	feat: Add CMEK cipher plugin (#43722 ) 1. Enable Milvus to read cipher configs 2. Enable cipher plugin in binlog reader and writer 3. Add a testCipher for unittests 4. Support pooling for datanode 5. Add encryption in storagev2 See also: #40321 Signed-off-by: yangxuan <xuan.yang@zilliz.com> --------- Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-08-27 11:15:52 +08:00
Spade A	8456f824be	feat: impl StructArray -- miscellaneous staffs for struct array (#43960 ) Ref https://github.com/milvus-io/milvus/issues/42148 1. enable storage v2 2. implement some missing staffs 3. fix some bugs and add tests --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-08-26 21:35:53 +08:00
Tianx	c0d62268ac	feat: add timesatmptz data type (#44005 ) issue: https://github.com/milvus-io/milvus/issues/27467 > https://github.com/milvus-io/milvus/issues/27467#issuecomment-3092211420 > * [x] M1 Create collection with timestamptz field > * [x] M2 Insert timestamptz field data > * [x] M3 Retrieve timestamptz field data > * [x] M4 Implement handoff[ ] The second PR of issue: https://github.com/milvus-io/milvus/issues/27467, which completes M1-M4 described above. --------- Signed-off-by: xtx <xtianx@smail.nju.edu.cn>	2025-08-26 15:59:53 +08:00
Gao	e97a618630	enhance: support readAt interface for remote input stream (#43997 ) #42032 Also, fix the cacheoptfield method to work in storagev2. Also, change the sparse related interface for knowhere version bump #43974 . Also, includes https://github.com/milvus-io/milvus/pull/44046 for metric lost. --------- Signed-off-by: chasingegg <chao.gao@zilliz.com> Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com> Signed-off-by: Congqi Xia <congqi.xia@zilliz.com> Co-authored-by: marcelo.chen <marcelo.chen@zilliz.com> Co-authored-by: Congqi Xia <congqi.xia@zilliz.com>	2025-08-26 11:19:58 +08:00
congqixia	7963b17ac1	fix: Revert "fix: Use `folly::SharedMutex` preventing starvation (#43937 )" (#43959 ) Related to #43958 This reverts commit 580350495ab40b3c0a2ec473882258edf6d7dd08. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-08-21 10:09:47 +08:00
Spade A	d6a428e880	feat: impl StructArray -- support create index for vector array (embedding list) and search on it (#43726 ) Ref https://github.com/milvus-io/milvus/issues/42148 This PR supports create index for vector array (now, only for `DataType.FLOAT_VECTOR`) and search on it. The index type supported in this PR is `EMB_LIST_HNSW` and the metric type is `MAX_SIM` only. The way to use it: ```python milvus_client = MilvusClient("xxx:19530") schema = milvus_client.create_schema(enable_dynamic_field=True, auto_id=True) ... struct_schema = milvus_client.create_struct_array_field_schema("struct_array_field") ... struct_schema.add_field("struct_float_vec", DataType.ARRAY_OF_VECTOR, element_type=DataType.FLOAT_VECTOR, dim=128, max_capacity=1000) ... schema.add_struct_array_field(struct_schema) index_params = milvus_client.prepare_index_params() index_params.add_index(field_name="struct_float_vec", index_type="EMB_LIST_HNSW", metric_type="MAX_SIM", index_params={"nlist": 128}) ... milvus_client.create_index(COLLECTION_NAME, schema=schema, index_params=index_params) ``` Note: This PR uses `Lims` to convey offsets of the vector array to knowhere where vectors of multiple vector arrays are concatenated and we need offsets to specify which vectors belong to which vector array. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-08-20 10:27:46 +08:00
congqixia	580350495a	fix: Use `folly::SharedMutex` preventing starvation (#43937 ) Related to #43936 This PR: - Use `folly::SharedMutex` instead of `std::shared_mutex` preventing starvation - Use `folly::SharedMutex::WriteHolder/ReadHolder` instead of std::shared_lock and std::unique_lock to get better performance Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-08-19 20:05:46 +08:00
congqixia	b6199acb05	enhance: Utilize `search_batch_pks` for `search_ids` of PkTerm (#43751 ) Related to #43660 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-08-07 14:19:40 +08:00
Chun Han	d826d6ac91	fix: try to get span raw data for variable length data type(#43544 ) (#43705 ) related: #43544 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-08-04 11:15:38 +08:00
congqixia	4aff581007	enhance: Pass callback in search batch pks to void large result (#43695 ) Related to #43660 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-08-02 17:57:37 +08:00
congqixia	5f2f4eb3d6	enhance: Ignore entry with same ts when DeleteRecord search pks (#43669 ) Related to #43660 This patch reduces the unwanted offset&ts entries having same timestamp of delete record. Under large amount of upsert, this false hit could increase large amount of memory usage while applying delete. The next step could be passing a callback to `search_pk_func_` to handle hit entry streamingly. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-08-01 10:15:36 +08:00
congqixia	6a74a7de66	enhance: Make DeleteRecord search pks by batch and PinAll (#43640 ) Related to #43592 When delete records are large, search pk one by one will result into many `Pincells` call which creates lots of futures. This patch make search pk execute in batch to reduce this cost. Also add `GetAllChunks` API to utilize `PinAllCells` to reduce pins. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-30 19:15:36 +08:00
sthuang	5cebc9f7f6	fix: [StorageV2] handle correct cid with multiple files and add storage v2 prefix logs (#43539 ) related: #43372 --------- Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-07-25 11:22:54 +08:00
congqixia	cc1034fe96	fix: [AddField] Resolve FieldIndexing dangling reference (#43499 ) Related to #43113 This PR: - Change member of FieldIndex from `FieldMeta &` to needed `DataType` and dim member resolving dangling reference after schema change - Add double check after acquiring lock to reduce multiple assignment - Change `auto schema` to `auto& schema` to reduce schema copy Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-23 00:14:52 +08:00
sthuang	59bbdd93f5	fix: [StorageV2] fill the correct group chunk into cell (#43486 ) The root cause of the issue lies in the fact that when a sealed segment contains multiple row groups, the get_cells function may receive unordered cids. This can result in row groups being written into incorrect cells during data retrieval. Previously, this issue was hard to reproduce because the old Storage V2 writer had a bug that caused it to write row groups larger than 1MB. These large row groups could lead to uncontrolled memory usage and eventually an OOM (Out of Memory) error. Additionally, compaction typically produced a single large row group, which avoided the incorrect cell-filling issue during query execution. related: https://github.com/milvus-io/milvus/issues/43388, https://github.com/milvus-io/milvus/issues/43372, https://github.com/milvus-io/milvus/issues/43464, #43446, #43453 --------- Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-07-22 22:22:53 +08:00
Buqian Zheng	389104d200	enhance: rename PanicInfo to ThrowInfo (#43384 ) issue: #41435 this is to prevent AI from thinking of our exception throwing as a dangerous PANIC operation that terminates the program. Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-07-19 20:22:52 +08:00
congqixia	6bbed3b019	fix: [AddField] Add shared_lock for insert prevent race (#43229 ) Related to #43113 When schema change happens, insert shall not happen, otherwise: - Data race may happen causing insertion failure - Inconsistent data schema This PR add shared_lock prevent this data race. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-10 21:26:48 +08:00
congqixia	f027eea545	enhance: [AddField] Add log for segcore segment schema change (#43215 ) Related to #39178 This PR add logs for segment schema change operations. Also fixes the nit comments from PR #42490 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-10 10:22:47 +08:00
sthuang	a0ae5bccc9	fix: [StorageV2] load growing segment get dim datatype check (#43168 ) related: https://github.com/milvus-io/milvus/issues/43072 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-07-07 15:46:47 +08:00
Spade A	fce0bbe2ae	fix: remove redundant locks for null_offset (#43103 ) Ref: https://github.com/milvus-io/milvus/issues/40308 https://github.com/milvus-io/milvus/pull/40363 add lock for protecting concurrent read/write for null offset. But we don't need this for sealed segment. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-07-04 10:10:45 +08:00
sthuang	238bd30f42	fix: [StorageV2] end to end minor issues for sync, stats, and load (#42948 ) Fix issues in end-to-end tests: 1. Split column groups based on schema, rather than estimating by average chunk row size. Ensure column group consistency within a segment, to avoid errors caused by loading multiple column group chunks simultaneously. 2. Use sorted segmentId when generating the stats binlog path, to ensure consistent and correct file path resolution. 3. Determine field IDs as follows: For multi-column column groups, retrieve the field ID list from metadata. For single-column column groups, use the column group ID directly as the field ID. related: #39173 fix: #42862 --------- Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-06-27 14:44:42 +08:00
congqixia	336e743b55	fix: [AddField] Respect growing mmap setting adding empty field (#42933 ) Related to #42856 Data under mmapped growing segment shall be treated respecting growingMmap setting. Otherwise, varchar datatype could be treated with logic error. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-06-25 21:10:42 +08:00
sthuang	0d57acb13a	enhance: [StorageV2] field id as meta path for wide column when load (#42863 ) related: #42862 #39173 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-06-25 11:08:48 +08:00
Spade A	50f7579d8f	fix: fix some bugs discovered by chaos tests (#42906 ) fix: https://github.com/milvus-io/milvus/issues/42870 This PR fixes: 1. SetBitset fn shuold consider growing segments with concurrent write 2. avoid using from_raw_parts directly --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-06-24 16:32:42 +08:00
cai.zhang	59b003adac	enhance: Skip modify field meta when rename collection or rename dbName (#42875 ) issue: #42873 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-06-23 17:04:41 +08:00
congqixia	f9caad95b9	fix: [AddField] Check field empty instead of existence (#42789 ) Related to #42773 Growing segment fills all known meta into `InsertRecord` data, which cause even the field is missing, the field data will still exists. This PR update the logic while finish loading growing segment to check field empty or not instead of existence. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-06-17 17:22:39 +08:00
Chun Han	001619aef9	feat: supporing load priority for loading (#42413 ) related: #40781 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-06-17 15:22:38 +08:00
congqixia	c9bc70f272	fix: [AddField] Use shared_ptr of schema in plan fixing dangling ref (#42693 ) Related to #42640 The search/query plan holded a reference to schema, which could be destructed after schema change. This PR make plan hold a shared ptr to it fixing dangling reference problem under concurrent read & schema change. This PR also remove field binlog check for loading index for old segment with old schema may have binlog lack. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-06-12 20:46:36 +08:00
Spade A	911a8df17c	feat: impl StructArray -- data storage support in segcore (#42406 ) Ref https://github.com/milvus-io/milvus/issues/42148 This PR mainly enables segcore to support array of vector (read and write, but not indexing). Now only float vector as the element type is supported. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-06-12 14:38:35 +08:00
Bingyi Sun	6c16d3dbee	enhance: Add bulk api for json data (#42407 ) issue: https://github.com/milvus-io/milvus/issues/42409 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-06-12 10:40:39 +08:00
congqixia	b76478378a	feat: [Tiered] Make load list work as warmup hint (#42490 ) Related to #42489 See also #41435 This PR's main target is to make partial load field list work as caching layer warmup policy hint. If user specify load field list, the fields not included in the list shall use `disabled` warmup policy and be able to lazily loaded if any read op uses them. The major changes are listed here: - Pass load list to segcore and creating collection&schema - Add util functions to check field shall be proactively loaded - Adapt storage v2 column group, which may lead to hint fail if columns share same group --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-06-04 10:28:32 +08:00
cqy123456	5fe7015f63	enhance: InterimIndex support more index type and data type (#41021 ) issue: https://github.com/milvus-io/milvus/issues/27678 cherry pick from : https://github.com/milvus-io/milvus/pull/39180, https://github.com/milvus-io/milvus/pull/40429 Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>	2025-05-28 08:40:28 +08:00
Xianhui Lin	6a0e182e13	enhance: support TTL expiration with queries returning no results (#42086 ) support TTL expiration with queries returning no results issue:https://github.com/milvus-io/milvus/issues/41959 Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-05-27 18:28:27 +08:00
congqixia	f021b3f26a	fix: [AddField] Add protection logic inserting old data into new schema (#41978 ) Related to #39718 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-05-22 11:30:24 +08:00
Buqian Zheng	ff5c2770e5	feat: cachinglayer: various improvements (#41546 ) issue: https://github.com/milvus-io/milvus/issues/41435 this PR is based on https://github.com/milvus-io/milvus/pull/41436. Improvements include: - Lazy Load support for Storage v1 - Use Low/High watermark to control eviction - Caching Layer related config changes - Removed ChunkCache related configs and code in golang - Add `PinAllCells` helper method to CacheSlot class - Modified ValueAt, RawAt, PrimitiveRawAt to Bulk version, to reduce caching layer overhead - Removed some unclear templated bulk_subscript methods - CachedSearchIterator to store PinWrapper when searching on ChunkedColumn, and removed unused contrustor. --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-05-10 09:19:16 +08:00
sthuang	e9442f575d	feat: storage v2 seal segment load (#41567 ) storage v2 chunked seal segment loading is based on caching layer. A cell unit in storage v2 is a parquet row group in remote object storage, containing all fields. Therefore, each field needs a proxy to do related one field operations. <img width="965" alt="Screenshot 2025-04-28 at 10 59 30" src="https://github.com/user-attachments/assets/83e93a10-3b1d-4066-ac17-b996d5650416" /> related: #39173 --------- Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-04-30 14:22:58 +08:00
congqixia	f3f8227cd0	enhance: [AddField] Trigger check schema in retrieve as well (#41598 ) Related to #39718 Fixes milvus-io/pymilvus#2771 This PR: - Make AsyncRetrieve task triggers "schema check" logic as well - Rename `AddField` related methods to align with code standard Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-04-29 14:10:49 +08:00
Buqian Zheng	3de904c7ea	feat: add cachinglayer to sealed segment (#41436 ) issue: https://github.com/milvus-io/milvus/issues/41435 --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-04-28 10:52:40 +08:00
congqixia	b5443ddbd0	enhance: [AddField] Reopen loaded segments after AddField (#41529 ) Related to #39718 This PR: - Add reopen logic for growing & sealed segments - Lazy reopen when schema version increases - Add FinishLoad api for loading progress --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-04-26 08:48:39 +08:00
sthuang	1f1c836fb9	feat: Storage v2 growing segment load (#41001 ) support parallel loading sealed and growing segments with storage v2 format by async reading row groups. related: #39173 --------- Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-04-16 17:14:33 +08:00
Xianhui Lin	3bc24c264f	enhance: Add json key inverted index in stats for optimization (#38039 ) Add json key inverted index in stats for optimization https://github.com/milvus-io/milvus/issues/36995 --------- Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-04-10 15:20:28 +08:00
smellthemoon	cb1e86e17c	enhance: support add field (#39800 ) after the pr merged, we can support to insert, upsert, build index, query, search in the added field. can only do the above operates in added field after add field request complete, which is a sync operate. compact will be supported in the next pr. #39718 --------- Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2025-04-02 14:24:31 +08:00
cqy123456	6dc0f42830	fix:growing mmap data type crashed by nullable input (#40994 ) issue: https://github.com/milvus-io/milvus/issues/40981 2.5 pr: https://github.com/milvus-io/milvus/pull/40980 Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>	2025-03-31 20:32:19 +08:00
Chun Han	259f9106ad	enhance: refine variable-length-type memory usage(#38736 ) (#39578 ) related: #38736 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-02-27 21:13:58 +08:00

1 2 3 4

176 Commits