milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2025-12-06 17:18:35 +08:00

Author	SHA1	Message	Date
Spade A	7cb15ef141	feat: impl StructArray -- optimize vector array serialization (#44035 ) issue: https://github.com/milvus-io/milvus/issues/42148 Optimized from Go VectorArray → VectorArray Proto → Binary → C++ VectorArray Proto → C++ VectorArray local impl → Memory to Go VectorArray → Arrow ListArray → Memory --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-09-03 16:39:53 +08:00
XuanYang-cn	37a447d166	feat: Add CMEK cipher plugin (#43722 ) 1. Enable Milvus to read cipher configs 2. Enable cipher plugin in binlog reader and writer 3. Add a testCipher for unittests 4. Support pooling for datanode 5. Add encryption in storagev2 See also: #40321 Signed-off-by: yangxuan <xuan.yang@zilliz.com> --------- Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-08-27 11:15:52 +08:00
Spade A	d6a428e880	feat: impl StructArray -- support create index for vector array (embedding list) and search on it (#43726 ) Ref https://github.com/milvus-io/milvus/issues/42148 This PR supports create index for vector array (now, only for `DataType.FLOAT_VECTOR`) and search on it. The index type supported in this PR is `EMB_LIST_HNSW` and the metric type is `MAX_SIM` only. The way to use it: ```python milvus_client = MilvusClient("xxx:19530") schema = milvus_client.create_schema(enable_dynamic_field=True, auto_id=True) ... struct_schema = milvus_client.create_struct_array_field_schema("struct_array_field") ... struct_schema.add_field("struct_float_vec", DataType.ARRAY_OF_VECTOR, element_type=DataType.FLOAT_VECTOR, dim=128, max_capacity=1000) ... schema.add_struct_array_field(struct_schema) index_params = milvus_client.prepare_index_params() index_params.add_index(field_name="struct_float_vec", index_type="EMB_LIST_HNSW", metric_type="MAX_SIM", index_params={"nlist": 128}) ... milvus_client.create_index(COLLECTION_NAME, schema=schema, index_params=index_params) ``` Note: This PR uses `Lims` to convey offsets of the vector array to knowhere where vectors of multiple vector arrays are concatenated and we need offsets to specify which vectors belong to which vector array. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-08-20 10:27:46 +08:00
congqixia	34d3f0c0f8	enhance: Reserve builder space for ValueSerializer (#43570 ) Add `arrowBuild.Reserve` call for `ValueSerializer` to reduce repeated resizing buffer when write size is large Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-28 11:02:55 +08:00
Spade A	faeb7fd410	feat: impl StructArray -- create schema, insert, and retrieve data (#42855 ) Ref https://github.com/milvus-io/milvus/issues/42148 https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of storage for handling with VectorArray. This PR: 1. impls the go part of storage for VectorArray 2. impls the collection creation with StructArrayField and VectorArray 3. insert and retrieve data from the collection. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>	2025-07-27 01:30:55 +08:00
Ted Xu	9041bf1b9a	fix: including shouldCopy parameter in file readers (#43578 ) This parameter determines whether the returned value should be a copy or a reference from the arrow array. The updates enhance memory management and provide more control over data handling during deserialization. See #43186 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-07-26 17:30:55 +08:00
sthuang	a0c9f499ee	fix: [StorageV2] sync panic with nullable add field (#43142 ) related: https://github.com/milvus-io/milvus/pull/42932 fix: https://github.com/milvus-io/milvus/issues/43072 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-07-25 10:08:53 +08:00
congqixia	1cf8ed505f	fix: Implement `NeededFields` feature in `RecordReader` (#43523 ) Related to #43522 Currently, passing partial schema to storage v2 packed reader may trigger SEGV during clustering compaction unit test. This patch implement `NeededFields` differently in each `RecordReader` imlementation. For now, v2 will implemented as no-op. This will be supported after packed reader support this API. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-24 00:22:54 +08:00
congqixia	ee056f0bff	fix: [AddField] Fill default value in serde logic when field missing (#42891 ) Related to #42856 Default value will be missing after segment get sorted/compacted. This PR is a temp workaround since in long term default value shall be filled with storage engine instead. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-06-23 14:20:41 +08:00
SimFG	91d40fa558	fix: Update logging context and upgrade dependencies (#41318 ) - issue: #41291 --------- Signed-off-by: SimFG <bang.fu@zilliz.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2025-04-23 10:52:38 +08:00
Ted Xu	1bcea2a775	fix: assigning the correct storage version in sync and index tasks (#41093 ) See #39663 #40667 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-04-08 10:14:25 +08:00
Ted Xu	688505ab1c	enhance: cleanup lint check exclusions (#40829 ) See: #40828 Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-03-21 18:12:14 +08:00
sthuang	d7df78a6c9	feat: Storage v2 compaction (#40667 ) - Feat: Support Mix compaction. Covering tests include compatibility and rollback ability. - Read v1 segments and compact with v2 format. - Read both v1 and v2 segments and compact with v2 format. - Read v2 segments and compact with v2 format. - Compact with duplicate primary key test. - Compact with bm25 segments. - Compact with merge sort segments. - Compact with no expiration segments. - Compact with lack binlog segments. - Compact with nullable field segments. - Feat: Support Clustering compaction. Covering tests include compatibility and rollback ability. - Read v1 segments and compact with v2 format. - Read both v1 and v2 segments and compact with v2 format. - Read v2 segments and compact with v2 format. - Compact bm25 segments with v2 format. - Compact with memory limit. - Enhance: Use serdeMap serialize in BuildRecord function to support all Milvus data types. related: #39173 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-03-21 10:16:12 +08:00
Ted Xu	be86d31ea3	feat: compaction to support add field (#40415 ) See: #39718 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-03-18 11:32:12 +08:00
Ted Xu	df4285c9ef	enhance: API integration with storage v2 in clustering-compactions (#40133 ) See #39173 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-03-13 14:12:06 +08:00
Ted Xu	878ce56079	fix: correct memory size estimation on arrays (#40312 ) See: #40342 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-03-05 16:54:09 +08:00
sthuang	90acc8a58f	enhance: upgrade go arrow version from 12.0.1 to 17.0.0 (#39916 ) related: https://github.com/milvus-io/milvus/issues/39915 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-02-25 10:30:02 +08:00
congqixia	cb7f2fa6fd	enhance: Use v2 package name for pkg module (#39990 ) Related to #39095 https://go.dev/doc/modules/version-numbers Update pkg version according to golang dep version convention --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-02-22 23:15:58 +08:00
Ted Xu	8562a102ec	enhance: API integration with storage v2 in mix-compactions (#40008 ) See #39173 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-02-22 14:23:54 +08:00
smellthemoon	8b974c5742	enhance: support compact if lack of binlog (#40000 ) https://github.com/milvus-io/milvus/issues/39718 Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2025-02-22 10:51:56 +08:00
sthuang	3eb3af5f08	feat: explicitly specify column groups for storage v2 api (#39790 ) * use the new packed reader and writer api to be compatible with current etcd meta * For the new packed writer API: column groups and paths are explicitly defined by users and won't split column groups by memory in storage v2. Packed writer follows the user-defined column groups to split arrow record and write into the corresponding file path. * For the new packed reader API: read paths are explicitly defined by users. related: #39173 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-02-21 22:03:54 +08:00
Ted Xu	2978b0890e	enhance: iterative download data during compaction to reduce memory cost (#39724 ) See #37234 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-02-13 10:36:47 +08:00
sthuang	15c8798b93	feat: storage v2 serde reader and writer (#39667 ) related: https://github.com/milvus-io/milvus/issues/39173 --------- Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-02-11 16:00:46 +08:00
Ted Xu	427b6a4c94	enhance: reduce stats task cost by skipping ser/de (#39568 ) See #37234 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-02-06 17:14:45 +08:00
congqixia	b0bd290a6e	enhance: Use internal json(sonic) to replace std json lib (#37708 ) Related to #35020 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-11-18 10:46:31 +08:00
Ted Xu	31d0c84f67	fix: panic calclulating data size on writing binlogs (#37575 ) See: #37568 Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-11-11 14:08:27 +08:00
Ted Xu	bc9562feb1	enhance: avoid memory copy and serde in mix compaction (#37479 ) See: #37234 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-11-07 16:30:57 -08:00
Ted Xu	262a994d6d	enhance: generally improve the performance of mix compactions (#37163 ) See #37234 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-10-29 18:12:20 +08:00
smellthemoon	a3f2f044d6	fix: not set nullable when stream writer write headers (#35799 ) #35802 Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-08-29 20:59:00 +08:00
Ted Xu	41646c8439	feat: integrate new deltalog format (#35522 ) See #34123 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-08-20 19:06:56 +08:00
shaoting-huang	88b373b024	enhance: binlog primary key turn off dict encoding (#34358 ) issue: #34357 Go Parquet uses dictionary encoding by default, and it will fall back to plain encoding if the dictionary size exceeds the dictionary size page limit. Users can specify custom fallback encoding by using `parquet.WithEncoding(ENCODING_METHOD)` in writer properties. However, Go Parquet [fallbacks to plain encoding](`e65c1e295d/go/parquet/file/column_writer_types.gen.go.tmpl (L238)`) rather than custom encoding method users provide. Therefore, this patch only turns off dictionary encoding for the primary key. With a 5 million auto ID primary key benchmark, the parquet file size improves from 13.93 MB to 8.36 MB when dictionary encoding is turned off, reducing primary key storage space by 40%. Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2024-07-17 17:47:44 +08:00
congqixia	3333160b8d	enhance: Fix lint issues from recent PRs (#34482 ) See also #34483 Some lint issues are introduced due to lack of static check run. This PR fixes these problems. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-07-09 10:06:24 +08:00
shaoting-huang	f4dd7c7efb	enhance: add delta log stream new format reader and writer (#34116 ) issue: #34123 Benchmark case: The benchmark run the go benchmark function `BenchmarkDeltalogFormat` which is put in the Files changed. It tests the performance of serializing and deserializing from two different data formats under a 10 million delete log dataset. Metrics: The benchmarks measure the average time taken per operation (ns/op), memory allocated per operation (MB/op), and the number of memory allocations per operation (allocs/op). \| Test Name \| Avg Time (ns/op) \| Time Comparison \| Memory Allocation (MB/op) \| Memory Comparison \| Allocation Count (allocs/op) \| Allocation Comparison \| \|---------------------------------\|------------------\|-----------------\|---------------------------\|-------------------\|------------------------------\|------------------------\| \| one_string_format_reader \| 2,781,990,000 \| Baseline \| 2,422 \| Baseline \| 20,336,539 \| Baseline \| \| pk_ts_separate_format_reader \| 480,682,639 \| -82.72% \| 1,765 \| -27.14% \| 20,396,958 \| +0.30% \| \| one_string_format_writer \| 5,483,436,041 \| Baseline \| 13,900 \| Baseline \| 70,057,473 \| Baseline \| \| pk_and_ts_separate_format_writer\| 798,591,584 \| -85.43% \| 2,178 \| -84.34% \| 30,270,488 \| -56.78% \| Both read and write operations show significant improvements in both speed and memory allocation. Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2024-07-06 09:08:09 +08:00
congqixia	2f691f1e67	enhance: Unify DeleteLog parsing code (#34009 ) See also #33787 The parsing delete log is distributed in lots of places, which is not recommended and hard to maintain. This PR abstract common parsing logic into `DeleteLog.Parse` method to unify implementation and make it easier to replace json parsing lib. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com> --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-06-21 16:54:01 +08:00
Ted Xu	6d5747cb3e	feat: adding deltalog stream reader and writer (#33844 ) See #31679 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-06-19 14:42:01 +08:00

35 Commits