milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2025-12-28 22:45:26 +08:00

Author	SHA1	Message	Date
XuanYang-cn	0bbb134e39	feat: Enable to backup and reload ez (#46332 ) see also: #40013 Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-12-16 17:19:16 +08:00
Spade A	eb793531b9	feat: impl StructArray -- support import for CSV/JSON/PARQUET/BINLOG (#44201 ) Ref https://github.com/milvus-io/milvus/issues/42148 --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-09-15 20:41:59 +08:00
XuanYang-cn	37a447d166	feat: Add CMEK cipher plugin (#43722 ) 1. Enable Milvus to read cipher configs 2. Enable cipher plugin in binlog reader and writer 3. Add a testCipher for unittests 4. Support pooling for datanode 5. Add encryption in storagev2 See also: #40321 Signed-off-by: yangxuan <xuan.yang@zilliz.com> --------- Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-08-27 11:15:52 +08:00
wei liu	46dfe260da	enhance: Add timestamp filtering support to L0Reader (#43747 ) issue: #43745 Add timestamp filtering capability to L0Reader to match the functionality available in the regular Reader. This enhancement allows filtering delete records based on timestamp range during L0 import operations. Changes include: - Add tsStart and tsEnd fields to l0Reader struct for timestamp filtering - Modify NewL0Reader function signature to accept tsStart and tsEnd parameters - Implement timestamp filtering logic in Read method to skip records outside the specified range - Update L0ImportTask and L0PreImportTask to parse timestamp parameters from request options and pass them to NewL0Reader - Add comprehensive test case TestL0Reader_ReadWithTsFilter to verify ts filtering functionality using mockey framework Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-08-06 16:49:39 +08:00
Spade A	faeb7fd410	feat: impl StructArray -- create schema, insert, and retrieve data (#42855 ) Ref https://github.com/milvus-io/milvus/issues/42148 https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of storage for handling with VectorArray. This PR: 1. impls the go part of storage for VectorArray 2. impls the collection creation with StructArrayField and VectorArray 3. insert and retrieve data from the collection. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>	2025-07-27 01:30:55 +08:00
Ted Xu	9041bf1b9a	fix: including shouldCopy parameter in file readers (#43578 ) This parameter determines whether the returned value should be a copy or a reference from the arrow array. The updates enhance memory management and provide more control over data handling during deserialization. See #43186 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-07-26 17:30:55 +08:00
yihao.dai	9fbd41a97d	fix: Adjust binlog and parquet reader buffer size for import (#43495 ) 1. Modify the binlog reader to stop reading a fixed 4096 rows and instead use the calculated bufferSize to avoid generating small binlogs. 2. Use a fixed bufferSize (32MB) for the Parquet reader to prevent OOM. issue: https://github.com/milvus-io/milvus/issues/43387 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-07-23 21:28:54 +08:00
yihao.dai	1984be646c	fix: Fix storagev2 binlog import (#43221 ) issue: https://github.com/milvus-io/milvus/issues/43218 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-07-13 22:52:49 +08:00
congqixia	5a9efb3f81	enhance: [StorageV2] Refine storage rw option usage & validation (#43175 ) Related to #39173 This PR: - Make all datanode task passes storage config via storage config option - Remove legacy comments, rootPath & bucketName parameters - Fix clustering compaction option behavior - Add validation logic for `rwOptions` - Use correct storageType from storageConfig - Add storage config in sync task --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-11 01:14:48 +08:00
groot	1ee8cea35b	enhance: bulkinsert handle nullable/defaultValue/functionOutput fields (#42956 ) issue: https://github.com/milvus-io/milvus/issues/42173 Signed-off-by: yhmo <yihua.mo@zilliz.com>	2025-07-04 14:20:44 +08:00
Zhen Ye	43f0c56ce7	fix: limit the concurency of zstd compression and decrease the memory usage of binlog generation (#42630 ) issue: #42028 - limit the concurrency of zstd compression. - zstd.go modified from `github.com/apache/arrow/go/v17/parquet/compress/ztsd.go` - may be related to #42129 Signed-off-by: chyezh <chyezh@outlook.com>	2025-06-11 09:06:34 +08:00
Ted Xu	7660be0993	feat: bulk insert support storage v2 (#41843 ) See #39173 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-05-19 10:34:24 +08:00
yihao.dai	6c1a37fca1	fix: Fix import reader goroutine leak (#41869 ) Close the chunk manager's reader after the import completes to prevent goroutine leaks. issues: https://github.com/milvus-io/milvus/issues/41868 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-05-16 10:18:35 +08:00
yihao.dai	16eb5eb921	enhance: Accelerate delete filtering during binlog import (#41551 ) Use map for deleteData instead of slice to accelerate delete filtering during binlog import. issue: https://github.com/milvus-io/milvus/issues/41550 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-04-27 18:56:38 +08:00
Buqian Zheng	03b63bf982	fix: use NewInsertDataWithFunctionOutputField when importing binlog file (#40741 ) issue: https://github.com/milvus-io/milvus/issues/40740 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-03-19 10:50:14 +08:00
Ted Xu	df4285c9ef	enhance: API integration with storage v2 in clustering-compactions (#40133 ) See #39173 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-03-13 14:12:06 +08:00
congqixia	cb7f2fa6fd	enhance: Use v2 package name for pkg module (#39990 ) Related to #39095 https://go.dev/doc/modules/version-numbers Update pkg version according to golang dep version convention --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-02-22 23:15:58 +08:00
Cai Yudong	7476eb3625	feat: Support bulk insert for Int8Vector (#39499 ) Issue: #38666 Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>	2025-01-23 10:19:06 +08:00
Zhen Ye	bb8d1ab3bf	enhance: make new go package to manage proto (#39114 ) issue: #39095 --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-01-10 10:49:01 +08:00
Buqian Zheng	82c5cf2fa2	feat: add bulk insert support for Functions (#36715 ) issue: https://github.com/milvus-io/milvus/issues/35853 and https://github.com/milvus-io/milvus/issues/35856 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2024-10-12 17:19:20 +08:00
smellthemoon	b60164b882	enhance: support null in bulk insert of binlog to help backup null (#36526 ) https://github.com/milvus-io/milvus/issues/36341 Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-09-26 14:35:14 +08:00
Ted Xu	41646c8439	feat: integrate new deltalog format (#35522 ) See #34123 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-08-20 19:06:56 +08:00
smellthemoon	80a7c78f28	enhance: import supports null in parquet and json formats (#35558 ) #31728 --------- Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-08-20 16:50:55 +08:00
zhenshan.cao	aa247f192d	enhance: remove unused code for StorageV2 (#35132 ) issue: https://github.com/milvus-io/milvus/issues/34168 Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2024-08-01 12:08:13 +08:00
shaoting-huang	88b373b024	enhance: binlog primary key turn off dict encoding (#34358 ) issue: #34357 Go Parquet uses dictionary encoding by default, and it will fall back to plain encoding if the dictionary size exceeds the dictionary size page limit. Users can specify custom fallback encoding by using `parquet.WithEncoding(ENCODING_METHOD)` in writer properties. However, Go Parquet [fallbacks to plain encoding](`e65c1e295d/go/parquet/file/column_writer_types.gen.go.tmpl (L238)`) rather than custom encoding method users provide. Therefore, this patch only turns off dictionary encoding for the primary key. With a 5 million auto ID primary key benchmark, the parquet file size improves from 13.93 MB to 8.36 MB when dictionary encoding is turned off, reducing primary key storage space by 40%. Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2024-07-17 17:47:44 +08:00
congqixia	2f691f1e67	enhance: Unify DeleteLog parsing code (#34009 ) See also #33787 The parsing delete log is distributed in lots of places, which is not recommended and hard to maintain. This PR abstract common parsing logic into `DeleteLog.Parse` method to unify implementation and make it easier to replace json parsing lib. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com> --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-06-21 16:54:01 +08:00
smellthemoon	2a1356985d	enhance: support null in go payload (#32296 ) #31728 --------- Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-06-19 17:08:00 +08:00
congqixia	512ea6be5f	enhance: Avoid merging insert data when buffering insert msgs (#33562 ) See also #33561 This PR: - Use zero copy when buffering insert messages - Make `storage.InsertCodec` support serialize multiple insert data chunk into same batch binlog files Signed-off-by: Congqi Xia <congqi.xia@zilliz.com> --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-06-13 11:15:56 +08:00
yihao.dai	3540eee977	enhance: Support L0 import (#33514 ) issue: https://github.com/milvus-io/milvus/issues/33157 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-06-07 14:17:20 +08:00
Cai Yudong	dc89c6f810	enhance: remove duplicated data generation APIs for bulk insert test (#32889 ) Issue: #22837 including following changes: 1. Add API CreateInsertData() and BuildArrayData() in internal/util/testutil 2. Remove duplicated test APIs from importutilv2 unittest and bulk insert integration test Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>	2024-05-10 15:27:31 +08:00
Cai Yudong	bcdbd1966e	feat: Support sparse float vector bulk insert for binlog/json/parquet (#32649 ) Issue: #22837 Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>	2024-05-07 18:43:30 +08:00
chyezh	2586c2f1b3	enhance: use WalkWithPrefix api for oss, enable piplined file gc (#31740 ) issue: #19095,#29655,#31718 - Change `ListWithPrefix` to `WalkWithPrefix` of OOS into a pipeline mode. - File garbage collection is performed in other goroutine. - Segment Index Recycle clean index file too. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-25 20:41:27 +08:00
Cai Yudong	5fc439c600	feat: Bulk insert support fp16/bf16 (#32157 ) Issue: #22837 Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>	2024-04-22 10:05:22 +08:00
yihao.dai	4e264003bf	enhance: Ensure ImportV2 waits for the index to be built and refine some logic (#31629 ) Feature Introduced: 1. Ensure ImportV2 waits for the index to be built Enhancements Introduced: 1. Utilization of local time for timeout ts instead of allocating ts from rootcoord. 3. Enhanced input file length check for binlog import. 4. Removal of duplicated manager in datanode. 5. Renaming of executor to scheduler in datanode. 6. Utilization of a thread pool in the scheduler in datanode. issue: https://github.com/milvus-io/milvus/issues/28521 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-04-01 20:09:13 +08:00
yihao.dai	31cf849f68	enhance: Support retriving file size from importutilv2.Reader (#31533 ) To reduce the overhead caused by listing the S3 objects, add an interface to importutil.Reader to retrieve file sizes. issue: https://github.com/milvus-io/milvus/issues/31532, https://github.com/milvus-io/milvus/issues/28521 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-03-25 20:29:07 +08:00
yihao.dai	87b3c25b15	fix: Fix binlog import (#31205 ) 1. File type validation is omitted during binlog import. 2. System fields are appended to the schema during binlog import. issue: https://github.com/milvus-io/milvus/issues/28521 Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-03-13 10:35:04 +08:00
yihao.dai	c5918290e6	feat: Add import executor and manager for datanode (#29438 ) This PR introduces novel importv2 roles for datanode: 1. Executor: To execute tasks, a import task will be divided into the following steps: read data -> hash data -> sync data; 2. Manager: To manage all the tasks; issue: https://github.com/milvus-io/milvus/issues/28521 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-01-31 20:45:04 +08:00
yihao.dai	3561586edf	feat: Add import reader for binlog (#28910 ) This PR defines the new import reader interfaces and implement a binlog reader for import. issue: https://github.com/milvus-io/milvus/issues/28521 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-01-05 11:48:47 +08:00

38 Commits