17 Commits

Author SHA1 Message Date
yihao.dai
9fbd41a97d
fix: Adjust binlog and parquet reader buffer size for import (#43495)
1. Modify the binlog reader to stop reading a fixed 4096 rows and
instead use the calculated bufferSize to avoid generating small binlogs.
2. Use a fixed bufferSize (32MB) for the Parquet reader to prevent OOM.

issue: https://github.com/milvus-io/milvus/issues/43387

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-07-23 21:28:54 +08:00
yihao.dai
1984be646c
fix: Fix storagev2 binlog import (#43221)
issue: https://github.com/milvus-io/milvus/issues/43218

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-07-13 22:52:49 +08:00
congqixia
5a9efb3f81
enhance: [StorageV2] Refine storage rw option usage & validation (#43175)
Related to #39173

This PR:
- Make all datanode task passes storage config via storage config option
- Remove legacy comments, rootPath & bucketName parameters
- Fix clustering compaction option behavior
- Add validation logic for `rwOptions`
- Use correct storageType from storageConfig
- Add storage config in sync task

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-11 01:14:48 +08:00
groot
1ee8cea35b
enhance: bulkinsert handle nullable/defaultValue/functionOutput fields (#42956)
issue: https://github.com/milvus-io/milvus/issues/42173

Signed-off-by: yhmo <yihua.mo@zilliz.com>
2025-07-04 14:20:44 +08:00
Ted Xu
7660be0993
feat: bulk insert support storage v2 (#41843)
See #39173

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-05-19 10:34:24 +08:00
yihao.dai
6c1a37fca1
fix: Fix import reader goroutine leak (#41869)
Close the chunk manager's reader after the import completes to prevent
goroutine leaks.

issues: https://github.com/milvus-io/milvus/issues/41868

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-05-16 10:18:35 +08:00
yihao.dai
16eb5eb921
enhance: Accelerate delete filtering during binlog import (#41551)
Use map for deleteData instead of slice to accelerate delete filtering
during binlog import.

issue: https://github.com/milvus-io/milvus/issues/41550

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-04-27 18:56:38 +08:00
Buqian Zheng
03b63bf982
fix: use NewInsertDataWithFunctionOutputField when importing binlog file (#40741)
issue: https://github.com/milvus-io/milvus/issues/40740

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-03-19 10:50:14 +08:00
congqixia
cb7f2fa6fd
enhance: Use v2 package name for pkg module (#39990)
Related to #39095

https://go.dev/doc/modules/version-numbers

Update pkg version according to golang dep version convention

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-02-22 23:15:58 +08:00
smellthemoon
b60164b882
enhance: support null in bulk insert of binlog to help backup null (#36526)
https://github.com/milvus-io/milvus/issues/36341

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-09-26 14:35:14 +08:00
congqixia
2f691f1e67
enhance: Unify DeleteLog parsing code (#34009)
See also #33787

The parsing delete log is distributed in lots of places, which is not
recommended and hard to maintain.

This PR abstract common parsing logic into `DeleteLog.Parse` method to
unify implementation and make it easier to replace json parsing lib.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-06-21 16:54:01 +08:00
chyezh
2586c2f1b3
enhance: use WalkWithPrefix api for oss, enable piplined file gc (#31740)
issue: #19095,#29655,#31718

- Change `ListWithPrefix` to `WalkWithPrefix` of OOS into a pipeline
mode.

- File garbage collection is performed in other goroutine.

- Segment Index Recycle clean index file too.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-04-25 20:41:27 +08:00
yihao.dai
4e264003bf
enhance: Ensure ImportV2 waits for the index to be built and refine some logic (#31629)
Feature Introduced:
1. Ensure ImportV2 waits for the index to be built

Enhancements Introduced:
1. Utilization of local time for timeout ts instead of allocating ts
from rootcoord.
3. Enhanced input file length check for binlog import.
4. Removal of duplicated manager in datanode.
5. Renaming of executor to scheduler in datanode.
6. Utilization of a thread pool in the scheduler in datanode.

issue: https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-04-01 20:09:13 +08:00
yihao.dai
31cf849f68
enhance: Support retriving file size from importutilv2.Reader (#31533)
To reduce the overhead caused by listing the S3 objects, add an
interface to importutil.Reader to retrieve file sizes.

issue: https://github.com/milvus-io/milvus/issues/31532,
https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-03-25 20:29:07 +08:00
yihao.dai
87b3c25b15
fix: Fix binlog import (#31205)
1. File type validation is omitted during binlog import.
2. System fields are appended to the schema during binlog import.

issue: https://github.com/milvus-io/milvus/issues/28521

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-03-13 10:35:04 +08:00
yihao.dai
c5918290e6
feat: Add import executor and manager for datanode (#29438)
This PR introduces novel importv2 roles for datanode:
1. Executor: To execute tasks, a import task will be divided into the
following steps: read data -> hash data -> sync data;
2. Manager: To manage all the tasks;

issue: https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-31 20:45:04 +08:00
yihao.dai
3561586edf
feat: Add import reader for binlog (#28910)
This PR defines the new import reader interfaces and implement a binlog
reader for import.

issue: https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-05 11:48:47 +08:00