80 Commits

Author SHA1 Message Date
yihao.dai
9fbd41a97d
fix: Adjust binlog and parquet reader buffer size for import (#43495)
1. Modify the binlog reader to stop reading a fixed 4096 rows and
instead use the calculated bufferSize to avoid generating small binlogs.
2. Use a fixed bufferSize (32MB) for the Parquet reader to prevent OOM.

issue: https://github.com/milvus-io/milvus/issues/43387

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-07-23 21:28:54 +08:00
yihao.dai
1984be646c
fix: Fix storagev2 binlog import (#43221)
issue: https://github.com/milvus-io/milvus/issues/43218

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-07-13 22:52:49 +08:00
congqixia
5a9efb3f81
enhance: [StorageV2] Refine storage rw option usage & validation (#43175)
Related to #39173

This PR:
- Make all datanode task passes storage config via storage config option
- Remove legacy comments, rootPath & bucketName parameters
- Fix clustering compaction option behavior
- Add validation logic for `rwOptions`
- Use correct storageType from storageConfig
- Add storage config in sync task

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-11 01:14:48 +08:00
groot
1ee8cea35b
enhance: bulkinsert handle nullable/defaultValue/functionOutput fields (#42956)
issue: https://github.com/milvus-io/milvus/issues/42173

Signed-off-by: yhmo <yihua.mo@zilliz.com>
2025-07-04 14:20:44 +08:00
cai.zhang
ebe1c95bb1
enhance: Add Size interface to FileReader to eliminate the StatObject call during Read (#42908)
issue: #42907

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-06-25 14:36:41 +08:00
Zhen Ye
43f0c56ce7
fix: limit the concurency of zstd compression and decrease the memory usage of binlog generation (#42630)
issue: #42028

- limit the concurrency of zstd compression.
- zstd.go modified from
`github.com/apache/arrow/go/v17/parquet/compress/ztsd.go`
- may be related to #42129

Signed-off-by: chyezh <chyezh@outlook.com>
2025-06-11 09:06:34 +08:00
groot
14563ad2b3
enhance: bulkinsert handles nullable/default (#42127)
issue: https://github.com/milvus-io/milvus/issues/42096,
https://github.com/milvus-io/milvus/issues/42130

Signed-off-by: yhmo <yihua.mo@zilliz.com>
2025-05-28 18:02:28 +08:00
Ted Xu
7660be0993
feat: bulk insert support storage v2 (#41843)
See #39173

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-05-19 10:34:24 +08:00
yihao.dai
6c1a37fca1
fix: Fix import reader goroutine leak (#41869)
Close the chunk manager's reader after the import completes to prevent
goroutine leaks.

issues: https://github.com/milvus-io/milvus/issues/41868

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-05-16 10:18:35 +08:00
yihao.dai
71b14fc32b
enhance: Skip disk quota check for l0 import (#41571)
issue: https://github.com/milvus-io/milvus/issues/41569

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-04-29 10:46:54 +08:00
yihao.dai
16eb5eb921
enhance: Accelerate delete filtering during binlog import (#41551)
Use map for deleteData instead of slice to accelerate delete filtering
during binlog import.

issue: https://github.com/milvus-io/milvus/issues/41550

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-04-27 18:56:38 +08:00
SimFG
91d40fa558
fix: Update logging context and upgrade dependencies (#41318)
- issue: #41291

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-04-23 10:52:38 +08:00
yihao.dai
b4cb8a4b13
enhance: Add UTF-8 string validation for import (#40694)
issue: https://github.com/milvus-io/milvus/issues/40684

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-04-01 19:04:21 +08:00
groot
aae3a3598e
enhance: bulkinsert supports parsing sparse vector form parquet struct (#40927)
issue: https://github.com/milvus-io/milvus/issues/40777

Signed-off-by: yhmo <yihua.mo@zilliz.com>
2025-03-31 14:20:30 +08:00
Buqian Zheng
03b63bf982
fix: use NewInsertDataWithFunctionOutputField when importing binlog file (#40741)
issue: https://github.com/milvus-io/milvus/issues/40740

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-03-19 10:50:14 +08:00
groot
9fbfcda48e
fix: Fix a crash issue of bulkinsert (#40331)
issue: https://github.com/milvus-io/milvus/issues/40291
pr: https://github.com/milvus-io/milvus/pull/40304

Signed-off-by: yhmo <yihua.mo@zilliz.com>
2025-03-14 18:14:07 +08:00
yihao.dai
bab30a41bf
enhance: Improve import error msgs (#40567)
issue: https://github.com/milvus-io/milvus/issues/40208

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-03-13 21:02:07 +08:00
Ted Xu
df4285c9ef
enhance: API integration with storage v2 in clustering-compactions (#40133)
See #39173

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-03-13 14:12:06 +08:00
Xiaofan
fb48b3c7ac
fix: empty sparse row in importer (#40585)
fix #40584

parquet bulk writer can not finish 0 dim sparse vector.

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
2025-03-13 01:29:41 +08:00
jaime
c8a96377bb
enhance: move object storage client creation to pkg package (#40440)
issue: #40439

Signed-off-by: jaime <yun.zhang@zilliz.com>
2025-03-12 20:38:07 +08:00
yihao.dai
2ca2e2dbc8
fix: Fix parsing import endTs (#40332)
Parsing import beginTs, endTs as a hybrid timestamp.

issue: https://github.com/milvus-io/milvus/issues/40326

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-03-10 17:38:04 +08:00
sthuang
90acc8a58f
enhance: upgrade go arrow version from 12.0.1 to 17.0.0 (#39916)
related: https://github.com/milvus-io/milvus/issues/39915

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-02-25 10:30:02 +08:00
congqixia
cb7f2fa6fd
enhance: Use v2 package name for pkg module (#39990)
Related to #39095

https://go.dev/doc/modules/version-numbers

Update pkg version according to golang dep version convention

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-02-22 23:15:58 +08:00
Cai Yudong
7476eb3625
feat: Support bulk insert for Int8Vector (#39499)
Issue: #38666

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2025-01-23 10:19:06 +08:00
Zhen Ye
bb8d1ab3bf
enhance: make new go package to manage proto (#39114)
issue: #39095

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-10 10:49:01 +08:00
smellthemoon
92a2d608ac
fix: Bulk insert failed when the nullable/default_value field is not exist (#39063)
#39036

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2025-01-09 19:27:03 +08:00
yihao.dai
2b53b0905e
fix: Fix 0 read count during import (#38694)
issue: https://github.com/milvus-io/milvus/issues/38693

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-12-24 20:06:49 +08:00
congqixia
b0bd290a6e
enhance: Use internal json(sonic) to replace std json lib (#37708)
Related to #35020

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-18 10:46:31 +08:00
zhenshan.cao
63843dce33
fix: Fix conan gdal building problem (#37338)
issue:https://github.com/milvus-io/milvus/issues/27576

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-10-31 21:04:16 +08:00
Hao Tan
67c4340565
feat: Geospatial Data Type and GIS Function Support for milvus server (#35990)
issue:https://github.com/milvus-io/milvus/issues/27576

# Main Goals
1. Create and describe collections with geospatial fields, enabling both
client and server to recognize and process geo fields.
2. Insert geospatial data as payload values in the insert binlog, and
print the values for verification.
3. Load segments containing geospatial data into memory.
4. Ensure query outputs can display geospatial data.
5. Support filtering on GIS functions for geospatial columns.

# Solution
1. **Add Type**: Modify the Milvus core by adding a Geospatial type in
both the C++ and Go code layers, defining the Geospatial data structure
and the corresponding interfaces.
2. **Dependency Libraries**: Introduce necessary geospatial data
processing libraries. In the C++ source code, use Conan package
management to include the GDAL library. In the Go source code, add the
go-geom library to the go.mod file.
3. **Protocol Interface**: Revise the Milvus protocol to provide
mechanisms for Geospatial message serialization and deserialization.
4. **Data Pipeline**: Facilitate interaction between the client and
proxy using the WKT format for geospatial data. The proxy will convert
all data into WKB format for downstream processing, providing column
data interfaces, segment encapsulation, segment loading, payload
writing, and cache block management.
5. **Query Operators**: Implement simple display and support for filter
queries. Initially, focus on filtering based on spatial relationships
for a single column of geospatial literal values, providing parsing and
execution for query expressions.
6. **Client Modification**: Enable the client to handle user input for
geospatial data and facilitate end-to-end testing.Check the modification
in pymilvus.

---------

Signed-off-by: tasty-gumi <1021989072@qq.com>
2024-10-31 20:58:20 +08:00
yihao.dai
b45cf2d49f
enhance: Add max length check for csv import (#37077)
1. Add max length check for csv import.
2. Tidy import options.
3. Tidy common import util functions.

issue: https://github.com/milvus-io/milvus/issues/34150

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-25 14:37:29 +08:00
OxalisCu
60e51f1076
fix: unicode replacement character (0xFFFD) are not supported as csv delimiter (#36310)
https://github.com/milvus-io/milvus/issues/36309

Signed-off-by: OxalisCu <2127298698@qq.com>
2024-10-17 14:45:40 +08:00
smellthemoon
463c47ced1
enhance: support default value in import (#36700)
https://github.com/milvus-io/milvus/issues/31728

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-10-17 12:05:24 +08:00
Buqian Zheng
82c5cf2fa2
feat: add bulk insert support for Functions (#36715)
issue: https://github.com/milvus-io/milvus/issues/35853 and
https://github.com/milvus-io/milvus/issues/35856

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-10-12 17:19:20 +08:00
smellthemoon
b60164b882
enhance: support null in bulk insert of binlog to help backup null (#36526)
https://github.com/milvus-io/milvus/issues/36341

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-09-26 14:35:14 +08:00
smellthemoon
89397d1e66
enhance: adjust parquet reader type check with null type (#36266)
#36252 
remove no need type check. if users use null type writer to write
parquet, hope it successfully.

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-09-19 18:43:10 +08:00
smellthemoon
fc1bdd4c84
fix: to forbid bulk insert with nullable field in numpy files (#36246)
#36241

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-09-14 15:35:07 +08:00
OxalisCu
3a381bc247
enhance: Bulkinsert supports null in csv formats (#35912)
see details in this issue
https://github.com/milvus-io/milvus/issues/35911

---------

Signed-off-by: OxalisCu <2127298698@qq.com>
2024-09-09 19:17:07 +08:00
cai.zhang
2c9bb4dfa3
feat: Support stats task to sort segment by PK (#35054)
issue: #33744 

This PR includes the following changes:
1. Added a new task type to the task scheduler in datacoord: stats task,
which sorts segments by primary key.
2. Implemented segment sorting in indexnode.
3. Added a new field `FieldStatsLog` to SegmentInfo to store token index
information.

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-09-02 14:19:03 +08:00
Xiaofan
50fcfe8ef1
enhance: add nan and inf check (#35683)
fix #35594
add float check on files

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
2024-08-25 15:22:57 +08:00
OxalisCu
ed4eaffc9d
enhance: add csv support for bulkinsert (#34938)
See this issue for details: #34937

---------

Signed-off-by: OxalisCu <2127298698@qq.com>
2024-08-21 17:47:01 +08:00
Ted Xu
41646c8439
feat: integrate new deltalog format (#35522)
See #34123

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-08-20 19:06:56 +08:00
smellthemoon
80a7c78f28
enhance: import supports null in parquet and json formats (#35558)
#31728

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-08-20 16:50:55 +08:00
nish112022
3948bd4e79
fix: Added check for validating varchar,array max length (#35499)
issue : https://github.com/milvus-io/milvus/issues/34150

This is for numpy,parquet,json readers.

---------

Signed-off-by: Nischay Yadav <nischay.yadav@ibm.com>
2024-08-20 11:42:55 +08:00
yihao.dai
b71e058bc5
enhance: Add import option to skip disk quota check (#35274)
Add an option to skip the disk quota check for backup-restore import.

issue: https://github.com/milvus-io/milvus/issues/33775

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-08-05 16:40:16 +08:00
yihao.dai
678c0a708b
enhance: Refine import error log (#35067)
issue: https://github.com/milvus-io/milvus/issues/35060

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-08-05 15:30:16 +08:00
zhenshan.cao
aa247f192d
enhance: remove unused code for StorageV2 (#35132)
issue: https://github.com/milvus-io/milvus/issues/34168

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-08-01 12:08:13 +08:00
shaoting-huang
88b373b024
enhance: binlog primary key turn off dict encoding (#34358)
issue: #34357 

Go Parquet uses dictionary encoding by default, and it will fall back to
plain encoding if the dictionary size exceeds the dictionary size page
limit. Users can specify custom fallback encoding by using
`parquet.WithEncoding(ENCODING_METHOD)` in writer properties. However,
Go Parquet [fallbacks to plain
encoding](e65c1e295d/go/parquet/file/column_writer_types.gen.go.tmpl (L238))
rather than custom encoding method users provide. Therefore, this patch
only turns off dictionary encoding for the primary key.

With a 5 million auto ID primary key benchmark, the parquet file size
improves from 13.93 MB to 8.36 MB when dictionary encoding is turned
off, reducing primary key storage space by 40%.

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2024-07-17 17:47:44 +08:00
congqixia
3333160b8d
enhance: Fix lint issues from recent PRs (#34482)
See also #34483
Some lint issues are introduced due to lack of static check run. This PR
fixes these problems.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-09 10:06:24 +08:00
congqixia
2f691f1e67
enhance: Unify DeleteLog parsing code (#34009)
See also #33787

The parsing delete log is distributed in lots of places, which is not
recommended and hard to maintain.

This PR abstract common parsing logic into `DeleteLog.Parse` method to
unify implementation and make it easier to replace json parsing lib.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-06-21 16:54:01 +08:00