588 Commits

Author SHA1 Message Date
congqixia
6f7318a731
enhance: [StorageV2] Use compressed size as log file size (#44402)
Related to #39173

backlog issue that memory size and log size shared same value. This
patch add `GetFileSize` api to get remote compressed binlog size as meta
log file size to calculate usage more accurate.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-16 21:20:02 +08:00
congqixia
aa861f55e6
enhance: [StorageV2] Reverts #44232 bucket name change (#44390)
Related to #39173

- Put bucket name concatenation logic back for azure support

This reverts commit 8f97eb355fde6b86cf37f166d2191750b4210ba3.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-16 10:10:00 +08:00
Spade A
eb793531b9
feat: impl StructArray -- support import for CSV/JSON/PARQUET/BINLOG (#44201)
Ref https://github.com/milvus-io/milvus/issues/42148

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-09-15 20:41:59 +08:00
congqixia
9cfa013ec6
enhance: [StorageV2] Store column group info in compaction result (#44327)
Related to #44257

This PR store split result in compaction segment binlog struct to make
querynode could utilize it later.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-11 19:49:57 +08:00
congqixia
fc968ff1c2
enhance: [StorageV2] Pass args for avg size split policy (#44301)
Related to #44257

This PR
- Pass column stats for avg size split policy
- Add param items for policy configuration

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-11 10:43:57 +08:00
congqixia
f5618d5153
enhance: [StorageV2] Utilized advance split policy and persist in meta (#44282)
Related to #44257

This PR:
- Utilize configurable split policy for storage v2, enabling system
field policy
- Store split result in field binlog struct
- Adapt legacy binlog without child fields

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-10 14:47:57 +08:00
cai.zhang
fb43651a74
fix: Fix MultiSegmentWrite only write one segment (#44256)
issue: #44254

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-09-09 14:28:10 +08:00
Bingyi Sun
c5bb4f7483
fix: Fix merge sort out of range (#44230)
issue: https://github.com/milvus-io/milvus/issues/44227

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-09-08 18:01:55 +08:00
congqixia
8f97eb355f
enhance: [StorageV2] Make bucket name concatenation transparent to user (#44232)
Related to #39173

This PR:
- Bump milvus-storage commit to handle bucket name concatenation logic
in multipart s3 fs
- Remove all user-side bucket name concatenation code

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-08 10:15:55 +08:00
Bingyi Sun
ef392bb1b2
enhance: merge sort support multiple fields (#44191)
issue: #44011

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-09-04 19:53:53 +08:00
Spade A
7cb15ef141
feat: impl StructArray -- optimize vector array serialization (#44035)
issue: https://github.com/milvus-io/milvus/issues/42148

Optimized from
Go VectorArray → VectorArray Proto → Binary → C++ VectorArray Proto →
C++ VectorArray local impl → Memory
to
Go VectorArray → Arrow ListArray  → Memory

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-09-03 16:39:53 +08:00
Bingyi Sun
6624011927
enhance: storage sort can sort by multiple fields (#43994)
https://github.com/milvus-io/milvus/issues/44011
this is to support compaction that sorts records by partition key and pk
in the future

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-09-03 10:11:52 +08:00
XuanYang-cn
37a447d166
feat: Add CMEK cipher plugin (#43722)
1. Enable Milvus to read cipher configs
2. Enable cipher plugin in binlog reader and writer
3. Add a testCipher for unittests
4. Support pooling for datanode
5. Add encryption in storagev2

See also: #40321 
Signed-off-by: yangxuan <xuan.yang@zilliz.com>

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-08-27 11:15:52 +08:00
Spade A
8456f824be
feat: impl StructArray -- miscellaneous staffs for struct array (#43960)
Ref https://github.com/milvus-io/milvus/issues/42148

1. enable storage v2
2. implement some missing staffs
3. fix some bugs and add tests

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-08-26 21:35:53 +08:00
Tianx
c0d62268ac
feat: add timesatmptz data type (#44005)
issue: https://github.com/milvus-io/milvus/issues/27467
>
https://github.com/milvus-io/milvus/issues/27467#issuecomment-3092211420
> * [x]  M1 Create collection with timestamptz field
> * [x]  M2 Insert timestamptz field data
> * [x]  M3 Retrieve timestamptz field data
> * [x]  M4 Implement handoff[ ]  

The second PR of issue:
https://github.com/milvus-io/milvus/issues/27467, which completes M1-M4
described above.

---------

Signed-off-by: xtx <xtianx@smail.nju.edu.cn>
2025-08-26 15:59:53 +08:00
Spade A
d6a428e880
feat: impl StructArray -- support create index for vector array (embedding list) and search on it (#43726)
Ref https://github.com/milvus-io/milvus/issues/42148

This PR supports create index for vector array (now, only for
`DataType.FLOAT_VECTOR`) and search on it.
The index type supported in this PR is `EMB_LIST_HNSW` and the metric
type is `MAX_SIM` only.

The way to use it:
```python
milvus_client = MilvusClient("xxx:19530")
schema = milvus_client.create_schema(enable_dynamic_field=True, auto_id=True)
...
struct_schema = milvus_client.create_struct_array_field_schema("struct_array_field")
...
struct_schema.add_field("struct_float_vec", DataType.ARRAY_OF_VECTOR, element_type=DataType.FLOAT_VECTOR, dim=128, max_capacity=1000)
...
schema.add_struct_array_field(struct_schema)
index_params = milvus_client.prepare_index_params()
index_params.add_index(field_name="struct_float_vec", index_type="EMB_LIST_HNSW", metric_type="MAX_SIM", index_params={"nlist": 128})
...
milvus_client.create_index(COLLECTION_NAME, schema=schema, index_params=index_params)
```

Note: This PR uses `Lims` to convey offsets of the vector array to
knowhere where vectors of multiple vector arrays are concatenated and we
need offsets to specify which vectors belong to which vector array.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-08-20 10:27:46 +08:00
Ted Xu
e37cd19da2
enhance: enable storage v2 by default (#43652)
Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-08-01 08:59:36 +08:00
sthuang
a2c7ed2780
fix: [StorageV2] sort field binlogs paths for packed reader and writer (#43585)
key changes:
* fix unstable storage v2 compaction unit test by guaranteeing the order
of paths during sync.
* bump milvus-storage version, include
https://github.com/milvus-io/milvus-storage/pull/222
https://github.com/milvus-io/milvus-storage/pull/223
https://github.com/milvus-io/milvus-storage/pull/224
https://github.com/milvus-io/milvus-storage/pull/225
https://github.com/milvus-io/milvus-storage/pull/226
* Also fix the below related oom issue.
related: https://github.com/milvus-io/milvus/issues/43310

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-07-30 08:09:36 +08:00
congqixia
34d3f0c0f8
enhance: Reserve builder space for ValueSerializer (#43570)
Add `arrowBuild.Reserve` call for `ValueSerializer` to reduce repeated
resizing buffer when write size is large

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-28 11:02:55 +08:00
Spade A
faeb7fd410
feat: impl StructArray -- create schema, insert, and retrieve data (#42855)
Ref https://github.com/milvus-io/milvus/issues/42148

https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of
storage for handling with VectorArray.
This PR:
1. impls the go part of storage for VectorArray
2. impls the collection creation with StructArrayField and VectorArray
3. insert and retrieve data from the collection.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>
2025-07-27 01:30:55 +08:00
Ted Xu
9041bf1b9a
fix: including shouldCopy parameter in file readers (#43578)
This parameter determines whether the returned value should be a copy or
a reference from the arrow array. The updates enhance memory management
and provide more control over data handling during deserialization.

See #43186

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-07-26 17:30:55 +08:00
sthuang
a0c9f499ee
fix: [StorageV2] sync panic with nullable add field (#43142)
related: https://github.com/milvus-io/milvus/pull/42932
fix: https://github.com/milvus-io/milvus/issues/43072

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-07-25 10:08:53 +08:00
congqixia
1cf8ed505f
fix: Implement NeededFields feature in RecordReader (#43523)
Related to #43522

Currently, passing partial schema to storage v2 packed reader may
trigger SEGV during clustering compaction unit test.

This patch implement `NeededFields` differently in each `RecordReader`
imlementation. For now, v2 will implemented as no-op. This will be
supported after packed reader support this API.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-24 00:22:54 +08:00
congqixia
563e2935c5
enhance: [StorageV2] Fill ts range default values for PackedBinlogRecordWriter (#43454)
This PR fill default value for `PackedBinlogRecordWriter` timestamp
range so target segment meta will contains correct timestamp range

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-22 12:04:53 +08:00
yihao.dai
b69e601fe1
fix: [StorageV2] Correct read and write buffer size (#43335)
Correct read and buffer size to 64MB to prevent OOM during clustering
compaction.

issue: https://github.com/milvus-io/milvus/issues/43310

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-07-16 14:28:52 +08:00
yihao.dai
1984be646c
fix: Fix storagev2 binlog import (#43221)
issue: https://github.com/milvus-io/milvus/issues/43218

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-07-13 22:52:49 +08:00
congqixia
5a9efb3f81
enhance: [StorageV2] Refine storage rw option usage & validation (#43175)
Related to #39173

This PR:
- Make all datanode task passes storage config via storage config option
- Remove legacy comments, rootPath & bucketName parameters
- Fix clustering compaction option behavior
- Add validation logic for `rwOptions`
- Use correct storageType from storageConfig
- Add storage config in sync task

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-11 01:14:48 +08:00
cai.zhang
95e767611a
fix: Fix merge sort loss data when last row in a record is deleted (#43216)
issue: #43207

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-07-09 22:18:48 +08:00
cai.zhang
8720feeb79
fix: Fix enqueuing when current batch is fully deleted (#43174)
issue: #43045

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-07-08 12:20:46 +08:00
congqixia
ab818dcbca
fix: [StorageV2] Pass storage config for compaction rw (#43167)
Related to #43148

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-07 15:32:46 +08:00
congqixia
d09764508a
fix: [Storagev2] Close segment readers in mergeSort (#43116)
Related to #43062

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-04 23:56:44 +08:00
cai.zhang
4133e3b8fd
fix: Enable merge sort and fix sort bug (#43080)
issue: #42980, #43034

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-07-04 10:18:44 +08:00
congqixia
8962b0058d
fix: [StorageV2] Check writer nil when closing not written one (#43056)
Related to #43047

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-02 14:22:43 +08:00
congqixia
9b06ecb72f
enhance: [StorageV2] Release record and close reader (#42983)
Related to #39173

This PR
- Close packed reader after sort
- Release arrow.Record preventing memory leakage
- Invoke `pack_reader->Close()` for CloseReader

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-06-27 14:46:43 +08:00
sthuang
238bd30f42
fix: [StorageV2] end to end minor issues for sync, stats, and load (#42948)
Fix issues in end-to-end tests: 
1. **Split column groups based on schema**, rather than estimating by
average chunk row size. **Ensure column group consistency within a
segment**, to avoid errors caused by loading multiple column group
chunks simultaneously.
2. **Use sorted segmentId** when generating the stats binlog path, to
ensure consistent and correct file path resolution.
3. **Determine field IDs as follows**:
For multi-column column groups, retrieve the field ID list from
metadata.
For single-column column groups, use the column group ID directly as the
field ID.

related: #39173 
fix: #42862

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-06-27 14:44:42 +08:00
cai.zhang
ebe1c95bb1
enhance: Add Size interface to FileReader to eliminate the StatObject call during Read (#42908)
issue: #42907

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-06-25 14:36:41 +08:00
congqixia
ee056f0bff
fix: [AddField] Fill default value in serde logic when field missing (#42891)
Related to #42856

Default value will be missing after segment get sorted/compacted. This
PR is a temp workaround since in long term default value shall be filled
with storage engine instead.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-06-23 14:20:41 +08:00
sthuang
4a0a2441f2
enhance: [StorageV2] field id as meta path for wide column (#42787)
related: #39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-06-19 15:00:38 +08:00
congqixia
f01ff57f3f
fix: [StorageV2] Use correct offset filling null bitmap (#42774)
Related to #39173

`null_bitmap_data()` returns raw pointer of null bitmap of Array. While
after slicing, this bitmap is not rewritten due to zero copy
implementation, so the current start pos maybe non-zero while
FillFieldData generating column `valid_data` array.

This PR add `offset` param for `FillFieldData` method, and force all
invocation pass correct offset of `null_bitmap_data` ptr.

Also update milvus-storage commit fixing reader failed to return data
when buffer size smaller than row group size problem.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-06-17 10:08:38 +08:00
sthuang
ed5dbf3eaa
enhance: [StorageV2] sync separate vector datatype into its own column group (#42638)
related: #39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-06-16 11:48:37 +08:00
congqixia
ef8829c5bc
fix: [AddField] Skip missing nullable field in insertCodec (#42724)
Related to #42723

Previous PR #42684 permit insert msg transformation but insertCodec did
not adapt the same skip logic, whic causes panicking.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-06-13 19:56:36 +08:00
Zhen Ye
1f66b650e9
fix: pulsar cannot work properly if backlog exceed (#42653)
issue: #42649

- the sync operation of different pchannel is concurrent now.
- add a option to notify the backlog clear automatically.
- make pulsar walimpls can be recovered from backlog exceed.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-06-13 14:28:37 +08:00
congqixia
cbed31933a
fix: [AddField] Permit missing new nullable field in InsertMsg (#42684)
Related to #41858 #41951 #42084

When insert msg consumer (pipeline/flowgraph) have newer schema than
insertMsg, it have to adapter the insert msg used old schema(missing
newly added field)

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-06-13 13:52:35 +08:00
Zhen Ye
43f0c56ce7
fix: limit the concurency of zstd compression and decrease the memory usage of binlog generation (#42630)
issue: #42028

- limit the concurrency of zstd compression.
- zstd.go modified from
`github.com/apache/arrow/go/v17/parquet/compress/ztsd.go`
- may be related to #42129

Signed-off-by: chyezh <chyezh@outlook.com>
2025-06-11 09:06:34 +08:00
sthuang
9439eaef52
fix: [StorageV2] sync with int8 vector data type core dumped (#42616)
related: https://github.com/milvus-io/milvus/issues/42613, #39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-06-10 11:42:35 +08:00
sthuang
89c3afb12e
fix: [StorageV2] index/stats task level storage v2 fs (#42191)
related: #39173

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-06-10 11:06:35 +08:00
congqixia
118684afbb
enhance: [storageV2] Pass nullable converting insertMsg fieldData (#42584)
Related to #39173

`nullable` flag is crucial for serde logic of v2 writer, missing this
flag causes logic bug for v2 nullalbe data.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-06-10 10:06:34 +08:00
Chun Han
e9b5d9e8bc
enhance: refine compaction trigger to reduce read/write amplifaction(#41336) (#41728)
related: #41336

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-06-04 11:24:38 +08:00
yihao.dai
e0113b375e
fix: Fix sort stats generates large binlogs (#42456)
Remove the hardcoded batchSize of 100,000 and instead trigger a write
every 64MB based on actual data size. This prevents sort stats from
generating excessively large binlog files.

issue: https://github.com/milvus-io/milvus/issues/42400

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-06-04 09:56:39 +08:00
congqixia
a22088a380
enhance: [StorageV2] Make packed reader use correct path (#41919)
Related to #39173

This PR
- Use updated path with bucketName for packedReader
- Update milvus-storage commit to report reader/writer initialization
failure, see also milvus-io/milvus-storage#192

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-05-20 10:36:23 +08:00