27 Commits

Author SHA1 Message Date
XuanYang-cn
37a447d166
feat: Add CMEK cipher plugin (#43722)
1. Enable Milvus to read cipher configs
2. Enable cipher plugin in binlog reader and writer
3. Add a testCipher for unittests
4. Support pooling for datanode
5. Add encryption in storagev2

See also: #40321 
Signed-off-by: yangxuan <xuan.yang@zilliz.com>

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-08-27 11:15:52 +08:00
Spade A
8456f824be
feat: impl StructArray -- miscellaneous staffs for struct array (#43960)
Ref https://github.com/milvus-io/milvus/issues/42148

1. enable storage v2
2. implement some missing staffs
3. fix some bugs and add tests

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-08-26 21:35:53 +08:00
Spade A
faeb7fd410
feat: impl StructArray -- create schema, insert, and retrieve data (#42855)
Ref https://github.com/milvus-io/milvus/issues/42148

https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of
storage for handling with VectorArray.
This PR:
1. impls the go part of storage for VectorArray
2. impls the collection creation with StructArrayField and VectorArray
3. insert and retrieve data from the collection.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>
2025-07-27 01:30:55 +08:00
Ted Xu
9041bf1b9a
fix: including shouldCopy parameter in file readers (#43578)
This parameter determines whether the returned value should be a copy or
a reference from the arrow array. The updates enhance memory management
and provide more control over data handling during deserialization.

See #43186

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-07-26 17:30:55 +08:00
congqixia
1cf8ed505f
fix: Implement NeededFields feature in RecordReader (#43523)
Related to #43522

Currently, passing partial schema to storage v2 packed reader may
trigger SEGV during clustering compaction unit test.

This patch implement `NeededFields` differently in each `RecordReader`
imlementation. For now, v2 will implemented as no-op. This will be
supported after packed reader support this API.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-24 00:22:54 +08:00
congqixia
563e2935c5
enhance: [StorageV2] Fill ts range default values for PackedBinlogRecordWriter (#43454)
This PR fill default value for `PackedBinlogRecordWriter` timestamp
range so target segment meta will contains correct timestamp range

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-22 12:04:53 +08:00
yihao.dai
b69e601fe1
fix: [StorageV2] Correct read and write buffer size (#43335)
Correct read and buffer size to 64MB to prevent OOM during clustering
compaction.

issue: https://github.com/milvus-io/milvus/issues/43310

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-07-16 14:28:52 +08:00
congqixia
5a9efb3f81
enhance: [StorageV2] Refine storage rw option usage & validation (#43175)
Related to #39173

This PR:
- Make all datanode task passes storage config via storage config option
- Remove legacy comments, rootPath & bucketName parameters
- Fix clustering compaction option behavior
- Add validation logic for `rwOptions`
- Use correct storageType from storageConfig
- Add storage config in sync task

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-11 01:14:48 +08:00
congqixia
8962b0058d
fix: [StorageV2] Check writer nil when closing not written one (#43056)
Related to #43047

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-02 14:22:43 +08:00
sthuang
238bd30f42
fix: [StorageV2] end to end minor issues for sync, stats, and load (#42948)
Fix issues in end-to-end tests: 
1. **Split column groups based on schema**, rather than estimating by
average chunk row size. **Ensure column group consistency within a
segment**, to avoid errors caused by loading multiple column group
chunks simultaneously.
2. **Use sorted segmentId** when generating the stats binlog path, to
ensure consistent and correct file path resolution.
3. **Determine field IDs as follows**:
For multi-column column groups, retrieve the field ID list from
metadata.
For single-column column groups, use the column group ID directly as the
field ID.

related: #39173 
fix: #42862

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-06-27 14:44:42 +08:00
sthuang
4a0a2441f2
enhance: [StorageV2] field id as meta path for wide column (#42787)
related: #39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-06-19 15:00:38 +08:00
sthuang
ed5dbf3eaa
enhance: [StorageV2] sync separate vector datatype into its own column group (#42638)
related: #39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-06-16 11:48:37 +08:00
sthuang
89c3afb12e
fix: [StorageV2] index/stats task level storage v2 fs (#42191)
related: #39173

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-06-10 11:06:35 +08:00
congqixia
a6d09ff4cd
enhance: [StorageV2] fix issues integrating basic RW operations (#41834)
Related to #39173

This PR:
- Upgrade milvus-storage commit to fix filesystem finalized issue
- Add bucket-name as prefix for all fs style access io
- Initial arrow fs on querynodes startup
- Fix timestamp access when loading sealed segment

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-05-15 09:52:23 +08:00
SimFG
91d40fa558
fix: Update logging context and upgrade dependencies (#41318)
- issue: #41291

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-04-23 10:52:38 +08:00
Ted Xu
1bcea2a775
fix: assigning the correct storage version in sync and index tasks (#41093)
See #39663 #40667

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-04-08 10:14:25 +08:00
Ted Xu
128efaa3e3
enhance: simplify size calculation in file writers (#40808)
See: #40342

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-03-26 20:04:22 +08:00
sthuang
d7df78a6c9
feat: Storage v2 compaction (#40667)
- Feat: Support Mix compaction. Covering tests include compatibility and
rollback ability.
  - Read v1 segments and compact with v2 format.
  - Read both v1 and v2 segments and compact with v2 format.
  - Read v2 segments and compact with v2 format.
  - Compact with duplicate primary key test.
  - Compact with bm25 segments.
  - Compact with merge sort segments.
  - Compact with no expiration segments.
  - Compact with lack binlog segments.
  - Compact with nullable field segments.
- Feat: Support Clustering compaction. Covering tests include
compatibility and rollback ability.
  - Read v1 segments and compact with v2 format.
  - Read both v1 and v2 segments and compact with v2 format.
  - Read v2 segments and compact with v2 format.
  - Compact bm25 segments with v2 format.
  - Compact with memory limit.
- Enhance: Use serdeMap serialize in BuildRecord function to support all
Milvus data types.
related: #39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-03-21 10:16:12 +08:00
Ted Xu
df4285c9ef
enhance: API integration with storage v2 in clustering-compactions (#40133)
See #39173

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-03-13 14:12:06 +08:00
sthuang
63a7c4570e
feat: storage v2 sync (#39663)
related: #39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-03-05 11:22:15 +08:00
sthuang
de02a3ebcc
feat: Storage v2 binlog packed record reader and writer (#40221)
related: #39173

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-03-03 10:24:02 +08:00
sthuang
90acc8a58f
enhance: upgrade go arrow version from 12.0.1 to 17.0.0 (#39916)
related: https://github.com/milvus-io/milvus/issues/39915

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-02-25 10:30:02 +08:00
congqixia
cb7f2fa6fd
enhance: Use v2 package name for pkg module (#39990)
Related to #39095

https://go.dev/doc/modules/version-numbers

Update pkg version according to golang dep version convention

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-02-22 23:15:58 +08:00
Ted Xu
8562a102ec
enhance: API integration with storage v2 in mix-compactions (#40008)
See #39173

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-02-22 14:23:54 +08:00
smellthemoon
8b974c5742
enhance: support compact if lack of binlog (#40000)
https://github.com/milvus-io/milvus/issues/39718

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2025-02-22 10:51:56 +08:00
sthuang
3eb3af5f08
feat: explicitly specify column groups for storage v2 api (#39790)
* use the new packed reader and writer api to be compatible with current
etcd meta
* For the new packed writer API: column groups and paths are explicitly
defined by users and won't split column groups by memory in storage v2.
Packed writer follows the user-defined column groups to split arrow
record and write into the corresponding file path.
* For the new packed reader API: read paths are explicitly defined by
users.
related: #39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-02-21 22:03:54 +08:00
sthuang
15c8798b93
feat: storage v2 serde reader and writer (#39667)
related: https://github.com/milvus-io/milvus/issues/39173

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-02-11 16:00:46 +08:00