26 Commits

Author SHA1 Message Date
XuanYang-cn
623a9e5156
fix: Accurate size estimation for sliced arrow arrays in compaction (#45294)
Sliced arrow arrays "incorrectly" returned the original array's size via
SizeInBytes(), causing inaccurate memory estimates during compaction.

This resulted in segments closing prematurely in mergeSplit mode -
expected 500MB compactions produced 4x100+MB segments instead.

Fixed by calculating actual byte size of sliced arrays, ensuring proper
segment sizing and more accurate memory usage tracking.

See also: #45293

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-11-06 14:57:34 +08:00
Spade A
c4f3f0ce4c
feat: impl StructArray -- support more types of vector in STRUCT (#44736)
ref: https://github.com/milvus-io/milvus/issues/42148

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-10-15 10:25:59 +08:00
Spade A
7cb15ef141
feat: impl StructArray -- optimize vector array serialization (#44035)
issue: https://github.com/milvus-io/milvus/issues/42148

Optimized from
Go VectorArray → VectorArray Proto → Binary → C++ VectorArray Proto →
C++ VectorArray local impl → Memory
to
Go VectorArray → Arrow ListArray  → Memory

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-09-03 16:39:53 +08:00
Spade A
faeb7fd410
feat: impl StructArray -- create schema, insert, and retrieve data (#42855)
Ref https://github.com/milvus-io/milvus/issues/42148

https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of
storage for handling with VectorArray.
This PR:
1. impls the go part of storage for VectorArray
2. impls the collection creation with StructArrayField and VectorArray
3. insert and retrieve data from the collection.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>
2025-07-27 01:30:55 +08:00
Ted Xu
9041bf1b9a
fix: including shouldCopy parameter in file readers (#43578)
This parameter determines whether the returned value should be a copy or
a reference from the arrow array. The updates enhance memory management
and provide more control over data handling during deserialization.

See #43186

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-07-26 17:30:55 +08:00
sthuang
238bd30f42
fix: [StorageV2] end to end minor issues for sync, stats, and load (#42948)
Fix issues in end-to-end tests: 
1. **Split column groups based on schema**, rather than estimating by
average chunk row size. **Ensure column group consistency within a
segment**, to avoid errors caused by loading multiple column group
chunks simultaneously.
2. **Use sorted segmentId** when generating the stats binlog path, to
ensure consistent and correct file path resolution.
3. **Determine field IDs as follows**:
For multi-column column groups, retrieve the field ID list from
metadata.
For single-column column groups, use the column group ID directly as the
field ID.

related: #39173 
fix: #42862

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-06-27 14:44:42 +08:00
sthuang
ed5dbf3eaa
enhance: [StorageV2] sync separate vector datatype into its own column group (#42638)
related: #39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-06-16 11:48:37 +08:00
sthuang
9439eaef52
fix: [StorageV2] sync with int8 vector data type core dumped (#42616)
related: https://github.com/milvus-io/milvus/issues/42613, #39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-06-10 11:42:35 +08:00
XuanYang-cn
540456041f
enhance: Remove not inuse binlog iterator (#41359)
See also: #41466

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-04-24 12:04:38 +08:00
Ted Xu
128efaa3e3
enhance: simplify size calculation in file writers (#40808)
See: #40342

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-03-26 20:04:22 +08:00
sthuang
d7df78a6c9
feat: Storage v2 compaction (#40667)
- Feat: Support Mix compaction. Covering tests include compatibility and
rollback ability.
  - Read v1 segments and compact with v2 format.
  - Read both v1 and v2 segments and compact with v2 format.
  - Read v2 segments and compact with v2 format.
  - Compact with duplicate primary key test.
  - Compact with bm25 segments.
  - Compact with merge sort segments.
  - Compact with no expiration segments.
  - Compact with lack binlog segments.
  - Compact with nullable field segments.
- Feat: Support Clustering compaction. Covering tests include
compatibility and rollback ability.
  - Read v1 segments and compact with v2 format.
  - Read both v1 and v2 segments and compact with v2 format.
  - Read v2 segments and compact with v2 format.
  - Compact bm25 segments with v2 format.
  - Compact with memory limit.
- Enhance: Use serdeMap serialize in BuildRecord function to support all
Milvus data types.
related: #39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-03-21 10:16:12 +08:00
Ted Xu
df4285c9ef
enhance: API integration with storage v2 in clustering-compactions (#40133)
See #39173

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-03-13 14:12:06 +08:00
sthuang
90acc8a58f
enhance: upgrade go arrow version from 12.0.1 to 17.0.0 (#39916)
related: https://github.com/milvus-io/milvus/issues/39915

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-02-25 10:30:02 +08:00
congqixia
cb7f2fa6fd
enhance: Use v2 package name for pkg module (#39990)
Related to #39095

https://go.dev/doc/modules/version-numbers

Update pkg version according to golang dep version convention

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-02-22 23:15:58 +08:00
Ted Xu
2978b0890e
enhance: iterative download data during compaction to reduce memory cost (#39724)
See #37234

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-02-13 10:36:47 +08:00
Ted Xu
427b6a4c94
enhance: reduce stats task cost by skipping ser/de (#39568)
See #37234

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-02-06 17:14:45 +08:00
zhagnlu
6ee94d00b9
fix:fix calculate arrow nest type and add ut (#38527)
#37767

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-12-18 11:54:44 +08:00
shaoting-huang
f4dd7c7efb
enhance: add delta log stream new format reader and writer (#34116)
issue: #34123

Benchmark case: The benchmark run the go benchmark function
`BenchmarkDeltalogFormat` which is put in the Files changed. It tests
the performance of serializing and deserializing from two different data
formats under a 10 million delete log dataset.

Metrics: The benchmarks measure the average time taken per operation
(ns/op), memory allocated per operation (MB/op), and the number of
memory allocations per operation (allocs/op).
| Test Name | Avg Time (ns/op) | Time Comparison | Memory Allocation
(MB/op) | Memory Comparison | Allocation Count (allocs/op) | Allocation
Comparison |

|---------------------------------|------------------|-----------------|---------------------------|-------------------|------------------------------|------------------------|
| one_string_format_reader | 2,781,990,000 | Baseline | 2,422 | Baseline
| 20,336,539 | Baseline |
| pk_ts_separate_format_reader | 480,682,639 | -82.72% | 1,765 | -27.14%
| 20,396,958 | +0.30% |
| one_string_format_writer | 5,483,436,041 | Baseline | 13,900 |
Baseline | 70,057,473 | Baseline |
| pk_and_ts_separate_format_writer| 798,591,584 | -85.43% | 2,178 |
-84.34% | 30,270,488 | -56.78% |

Both read and write operations show significant improvements in both
speed and memory allocation.

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2024-07-06 09:08:09 +08:00
Ted Xu
6d5747cb3e
feat: adding deltalog stream reader and writer (#33844)
See #31679

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-06-19 14:42:01 +08:00
Ted Xu
066c8ea175
feat: stream reader/writer to support nulls (#33080)
See: #31728

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-05-27 16:27:42 +08:00
Ted Xu
a8bd9bea39
fix: adding blob memory size in binlog serde (#33324)
See: #33280

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-05-24 10:33:40 +08:00
Ted Xu
a9c7ce72b8
enhance: enable stream writer in compactions (#32612)
See #31679

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-05-17 15:05:37 +08:00
Ted Xu
dc5ea6f17c
feat: adding binlog streaming writer (#31537)
See #31679

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-04-11 10:33:20 +08:00
Buqian Zheng
d7dbc3c9d8
fix: [sparse float vector] support the new streaming deserialize reader (#31325)
issue: https://github.com/milvus-io/milvus/issues/31324

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-03-17 13:59:04 +08:00
Ted Xu
987d9023a5
enhance: Enable binlog deserialize reader in datanode compaction (#31036)
See #30863

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-03-08 18:25:02 +08:00
Ted Xu
71adafa933
enhance: adding a streaming deserialize reader for binlogs (#30860)
See #30863

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-03-04 19:31:09 +08:00