607 Commits

Author SHA1 Message Date
congqixia
c01fd94a6a
enhance: integrate Storage V2 FFI interface for unified storage access (#45723)
Related #44956
This commit integrates the Storage V2 FFI (Foreign Function Interface)
interface throughout the Milvus codebase, enabling unified storage
access through the Loon FFI layer. This is a significant step towards
standardizing storage operations across different storage versions.

1. Configuration Support
- **configs/milvus.yaml**: Added `useLoonFFI` configuration flag under
`common.storage.file.splitByAvgSize` section
- Allows runtime toggle between traditional binlog readers and new
FFI-based manifest readers
  - Default: `false` (maintains backward compatibility)

2. Core FFI Infrastructure

Enhanced Utilities (internal/core/src/storage/loon_ffi/util.cpp/h)
- **ToCStorageConfig()**: Converts Go's `StorageConfig` to C's
`CStorageConfig` struct for FFI calls
- **GetManifest()**: Parses manifest JSON and retrieves latest column
groups using FFI
  - Accepts manifest path with `base_path` and `ver` fields
  - Calls `get_latest_column_groups()` FFI function
  - Returns column group information as string
  - Comprehensive error handling for JSON parsing and FFI errors

3. Dependency Updates
- **internal/core/thirdparty/milvus-storage/CMakeLists.txt**:
  - Updated milvus-storage version from `0883026` to `302143c`
  - Ensures compatibility with latest FFI interfaces

4. Data Coordinator Changes

All compaction task builders now include manifest path in segment
binlogs:

- **compaction_task_clustering.go**: Added `Manifest:
segInfo.GetManifestPath()` to segment binlogs
- **compaction_task_l0.go**: Added manifest path to both L0 segment
selection and compaction plan building
- **compaction_task_mix.go**: Added manifest path to mixed compaction
segment binlogs
- **meta.go**: Updated metadata completion logic:
- `completeClusterCompactionMutation()`: Set `ManifestPath` in new
segment info
- `completeMixCompactionMutation()`: Preserve manifest path in compacted
segments
- `completeSortCompactionMutation()`: Include manifest path in sorted
segments

5. Data Node Compactor Enhancements

All compactors updated to support dual-mode reading (binlog vs
manifest):

6. Flush & Sync Manager Updates

Pack Writer V2 (pack_writer_v2.go)
- **BulkPackWriterV2.Write()**: Extended return signature to include
`manifest string`
- Implementation:
  - Generate manifest path: `path.Join(pack.segmentID, "manifest.json")`
  - Write packed data using FFI-based writer
  - Return manifest path along with binlogs, deltas, and stats

Task Handling (task.go)
- Updated all sync task result handling to accommodate new manifest
return value
- Ensured backward compatibility for callers not using manifest

7. Go Storage Layer Integration

New Interfaces and Implementations
- **record_reader.go**: Interface for unified record reading across
storage versions
- **record_writer.go**: Interface for unified record writing across
storage versions
- **binlog_record_writer.go**: Concrete implementation for traditional
binlog-based writing

Enhanced Schema Support (schema.go, schema_test.go)
- Schema conversion utilities to support FFI-based storage operations
- Ensures proper Arrow schema mapping for V2 storage

Serialization Updates
- **serde.go, serde_events.go, serde_events_v2.go**: Updated to work
with new reader/writer interfaces
- Test files updated to validate dual-mode serialization

8. Storage V2 Packed Format

FFI Common (storagev2/packed/ffi_common.go)
- Common FFI utilities and type conversions for packed storage format

Packed Writer FFI (storagev2/packed/packed_writer_ffi.go)
- FFI-based implementation of packed writer
- Integrates with Loon storage layer for efficient columnar writes

Packed Reader FFI (storagev2/packed/packed_reader_ffi.go)
- Already existed, now complemented by writer implementation

9. Protocol Buffer Updates

data_coord.proto & datapb/data_coord.pb.go
- Added `manifest` field to compaction segment messages
- Enables passing manifest metadata through compaction pipeline

worker.proto & workerpb/worker.pb.go
- Added compaction parameter for `useLoonFFI` flag
- Allows workers to receive FFI configuration from coordinator

10. Parameter Configuration

component_param.go
- Added `UseLoonFFI` parameter to compaction configuration
- Reads from `common.storage.file.useLoonFFI` config path
- Default: `false` for safe rollout

11. Test Updates
- **clustering_compactor_storage_v2_test.go**: Updated signatures to
handle manifest return value
- **mix_compactor_storage_v2_test.go**: Updated test helpers for
manifest support
- **namespace_compactor_test.go**: Adjusted writer calls to expect
manifest
- **pack_writer_v2_test.go**: Validated manifest generation in pack
writing

This integration follows a **dual-mode approach**:
1. **Legacy Path**: Traditional binlog-based reading/writing (when
`useLoonFFI=false` or no manifest)
2. **FFI Path**: Manifest-based reading/writing through Loon FFI (when
`useLoonFFI=true` and manifest exists)

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-24 19:57:07 +08:00
XuanYang-cn
c082317681
fix: Use base64 to encode not utf-8 bytes (#45655)
See also: #45654

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-11-24 18:23:06 +08:00
congqixia
e9506f1d64
fix: Handle default values correctly during compaction for added fields (#45572)
Related to #45543

When a field with a default value is added to a collection, the default
value becomes null after compaction instead of retaining the expected
default value.

**Root Cause**
The `appendValueAt` function in `internal/storage/arrow_util.go`
incorrectly checked if the entire arrow.Array was nil before handling
default values. This meant that default values were only applied when
the array itself was nil, not when individual field values were null
(which is the correct condition).

**Changes**
1. **Early nil check**: Added a guard at the function entry to detect
nil arrow.Array and return an error immediately, as this is an
unexpected condition that should not occur during normal operation.

2. **Refactored default value handling**: Removed the per-type nil array
checks and moved default value logic to handle individual null values
within the array (when `IsNull(idx)` returns true).

3. **Applied to all types**: Updated the logic consistently across all
builder types:
   - BooleanBuilder
   - Int8Builder, Int16Builder, Int32Builder, Int64Builder
   - Float32Builder
   - StringBuilder
   - BinaryBuilder (added default value support for internal $meta json)
   - ListBuilder (removed unnecessary nil check)

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-17 19:03:38 +08:00
XuanYang-cn
623a9e5156
fix: Accurate size estimation for sliced arrow arrays in compaction (#45294)
Sliced arrow arrays "incorrectly" returned the original array's size via
SizeInBytes(), causing inaccurate memory estimates during compaction.

This resulted in segments closing prematurely in mergeSplit mode -
expected 500MB compactions produced 4x100+MB segments instead.

Fixed by calculating actual byte size of sliced arrays, ensuring proper
segment sizing and more accurate memory usage tracking.

See also: #45293

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-11-06 14:57:34 +08:00
congqixia
1cf00c6d32
fix: Support JSON default value in compaction (#45330)
Related to #45329

Fix compaction failure when handling newly added dynamic fields with
storage v1 binlogs. The issue occurred because the
`GenerateEmptyArrayFromSchema` function did not support JSON data type
default values, causing "Unexpected default value" errors during
compaction.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-06 10:15:34 +08:00
congqixia
55bfd610b6
enhance: [StorageV2] Integrate FFI interface for packed reader (#45132)
Related to #44956

Integrate the StorageV2 FFI interface as the unified storage layer for
reading packed columnar data, replacing the custom iterative reader with
a manifest-based approach using the milvus-storage library.

Changes:
- Add C++ FFI reader implementation (ffi_reader_c.cpp/h) with Arrow C
Stream interface
- Implement utility functions to convert CStorageConfig to
milvus-storage Properties
- Create ManifestReader in Go that generates manifests from binlogs
- Add FFI packed reader CGO bindings (packed_reader_ffi.go)
- Refactor NewBinlogRecordReader to use ManifestReader for V2 storage
- Support both manifest file paths and direct manifest content
- Enable configurable buffer sizes and column projection

Technical improvements:
- Zero-copy data exchange using Arrow C Data Interface
- Optimized I/O operations through milvus-storage library
- Simplified code path with manifest-based reading
- Better performance with batched streaming reads

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-05 19:57:34 +08:00
zhenshan.cao
6327c9a514
fix: Fix bugs related to TimestampTz (#45111)
issue: https://github.com/milvus-io/milvus/issues/44527
https://github.com/milvus-io/milvus/issues/44537
https://github.com/milvus-io/milvus/issues/44538
https://github.com/milvus-io/milvus/issues/44585
https://github.com/milvus-io/milvus/issues/44622

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2025-11-04 16:51:33 +08:00
cai.zhang
3c9aa3e784
fix: Fix import null geometry data (#45161)
issue: #44787

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-10-30 14:18:07 +08:00
cai.zhang
c33d221536
fix: Fix bug for importing Geometry data (#45089)
issue: #44787 , #45012

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-10-27 20:34:11 +08:00
aoiasd
cfeb095ad7
enhance: forbid build analyzer at proxy (#44067)
relate: https://github.com/milvus-io/milvus/issues/43687
We used to run the temporary analyzer and validate analyzer on the
proxy, but the proxy should not be a computation-heavy node. This PR
move all analyzer calculations to the streaming node.

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-10-23 10:58:12 +08:00
Ted Xu
196006b4ce
enhance: update delta log serialization APIs to integrate storage V2 (#44998)
See #39173

In this PR:

- Adjusted the delta log serialization APIs.
- Refactored the stats collector to improve the collection and digest of
primary key and BM25 statistics.
- Introduced new tests for the delta log reader/writer and stats
collectors to ensure functionality and correctness.

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-10-22 15:58:12 +08:00
congqixia
8f16afd5e7
fix: Handle JSON field default values in storage layer (#44999)
Related to #44995
Added missing case for JSON data type in GetDefaultValue function to
properly retrieve default values for JSON fields. This prevents crashes
when enabling dynamic fields with default values during concurrent
insert operations.

Changes:
- Added JSON data type case in GetDefaultValue to return BytesData
- Added comprehensive tests for fillMissingFields covering JSON and
other data types with default values
- Added tests for nullable fields, required fields validation, and edge
cases

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-10-21 15:14:03 +08:00
cai.zhang
d5ecb63f53
enhance: Support import geometry data by json/csv (#44826)
issue: #44787

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-10-17 17:08:02 +08:00
Spade A
6c8e353439
feat: impl StructArray -- ban non-float-vector for now (#44875)
ref https://github.com/milvus-io/milvus/issues/42148

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-10-17 10:26:09 +08:00
Spade A
c4f3f0ce4c
feat: impl StructArray -- support more types of vector in STRUCT (#44736)
ref: https://github.com/milvus-io/milvus/issues/42148

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-10-15 10:25:59 +08:00
cai.zhang
7a93cfe890
fix: Fix bug for nullable geometry (#44732)
issue: #44648

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-10-11 11:27:57 +08:00
aoiasd
78ee76f018
enhance: support preload sealed segment bm25 stats and optimize bm25 stats serialize (#44279)
relate: https://github.com/milvus-io/milvus/issues/41424

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-09-29 16:35:05 +08:00
cai.zhang
19346fa389
feat: Geospatial Data Type and GIS Function support for milvus (#44547)
issue: #43427

This pr's main goal is merge #37417 to milvus 2.5 without conflicts.

# Main Goals

1. Create and describe collections with geospatial type
2. Insert geospatial data into the insert binlog
3. Load segments containing geospatial data into memory
4. Enable query and search can display  geospatial data
5. Support using GIS funtions like ST_EQUALS in query
6. Support R-Tree index for geometry type

# Solution

1. **Add Type**: Modify the Milvus core by adding a Geospatial type in
both the C++ and Go code layers, defining the Geospatial data structure
and the corresponding interfaces.
2. **Dependency Libraries**: Introduce necessary geospatial data
processing libraries. In the C++ source code, use Conan package
management to include the GDAL library. In the Go source code, add the
go-geom library to the go.mod file.
3. **Protocol Interface**: Revise the Milvus protocol to provide
mechanisms for Geospatial message serialization and deserialization.
4. **Data Pipeline**: Facilitate interaction between the client and
proxy using the WKT format for geospatial data. The proxy will convert
all data into WKB format for downstream processing, providing column
data interfaces, segment encapsulation, segment loading, payload
writing, and cache block management.
5. **Query Operators**: Implement simple display and support for filter
queries. Initially, focus on filtering based on spatial relationships
for a single column of geospatial literal values, providing parsing and
execution for query expressions.Now only support brutal search
7. **Client Modification**: Enable the client to handle user input for
geospatial data and facilitate end-to-end testing.Check the modification
in pymilvus.

---------

Signed-off-by: Yinwei Li <yinwei.li@zilliz.com>
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Co-authored-by: ZhuXi <150327960+Yinwei-Yu@users.noreply.github.com>
2025-09-28 19:43:05 +08:00
Tianx
2c0c5ef41e
feat: timestamptz expression & index & timezone (#44080)
issue: https://github.com/milvus-io/milvus/issues/27467

>My plan is as follows.
>- [x] M1 Create collection with timestamptz field
>- [x] M2 Insert timestamptz field data
>- [x] M3 Retrieve timestamptz field data
>- [x] M4 Implement handoff
>- [x] M5 Implement compare operator
>- [x] M6 Implement extract operator
 >- [x] M8 Support database/collection level default timezone
>- [x] M7 Support STL-SORT index for datatype timestamptz

---

The third PR of issue: https://github.com/milvus-io/milvus/issues/27467,
which completes M5, M6, M7, M8 described above.

## M8 Default Timezone

We will be able to use alter_collection() and alter_database() in a
future Python SDK release to modify the default timezone at the
collection or database level.

For insert requests, the timezone will be resolved using the following
order of precedence: String Literal-> Collection Default -> Database
Default.
For retrieval requests, the timezone will be resolved in this order:
Query Parameters -> Collection Default -> Database Default.
In both cases, the final fallback timezone is UTC.


## M5: Comparison Operators

We can now use the following expression format to filter on the
timestamptz field:

- `timestamptz_field [+/- INTERVAL 'interval_string'] {comparison_op}
ISO 'iso_string' `

- The interval_string follows the ISO 8601 duration format, for example:
P1Y2M3DT1H2M3S.

- The iso_string follows the ISO 8601 timestamp format, for example:
2025-01-03T00:00:00+08:00.

- Example expressions: "tsz + INTERVAL 'P0D' != ISO
'2025-01-03T00:00:00+08:00'" or "tsz != ISO
'2025-01-03T00:00:00+08:00'".

## M6: Extract

We will be able to extract sepecific time filed by kwargs in a future
Python SDK release.
The key is `time_fields`, and value should be one or more of "year,
month, day, hour, minute, second, microsecond", seperated by comma or
space. Then the result of each record would be an array of int64.



## M7: Indexing Support

Expressions without interval arithmetic can be accelerated using an
STL-SORT index. However, expressions that include interval arithmetic
cannot be indexed. This is because the result of an interval calculation
depends on the specific timestamp value. For example, adding one month
to a date in February results in a different number of added days than
adding one month to a date in March.

--- 

After this PR, the input / output type of timestamptz would be iso
string. Timestampz would be stored as timestamptz data, which is int64_t
finally.

> for more information, see https://en.wikipedia.org/wiki/ISO_8601

---------

Signed-off-by: xtx <xtianx@smail.nju.edu.cn>
2025-09-23 10:24:12 +08:00
congqixia
6f7318a731
enhance: [StorageV2] Use compressed size as log file size (#44402)
Related to #39173

backlog issue that memory size and log size shared same value. This
patch add `GetFileSize` api to get remote compressed binlog size as meta
log file size to calculate usage more accurate.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-16 21:20:02 +08:00
congqixia
aa861f55e6
enhance: [StorageV2] Reverts #44232 bucket name change (#44390)
Related to #39173

- Put bucket name concatenation logic back for azure support

This reverts commit 8f97eb355fde6b86cf37f166d2191750b4210ba3.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-16 10:10:00 +08:00
Spade A
eb793531b9
feat: impl StructArray -- support import for CSV/JSON/PARQUET/BINLOG (#44201)
Ref https://github.com/milvus-io/milvus/issues/42148

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-09-15 20:41:59 +08:00
congqixia
9cfa013ec6
enhance: [StorageV2] Store column group info in compaction result (#44327)
Related to #44257

This PR store split result in compaction segment binlog struct to make
querynode could utilize it later.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-11 19:49:57 +08:00
congqixia
fc968ff1c2
enhance: [StorageV2] Pass args for avg size split policy (#44301)
Related to #44257

This PR
- Pass column stats for avg size split policy
- Add param items for policy configuration

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-11 10:43:57 +08:00
congqixia
f5618d5153
enhance: [StorageV2] Utilized advance split policy and persist in meta (#44282)
Related to #44257

This PR:
- Utilize configurable split policy for storage v2, enabling system
field policy
- Store split result in field binlog struct
- Adapt legacy binlog without child fields

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-10 14:47:57 +08:00
cai.zhang
fb43651a74
fix: Fix MultiSegmentWrite only write one segment (#44256)
issue: #44254

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-09-09 14:28:10 +08:00
Bingyi Sun
c5bb4f7483
fix: Fix merge sort out of range (#44230)
issue: https://github.com/milvus-io/milvus/issues/44227

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-09-08 18:01:55 +08:00
congqixia
8f97eb355f
enhance: [StorageV2] Make bucket name concatenation transparent to user (#44232)
Related to #39173

This PR:
- Bump milvus-storage commit to handle bucket name concatenation logic
in multipart s3 fs
- Remove all user-side bucket name concatenation code

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-08 10:15:55 +08:00
Bingyi Sun
ef392bb1b2
enhance: merge sort support multiple fields (#44191)
issue: #44011

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-09-04 19:53:53 +08:00
Spade A
7cb15ef141
feat: impl StructArray -- optimize vector array serialization (#44035)
issue: https://github.com/milvus-io/milvus/issues/42148

Optimized from
Go VectorArray → VectorArray Proto → Binary → C++ VectorArray Proto →
C++ VectorArray local impl → Memory
to
Go VectorArray → Arrow ListArray  → Memory

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-09-03 16:39:53 +08:00
Bingyi Sun
6624011927
enhance: storage sort can sort by multiple fields (#43994)
https://github.com/milvus-io/milvus/issues/44011
this is to support compaction that sorts records by partition key and pk
in the future

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-09-03 10:11:52 +08:00
XuanYang-cn
37a447d166
feat: Add CMEK cipher plugin (#43722)
1. Enable Milvus to read cipher configs
2. Enable cipher plugin in binlog reader and writer
3. Add a testCipher for unittests
4. Support pooling for datanode
5. Add encryption in storagev2

See also: #40321 
Signed-off-by: yangxuan <xuan.yang@zilliz.com>

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-08-27 11:15:52 +08:00
Spade A
8456f824be
feat: impl StructArray -- miscellaneous staffs for struct array (#43960)
Ref https://github.com/milvus-io/milvus/issues/42148

1. enable storage v2
2. implement some missing staffs
3. fix some bugs and add tests

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-08-26 21:35:53 +08:00
Tianx
c0d62268ac
feat: add timesatmptz data type (#44005)
issue: https://github.com/milvus-io/milvus/issues/27467
>
https://github.com/milvus-io/milvus/issues/27467#issuecomment-3092211420
> * [x]  M1 Create collection with timestamptz field
> * [x]  M2 Insert timestamptz field data
> * [x]  M3 Retrieve timestamptz field data
> * [x]  M4 Implement handoff[ ]  

The second PR of issue:
https://github.com/milvus-io/milvus/issues/27467, which completes M1-M4
described above.

---------

Signed-off-by: xtx <xtianx@smail.nju.edu.cn>
2025-08-26 15:59:53 +08:00
Spade A
d6a428e880
feat: impl StructArray -- support create index for vector array (embedding list) and search on it (#43726)
Ref https://github.com/milvus-io/milvus/issues/42148

This PR supports create index for vector array (now, only for
`DataType.FLOAT_VECTOR`) and search on it.
The index type supported in this PR is `EMB_LIST_HNSW` and the metric
type is `MAX_SIM` only.

The way to use it:
```python
milvus_client = MilvusClient("xxx:19530")
schema = milvus_client.create_schema(enable_dynamic_field=True, auto_id=True)
...
struct_schema = milvus_client.create_struct_array_field_schema("struct_array_field")
...
struct_schema.add_field("struct_float_vec", DataType.ARRAY_OF_VECTOR, element_type=DataType.FLOAT_VECTOR, dim=128, max_capacity=1000)
...
schema.add_struct_array_field(struct_schema)
index_params = milvus_client.prepare_index_params()
index_params.add_index(field_name="struct_float_vec", index_type="EMB_LIST_HNSW", metric_type="MAX_SIM", index_params={"nlist": 128})
...
milvus_client.create_index(COLLECTION_NAME, schema=schema, index_params=index_params)
```

Note: This PR uses `Lims` to convey offsets of the vector array to
knowhere where vectors of multiple vector arrays are concatenated and we
need offsets to specify which vectors belong to which vector array.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-08-20 10:27:46 +08:00
Ted Xu
e37cd19da2
enhance: enable storage v2 by default (#43652)
Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-08-01 08:59:36 +08:00
sthuang
a2c7ed2780
fix: [StorageV2] sort field binlogs paths for packed reader and writer (#43585)
key changes:
* fix unstable storage v2 compaction unit test by guaranteeing the order
of paths during sync.
* bump milvus-storage version, include
https://github.com/milvus-io/milvus-storage/pull/222
https://github.com/milvus-io/milvus-storage/pull/223
https://github.com/milvus-io/milvus-storage/pull/224
https://github.com/milvus-io/milvus-storage/pull/225
https://github.com/milvus-io/milvus-storage/pull/226
* Also fix the below related oom issue.
related: https://github.com/milvus-io/milvus/issues/43310

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-07-30 08:09:36 +08:00
congqixia
34d3f0c0f8
enhance: Reserve builder space for ValueSerializer (#43570)
Add `arrowBuild.Reserve` call for `ValueSerializer` to reduce repeated
resizing buffer when write size is large

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-28 11:02:55 +08:00
Spade A
faeb7fd410
feat: impl StructArray -- create schema, insert, and retrieve data (#42855)
Ref https://github.com/milvus-io/milvus/issues/42148

https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of
storage for handling with VectorArray.
This PR:
1. impls the go part of storage for VectorArray
2. impls the collection creation with StructArrayField and VectorArray
3. insert and retrieve data from the collection.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>
2025-07-27 01:30:55 +08:00
Ted Xu
9041bf1b9a
fix: including shouldCopy parameter in file readers (#43578)
This parameter determines whether the returned value should be a copy or
a reference from the arrow array. The updates enhance memory management
and provide more control over data handling during deserialization.

See #43186

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-07-26 17:30:55 +08:00
sthuang
a0c9f499ee
fix: [StorageV2] sync panic with nullable add field (#43142)
related: https://github.com/milvus-io/milvus/pull/42932
fix: https://github.com/milvus-io/milvus/issues/43072

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-07-25 10:08:53 +08:00
congqixia
1cf8ed505f
fix: Implement NeededFields feature in RecordReader (#43523)
Related to #43522

Currently, passing partial schema to storage v2 packed reader may
trigger SEGV during clustering compaction unit test.

This patch implement `NeededFields` differently in each `RecordReader`
imlementation. For now, v2 will implemented as no-op. This will be
supported after packed reader support this API.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-24 00:22:54 +08:00
congqixia
563e2935c5
enhance: [StorageV2] Fill ts range default values for PackedBinlogRecordWriter (#43454)
This PR fill default value for `PackedBinlogRecordWriter` timestamp
range so target segment meta will contains correct timestamp range

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-22 12:04:53 +08:00
yihao.dai
b69e601fe1
fix: [StorageV2] Correct read and write buffer size (#43335)
Correct read and buffer size to 64MB to prevent OOM during clustering
compaction.

issue: https://github.com/milvus-io/milvus/issues/43310

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-07-16 14:28:52 +08:00
yihao.dai
1984be646c
fix: Fix storagev2 binlog import (#43221)
issue: https://github.com/milvus-io/milvus/issues/43218

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-07-13 22:52:49 +08:00
congqixia
5a9efb3f81
enhance: [StorageV2] Refine storage rw option usage & validation (#43175)
Related to #39173

This PR:
- Make all datanode task passes storage config via storage config option
- Remove legacy comments, rootPath & bucketName parameters
- Fix clustering compaction option behavior
- Add validation logic for `rwOptions`
- Use correct storageType from storageConfig
- Add storage config in sync task

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-11 01:14:48 +08:00
cai.zhang
95e767611a
fix: Fix merge sort loss data when last row in a record is deleted (#43216)
issue: #43207

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-07-09 22:18:48 +08:00
cai.zhang
8720feeb79
fix: Fix enqueuing when current batch is fully deleted (#43174)
issue: #43045

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-07-08 12:20:46 +08:00
congqixia
ab818dcbca
fix: [StorageV2] Pass storage config for compaction rw (#43167)
Related to #43148

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-07 15:32:46 +08:00
congqixia
d09764508a
fix: [Storagev2] Close segment readers in mergeSort (#43116)
Related to #43062

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-04 23:56:44 +08:00