13 Commits

Author SHA1 Message Date
marcelo-cjl
3b599441fd
feat: Add nullable vector support for proxy and querynode (#46305)
related: #45993 

This commit extends nullable vector support to the proxy layer,
querynode,
and adds comprehensive validation, search reduce, and field data
handling
    for nullable vectors with sparse storage.
    
    Proxy layer changes:
- Update validate_util.go checkAligned() with getExpectedVectorRows()
helper
      to validate nullable vector field alignment using valid data count
- Update checkFloatVectorFieldData/checkSparseFloatVectorFieldData for
      nullable vector validation with proper row count expectations
- Add FieldDataIdxComputer in typeutil/schema.go for logical-to-physical
      index translation during search reduce operations
- Update search_reduce_util.go reduceSearchResultData to use
idxComputers
      for correct field data indexing with nullable vectors
- Update task.go, task_query.go, task_upsert.go for nullable vector
handling
    - Update msg_pack.go with nullable vector field data processing
    
    QueryNode layer changes:
    - Update segments/result.go for nullable vector result handling
- Update segments/search_reduce.go with nullable vector offset
translation
    
    Storage and index changes:
- Update data_codec.go and utils.go for nullable vector serialization
- Update indexcgowrapper/dataset.go and index.go for nullable vector
indexing
    
    Utility changes:
- Add FieldDataIdxComputer struct with Compute() method for efficient
      logical-to-physical index mapping across multiple field data
- Update EstimateEntitySize() and AppendFieldData() with fieldIdxs
parameter
    - Update funcutil.go with nullable vector support functions

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Full support for nullable vector fields (float, binary, float16,
bfloat16, int8, sparse) across ingest, storage, indexing, search and
retrieval; logical↔physical offset mapping preserves row semantics.
  * Client: compaction control and compaction-state APIs.

* **Bug Fixes**
* Improved validation for adding vector fields (nullable + dimension
checks) and corrected search/query behavior for nullable vectors.

* **Chores**
  * Persisted validity maps with indexes and on-disk formats.

* **Tests**
  * Extensive new and updated end-to-end nullable-vector tests.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
2025-12-24 10:13:19 +08:00
congqixia
e9506f1d64
fix: Handle default values correctly during compaction for added fields (#45572)
Related to #45543

When a field with a default value is added to a collection, the default
value becomes null after compaction instead of retaining the expected
default value.

**Root Cause**
The `appendValueAt` function in `internal/storage/arrow_util.go`
incorrectly checked if the entire arrow.Array was nil before handling
default values. This meant that default values were only applied when
the array itself was nil, not when individual field values were null
(which is the correct condition).

**Changes**
1. **Early nil check**: Added a guard at the function entry to detect
nil arrow.Array and return an error immediately, as this is an
unexpected condition that should not occur during normal operation.

2. **Refactored default value handling**: Removed the per-type nil array
checks and moved default value logic to handle individual null values
within the array (when `IsNull(idx)` returns true).

3. **Applied to all types**: Updated the logic consistently across all
builder types:
   - BooleanBuilder
   - Int8Builder, Int16Builder, Int32Builder, Int64Builder
   - Float32Builder
   - StringBuilder
   - BinaryBuilder (added default value support for internal $meta json)
   - ListBuilder (removed unnecessary nil check)

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-17 19:03:38 +08:00
congqixia
1cf00c6d32
fix: Support JSON default value in compaction (#45330)
Related to #45329

Fix compaction failure when handling newly added dynamic fields with
storage v1 binlogs. The issue occurred because the
`GenerateEmptyArrayFromSchema` function did not support JSON data type
default values, causing "Unexpected default value" errors during
compaction.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-06 10:15:34 +08:00
Spade A
c4f3f0ce4c
feat: impl StructArray -- support more types of vector in STRUCT (#44736)
ref: https://github.com/milvus-io/milvus/issues/42148

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-10-15 10:25:59 +08:00
Spade A
7cb15ef141
feat: impl StructArray -- optimize vector array serialization (#44035)
issue: https://github.com/milvus-io/milvus/issues/42148

Optimized from
Go VectorArray → VectorArray Proto → Binary → C++ VectorArray Proto →
C++ VectorArray local impl → Memory
to
Go VectorArray → Arrow ListArray  → Memory

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-09-03 16:39:53 +08:00
Tianx
c0d62268ac
feat: add timesatmptz data type (#44005)
issue: https://github.com/milvus-io/milvus/issues/27467
>
https://github.com/milvus-io/milvus/issues/27467#issuecomment-3092211420
> * [x]  M1 Create collection with timestamptz field
> * [x]  M2 Insert timestamptz field data
> * [x]  M3 Retrieve timestamptz field data
> * [x]  M4 Implement handoff[ ]  

The second PR of issue:
https://github.com/milvus-io/milvus/issues/27467, which completes M1-M4
described above.

---------

Signed-off-by: xtx <xtianx@smail.nju.edu.cn>
2025-08-26 15:59:53 +08:00
Spade A
faeb7fd410
feat: impl StructArray -- create schema, insert, and retrieve data (#42855)
Ref https://github.com/milvus-io/milvus/issues/42148

https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of
storage for handling with VectorArray.
This PR:
1. impls the go part of storage for VectorArray
2. impls the collection creation with StructArrayField and VectorArray
3. insert and retrieve data from the collection.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>
2025-07-27 01:30:55 +08:00
congqixia
ee056f0bff
fix: [AddField] Fill default value in serde logic when field missing (#42891)
Related to #42856

Default value will be missing after segment get sorted/compacted. This
PR is a temp workaround since in long term default value shall be filled
with storage engine instead.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-06-23 14:20:41 +08:00
yihao.dai
e0113b375e
fix: Fix sort stats generates large binlogs (#42456)
Remove the hardcoded batchSize of 100,000 and instead trigger a write
every 64MB based on actual data size. This prevents sort stats from
generating excessively large binlog files.

issue: https://github.com/milvus-io/milvus/issues/42400

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-06-04 09:56:39 +08:00
Ted Xu
128efaa3e3
enhance: simplify size calculation in file writers (#40808)
See: #40342

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-03-26 20:04:22 +08:00
sthuang
90acc8a58f
enhance: upgrade go arrow version from 12.0.1 to 17.0.0 (#39916)
related: https://github.com/milvus-io/milvus/issues/39915

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-02-25 10:30:02 +08:00
smellthemoon
8b974c5742
enhance: support compact if lack of binlog (#40000)
https://github.com/milvus-io/milvus/issues/39718

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2025-02-22 10:51:56 +08:00
Ted Xu
53a4207f46
enhance: improve sort performance by writing with batch record (#39685)
See #37234

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-02-12 15:08:47 +08:00