45 Commits

Author SHA1 Message Date
marcelo-cjl
3b599441fd
feat: Add nullable vector support for proxy and querynode (#46305)
related: #45993 

This commit extends nullable vector support to the proxy layer,
querynode,
and adds comprehensive validation, search reduce, and field data
handling
    for nullable vectors with sparse storage.
    
    Proxy layer changes:
- Update validate_util.go checkAligned() with getExpectedVectorRows()
helper
      to validate nullable vector field alignment using valid data count
- Update checkFloatVectorFieldData/checkSparseFloatVectorFieldData for
      nullable vector validation with proper row count expectations
- Add FieldDataIdxComputer in typeutil/schema.go for logical-to-physical
      index translation during search reduce operations
- Update search_reduce_util.go reduceSearchResultData to use
idxComputers
      for correct field data indexing with nullable vectors
- Update task.go, task_query.go, task_upsert.go for nullable vector
handling
    - Update msg_pack.go with nullable vector field data processing
    
    QueryNode layer changes:
    - Update segments/result.go for nullable vector result handling
- Update segments/search_reduce.go with nullable vector offset
translation
    
    Storage and index changes:
- Update data_codec.go and utils.go for nullable vector serialization
- Update indexcgowrapper/dataset.go and index.go for nullable vector
indexing
    
    Utility changes:
- Add FieldDataIdxComputer struct with Compute() method for efficient
      logical-to-physical index mapping across multiple field data
- Update EstimateEntitySize() and AppendFieldData() with fieldIdxs
parameter
    - Update funcutil.go with nullable vector support functions

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Full support for nullable vector fields (float, binary, float16,
bfloat16, int8, sparse) across ingest, storage, indexing, search and
retrieval; logical↔physical offset mapping preserves row semantics.
  * Client: compaction control and compaction-state APIs.

* **Bug Fixes**
* Improved validation for adding vector fields (nullable + dimension
checks) and corrected search/query behavior for nullable vectors.

* **Chores**
  * Persisted validity maps with indexes and on-disk formats.

* **Tests**
  * Extensive new and updated end-to-end nullable-vector tests.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
2025-12-24 10:13:19 +08:00
Spade A
c4f3f0ce4c
feat: impl StructArray -- support more types of vector in STRUCT (#44736)
ref: https://github.com/milvus-io/milvus/issues/42148

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-10-15 10:25:59 +08:00
Spade A
7cb15ef141
feat: impl StructArray -- optimize vector array serialization (#44035)
issue: https://github.com/milvus-io/milvus/issues/42148

Optimized from
Go VectorArray → VectorArray Proto → Binary → C++ VectorArray Proto →
C++ VectorArray local impl → Memory
to
Go VectorArray → Arrow ListArray  → Memory

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-09-03 16:39:53 +08:00
Spade A
faeb7fd410
feat: impl StructArray -- create schema, insert, and retrieve data (#42855)
Ref https://github.com/milvus-io/milvus/issues/42148

https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of
storage for handling with VectorArray.
This PR:
1. impls the go part of storage for VectorArray
2. impls the collection creation with StructArrayField and VectorArray
3. insert and retrieve data from the collection.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>
2025-07-27 01:30:55 +08:00
sthuang
90acc8a58f
enhance: upgrade go arrow version from 12.0.1 to 17.0.0 (#39916)
related: https://github.com/milvus-io/milvus/issues/39915

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-02-25 10:30:02 +08:00
congqixia
cb7f2fa6fd
enhance: Use v2 package name for pkg module (#39990)
Related to #39095

https://go.dev/doc/modules/version-numbers

Update pkg version according to golang dep version convention

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-02-22 23:15:58 +08:00
congqixia
de8a266d8a
enhance: Enable linux code checker (#35084)
See also #34483

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-30 15:53:51 +08:00
shaoting-huang
88b373b024
enhance: binlog primary key turn off dict encoding (#34358)
issue: #34357 

Go Parquet uses dictionary encoding by default, and it will fall back to
plain encoding if the dictionary size exceeds the dictionary size page
limit. Users can specify custom fallback encoding by using
`parquet.WithEncoding(ENCODING_METHOD)` in writer properties. However,
Go Parquet [fallbacks to plain
encoding](e65c1e295d/go/parquet/file/column_writer_types.gen.go.tmpl (L238))
rather than custom encoding method users provide. Therefore, this patch
only turns off dictionary encoding for the primary key.

With a 5 million auto ID primary key benchmark, the parquet file size
improves from 13.93 MB to 8.36 MB when dictionary encoding is turned
off, reducing primary key storage space by 40%.

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2024-07-17 17:47:44 +08:00
smellthemoon
2a1356985d
enhance: support null in go payload (#32296)
#31728

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-06-19 17:08:00 +08:00
Buqian Zheng
8a1017a152
enhance: add helpers to parse sparse float vector in JSON (#32543)
issue: #29419

added helper functions to parse JSON representation of sparse float
vectors, will be used by both the restful server and the import utils.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-04-25 14:47:24 +08:00
Buqian Zheng
3c80083f51
feat: [Sparse Float Vector] add sparse vector support to milvus components (#30630)
add sparse float vector support to different milvus components,
including proxy, data node to receive and write sparse float vectors to
binlog, query node to handle search requests, index node to build index
for sparse float column, etc.

https://github.com/milvus-io/milvus/issues/29419

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-03-13 14:32:54 -07:00
Ted Xu
12acaf3e4f
enhance: Adding a generic stream payload reader (#30682)
See: #30404

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-02-21 17:10:52 +08:00
aoiasd
a0537156c0
enhance: delete codc deserialize data by stream batch (#30407)
relate: https://github.com/milvus-io/milvus/issues/30404

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-02-06 17:04:25 +08:00
Xu Tong
e429965f32
Add float16 approve for multi-type part (#28427)
issue:https://github.com/milvus-io/milvus/issues/22837

Add bfloat16 vector, add the index part of float16 vector.

Signed-off-by: Writer-X <1256866856@qq.com>
2024-01-11 15:48:51 +08:00
SimFG
26f06dd732
Format the code (#27275)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-09-21 09:45:27 +08:00
Xu Tong
9166011c4a
Add float16 vector (#25852)
Signed-off-by: Writer-X <1256866856@qq.com>
2023-09-08 10:03:16 +08:00
yah01
a9dccec03a
Add go payload writer (#24656) (#24762)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-06-09 13:52:39 +08:00
congqixia
41af0a98fa
Use go-api/v2 for milvus-proto (#24770)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-06-09 01:28:37 +08:00
yah01
ebd0279d3f
Check error by Error() and NoError() for better report message (#24736)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-06-08 15:36:36 +08:00
Enwei Jiao
967a97b9bd
Support json & array types (#23408)
Signed-off-by: yah01 <yang.cen@zilliz.com>
Co-authored-by: yah01 <yang.cen@zilliz.com>
2023-04-20 11:32:31 +08:00
congqixia
732986aa04
Remove fmt.Print from internal package (#22722)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-03-14 17:36:05 +08:00
SimFG
a55f739608
Separate public proto files (#19782)
Signed-off-by: SimFG <bang.fu@zilliz.com>

Signed-off-by: SimFG <bang.fu@zilliz.com>
2022-10-16 20:49:27 +08:00
SimFG
d7f38a803d
Separate some proto files (#19218)
Signed-off-by: SimFG <bang.fu@zilliz.com>

Signed-off-by: SimFG <bang.fu@zilliz.com>
2022-09-16 16:56:49 +08:00
groot
b161aec95e
Support input empty string (#19111) (#19144)
Signed-off-by: yhmo <yihua.mo@zilliz.com>

Signed-off-by: yhmo <yihua.mo@zilliz.com>

Signed-off-by: yhmo <yihua.mo@zilliz.com>
2022-09-13 13:36:29 +08:00
xige-16
4de1bfe5bc
Add cpp data codec (#18538)
Signed-off-by: xige-16 <xi.ge@zilliz.com>
Co-authored-by: zhagnlu lu.zhang@zilliz.com

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-09-09 22:12:34 +08:00
Letian Jiang
4ae1ca2cac
Fix arrow builder nullptr check in FinishPayloadWriter (#17873)
Signed-off-by: Letian Jiang <letian.jiang@zilliz.com>
2022-06-28 20:04:17 +08:00
Letian Jiang
72bbe40254
Make PayloadReader read column data in batch (#16826)
Signed-off-by: Letian Jiang <letian.jiang@zilliz.com>
2022-05-10 11:37:52 +08:00
Xiaofan
801eeffbcc
Replace cgo parquet reader to go parquet reader (#16199)
Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
2022-03-30 15:21:28 +08:00
XuanYang-cn
1567aa3f53
[skip e2e]Update license for storage payload (#14347)
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2021-12-28 09:43:26 +08:00
godchen
febdda90f4
Change binlog writer close behavior (#13046)
Signed-off-by: godchen <qingxiang.chen@zilliz.com>
2021-12-09 12:37:06 +08:00
godchen
5c44ee8c0f
Close binlog event when error occur (#12995)
Signed-off-by: godchen <qingxiang.chen@zilliz.com>
2021-12-08 21:11:39 +08:00
congqixia
8c107ab15f
Set segment DroppedAt when compacted (#12182)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2021-11-22 20:11:14 +08:00
Bingyi Sun
a3d4cbdd4c
Convert c array to a golang slice (#12066)
issue: #11974
Signed-off-by: sunby <bingyi.sun@zilliz.com>

Co-authored-by: sunby <bingyi.sun@zilliz.com>
2021-11-22 10:05:14 +08:00
Cai Yudong
3387b07dfd
Optimize code under storage (#6335)
* rename AddOneStringToPayload/GetOneStringFromPayload to AddStringToPayload/GetStringFromPayload

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* code optimize

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* rename print_binglog_test to print_binlog_test

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* update chap08_binlog.md

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* fix unittest

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* use SetEventTimestamp() to replace SetStartTimestamp() and SetEndTimestamp()

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* code optimize

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* rename AddStringToPayload/GetStringFromPayload to AddOneStringToPayload/GetOneStringFromPayload

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>
2021-07-07 19:10:07 +08:00
Xiangyu Wang
82ccd4cec0
Rename module (#4988)
* Rename module

Signed-off-by: Xiangyu Wang <xiangyu.wang@zilliz.com>
2021-04-22 14:45:57 +08:00
godchen
0dfcb90881 Add storage copyright
Signed-off-by: godchen <qingxiang.chen@zilliz.com>
2021-04-19 11:32:24 +08:00
godchen
a5ad70a5ab Add unittest for storage
Signed-off-by: godchen <qingxiang.chen@zilliz.com>
2021-04-19 10:36:19 +08:00
godchen
f3649f0419 Refactor interface and proto
Signed-off-by: godchen <qingxiang.chen@zilliz.com>
2021-03-12 14:22:09 +08:00
XuanYang-cn
32977e270c Add impl cgo of parquet
Signed-off-by: XuanYang-cn <xuan.yang@zilliz.com>
2020-12-08 14:41:04 +08:00
zhenshan.cao
7bbbc14637 Fix bug: address already used
Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2020-12-07 15:22:20 +08:00
xige-16
37d7526d31 Fix search test failure
Signed-off-by: xige-16 <xi.ge@zilliz.com>
2020-12-07 14:37:42 +08:00
neza2017
0d8273c7cc Add string type payload cgo wrapper
Signed-off-by: neza2017 <yefu.chen@zilliz.com>
2020-12-06 15:15:11 +08:00
neza2017
70710dee47 Add parquet payload
Signed-off-by: neza2017 <yefu.chen@zilliz.com>
2020-12-05 16:11:03 +08:00
FluorineDog
63c8f60c6e Enable term parser and executor
Signed-off-by: FluorineDog <guilin.gou@zilliz.com>
2020-12-05 09:51:27 +08:00
FluorineDog
6412ebc0d4 Add support of metric type in schema, enable binary vector, fix segfault
Signed-off-by: FluorineDog <guilin.gou@zilliz.com>
2020-12-05 06:46:01 +08:00