787 Commits

Author SHA1 Message Date
marcelo-cjl
3b599441fd
feat: Add nullable vector support for proxy and querynode (#46305)
related: #45993 

This commit extends nullable vector support to the proxy layer,
querynode,
and adds comprehensive validation, search reduce, and field data
handling
    for nullable vectors with sparse storage.
    
    Proxy layer changes:
- Update validate_util.go checkAligned() with getExpectedVectorRows()
helper
      to validate nullable vector field alignment using valid data count
- Update checkFloatVectorFieldData/checkSparseFloatVectorFieldData for
      nullable vector validation with proper row count expectations
- Add FieldDataIdxComputer in typeutil/schema.go for logical-to-physical
      index translation during search reduce operations
- Update search_reduce_util.go reduceSearchResultData to use
idxComputers
      for correct field data indexing with nullable vectors
- Update task.go, task_query.go, task_upsert.go for nullable vector
handling
    - Update msg_pack.go with nullable vector field data processing
    
    QueryNode layer changes:
    - Update segments/result.go for nullable vector result handling
- Update segments/search_reduce.go with nullable vector offset
translation
    
    Storage and index changes:
- Update data_codec.go and utils.go for nullable vector serialization
- Update indexcgowrapper/dataset.go and index.go for nullable vector
indexing
    
    Utility changes:
- Add FieldDataIdxComputer struct with Compute() method for efficient
      logical-to-physical index mapping across multiple field data
- Update EstimateEntitySize() and AppendFieldData() with fieldIdxs
parameter
    - Update funcutil.go with nullable vector support functions

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Full support for nullable vector fields (float, binary, float16,
bfloat16, int8, sparse) across ingest, storage, indexing, search and
retrieval; logical↔physical offset mapping preserves row semantics.
  * Client: compaction control and compaction-state APIs.

* **Bug Fixes**
* Improved validation for adding vector fields (nullable + dimension
checks) and corrected search/query behavior for nullable vectors.

* **Chores**
  * Persisted validity maps with indexes and on-disk formats.

* **Tests**
  * Extensive new and updated end-to-end nullable-vector tests.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
2025-12-24 10:13:19 +08:00
Buqian Zheng
e379b1f0f4
enhance: moved query optimization to proxy, added various optimizations (#45526)
issue: https://github.com/milvus-io/milvus/issues/45525

see added README.md for added optimizations

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added query expression optimization feature with a new `optimizeExpr`
configuration flag to enable automatic simplification of filter
predicates, including range predicate optimization, merging of IN/NOT IN
conditions, and flattening of nested logical operators.

* **Bug Fixes**
* Adjusted delete operation behavior to correctly handle expression
evaluation.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-12-24 00:39:19 +08:00
Spade A
f6f716bcfd
feat: impl StructArray -- support embedding searches embeddings in embedding list with element level filter expression (#45830)
issue: https://github.com/milvus-io/milvus/issues/42148

For a vector field inside a STRUCT, since a STRUCT can only appear as
the element type of an ARRAY field, the vector field in STRUCT is
effectively an array of vectors, i.e. an embedding list.
Milvus already supports searching embedding lists with metrics whose
names start with the prefix MAX_SIM_.

This PR allows Milvus to search embeddings inside an embedding list
using the same metrics as normal embedding fields. Each embedding in the
list is treated as an independent vector and participates in ANN search.

Further, since STRUCT may contain scalar fields that are highly related
to the embedding field, this PR introduces an element-level filter
expression to refine search results.
The grammar of the element-level filter is:

element_filter(structFieldName, $[subFieldName] == 3)

where $[subFieldName] refers to the value of subFieldName in each
element of the STRUCT array structFieldName.

It can be combined with existing filter expressions, for example:

"varcharField == 'aaa' && element_filter(struct_field, $[struct_int] ==
3)"

A full example:
```
struct_schema = milvus_client.create_struct_field_schema()
struct_schema.add_field("struct_str", DataType.VARCHAR, max_length=65535)
struct_schema.add_field("struct_int", DataType.INT32)
struct_schema.add_field("struct_float_vec", DataType.FLOAT_VECTOR, dim=EMBEDDING_DIM)

schema.add_field(
    "struct_field",
    datatype=DataType.ARRAY,
    element_type=DataType.STRUCT,
    struct_schema=struct_schema,
    max_capacity=1000,
)
...

filter = "varcharField == 'aaa' && element_filter(struct_field, $[struct_int] == 3 && $[struct_str] == 'abc')"
res = milvus_client.search(
    COLLECTION_NAME,
    data=query_embeddings,
    limit=10,
    anns_field="struct_field[struct_float_vec]",
    filter=filter,
    output_fields=["struct_field[struct_int]", "varcharField"],
)

```
TODO:
1. When an `element_filter` expression is used, a regular filter
expression must also be present. Remove this restriction.
2. Implement `element_filter` expressions in the `query`.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-12-15 12:01:15 +08:00
zhagnlu
a86b8b7a12
enhance: move jsonshredding meta from parquet to meta.json (#46130)
#42533

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-12-11 14:01:13 +08:00
cai.zhang
bb486c0db3
fix: Fix path concatenation error when rootPath = "." in minio (#46220)
issue: #46219

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-12-10 13:53:13 +08:00
Buqian Zheng
1372e84d7f
fix: move cursor after skip index skipped a chunk (#46054)
issue: #46053

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-12-05 10:47:11 +08:00
Xinyi7
59752f216d
fix: add check in batch_score function to prevent query node seg fault (#46025)
previously we saw that when doing reranker with phrase matching, the
query node throws a segmentation fault error.

github issue link: https://github.com/milvus-io/milvus/issues/45990

---------

Signed-off-by: Xinyi Jiang <xinyi.jiang@reddit.com>
Co-authored-by: Xinyi Jiang <xinyi.jiang@reddit.com>
2025-12-04 17:35:17 +08:00
Ted Xu
20ce9fdc23
feat: bump loon version (#46029)
See: #44956

This PR upgrades loon to the latest version and resolves building
conflicts.

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Co-authored-by: Congqi Xia <congqi.xia@zilliz.com>
2025-12-04 10:57:12 +08:00
zhagnlu
d5bd17315c
enhance: remove some meta cache for json shredding (#45888)
#42533

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-12-01 11:57:09 +08:00
sparknack
0392db6976
enhance: add cancellation checking in each operator and expr (#45354)
issue: #45353

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-11-26 10:15:07 +08:00
Gao
09a3195867
enhance: support max_connections config for remote storage (#45225)
related: https://github.com/milvus-io/milvus/issues/45344

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2025-11-13 15:37:38 +08:00
Spade A
929dc65882
fix: fix index compatibility after upgrade (#45373)
issue: https://github.com/milvus-io/milvus/issues/45380

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-11-13 12:59:38 +08:00
Buqian Zheng
515a939edf
enhance: remove obsolete code (#45307)
issue: #44452

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-11-07 16:07:35 +08:00
cai.zhang
7527ddf50f
enhance: [test] Move R-Tree index tests into the implementation package (#45355)
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-11-07 10:03:33 +08:00
Spade A
cd0b36c39e
feat: impl StructArray -- support diskann index (#45223)
issue: https://github.com/milvus-io/milvus/issues/42148

---------

Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-11-04 11:57:33 +08:00
zhagnlu
653e95aaad
fix: fix bug for shredding json when empty json but not null (#45221)
#45157

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-11-04 11:11:33 +08:00
Buqian Zheng
c284e8c4a8
enhance: some minor code cleanup, prepare for scalar benchmark (#45008)
issue: https://github.com/milvus-io/milvus/issues/44452

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-10-24 14:22:05 +08:00
cqy123456
822588302a
enhance: embedding_list support mmap in MemVectorIndex (#44764)
issue: https://github.com/milvus-io/milvus/issues/44702

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2025-10-15 15:22:00 +08:00
Spade A
c4f3f0ce4c
feat: impl StructArray -- support more types of vector in STRUCT (#44736)
ref: https://github.com/milvus-io/milvus/issues/42148

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-10-15 10:25:59 +08:00
congqixia
5ece760d73
fix: Pass fs via FileManagerContext when loading index (#44733)
Related to #44615

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-10-11 09:55:57 +08:00
Gao
d3dfb90587
enhance: move tracer test to milvus-common (#44605)
#43931

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2025-09-30 15:07:06 +08:00
cai.zhang
19346fa389
feat: Geospatial Data Type and GIS Function support for milvus (#44547)
issue: #43427

This pr's main goal is merge #37417 to milvus 2.5 without conflicts.

# Main Goals

1. Create and describe collections with geospatial type
2. Insert geospatial data into the insert binlog
3. Load segments containing geospatial data into memory
4. Enable query and search can display  geospatial data
5. Support using GIS funtions like ST_EQUALS in query
6. Support R-Tree index for geometry type

# Solution

1. **Add Type**: Modify the Milvus core by adding a Geospatial type in
both the C++ and Go code layers, defining the Geospatial data structure
and the corresponding interfaces.
2. **Dependency Libraries**: Introduce necessary geospatial data
processing libraries. In the C++ source code, use Conan package
management to include the GDAL library. In the Go source code, add the
go-geom library to the go.mod file.
3. **Protocol Interface**: Revise the Milvus protocol to provide
mechanisms for Geospatial message serialization and deserialization.
4. **Data Pipeline**: Facilitate interaction between the client and
proxy using the WKT format for geospatial data. The proxy will convert
all data into WKB format for downstream processing, providing column
data interfaces, segment encapsulation, segment loading, payload
writing, and cache block management.
5. **Query Operators**: Implement simple display and support for filter
queries. Initially, focus on filtering based on spatial relationships
for a single column of geospatial literal values, providing parsing and
execution for query expressions.Now only support brutal search
7. **Client Modification**: Enable the client to handle user input for
geospatial data and facilitate end-to-end testing.Check the modification
in pymilvus.

---------

Signed-off-by: Yinwei Li <yinwei.li@zilliz.com>
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Co-authored-by: ZhuXi <150327960+Yinwei-Yu@users.noreply.github.com>
2025-09-28 19:43:05 +08:00
sparknack
14c085374e
fix: set mmap_file_raii_ to nullptr when mmap is disabled (#44516)
issue: #44510
related: #44501

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-09-24 11:50:03 +08:00
Gao
539f17f1ad
enhance: tiered index updates (#44433)
issue: #42032 #44212 

- special case for warmup param and cell storage size for tiered index
- add a config to enable/disable storage usage tracking

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2025-09-22 21:34:11 +08:00
sparknack
ab64afba2f
enhance: add storage resource usage for scalar search (#44414)
issue: #44212

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-09-22 14:28:06 +08:00
Gao
d3784c6515
enhance: add storage resource usage for vector search (#44308)
issue: #44212 

Implement search/query storage usage statistics in go side(result
reduce), only record storage usage in vector search C++ path. Need to be
implemented in query c++ path in next prs.

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com>
Co-authored-by: marcelo.chen <marcelo.chen@zilliz.com>
2025-09-19 20:20:02 +08:00
congqixia
b532a3e026
enhance: Move c API unittest aside to src files (#44458)
Related to #43931

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-19 10:30:01 +08:00
congqixia
7b83314bf3
enhance: [StorageV2] Make datanode use non-singleton fs (#44418)
Related to #39173

According to the current design, datanode shall create fs from storage
config in request instead of using singleton fs. This PR upgrade
milvus-storage and make packed reader/writer compose new fs from storage
config.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-18 20:06:00 +08:00
zhagnlu
9b6703626d
fix:fix unescaped bug for json stats (#44421)
#42533

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-09-17 20:54:01 +08:00
sthuang
2f70a73258
fix: turn on azure by default (#44377)
related: #44354, #44138, #43869

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-09-17 10:12:01 +08:00
sthuang
b38013352d
enhance: [StorageV2] enable build with azure (#44177)
related: #43869

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-09-14 08:05:58 +08:00
zhagnlu
16e6b6aa8a
fix:fix build json stats bug for nested object (#44303)
issue: #44132

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-09-11 14:13:56 +08:00
sparknack
4a01c726f3
enhance: cachinglayer: some metric and params update (#44276)
issue: #41435

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-09-10 11:03:57 +08:00
Spade A
45adf2d426
fix: load resource considers ngram index (#44237)
fix https://github.com/milvus-io/milvus/issues/44236

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-09-10 10:27:56 +08:00
Chun Han
26a024625d
feat: support search by on json field and dynamic field(#43124) (#43203)
related: #43124

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-09-09 21:51:56 +08:00
Spade A
575d490af6
fix: ngram index is mistakenly used for unsopported operations 2 (#44142)
issue: https://github.com/milvus-io/milvus/issues/44020
https://github.com/milvus-io/milvus/pull/43955 only fixed unary
expression
This fixes all expressions and add more tests.

---------

Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-09-09 19:05:56 +08:00
Buqian Zheng
9bf2b5c10c
enhance: moved more segcore unit test files (#44210)
issue: https://github.com/milvus-io/milvus/issues/43931

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-09-08 10:21:55 +08:00
Spade A
ba4cd68edb
fix: adjust params to make CPP UT run faster (#44223)
fix: https://github.com/milvus-io/milvus/issues/44224

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-09-06 14:13:54 +08:00
cqy123456
1d4d721859
test: Reduce the run time of interim index cpp ut (#44200)
issue: https://github.com/milvus-io/milvus/issues/44176

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2025-09-05 16:45:53 +08:00
Gao
2e98cb0103
enhance: load resource estimation for tiered index (#44171)
issue: https://github.com/milvus-io/milvus/issues/42032

- Use bytes to estimate load resource in the whole estimation procedure
- Add num_rows and dim info for vector index to better estimate
- Disable eviction for tiered index's meta

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2025-09-04 19:41:53 +08:00
Buqian Zheng
b76bf13fc3
enhance: move c++ unit test file to aside of the production code (#43932)
issue: https://github.com/milvus-io/milvus/issues/43931

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-09-03 23:45:53 +08:00
Spade A
825a134739
feat: impl StructArray -- reject json types for struct (#44190)
issue: https://github.com/milvus-io/milvus/issues/42148

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-09-03 19:33:53 +08:00
Spade A
7cb15ef141
feat: impl StructArray -- optimize vector array serialization (#44035)
issue: https://github.com/milvus-io/milvus/issues/42148

Optimized from
Go VectorArray → VectorArray Proto → Binary → C++ VectorArray Proto →
C++ VectorArray local impl → Memory
to
Go VectorArray → Arrow ListArray  → Memory

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-09-03 16:39:53 +08:00
foxspy
d55bf49bf1
enhance: update knowhere version (#44144)
issue: #42937

---------

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-09-03 01:31:53 +08:00
Bingyi Sun
0c0630cc38
feat: support dropping index without releasing collection (#42941)
issue: #42942

This pr includes the following changes:
1. Added checks for index checker in querycoord to generate drop index
tasks
2. Added drop index interface to querynode
3. To avoid search failure after dropping the index, the querynode
allows the use of lazy mode (warmup=disable) to load raw data even when
indexes contain raw data.
4. In segcore, loading the index no longer deletes raw data; instead, it
evicts it.
5. In expr, the index is pinned to prevent concurrent errors.

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-09-02 16:17:52 +08:00
Bingyi Sun
c420e7bd27
enhance: align the behavior of exist expr between brute force and index (#44030)
https://github.com/milvus-io/milvus/issues/44031

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-09-01 15:03:52 +08:00
zhagnlu
fc876639cf
enhance: support json stats with shredding design (#42534)
#42533

Co-authored-by: luzhang <luzhang@zilliz.com>
2025-09-01 10:49:52 +08:00
sparknack
70c8114e85
enhance: cachinglayer: resource management for segment loading (#43846)
issue: #41435

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-08-29 11:37:50 +08:00
congqixia
e3b3502287
fix: Use correct regex for cppcheck (#44077)
Related to #44076

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-08-27 20:57:50 +08:00
marcelo-cjl
e13e19cd2c
enhance: add sparse_u32_f32 data type for sparse vertor (#43974)
issue: #43973

Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com>
2025-08-27 16:47:50 +08:00