2268 Commits

Author SHA1 Message Date
congqixia
0a208d7224
enhance: Move segment loading logic from Go layer to segcore for self-managed loading (#45488)
Related to #45060

Refactor segment loading architecture to make segments autonomously
manage their own loading process, moving the orchestration logic from Go
(segment_loader.go) to C++ (segcore).

**C++ Layer (segcore):**
- Added `SetLoadInfo()` and `Load()` methods to `SegmentInterface` and
implementations
- Implemented `ChunkedSegmentSealedImpl::Load()` with parallel loading
strategy:
  - Separates indexed fields from non-indexed fields
  - Loads indexes concurrently using thread pools
  - Loads field data for non-indexed fields in parallel
- Implemented `SegmentGrowingImpl::Load()` to convert and load field
data
- Extracted `LoadIndexData()` as a reusable utility function in
`Utils.cpp`
- Added `SegmentLoad()` C binding in `segment_c.cpp`

**Go Layer:**
- Added `Load()` method to segment interfaces
- Updated mock implementations and test interfaces
- Integrated new C++ `SegmentLoad()` binding in Go segment wrapper

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-14 11:21:37 +08:00
Gao
09a3195867
enhance: support max_connections config for remote storage (#45225)
related: https://github.com/milvus-io/milvus/issues/45344

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2025-11-13 15:37:38 +08:00
Spade A
929dc65882
fix: fix index compatibility after upgrade (#45373)
issue: https://github.com/milvus-io/milvus/issues/45380

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-11-13 12:59:38 +08:00
Chun Han
406fa7b694
fix: failed to get raw data for hybrid index(#45318) (#45411)
related: #45318

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-11-13 10:17:37 +08:00
sparknack
9d75d0393e
enhance: some optimization of scalar field fetching in tiered storage scenarios (#45360)
issue: #43611

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-11-11 17:17:41 +08:00
cai.zhang
e3c1673191
fix: Fix filter geometry for growing with mmap (#45464)
issue: #45450

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-11-11 15:39:36 +08:00
Chun Han
69f3aab229
feat: milvus support huawei cloud iam verification(#45298) (#45457)
related: #45298

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-11-11 14:41:41 +08:00
congqixia
8d1ea751a6
fix: Support JSON default values in FillFieldData (#45455)
Related to #45445

Previously, FillFieldData for JSON fields would assert and fail when a
default_value was provided, blocking index creation for JSON fields with
default values (including dynamic fields like $meta).

This change enables JSON default value support by:
- Removing the assertion that blocked default values
- Parsing bytes_data into Json objects when default_value is present
- Properly filling data_ array and setting valid_data_ bitset to true
- Maintaining null behavior when no default_value is provided

Impact:
- Fixes index creation failure for JSON fields with default values
- Resolves upgrade issues from 2.5 to 2.6.5 where dynamic fields with
default values couldn't be indexed
- Index builds that were stuck in InProgress state can now complete

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-11 10:35:36 +08:00
Gao
e9a875f7ac
enhance: override index_type while creating segment index (#45416)
issue: #44752

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2025-11-11 07:27:36 +08:00
congqixia
0e1de0073a
enhance: Update tantivy-binding with cargo build result (#45458)
Related to #44988

This PR commit newly updated tantivy-binding.h with cargo build result
which shall passes format check.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-10 18:09:36 +08:00
aoiasd
e82bf0e54f
enhance: fix typo of analyzer params (#45299)
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-11-10 14:35:35 +08:00
aoiasd
a38a0deb43
enhance: prevent panic by adding null pointer check when clearing InsertRecord _pk2offset_ (#45281)
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-11-10 11:37:35 +08:00
Xiaofan
7aa0ca5d4e
enhance: Clean unused conan dependency (#45366)
fix #45365

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
2025-11-07 17:07:34 +08:00
Buqian Zheng
515a939edf
enhance: remove obsolete code (#45307)
issue: #44452

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-11-07 16:07:35 +08:00
Amit Kumar
388d56fdc7
enhance: Add support for minimum_should_match in text_match (parser, engine, client, and tests) (#44988)
### Is there an existing issue for this?

- [x] I have searched the existing issues

---

Please see: https://github.com/milvus-io/milvus/issues/44593 for the
background

This PR makes https://github.com/milvus-io/milvus/pull/44638 redundant,
which can be closed. The PR comments for the original implementation
suggested an alternative and a better approach, this new PR has that
implementation.

---

This PR

- Adds an optional `minimum_should_match` argument to `text_match(...)`
and wires it through the parser, planner/visitor, index bindings, and
client-level tests/examples so full-text queries can require a minimum
number of tokens to match.

Motivation
- Provide a way to require an expression to match a minimum number of
tokens in lexical search.

What changed
- Parser / grammar
- Added grammar rule and token: `MINIMUM_SHOULD_MATCH` and
`textMatchOption` in `internal/parser/planparserv2/Plan.g4`.
- Regenerated parser outputs: `internal/parser/planparserv2/generated/*`
(parser, lexer, visitor, etc.) to support the new rule.
- Planner / visitor
- `parser_visitor.go`: parse and validate the `minimum_should_match`
integer; propagate as an extra value on the `TextMatch` expression so
downstream components receive it.
  - Added `VisitTextMatchOption` visitor method handling.
- Client (Golang)
- Added a unit test to verify `text_match(...,
minimum_should_match=...)` appears in the generated DSL and is accepted
by client code: `client/milvusclient/read_test.go` (new test coverage).
- Added an integration-style test for the feature to the go-client
testcase suite: `tests/go_client/testcases/full_text_search_test.go`
(exercise min=1, min=3, large min).
- Added an example demonstrating `text_match` usage:
`client/milvusclient/read_example_test.go` (example name conforms to
godoc mapping).
- Engine / index
  - Updated C++ index interface: `TextMatchIndex::MatchQuery`
- Added/updated unit tests for the index behavior:
`internal/core/src/index/TextMatchIndexTest.cpp`.
- Tantivy binding 
- Added `match_query_with_minimum` implementation and unit tests to
`internal/core/thirdparty/tantivy/tantivy-binding/src/index_reader_text.rs`
that construct boolean queries with minimum required clauses.



Behavioral / compatibility notes
- This adds an optional argument to `text_match` only; default behavior
(no `minimum_should_match`) is unchanged.
- Internal API change: `TextMatchIndex::MatchQuery` signature changed
(internal component). Callers in the repo were updated accordingly.
- Parser changes required regenerating ANTLR outputs 

Tests and verification
- New/updated tests:
- Go client unit test: `client/milvusclient/read_test.go` (mocked Search
request asserts DSL contains `minimum_should_match=2`).
- Go e2e-style test:
`tests/go_client/testcases/full_text_search_test.go` (exercises min=1, 3
and a large min).
- C++ unit tests for index behavior:
`internal/core/src/index/TextMatchIndexTest.cpp`.
  - Rust binding unit tests for `match_query_with_minimum`.
- Local verification commands to run:
- Go client tests: `cd client && go test ./milvusclient -run ^$` (client
package)
- Go testcases: `cd tests/go_client && go test ./testcases -run
TestTextMatchMinimumShouldMatch` (requires a running Milvus instance)
- C++ unit tests / build: run core build/test per repo instructions (the
change touches core index code).
- Rust binding tests: `cd
internal/core/thirdparty/tantivy/tantivy-binding && cargo test` (if
developing locally).

---------

Signed-off-by: Amit Kumar <amit.kumar@reddit.com>
Co-authored-by: Amit Kumar <amit.kumar@reddit.com>
2025-11-07 16:07:11 +08:00
cai.zhang
7527ddf50f
enhance: [test] Move R-Tree index tests into the implementation package (#45355)
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-11-07 10:03:33 +08:00
zhagnlu
59c64bee07
fix: not use json_shredding for json path is null (#45310)
#45284

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-11-06 11:43:33 +08:00
sparknack
9032bb7668
enhance: unify the aligned buffer for both buffered and direct I/O (#45323)
issue: #43040

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-11-06 10:53:33 +08:00
yihao.dai
121eb912ba
fix: Fix load segment failed due to get disk usage error (#45255)
When getting disk usage, files or directories may be removed
concurrently due to segment release. This PR ignores “file or directory
does not exist” errors in such cases.

issue: https://github.com/milvus-io/milvus/issues/45239

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-11-06 08:51:33 +08:00
congqixia
55bfd610b6
enhance: [StorageV2] Integrate FFI interface for packed reader (#45132)
Related to #44956

Integrate the StorageV2 FFI interface as the unified storage layer for
reading packed columnar data, replacing the custom iterative reader with
a manifest-based approach using the milvus-storage library.

Changes:
- Add C++ FFI reader implementation (ffi_reader_c.cpp/h) with Arrow C
Stream interface
- Implement utility functions to convert CStorageConfig to
milvus-storage Properties
- Create ManifestReader in Go that generates manifests from binlogs
- Add FFI packed reader CGO bindings (packed_reader_ffi.go)
- Refactor NewBinlogRecordReader to use ManifestReader for V2 storage
- Support both manifest file paths and direct manifest content
- Enable configurable buffer sizes and column projection

Technical improvements:
- Zero-copy data exchange using Arrow C Data Interface
- Optimized I/O operations through milvus-storage library
- Simplified code path with manifest-based reading
- Better performance with batched streaming reads

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-05 19:57:34 +08:00
foxspy
95d7302cf4
enhance: update knowhere version (#45270)
issue: #42937

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-11-05 11:09:32 +08:00
cai.zhang
fa3d4ebfbe
fix: Compute the correct batch size for the geometry index of the growing segment (#45253)
issue: #44648

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-11-04 20:25:37 +08:00
zhenshan.cao
6327c9a514
fix: Fix bugs related to TimestampTz (#45111)
issue: https://github.com/milvus-io/milvus/issues/44527
https://github.com/milvus-io/milvus/issues/44537
https://github.com/milvus-io/milvus/issues/44538
https://github.com/milvus-io/milvus/issues/44585
https://github.com/milvus-io/milvus/issues/44622

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2025-11-04 16:51:33 +08:00
sparknack
40b5e6b134
fix: avoid potential race conditions when updating the executor (#45230)
issue: #43040

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-11-04 14:25:33 +08:00
cai.zhang
617891b436
fix: Skip create tmp dir for growing R-Tree index (#45256)
issue: #45181

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-11-04 13:01:32 +08:00
Spade A
cd0b36c39e
feat: impl StructArray -- support diskann index (#45223)
issue: https://github.com/milvus-io/milvus/issues/42148

---------

Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-11-04 11:57:33 +08:00
zhagnlu
653e95aaad
fix: fix bug for shredding json when empty json but not null (#45221)
#45157

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-11-04 11:11:33 +08:00
cai.zhang
01cf5c9341
enhance: Add log to debug index task (#45198)
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-11-03 20:01:34 +08:00
cai.zhang
ed8ba4a28c
enhance: Make GeometryCache an optional configuration (#45192)
issue: #45187

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-11-03 19:59:32 +08:00
Spade A
ae03dee116
feat: implement ngram tokenizer with token_chars and custom_token_chars (#45040)
issue: https://github.com/milvus-io/milvus/issues/45039

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-11-03 18:09:33 +08:00
Jingsong Yin
e25ee08566
fix: fix LoadMetrics bool type error (#45209)
#44584

Signed-off-by: thekingking <1677273255@qq.com>
2025-11-01 01:19:32 +08:00
Jingsong Yin
0cc79772e7
enhance: Extend SkipIndex with IN/Match support and BloomFilter (#44581)
issue: #44584

---------

Signed-off-by: thekingking <1677273255@qq.com>
2025-10-31 22:39:32 +08:00
congqixia
22098c1785
fix: add null check for packed_writer_ in JsonStatsParquetWriter::Close() (#45158)
Related to #45157

Fix a bug where DataNode panics when building json stats index throws an
exception before the writer is initialized. The destructor would call
Close() on an uninitialized packed_writer_ pointer, causing a null
pointer dereference.

Changes:
- Add null check for packed_writer_ before calling Flush() and Close()
- Prevents null pointer dereference in edge cases
- Ignore close status as this is a cleanup operation

This ensures safe cleanup even when initialization fails due to
exceptions.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-10-30 17:40:09 +08:00
cqy123456
35d8213a00
fix: fail to mmap emb_list_meta in embedding list (#45127)
issue: https://github.com/milvus-io/milvus/issues/44965

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2025-10-30 11:00:09 +08:00
aoiasd
ad9a0cae48
enhance: add global analyzer options (#44684)
relate: https://github.com/milvus-io/milvus/issues/43687
Add global analyzer options, avoid having to merge some milvus params
into user's analyzer params.

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-10-28 14:52:10 +08:00
congqixia
fd0ef09e97
fix: Handle all-null data in StringIndexSort to prevent load timeout (#45100)
Related to #45081

StringIndexSort now properly handles collections with all-null string
fields by:
- Removing the error thrown when unique_count is 0 in ParseBinaryData
- Adding alignment and padding support in mmap serialization (similar to
ScalarIndexSort)
- Separating data_size_ from mmap_size_ to correctly parse data without
reading padding

This fixes load collection timeout failures when all string field data
is null, particularly affecting STL_SORT and TRIE index types.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-10-27 18:04:09 +08:00
congqixia
36a887b38b
enhance: add NewSegmentWithLoadInfo API to support segment self-managed loading (#45061)
This commit introduces the foundation for enabling segments to manage
their own loading process by passing load information during segment
creation.

Changes:

C++ Layer:
- Add NewSegmentWithLoadInfo() C API to create segments with serialized
load info
- Add SetLoadInfo() method to SegmentInterface for storing load
information
- Refactor segment creation logic into shared CreateSegment() helper
function
- Add comprehensive documentation for the new API

Go Layer:
- Extend CreateCSegmentRequest to support optional LoadInfo field
- Update segment creation in querynode to pass SegmentLoadInfo when
available
- Add ConvertToSegcoreSegmentLoadInfo() and helper converters for proto
translation

Proto Definitions:
- Add segcorepb.SegmentLoadInfo message with essential loading metadata
- Add supporting messages: Binlog, FieldBinlog, FieldIndexInfo,
TextIndexStats, JsonKeyStats
- Remove dependency on data_coord.proto by creating segcore-specific
definitions

Testing:
- Add comprehensive unit tests for proto conversion functions
- Test edge cases including nil inputs, empty data, and nil array/map
elements

This is the first step toward issue #45060 - enabling segments to
autonomously manage their loading process, which will:
- Clarify responsibilities between Go and C++ layers
- Reduce cross-language call overhead
- Enable precise resource management at the C++ level
- Support better integration with caching layer
- Enable proactive schema evolution handling

Related to #45060

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-10-27 15:28:12 +08:00
congqixia
7c627260f3
enhance: Optimize ScalarIndexSort bitmap initialization for range queries (#45085)
Optimize bitmap initialization in ScalarIndexSort range queries by using
adaptive strategy based on result density. When more than 50% of
elements match the range condition, initialize bitmap with all true
values and clear non-matching elements. Otherwise, use the original
approach of initializing with false and setting matching elements. Also
defer bitmap allocation until after early return checks to avoid
unnecessary memory allocation.

This optimization reduces bit operations for high-selectivity queries
while maintaining the same performance for low-selectivity queries.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-10-27 10:08:06 +08:00
Buqian Zheng
c284e8c4a8
enhance: some minor code cleanup, prepare for scalar benchmark (#45008)
issue: https://github.com/milvus-io/milvus/issues/44452

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-10-24 14:22:05 +08:00
congqixia
199f6d936e
fix: Update milvus-storage to fix duplicate AWS SDK initialization (#45062)
Update milvus-storage version from aa189ad to e5f5b4c to include the fix
for duplicate AWS SDK initialization that was causing init conflicts.

This update removes the redundant arrow::fs::InitializeS3() call that
was resulting in duplicate Aws::InitAPI() initialization. The duplicate
initialization was causing AWS SDK to ignore custom configurations,
particularly affecting GCP Workload Identity authentication.

Changes in milvus-storage e5f5b4c:
- Remove redundant arrow::fs::InitializeS3() call
- Keep only the extended S3 initialization with custom AWS SDK options
- Ensure GCP IAM authentication via custom HTTP client factory works
correctly

Relates to #44745
Reference: milvus-io/milvus-storage#285

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-10-24 11:32:05 +08:00
Buqian Zheng
22995cea3f
fix: Remove debug logging from JsonFlatIndex (#44807)
issue: https://github.com/milvus-io/milvus/issues/44452

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
Co-authored-by: buqian.zheng <buqian.zheng@zilliz.com>
2025-10-23 16:08:06 +08:00
Bingyi Sun
52270701ce
feat: use namespace skip index when search (#44888)
issue: #44011

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-10-23 12:04:04 +08:00
Spade A
6077178553
enhance: enable STL_SORT to support VARCHAR (#44401)
issue: https://github.com/milvus-io/milvus/issues/44399

This PR implements STL_SORT for VARCHAR data type for both RAM and MMAP
mode.
The general idea is that we deduplicate field values and maintains a
posting list for each unique value.

The serialization format of the index is:
```
[unique_count][string_offsets][string_data][post_list_offsets][post_list_data][magic_code]
string_offsets: array of offsets into string_data section
string_data: str_len1, str1, str_len2, str2, ...
post_list_offsets: array of offsets into post_list_data section
post_list_data: post_list_len1, row_id1, row_id2, ..., post_list_len2, row_id1, row_id2, ...
```

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-10-23 11:00:05 +08:00
cai.zhang
3d11ba06ef
fix: Double check to avoid iter has been earsed by other thread (#45013)
issue: #44974

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-10-21 23:36:04 +08:00
zhagnlu
730308b1eb
fix: fix not equal not include None (#44959)
#44816

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-10-21 17:08:03 +08:00
cai.zhang
b23d75a032
fix: Fix bug for gis function to filter geometry (#44966)
issue: #44961 

This PR fixes 3 geometry related bugs:
1. Implement `ToString` interface for GisFunctionFilter.
2. Ignore GisFunctionFilter `MoveCursor` for growing segment.
3. Don't skip null geometry for building R-Tree index, should be record
in null_offsets.

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-10-21 09:52:04 +08:00
cai.zhang
a35a3b7c69
fix: Ensure fulfill promise when CreateArrowFileSystem throw an exception (#44975)
issue: #44974

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-10-20 23:32:03 +08:00
zhagnlu
05df48fbe4
fix:remove duplicated '/' in jsonstats path (#44939)
#44950

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-10-20 14:06:03 +08:00
Zhen Ye
f98d02b3e1
fix: use short debug string to avoid newline in debug logs (#44925)
issue: #44924

Signed-off-by: chyezh <chyezh@outlook.com>
2025-10-20 10:16:03 +08:00
Bingyi Sun
3ddf9154ab
fix: Fix exists expr for json flat index (#44910)
issue: https://github.com/milvus-io/milvus/issues/44915

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-10-19 19:46:07 +08:00