11376 Commits

Author SHA1 Message Date
congqixia
f8c972a102
fix: update EnableDynamicField and SchemaVersion during collection modification (#45615)
Related to #45614

This commit fixes a bug where certain collection attributes were not
properly updated during collection modification, causing metadata errors
after cluster restart and collection reload failures.

When altering a collection, the `EnableDynamicField` and `SchemaVersion`
attributes were not being persisted to the catalog. This caused
inconsistencies between the in-memory collection metadata and the
persisted state, leading to:
- Dynamic field validation failures after restart
- Collection loading errors
- Metadata state mismatches

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-18 10:05:39 +08:00
wei liu
7aed88113c
enhance: Deduplicate primary keys in upsert request batch (#45249)
issue: #44320

This change adds deduplication logic to handle duplicate primary keys
within a single upsert batch, keeping the last occurrence of each
primary key.

Key changes:
- Add DeduplicateFieldData function to remove duplicate PKs from field
data, supporting both Int64 and VarChar primary keys
- Refactor fillFieldPropertiesBySchema into two separate functions:
validateFieldDataColumns for validation and fillFieldPropertiesOnly for
property filling, improving code clarity and reusability
- Integrate deduplication logic in upsertTask.PreExecute to
automatically deduplicate data before processing
- Add comprehensive unit tests for deduplication with various PK types
(Int64, VarChar) and field types (scalar, vector)
- Add Python integration tests to verify end-to-end behavior

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-11-17 21:35:40 +08:00
congqixia
e9506f1d64
fix: Handle default values correctly during compaction for added fields (#45572)
Related to #45543

When a field with a default value is added to a collection, the default
value becomes null after compaction instead of retaining the expected
default value.

**Root Cause**
The `appendValueAt` function in `internal/storage/arrow_util.go`
incorrectly checked if the entire arrow.Array was nil before handling
default values. This meant that default values were only applied when
the array itself was nil, not when individual field values were null
(which is the correct condition).

**Changes**
1. **Early nil check**: Added a guard at the function entry to detect
nil arrow.Array and return an error immediately, as this is an
unexpected condition that should not occur during normal operation.

2. **Refactored default value handling**: Removed the per-type nil array
checks and moved default value logic to handle individual null values
within the array (when `IsNull(idx)` returns true).

3. **Applied to all types**: Updated the logic consistently across all
builder types:
   - BooleanBuilder
   - Int8Builder, Int16Builder, Int32Builder, Int64Builder
   - Float32Builder
   - StringBuilder
   - BinaryBuilder (added default value support for internal $meta json)
   - ListBuilder (removed unnecessary nil check)

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-17 19:03:38 +08:00
aoiasd
96d0e780ac
fix: segcore collection schema update not concurrent safe. (#45337)
relate: https://github.com/milvus-io/milvus/issues/45345

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-11-14 17:51:37 +08:00
Zhen Ye
40e2042728
enhance: add more metrics for DDL framework (#45558)
issue: #43897

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-11-14 15:19:37 +08:00
congqixia
0a208d7224
enhance: Move segment loading logic from Go layer to segcore for self-managed loading (#45488)
Related to #45060

Refactor segment loading architecture to make segments autonomously
manage their own loading process, moving the orchestration logic from Go
(segment_loader.go) to C++ (segcore).

**C++ Layer (segcore):**
- Added `SetLoadInfo()` and `Load()` methods to `SegmentInterface` and
implementations
- Implemented `ChunkedSegmentSealedImpl::Load()` with parallel loading
strategy:
  - Separates indexed fields from non-indexed fields
  - Loads indexes concurrently using thread pools
  - Loads field data for non-indexed fields in parallel
- Implemented `SegmentGrowingImpl::Load()` to convert and load field
data
- Extracted `LoadIndexData()` as a reusable utility function in
`Utils.cpp`
- Added `SegmentLoad()` C binding in `segment_c.cpp`

**Go Layer:**
- Added `Load()` method to segment interfaces
- Updated mock implementations and test interfaces
- Integrated new C++ `SegmentLoad()` binding in Go segment wrapper

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-14 11:21:37 +08:00
Spade A
0454cdaab3
fix: remove validateFieldName in dropIndex (#45460)
issue: https://github.com/milvus-io/milvus/issues/45459

This check is unnecessary when dropping index.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-11-14 10:17:37 +08:00
Xiaofan
1c69c7fa17
enhance: Upgrade etcd to 3.5.23 (#44666)
related to #44614
fix the issue embedded etcd are not affected by quota config

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
2025-11-14 09:47:38 +08:00
cai.zhang
cc07be3c30
fix: Ignore compaction task when from segment is not healthy (#45534)
issue: #45533

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-11-13 23:07:39 +08:00
junjiejiangjjj
102481e53f
feat: Support add_function/alter_function/drop_function (#44895)
https://github.com/milvus-io/milvus/issues/44053

Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>
2025-11-13 20:53:39 +08:00
Gao
09a3195867
enhance: support max_connections config for remote storage (#45225)
related: https://github.com/milvus-io/milvus/issues/45344

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2025-11-13 15:37:38 +08:00
Spade A
929dc65882
fix: fix index compatibility after upgrade (#45373)
issue: https://github.com/milvus-io/milvus/issues/45380

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-11-13 12:59:38 +08:00
junjiejiangjjj
50f198e346
feat: Support zilliz models (#45168)
https://github.com/milvus-io/milvus/issues/35856

Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>
2025-11-13 12:55:37 +08:00
groot
e48fe7f820
fix: Fix bulkimport bug for Struct field (#45474)
issue: https://github.com/milvus-io/milvus/issues/45006

Signed-off-by: yhmo <yihua.mo@zilliz.com>
2025-11-13 11:31:41 +08:00
Xiaofan
a9895bb904
enhance: add robust handle etcd servercrash (#45304)
related to #45303
fix milvus pod may restart when etcd pod start

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
2025-11-13 10:23:36 +08:00
Chun Han
406fa7b694
fix: failed to get raw data for hybrid index(#45318) (#45411)
related: #45318

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-11-13 10:17:37 +08:00
Zhen Ye
b7fb8ed38c
fix: use the right resource key lock for ddl and use new ddl in transfer replica (#45506)
issue: #45452

- alias/rename related DDL should use database level exclusive lock
- alias cannot use as the resource key of lock, use collection name
instead
- transfer replica should use WAL-based framework

Signed-off-by: chyezh <chyezh@outlook.com>
2025-11-12 19:01:38 +08:00
yihao.dai
cabc47ce01
fix: Fix channel not available error and release collection blocking (#45428)
1. Ensure replica creation is idempotent.
2. Prevent currentTarget update when replica is missing.
3. Move the wait-for-release logic into the DDL framework's callback,
and add a timeout to prevent it from blocking the DDL callback
indefinitely.

issue: https://github.com/milvus-io/milvus/issues/45301,
https://github.com/milvus-io/milvus/issues/45274,
https://github.com/milvus-io/milvus/issues/45295

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-11-12 18:55:37 +08:00
XuanYang-cn
28d0755aaa
fix: Set schema properties before broadcast alter collection (#45502)
This causes collection schema properties is empty in datacoord caches,
thus making compaction, indexing, unable to get properties from schema.

See also: #45053, #45159

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-11-12 18:11:41 +08:00
Zhen Ye
8b01af55b9
fix: remove collection meta when drop partition (#45493)
issue: #45476

Signed-off-by: chyezh <chyezh@outlook.com>
2025-11-11 23:39:36 +08:00
cai.zhang
216c576da2
fix: Retain collection early to prevent it from being released before query completion (#45413)
issue: #45314

This PR only ensures that no panic occurs. However, we still need to
provide protection for the delegator handling ongoing query tasks.

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-11-11 20:29:37 +08:00
cai.zhang
d0d908e51d
fix: Fix target segment marked dropped for save stats result twice (#45478)
issue: #45477

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-11-11 17:19:38 +08:00
sparknack
9d75d0393e
enhance: some optimization of scalar field fetching in tiered storage scenarios (#45360)
issue: #43611

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-11-11 17:17:41 +08:00
sijie-ni-0214
77dc512b3b
fix: alter collection with alias failed (#45447)
issue: #45397

Signed-off-by: sijie-ni-0214 <sijie.ni@zilliz.com>
2025-11-11 16:05:36 +08:00
Zhen Ye
4797bb6ab2
fix: wrong update timetick of collection meta info (#45461)
issue: #45403, #45463

- fix the Nightly E2E failures.
- fix the wrong update timetick of altering collection to fix the
related load failure.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-11-11 16:01:36 +08:00
cai.zhang
e3c1673191
fix: Fix filter geometry for growing with mmap (#45464)
issue: #45450

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-11-11 15:39:36 +08:00
Chun Han
69f3aab229
feat: milvus support huawei cloud iam verification(#45298) (#45457)
related: #45298

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-11-11 14:41:41 +08:00
congqixia
382b1d7de6
fix: correct field data offset calculation in rerank functions for bulk search (#45444)
Related to #45338

When using bulk vector search in hybrid search with rerank functions,
the output field values for different queries were all equal to the
values returned by the first query, instead of the correct values
belonging to each document ID. The document IDs were correct, but the
entity field values were wrong.

In rerank functions (RRF, weighted, decay, model), when processing
multiple queries in a batch, the `idLocations` stored only the relative
offset within each result set (`idx`), not accounting for the absolute
position within the entire batch. This caused `FillFieldData` to
retrieve field data from the wrong positions, always using offsets
relative to the first query.

This fix ensures that when processing bulk searches with rerank
functions, each result correctly retrieves its corresponding field data
based on the absolute offset within the entire batch, resolving the
issue where all queries returned the first query's field values.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-11 14:39:41 +08:00
XuanYang-cn
dcf490663c
fix: store database event if the key is invalid (#45348)
See also: #45136, #45124

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-11-11 10:55:36 +08:00
congqixia
8d1ea751a6
fix: Support JSON default values in FillFieldData (#45455)
Related to #45445

Previously, FillFieldData for JSON fields would assert and fail when a
default_value was provided, blocking index creation for JSON fields with
default values (including dynamic fields like $meta).

This change enables JSON default value support by:
- Removing the assertion that blocked default values
- Parsing bytes_data into Json objects when default_value is present
- Properly filling data_ array and setting valid_data_ bitset to true
- Maintaining null behavior when no default_value is provided

Impact:
- Fixes index creation failure for JSON fields with default values
- Resolves upgrade issues from 2.5 to 2.6.5 where dynamic fields with
default values couldn't be indexed
- Index builds that were stuck in InProgress state can now complete

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-11 10:35:36 +08:00
Spade A
6f4abab6c8
fix: nextFieldID does not consider STRUCT (#45437)
issue: https://github.com/milvus-io/milvus/issues/45362

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-11-11 10:31:36 +08:00
zhenshan.cao
45907747e2
feat: Add /livez for Liveness Probes (#45454)
issue: https://github.com/milvus-io/milvus/issues/45443

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2025-11-11 09:51:15 +08:00
Gao
e9a875f7ac
enhance: override index_type while creating segment index (#45416)
issue: #44752

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2025-11-11 07:27:36 +08:00
congqixia
0e1de0073a
enhance: Update tantivy-binding with cargo build result (#45458)
Related to #44988

This PR commit newly updated tantivy-binding.h with cargo build result
which shall passes format check.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-10 18:09:36 +08:00
XuanYang-cn
897ac983c8
feat: Add new config and enable to dynamic update configs (#45170)
This PR changes the config layout according to the latest design, and
adds two external credential configs for aws kms

See also: #45169

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-11-10 14:43:35 +08:00
aoiasd
e82bf0e54f
enhance: fix typo of analyzer params (#45299)
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-11-10 14:35:35 +08:00
sparknack
f815f57b82
enhance: check both eviction and warmup when estimate segment loading size (#45222)
issue: #44857

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-11-10 14:15:36 +08:00
aoiasd
a38a0deb43
enhance: prevent panic by adding null pointer check when clearing InsertRecord _pk2offset_ (#45281)
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-11-10 11:37:35 +08:00
Chun Han
87b466fd83
fix: Group value is nil(#45418) (#45422)
related: #45418

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-11-08 10:29:33 +08:00
Xiaofan
7aa0ca5d4e
enhance: Clean unused conan dependency (#45366)
fix #45365

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
2025-11-07 17:07:34 +08:00
Buqian Zheng
515a939edf
enhance: remove obsolete code (#45307)
issue: #44452

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-11-07 16:07:35 +08:00
Amit Kumar
388d56fdc7
enhance: Add support for minimum_should_match in text_match (parser, engine, client, and tests) (#44988)
### Is there an existing issue for this?

- [x] I have searched the existing issues

---

Please see: https://github.com/milvus-io/milvus/issues/44593 for the
background

This PR makes https://github.com/milvus-io/milvus/pull/44638 redundant,
which can be closed. The PR comments for the original implementation
suggested an alternative and a better approach, this new PR has that
implementation.

---

This PR

- Adds an optional `minimum_should_match` argument to `text_match(...)`
and wires it through the parser, planner/visitor, index bindings, and
client-level tests/examples so full-text queries can require a minimum
number of tokens to match.

Motivation
- Provide a way to require an expression to match a minimum number of
tokens in lexical search.

What changed
- Parser / grammar
- Added grammar rule and token: `MINIMUM_SHOULD_MATCH` and
`textMatchOption` in `internal/parser/planparserv2/Plan.g4`.
- Regenerated parser outputs: `internal/parser/planparserv2/generated/*`
(parser, lexer, visitor, etc.) to support the new rule.
- Planner / visitor
- `parser_visitor.go`: parse and validate the `minimum_should_match`
integer; propagate as an extra value on the `TextMatch` expression so
downstream components receive it.
  - Added `VisitTextMatchOption` visitor method handling.
- Client (Golang)
- Added a unit test to verify `text_match(...,
minimum_should_match=...)` appears in the generated DSL and is accepted
by client code: `client/milvusclient/read_test.go` (new test coverage).
- Added an integration-style test for the feature to the go-client
testcase suite: `tests/go_client/testcases/full_text_search_test.go`
(exercise min=1, min=3, large min).
- Added an example demonstrating `text_match` usage:
`client/milvusclient/read_example_test.go` (example name conforms to
godoc mapping).
- Engine / index
  - Updated C++ index interface: `TextMatchIndex::MatchQuery`
- Added/updated unit tests for the index behavior:
`internal/core/src/index/TextMatchIndexTest.cpp`.
- Tantivy binding 
- Added `match_query_with_minimum` implementation and unit tests to
`internal/core/thirdparty/tantivy/tantivy-binding/src/index_reader_text.rs`
that construct boolean queries with minimum required clauses.



Behavioral / compatibility notes
- This adds an optional argument to `text_match` only; default behavior
(no `minimum_should_match`) is unchanged.
- Internal API change: `TextMatchIndex::MatchQuery` signature changed
(internal component). Callers in the repo were updated accordingly.
- Parser changes required regenerating ANTLR outputs 

Tests and verification
- New/updated tests:
- Go client unit test: `client/milvusclient/read_test.go` (mocked Search
request asserts DSL contains `minimum_should_match=2`).
- Go e2e-style test:
`tests/go_client/testcases/full_text_search_test.go` (exercises min=1, 3
and a large min).
- C++ unit tests for index behavior:
`internal/core/src/index/TextMatchIndexTest.cpp`.
  - Rust binding unit tests for `match_query_with_minimum`.
- Local verification commands to run:
- Go client tests: `cd client && go test ./milvusclient -run ^$` (client
package)
- Go testcases: `cd tests/go_client && go test ./testcases -run
TestTextMatchMinimumShouldMatch` (requires a running Milvus instance)
- C++ unit tests / build: run core build/test per repo instructions (the
change touches core index code).
- Rust binding tests: `cd
internal/core/thirdparty/tantivy/tantivy-binding && cargo test` (if
developing locally).

---------

Signed-off-by: Amit Kumar <amit.kumar@reddit.com>
Co-authored-by: Amit Kumar <amit.kumar@reddit.com>
2025-11-07 16:07:11 +08:00
aoiasd
6102f001a9
enhance: skip check source id (#45377)
relate:https://github.com/milvus-io/milvus/issues/45381

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-11-07 15:19:34 +08:00
yihao.dai
2fad5b34f7
fix: Fix data race in replicate stream client (#45346)
issue: https://github.com/milvus-io/milvus/issues/44123

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-11-07 10:17:33 +08:00
cai.zhang
7527ddf50f
enhance: [test] Move R-Tree index tests into the implementation package (#45355)
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-11-07 10:03:33 +08:00
cai.zhang
b8f9384a85
fix: Skip building text index for newly added columns (#45316)
issue: #45315

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-11-06 19:47:35 +08:00
XuanYang-cn
2dd2c96eb1
fix: Accidentally ignored sealed segments in L0 Compaction (#45340)
When there're no growing segments in the collection, L0 Compaction will
try to choose all L0 segments that hits all L1/L2 segments.

However, if there's Sealed Segment still under flushing in DataNode at
the same time L0 Compaction selects satisfied L1/L2 segments, L0
Compaction will ignore this Segment because it's not in "FlushState",
which is wrong, causing missing deletes on the Sealed Segment.

This quick solution here is to fail this L0 compaction task once
selected a Sealed segment.

See also: #45339

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-11-06 16:53:38 +08:00
XuanYang-cn
623a9e5156
fix: Accurate size estimation for sliced arrow arrays in compaction (#45294)
Sliced arrow arrays "incorrectly" returned the original array's size via
SizeInBytes(), causing inaccurate memory estimates during compaction.

This resulted in segments closing prematurely in mergeSplit mode -
expected 500MB compactions produced 4x100+MB segments instead.

Fixed by calculating actual byte size of sliced arrays, ensuring proper
segment sizing and more accurate memory usage tracking.

See also: #45293

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-11-06 14:57:34 +08:00
congqixia
e284733399
fix: Move FinishLoad before text index creation to ensure raw data availability (#45334)
Related to #45333

Fix segment loading failure when adding fields with text match enabled.
The issue occurred because text indexes were being loaded before
FinishLoad() was called, meaning raw data was not properly available
when text index creation attempted to access it, resulting in "failed to
create text index, neither raw data nor index are found" errors.

Solution is to move the FinishLoad() call to execute after raw data
loading but before text index loading. This ensures that:
1. Raw data is properly loaded and available in memory
2. Text indexes can access the raw data they need during creation
3. The segment is in the correct state before any index operations

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-06 14:49:34 +08:00
zhagnlu
59c64bee07
fix: not use json_shredding for json path is null (#45310)
#45284

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-11-06 11:43:33 +08:00