11577 Commits

Author SHA1 Message Date
aoiasd
55feb7ded8
feat: set related resource ids in collection schema (#46423)
Support crate analyzer with file resource info, and return used file
resource ids when validate analyzer.
Save the related resource ids in collection schema.
relate: https://github.com/milvus-io/milvus/issues/43687

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: analyzer file-resource resolution is deterministic and
traceable by threading a FileResourcePathHelper (collecting used
resource IDs in a HashSet) through all tokenizer/analyzer construction
and validation paths; validate_analyzer(params, extra_info) returns the
collected Vec<i64) which is propagated through C/Rust/Go layers to
callers (CValidateResult → RustResult::from_vec_i64 → Go []int64 →
querypb.ValidateAnalyzerResponse.ResourceIds →
CollectionSchema.FileResourceIds).

- Logic removed/simplified: ad‑hoc, scattered resource-path lookups and
per-filter file helpers (e.g., read_synonyms_file and other inline
file-reading logic) were consolidated into ResourceInfo +
FileResourcePathHelper and a centralized get_resource_path(helper, ...)
API; filter/tokenizer builder APIs now accept &mut
FileResourcePathHelper so all file path resolution and ID collection use
the same path and bookkeeping logic (redundant duplicated lookups
removed).

- Why no data loss or behavior regression: changes are additive and
default-preserving — existing call sites pass extra_info = "" so
analyzer creation/validation behavior and error paths remain unchanged;
new Collection.FileResourceIds is populated from resp.ResourceIds in
validateSchema and round‑tripped through marshal/unmarshal
(model.Collection ↔ schemapb.CollectionSchema) so schema persistence
uses the new list without overwriting other schema fields; proto change
adds a repeated field (resource_ids) which is wire‑compatible (older
clients ignore extra field). Concrete code paths: analyzer creation
still uses create_analyzer (now with extra_info ""), tokenizer
validation still returns errors as before but now also returns IDs via
CValidateResult/RustResult, and rootcoord.validateSchema assigns
resp.ResourceIds → schema.FileResourceIds.

- New capability added: end‑to‑end discovery, return, and persistence of
file resource IDs used by analyzers — validate flows now return resource
IDs and the system stores them in collection schema (affects tantivy
analyzer binding, canalyzer C bindings, internal/util analyzer APIs,
querynode ValidateAnalyzer response, and rootcoord/create_collection
flow).
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-12-26 22:49:19 +08:00
yihao.dai
512884524b
enhance: Maintain compatibility with the legacy FlushAll (#46564)
issue: https://github.com/milvus-io/milvus/issues/45919

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: FlushAll verification must accept both per-channel
FlushAllTss (new schema) and the legacy single FlushAllTs;
GetFlushAllState chooses the verification path based on which field is
present and treats a channel as flushed only if its channel checkpoint
timestamp >= the applicable threshold (per-channel timestamp or legacy
FlushAllTs).
- Logic removed/simplified: The previous mixed/ambiguous checks were
split into two focused
routines—verifyFlushAllStateByChannelFlushAllTs(logger, channel,
flushAllTss) and verifyFlushAllStateByLegacyFlushAllTs(logger, channel,
flushAllTs)—and GetFlushAllState now selects one path. This centralizes
compatibility logic, eliminates interleaved/duplicated checks, and
retains the outer-loop short-circuiting on the first unflushed channel.
- Why this does NOT cause data loss or regressions: Changes only affect
read-only verification paths (GetFlushAllState/GetFlushState) that
compare in-memory channel checkpoints (meta.GetChannelCheckpoint) to
provided thresholds; no writes to checkpoints or persisted state occur
and FlushAll enqueue/wait behavior is unchanged. Unit tests were added
to cover legacy FlushAllTs behavior and the new FlushAllMsgs→FlushAllTs
extraction, exercising both code paths.
- Enhancement scope and location: Adds backward-compatible support and
concrete FlushAllTs extraction from streaming FlushAllMsgs in Proxy
(internal/proxy/task_flush_all_streaming.go) and compatibility verifiers
in DataCoord (internal/datacoord/services.go), plus corresponding tests
(internal/datacoord/services_test.go, internal/proxy/*_test.go).
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-12-26 18:59:20 +08:00
cai.zhang
8d12bfb436
fix: Restore the compaction task correctly to ensure it can be properly cleaned up (#46577)
issue: #46576 

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: During meta load, only tasks that are truly
terminal-cleaned (states cleaned or unknown) should be dropped; all
other non-terminal tasks (including timeout and completed) must be
restored so the inspector can reattach them to executing/cleaning queues
and finish their cleanup lifecycle.
- Removed/simplified logic: loadMeta no longer uses the broad
isCompactionTaskFinished predicate (which treated timeout, completed,
cleaned, unknown as terminal). It now uses the new
isCompactionTaskCleaned predicate that only treats cleaned/unknown as
terminal. This removes the redundant exclusion of timeout/completed
tasks and simplifies the guard to drop only cleaned/unknown tasks.
- Bug fix (root cause & exact change): Fixes issue #46576 — the previous
isCompactionTaskFinished caused timeout/completed tasks to be skipped
during meta load and thus not passed into restoreTask(). The PR adds
isCompactionTaskCleaned and replaces the finished check so timeout and
completed tasks are included in restoreTask() and re-attached to the
inspector’s existing executing/cleaning queues.
- No data loss or regression: Tasks in cleaned/unknown remain dropped
(isCompactionTaskCleaned still returns true for cleaned/unknown).
Non-terminal timeout/completed tasks now follow the same restoreTask()
control path used previously for restored tasks — they are enqueued into
the inspector’s queue/executing/cleaning flows rather than being
discarded. No exported signatures changed and all restored tasks flow
into existing handlers, avoiding behavior regression or data loss.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-12-26 18:57:19 +08:00
yihao.dai
e0fd091d41
fix: Fix replicate lag when server is idle (#46574)
issue: https://github.com/milvus-io/milvus/issues/46116

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: the metric CDCLastReplicatedTimeTick must reflect the
most recent time-tick when replication has effectively processed all
pending messages (including idle periods), so reported replicate lag =
confirmed WAL tick − last replicated tick can reach zero when the server
is idle.

- Exact fix (bug): addresses issue #46116 by ensuring the
last-replicated metric is updated when the server is idle. Concretely, a
new ReplicateMetrics.UpdateLastReplicatedTimeTick(ts uint64) was added
and called from OnConfirmed (OnConfirmed now delegates to
UpdateLastReplicatedTimeTick(msg.TimeTick())), and from Replicate’s
self-controlled-message path when the pending queue is empty — so the
code records the time tick before returning ErrReplicateIgnored.

- Logic simplified / removed: direct, ad-hoc metric writes in
OnConfirmed were replaced by a single UpdateLastReplicatedTimeTick
helper on the metrics implementation. The scattered manual set of
CDCLastReplicatedTimeTick is consolidated into one method, removing
redundant direct metric manipulations and centralizing timestamp
conversion (tsoutil.PhysicalTimeSeconds).

- No data loss / no behavior regression: this change only updates
monitoring metrics and does not alter replication control flow or
message processing. Replicate still returns ErrReplicateIgnored for
self-controlled messages and does not change message persistence or
acknowledgement paths; OnConfirmed continues to be invoked on confirmed
messages but now delegates metric recording to the new method. Therefore
no replication state, message ordering, or persistence semantics are
modified.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-12-26 18:13:19 +08:00
Zhen Ye
2c2cbe89c2
fix: flush log when os exit (#46608)
issue: #45640

Signed-off-by: chyezh <chyezh@outlook.com>
2025-12-26 14:25:18 +08:00
congqixia
ef6d9c25c2
fix: check final result only in LeaderCacheObserver flaky test (#46601)
Related to #46600

The test previously checked if all 3 collection IDs were batched
together in a single InvalidateShardLeaderCache call. This caused
flakiness because the observer may split events across multiple calls.

Fix by accumulating all collection IDs across multiple calls and
verifying that eventually all expected IDs (1, 2, 3) are processed.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: the test asserts that all registered collection IDs
{1,2,3} are eventually processed by InvalidateShardLeaderCache across
any number of calls — i.e., the observer must invalidate every
registered collection ID, not necessarily in a single batched RPC (fixes
flaky assumption from issue #46600).
- Logic removed/simplified: the strict expectation that all three IDs
arrive in one InvalidateShardLeaderCache call was replaced by
accumulating IDs into a ConcurrentSet (collectionIDs.Upsert in the mock)
and asserting eventual containment of 1,2,3. This removes the brittle
per-call batching assertion and uses a set-based accumulation (lines
where the mock calls Upsert and final Eventually checks
collectionIDs.Contain(...)).
- Why this is safe (no data loss or behavior regression): only test
assertions changed — production code (LeaderCacheObserver calling
InvalidateShardLeaderCache) is unchanged. The mock intercepts
InvalidateShardLeaderCache and accumulates req.GetCollectionIDs(); the
test still verifies single-ID handling via the existing len==1 &&
lo.Contains(... ) check (first mock block) and verifies that all IDs
were invalidated over time in the batch scenario (second mock block). No
production code paths were modified, so invalidation behavior and RPC
usage remain identical.
- Bug-fix note: this is a targeted test-only fix for issue #46600 — it
tolerates legitimate splitting of events across multiple
InvalidateShardLeaderCache invocations by aggregating IDs across calls
in the test mock, eliminating flakiness without altering runtime
behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-12-26 10:17:19 +08:00
sijie-ni-0214
fc45905ee0
enhance: Optimize QuotaCenter CPU usage (#46388)
issue: https://github.com/milvus-io/milvus/issues/46387

---------

Signed-off-by: sijie-ni-0214 <sijie.ni@zilliz.com>
2025-12-26 10:09:19 +08:00
congqixia
6f94d8c41a
fix: Handle legacy binlog format (v1) in segment load diff computation (#46598)
When computing load diff, binlogs in v1/legacy format have empty
child_fields. In this case, the field_id itself should be used as the
child_id (group_id == field_id for legacy format).

Without this fix, legacy format binlogs are not recognized during diff
computation, causing segments to fail loading and TestProxy to timeout.

Changes:
- Add fallback to use fieldid as child_id when child_fields is empty
- Add LoadDiff::ToString() for debugging
- Add logging for diff in Load/Reopen operations
- Add comprehensive unit tests for legacy format handling

Related to #46594

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: load-diff computation must enumerate every binlog
child group for a field so current vs new segment state comparisons
include all column-group/binlog groups; for legacy (v1) binlogs that
have empty child_fields, the code must treat group_id == field_id to
preserve that mapping.
- Bug fix (resolves #46594): SegmentLoadInfo now normalizes
field_binlog.child_fields() into a vector and falls back to using
field_id as the single child group when child_fields is empty; the same
normalization is applied for both current and new-info paths, ensuring
legacy v1 binlogs are discovered and included in Load/ComputeDiff
results so segments load correctly.
- Logic simplified: removed the implicit assumption that child_fields is
always present by centralizing a single normalization/fallback step used
symmetrically for both diff paths, avoiding ad-hoc special-casing and
unifying iteration over child groups.
- No data loss / no behavior regression: the fallback only activates
when child_fields is empty — non-legacy binlogs continue to use their
child_fields unchanged. Add/drop semantics are preserved because the
same normalization is applied to both sides of the diff. Unit tests
(v1-only, v4-only, mixed cases) were added to validate correctness;
LoadDiff::ToString() and extra logging are diagnostic only.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Cai Zhang <cai.zhang@zilliz.com>

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-12-25 23:33:19 +08:00
zhenshan.cao
85486df8c9
fix: failed to check invalid timestamptz default value (#46546)
Also support space separator and offset in TIMESTAMPTZ
issue: https://github.com/milvus-io/milvus/issues/46376
https://github.com/milvus-io/milvus/issues/46365

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2025-12-25 15:59:05 +08:00
congqixia
6e07c3fee8
fix: remove EnableStorageV2 override in TestProxy (#46594) (#46596)
Related to #46594

Remove the temporary config override that forced EnableStorageV2 to
false in TestProxy. This override caused test failures with the new load
logic, as segments could not be loaded with v1 format.

This PR is a quick fix to make ut back to normal

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-12-25 15:35:18 +08:00
cai.zhang
b2fa3dd0ae
fix: Disable VCS to allow pkg tests to run (#46501)
### **Description**
- Add `-buildvcs=false` flag to Go test commands in coverage script

- Increase default session TTL from 10s to 15s

- Update SessionTTL parameter default value from 30 to 15


Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Co-authored-by: bigsheeper <yihao.dai@zilliz.com>
Co-authored-by: chyezh <chyezh@outlook.com>
Co-authored-by: czs007 <zhenshan.cao@zilliz.com>
2025-12-25 14:31:19 +08:00
Buqian Zheng
6ac66e38d1
enhance: STL_SORT to support LIKE operator (#46534)
issue: https://github.com/milvus-io/milvus/issues/44399

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **New Features**
* Enhanced pattern matching for string indexes with support for prefix,
postfix, inner, and regex-based matching operations.
* Optimized pattern matching performance through prefix-based filtering
and range-based lookups.

* **Tests**
* Added comprehensive test coverage for pattern matching functionality
across multiple index implementations.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-12-24 19:45:20 +08:00
zhagnlu
9ba0c4e501
fix:add json stats version because previous change #46130 (#46467)
#42533

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-12-24 19:17:18 +08:00
congqixia
6452d146af
enhance: move jemalloc_stats from pkg to internal/util/segcore (#46560)
Related to #46133

Move jemalloc_stats.go and its test file from pkg/util/hardware to
internal/util/segcore. This is a more appropriate location because:
- jemalloc_stats depends on milvus_core C++ library via cgo
- The pkg directory should remain independent of internal C++
dependencies
- segcore is the natural home for core memory allocator utilities

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Refactor**
* Improved internal code organization by reorganizing memory statistics
collection infrastructure for better maintainability and modularity. No
impact on end-user functionality or behavior.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-12-24 19:03:18 +08:00
aoiasd
7c714b0035
enhance: disallow the file resource interface before release (#46362)
relate: https://github.com/milvus-io/milvus/issues/43687

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
* File resource operations (add, remove, list) are now unavailable and
return a not-implemented response.
* **Tests**
* Tests updated to expect error responses for those file resource
operations and removed some previous coordination-path assertions.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-12-24 17:23:19 +08:00
Zhen Ye
1cae4a6194
enhance: support new server label rule for milvus and MILVUS_SERVER_LABEL_RESOURCE_GROUP (#46401)
issue: #46400

- add new server label rule.
- add `MILVUS_SERVER_LABEL_RESOURCE_GROUP` to determine the resource
group of querynode.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Automatic creation of resource groups when nodes with resource-group
labels join.
* Expanded server-label system supporting role-specific and global
labels.

* **Bug Fixes**
* Node acceptance now enforces resource-group name compatibility,
preventing cross-group assignment.

* **Refactor**
* Node handling flows updated to use richer node information for
assignment and validation.

* **Tests**
* Added tests validating resource-group labeling and node acceptance
behavior.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-12-24 14:23:18 +08:00
yihao.dai
5b97cb70a0
enhance: Support delaying scanner startup (#46369)
Introduce a ScannerStartupDelay configuration to enable WAL write-only
recovery, allowing fence messages to be persisted during
primary–secondary switchover when the StreamingNode is trapped in crash
loops.

issue: https://github.com/milvus-io/milvus/issues/46368

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added a configurable WAL scanner pause/resume and a consumer request
flag to optionally ignore pause signals.

* **Metrics**
* Added a scanner pause gauge and pause-duration tracking for WAL
scanning.

* **Tests**
* Added coverage for pause-consumption behavior and cleanup in stream
client tests.

* **Chores**
* Consolidated flush-all logging into a single field and added a helper
for bulk message conversion.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-12-24 11:53:19 +08:00
wei liu
b907f9e7a8
fix: unify ro node handling to avoid balance channel task stuck (#46440)
issue: #46393

RO node can be created from two sources: stopping a QueryNode or replica
node transfer (e.g., suspend node). Before this fix, there were two
defects and one constraint that caused a deadlock:

Defects:
1. LeaderChecker does not sync segment distribution to RO nodes
2. Scheduler only cancels tasks on stopping nodes, not RO nodes

Constraint:
- Balance channel task blocks waiting for new delegator to become
serviceable (via sync segment) before executing release action

Deadlock scenario:
When target node becomes RO node (but not stopping) during balance
channel execution, the task gets stuck because:
- Cannot sync segment to RO node (defect 1) -> task blocks
- Task is not cancelled since node is not stopping (defect 2)

PR #45949 attempted to fix defect 1 but was not successful.

This PR unifies RO node handling by:
- LeaderChecker: only sync segment distribution to RW nodes
- Scheduler: cancel task when target node becomes RO node
- Simplify checkStale logic with unified node state checking

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-12-24 10:31:19 +08:00
congqixia
48f8b3b585
enhance: Unify segment Load and Reopen through diff-based loading (#46536)
Related to #46358

Refactor segment loading to use a unified diff-based approach for both
initial Load and Reopen operations:

- Extract ApplyLoadDiff from Reopen to share loading logic
- Add GetLoadDiff to compute diff from empty state for initial load
- Change column_groups_to_load from map to vector<pair> to preserve
order
- Add validation for empty index file paths in diff computation
- Add comprehensive unit tests for GetLoadDiff

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Performance**
* Improved segment loading efficiency through incremental updates,
reducing memory overhead and enhancing performance during data updates.

* **Tests**
  * Expanded test coverage for load operation scenarios.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-12-24 10:19:22 +08:00
marcelo-cjl
3b599441fd
feat: Add nullable vector support for proxy and querynode (#46305)
related: #45993 

This commit extends nullable vector support to the proxy layer,
querynode,
and adds comprehensive validation, search reduce, and field data
handling
    for nullable vectors with sparse storage.
    
    Proxy layer changes:
- Update validate_util.go checkAligned() with getExpectedVectorRows()
helper
      to validate nullable vector field alignment using valid data count
- Update checkFloatVectorFieldData/checkSparseFloatVectorFieldData for
      nullable vector validation with proper row count expectations
- Add FieldDataIdxComputer in typeutil/schema.go for logical-to-physical
      index translation during search reduce operations
- Update search_reduce_util.go reduceSearchResultData to use
idxComputers
      for correct field data indexing with nullable vectors
- Update task.go, task_query.go, task_upsert.go for nullable vector
handling
    - Update msg_pack.go with nullable vector field data processing
    
    QueryNode layer changes:
    - Update segments/result.go for nullable vector result handling
- Update segments/search_reduce.go with nullable vector offset
translation
    
    Storage and index changes:
- Update data_codec.go and utils.go for nullable vector serialization
- Update indexcgowrapper/dataset.go and index.go for nullable vector
indexing
    
    Utility changes:
- Add FieldDataIdxComputer struct with Compute() method for efficient
      logical-to-physical index mapping across multiple field data
- Update EstimateEntitySize() and AppendFieldData() with fieldIdxs
parameter
    - Update funcutil.go with nullable vector support functions

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Full support for nullable vector fields (float, binary, float16,
bfloat16, int8, sparse) across ingest, storage, indexing, search and
retrieval; logical↔physical offset mapping preserves row semantics.
  * Client: compaction control and compaction-state APIs.

* **Bug Fixes**
* Improved validation for adding vector fields (nullable + dimension
checks) and corrected search/query behavior for nullable vectors.

* **Chores**
  * Persisted validity maps with indexes and on-disk formats.

* **Tests**
  * Extensive new and updated end-to-end nullable-vector tests.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
2025-12-24 10:13:19 +08:00
Buqian Zheng
e379b1f0f4
enhance: moved query optimization to proxy, added various optimizations (#45526)
issue: https://github.com/milvus-io/milvus/issues/45525

see added README.md for added optimizations

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added query expression optimization feature with a new `optimizeExpr`
configuration flag to enable automatic simplification of filter
predicates, including range predicate optimization, merging of IN/NOT IN
conditions, and flattening of nested logical operators.

* **Bug Fixes**
* Adjusted delete operation behavior to correctly handle expression
evaluation.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-12-24 00:39:19 +08:00
cai.zhang
7fca6e759f
enhance: Execute text indexes for multiple fields concurrently (#46279)
issue: #46274 

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Performance Improvements**
* Field-level text index creation and JSON-key statistics now run
concurrently, reducing overall indexing time and speeding task
completion.

* **Observability Enhancements**
* Per-task and per-field logging expanded with richer context and
per-phase elapsed-time reporting for improved monitoring and
diagnostics.

* **Refactor**
* Node slot handling simplified to compute slot counts on demand instead
of storing them.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-12-23 21:05:18 +08:00
cai.zhang
0943713481
fix: Skip Finished tasks when recovery with compatibility (#46515)
### **User description**
issue: #46466


___

### **PR Type**
Bug fix


___

### **Description**
- Extract finished task state check into reusable helper function

- Skip finished tasks during compaction recovery to prevent reprocessing

- Add backward compatibility check for pre-allocated segment IDs


___

### Diagram Walkthrough


```mermaid
flowchart LR
  A["Compaction Task States"] -->|"Check with helper"| B["isCompactionTaskFinished()"]
  B -->|"Used in"| C["compactionInspector.loadMeta()"]
  B -->|"Used in"| D["compactionTaskMeta.reloadFromKV()"]
  C -->|"Skip finished tasks"| E["Recovery Process"]
  D -->|"Backward compatibility"| E
```



<details><summary><h3>File Walkthrough</h3></summary>

<table><thead><tr><th></th><th align="left">Relevant
files</th></tr></thead><tbody><tr><td><strong>Enhancement</strong></td><td><table>
<tr>
  <td>
    <details>
<summary><strong>compaction_util.go</strong><dd><code>Add
isCompactionTaskFinished helper function</code>&nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
</dd></summary>
<hr>

internal/datacoord/compaction_util.go

<ul><li>Added new helper function
<code>isCompactionTaskFinished()</code> to check if a <br>compaction
task is in a terminal state<br> <li> Function checks for failed,
timeout, completed, cleaned, or unknown <br>states<br> <li> Centralizes
task state validation logic for reuse across multiple
<br>components</ul>


</details>


  </td>
<td><a
href="https://github.com/milvus-io/milvus/pull/46515/files#diff-8f2cb8d0fef37617202c5a2290ad2bdbf2df5b5983604b5b505bc73a65c7eb43">+8/-0</a>&nbsp;
&nbsp; &nbsp; </td>

</tr>
</table></td></tr><tr><td><strong>Bug fix</strong></td><td><table>
<tr>
  <td>
    <details>
<summary><strong>compaction_inspector.go</strong><dd><code>Refactor to
use finished task helper function</code>&nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary>
<hr>

internal/datacoord/compaction_inspector.go

<ul><li>Replaced inline state checks with call to
<code>isCompactionTaskFinished()</code> <br>helper<br> <li> Simplifies
code by removing repetitive state comparison logic<br> <li> Maintains
same behavior of skipping finished tasks during recovery</ul>


</details>


  </td>
<td><a
href="https://github.com/milvus-io/milvus/pull/46515/files#diff-1c884001f2e84de177fea22b584f3de70a6e73695dbffa34031be9890d17da6d">+1/-5</a>&nbsp;
&nbsp; &nbsp; </td>

</tr>

<tr>
  <td>
    <details>
<summary><strong>compaction_task_meta.go</strong><dd><code>Add finished
task check for backward compatibility</code>&nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp; </dd></summary>
<hr>

internal/datacoord/compaction_task_meta.go

<ul><li>Added check to skip finished tasks before processing
pre-allocated <br>segment IDs<br> <li> Ensures backward compatibility
for tasks without pre-allocated segment <br>IDs<br> <li> Prevents
marking already-finished tasks as failed during reload</ul>


</details>


  </td>
<td><a
href="https://github.com/milvus-io/milvus/pull/46515/files#diff-0dae7214c4c79ddf5106bd51d375b5fb2f41239d5d433798afa90708e443eca8">+1/-1</a>&nbsp;
&nbsp; &nbsp; </td>

</tr>
</table></td></tr></tbody></table>

</details>

___



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Bug Fixes**
* Improved detection of finished compaction tasks to reduce false
failures.
* Prevented finished tasks with missing pre-allocations from being
incorrectly marked as failed.
* Simplified abandonment logic for completed/timeout/cleaned tasks to
reduce erroneous retries and noisy logs.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-12-23 18:09:18 +08:00
Buqian Zheng
db9afe9756
enhance: update tantivy (#46521)
issue: #46520

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-12-23 16:57:19 +08:00
aoiasd
0203aefad1
enhance: add concurrency pool for analyzer (#46185)
relate: https://github.com/milvus-io/milvus/issues/42589

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## New Features
- Added `concurrency_per_cpu_core` configuration parameter for the
analyzer component, enabling customizable per-CPU concurrency tuning
(default: 8).

## Tests
- Added test coverage for batch analysis operations.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-12-23 16:01:18 +08:00
sparknack
0a2f8d4f63
enhance: map multi row groups into one cache cell (#46249)
issue: #45486

Introduce row group batching to reduce cache cell granularity and
improve
memory&disk efficiency. Previously, each parquet row group mapped 1:1 to
a cache
cell. Now, up to `kRowGroupsPerCell` (4) row groups are merged into one
cell.
This reduces the number of cache cells (and associated overhead) by ~4x
while
maintaining the same data granularity for loading.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Refactor**
* Switched to cell-based grouping that merges multiple row groups for
more efficient multi-file aggregation and reads.
* Chunk loading now combines multiple source batches/tables per cell and
better supports mmap-backed storage.

* **New Features**
* Exposed helpers to query row-group ranges and global row-group offsets
for diagnostics and testing.
* Translators now accept chunk-type and mmap/load hints to control
on-disk vs in-memory behavior.

* **Bug Fixes**
* Improved bounds checks and clearer error messages for out-of-range
cell requests.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-12-23 14:57:18 +08:00
congqixia
d3b15ac136
enhance: support pk isolation optional field data loading from manifest for index build (#46480)
### **User description**
Related to #44956

Add manifest-based data loading path for optional fields in
`cache_opt_field_memory_v2`. When a manifest file is provided in the
config, the function now retrieves field data directly from the manifest
using `GetFieldDatasFromManifest` instead of reading from segment insert
files. This enables storage v2 compatibility for building indexes with
optional fields.


___

### **PR Type**
Enhancement


___

### **Description**
- Add manifest-based data loading for optional fields in index building

- Support storage v2 compatibility via `GetFieldDatasFromManifest`
function

- Enable PK isolation optional field handling without segment insert
files


___

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-12-23 14:55:21 +08:00
Buqian Zheng
674ac8a006
enhance: fix IsMmapSupported for stl sort (#46472)
issue: https://github.com/milvus-io/milvus/issues/44399

this PR also adds `ByteSize()` methods for scalar indexes. currently not
used in milvus code, but used in scalar benchmark. may be used by
cachinglayer in the future.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Refactor**
* Improved and standardized memory-size computation and caching across
index types so reported index footprints are more accurate and
consistent.

* **Chores**
* Ensured byte-size metrics are refreshed immediately after index
build/load operations to keep memory accounting in sync with runtime
state.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-12-23 13:27:18 +08:00
XuanYang-cn
99b53316e5
enhance: Set latestDeletePos from L0 segments to bound L1 selection (#46436)
This commit refines L0 compaction to ensure data consistency by properly
setting the delete position boundary for L1 segment selection.

Key Changes:
1. L0 View Trigger Sets latestDeletePos for L1 Selection
2. Filter L0 Segments by Growing Segment Position in policy, not in
views
3. Renamed LevelZeroSegmentsView to LevelZeroCompactionView
4. Renamed fields for semantic clarity: * segments -> l0Segments *
earliestGrowingSegmentPos -> latestDeletePos
5. Update Default Compaction Prioritizer to level

See also: #46434

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-12-23 11:55:19 +08:00
cai.zhang
5911cb44e0
enhance: Estimate index task slot using field size instead of segment size (#46275)
issue: #45186

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-12-23 11:23:22 +08:00
yihao.dai
5e525eb3bf
enhance: Retry reads from object storage on rate limit error (#46455)
This PR improves the robustness of object storage operations by retrying
both explicit throttling errors (e.g. HTTP 429, SlowDown, ServerBusy).
These errors commonly occur under high concurrency and are typically
recoverable with bounded retries.

issue: https://github.com/milvus-io/milvus/issues/44772

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Configurable retry support for reads from object storage and improved
mapping of transient/rate-limit errors.
* Added a retryable reader wrapper used by CSV/JSON/Parquet/Numpy import
paths.

* **Configuration**
  * New parameter to control storage read retry attempts.

* **Tests**
* Expanded unit tests covering error mapping and retry behaviors across
storage backends.
* Standardized mock readers and test initialization to simplify test
setups.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-12-23 11:03:18 +08:00
foxspy
ab03521588
fix: fix chunk iterator merge order (#46461)
issue: #46349 
When using brute-force search, the iterator results from multiple chunks
are merged; at that point, we need to pay attention to how the metric
affects result ranking.

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-12-23 10:33:17 +08:00
Buqian Zheng
1a7ca339a5
feat: expose the Go expr parser to C++ and embed into libmilvus-core.so (#45703)
generated a library that wraps the go expr parser, and embedded that
into libmilvus-core.so

issue: https://github.com/milvus-io/milvus/issues/45702

see `internal/core/src/plan/milvus_plan_parser.h` for the exposed
interface

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Introduced C++ API for plan parsing with schema registration and
expression parsing capabilities.
* Plan parser now available as shared libraries instead of a standalone
binary tool.

* **Refactor**
* Reorganized build system to produce shared library artifacts instead
of executable binaries.
* Build outputs relocated to standardized library and include
directories.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-12-22 23:59:18 +08:00
cai.zhang
21b0e5ca9d
enhance: Don't seal segments when only alter collection properties (#46488)
### **PR Type**
Enhancement


___

### **Description**
- Only flush and fence segments for schema-changing alter collection
messages

- Skip segment sealing for collection property-only alterations

- Add conditional check using messageutil.IsSchemaChange utility
function


___

### Diagram Walkthrough


```mermaid
flowchart LR
  A["Alter Collection Message"] --> B{"Is Schema Change?"}
  B -->|Yes| C["Flush and Fence Segments"]
  B -->|No| D["Skip Segment Operations"]
  C --> E["Set Flushed Segment IDs"]
  D --> E
  E --> F["Append Operation"]
```



<details><summary><h3>File Walkthrough</h3></summary>

<table><thead><tr><th></th><th align="left">Relevant
files</th></tr></thead><tbody><tr><td><strong>Enhancement</strong></td><td><table>
<tr>
  <td>
    <details>
<summary><strong>shard_interceptor.go</strong><dd><code>Conditional
segment sealing based on schema changes</code>&nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp; </dd></summary>
<hr>


internal/streamingnode/server/wal/interceptors/shard/shard_interceptor.go

<ul><li>Added import for <code>messageutil</code> package to access
schema change detection <br>utility<br> <li> Modified
<code>handleAlterCollection</code> to conditionally flush and fence
<br>segments only for schema-changing messages<br> <li> Wrapped segment
flushing logic in <code>if
</code><br><code>messageutil.IsSchemaChange(header)</code> check<br>
<li> Skips unnecessary segment sealing when only collection properties
are <br>altered</ul>


</details>


  </td>
<td><a
href="https://github.com/milvus-io/milvus/pull/46488/files#diff-c1acf785e5b530e59137b21584cf567ccd9aeeb613fb3684294b439289e80beb">+9/-3</a>&nbsp;
&nbsp; &nbsp; </td>

</tr>
</table></td></tr></tbody></table>

</details>

___



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Optimized collection schema alteration to conditionally perform
segment allocation operations only when schema changes are detected,
reducing unnecessary overhead in unmodified collection scenarios.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-12-22 20:55:19 +08:00
Zhen Ye
2edc9ee236
enhance: support milvus version when coordinator startup (#46456)
issue: #46451

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Session versioning added to validate coordinator compatibility during
registration and active takeover.

* **Changes**
* Active–standby flow simplified: standby-to-active activation now
always enabled and initialized unconditionally.
* Registration uses version-aware transactions to ensure version
consistency during takeover.
  * Startup/health startup path streamlined.

* **Tests**
* Added version-key integration test; removed test for disabling
active-standby.
  * Updated flush test to assert rate-limiter errors occur.

* **Chores**
  * Removed centralized connection manager and its test suite.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-12-22 20:29:18 +08:00
aoiasd
5e28f45c5a
enhance: change highlight query keyword to highlight_query (#46360)
Instead of `queries`.
relate: https://github.com/milvus-io/milvus/issues/42589

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-12-22 11:43:18 +08:00
sijie-ni-0214
89a002e12a
fix: truncate_collection status check and add database interceptor su… (#46430)
issue: https://github.com/milvus-io/milvus/issues/46166

Signed-off-by: sijie-ni-0214 <sijie.ni@zilliz.com>
2025-12-21 19:19:17 +08:00
yihao.dai
32809c1053
fix: Remove stale proxy clients on rewatch etcd (#46398)
AddProxyClients now removes clients not in the new snapshot before
adding new ones. This ensures proper cleanup when ProxyWatcher re-watche
etcd.

issue: https://github.com/milvus-io/milvus/issues/46397

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-12-21 19:11:16 +08:00
yihao.dai
d03b9cc052
enhance: Align the monitoring of last_replicated_time_tick with wal_last_confirm_time_tick (#46469)
issue: https://github.com/milvus-io/milvus/issues/46116

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-12-21 19:03:17 +08:00
tinswzy
9345caa135
fix: call truncate when checkpoint is persisted (#46382)
issue: #44434

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-12-21 19:01:17 +08:00
congqixia
11c027ad81
fix: [Loon] pass mmap directory path to ManifestGroupTranslator (#46471)
Related to #44956

When loading column groups with mmap enabled, the
ManifestGroupTranslator needs the mmap directory path to properly handle
memory-mapped data loading. This change retrieves the root path from
LocalChunkManagerSingleton and passes it to the translator during
construction.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-12-20 12:25:17 +08:00
congqixia
6a15a08060
fix: set ManifestPath in GetRecoveryInfoV2 response (#46470)
Add ManifestPath field to SegmentInfo in GetRecoveryInfoV2 response,
enabling QueryCoord to detect manifest path changes and trigger segment
reopen for storage v2 incremental updates.

Related to #46394

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-12-19 22:21:19 +08:00
XuanYang-cn
0507db2015
feat: Add force merge (#45556)
See also: #46043

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-12-19 18:03:18 +08:00
Spade A
ab9bec0a6d
fix: some fixes for ngram index (#46405)
issue: https://github.com/milvus-io/milvus/issues/42053

The splitted literals in `match` execution should be handled in `and`
manner rather than `or`.

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-12-19 16:13:19 +08:00
Spade A
ad8aba7cb4
feat: impl ComputePhraseMatchSlop for compute min slop for phrase match query (#45892)
issue: https://github.com/milvus-io/milvus/issues/45890

ComputePhraseMatchSlop accepts three pararms:
1. A string: query text
2. Some trings: data texts
3. Analyzer params,

Slop will be calculated for the query text with each data text in the
context of phrase match where they are tokenized with tokenizer with
analyzer params.

So two array will be returned:
1. is_match: is phrase match can sucess
2. slop: the related slop if phrase match can sucess, or -1 is cannot.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-12-19 16:03:18 +08:00
congqixia
0425336635
fix: [skip e2e] resolve flaky TestKeyLockDispatcher unit test (#46454)
Related to #46453

The test was flaky because Submit() returns a Future and executes
asynchronously. The test was setting sig=true immediately after Submit()
returned, but the task's Run() might not have completed yet, causing
mock expectation failures.

Fix by calling future.Await() to wait for task execution to complete
before signaling. Also remove dead commented code.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-12-19 14:07:19 +08:00
junjiejiangjjj
617a77b0bd
enhance: Add embedding model and schema field type checks (#46421)
https://github.com/milvus-io/milvus/issues/46415

- Add output type validation when creating functions
- Fix improper error handling in bulk insert tasks

Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>
2025-12-19 11:05:19 +08:00
aoiasd
7e4f87e351
fix: Init analyzer at delegator for all field with enable analyzer (#46361)
To support text match highlight
relate: https://github.com/milvus-io/milvus/issues/46308

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-12-19 10:23:18 +08:00
congqixia
bf838eea5d
enhance: set dynamic field as nullable with default empty JSON (#46419)
Set the auto-appended dynamic field to be nullable with a default value
of empty JSON object `{}`. This allows collections with dynamic schema
to handle rows that don't have any dynamic fields more gracefully,
avoiding potential null reference issues when the dynamic field is not
explicitly set during insert.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-12-18 15:15:17 +08:00
congqixia
1414065860
feat: query coord support segment reopen when manifest path changes (#46394)
Related to #46358

Add segment reopen mechanism in QueryCoord to handle segment data
updates when the manifest path changes. This enables QueryNode to reload
segment data without full segment reload, supporting storage v2
incremental updates.

Changes:
- Add ActionTypeReopen action type and LoadScope_Reopen in protobuf
- Track ManifestPath in segment distribution metadata
- Add CheckSegmentDataReady utility to verify segment data matches
target
- Extend getSealedSegmentDiff to detect segments needing reopen
- Create segment reopen tasks when manifest path differs from target
- Block target update until segment data is ready

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-12-17 22:15:16 +08:00