23779 Commits

Author SHA1 Message Date
zhagnlu
635bead131
fix:fix incorrect rootPath for local storage mode (#46692)
#45959

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-12-31 12:23:22 +08:00
Buqian Zheng
724598d231
fix: handle mixed int64/float types in BinaryRangeExpr for JSON fields (#46681)
test: add unit tests for mixed int64/float types in BinaryRangeExpr

When processing binary range expressions (e.g., `x > 499 && x <= 512.0`)
on JSON/dynamic fields with expression templates, the lower and upper
bounds could have different numeric types (int64 vs float64). This
caused an assertion failure in GetValueFromProto when the template type
didn't match the actual proto value type.

Fixes:
1. Go side (fill_expression_value.go): Normalize numeric types for JSON
fields - if either bound is float and the other is int, convert the int
to float.

2. C++ side (BinaryRangeExpr.cpp):
   - Check both lower_val and upper_val types when dispatching
   - Use double template when either bound is float
- Use GetValueWithCastNumber instead of GetValueFromProto to safely
handle int64->double conversion

issue: #46588

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: JSON field binary-range expressions must present
numeric bounds to the evaluator with a consistent numeric type; if
either bound is floating-point, both bounds must be treated as double to
avoid proto-type mismatches during template instantiation.
- Bug fix (issue #46588 & concrete change): mixed int64/float bounds
could dispatch the wrong template (e.g.,
ExecRangeVisitorImplForJson<int64_t>) and trigger assertions in
GetValueFromProto. Fixes: (1) Go parser (FillBinaryRangeExpressionValue
in fill_expression_value.go) normalizes mixed JSON numeric bounds by
promoting the int bound to float; (2) C++ evaluator
(PhyBinaryRangeFilterExpr::Eval in BinaryRangeExpr.cpp) inspects both
lower_type and upper_type, sets use_double when either is float, selects
ExecRangeVisitorImplForJson<double> for mixed numeric cases, and
replaces GetValueFromProto with GetValueWithCastNumber so int64→double
conversions are handled safely.
- Removed / simplified logic: the previous evaluator branched on only
the lower bound's proto type and had separate index/non-index handling
for int64 vs float; that per-bound branching is replaced by unified
numeric handling (convert to double when needed) and a single numeric
path for index use — eliminating redundant, error-prone branches that
assumed homogeneous bound types.
- No data loss or regression: changes only promote int→double for
JSON-range comparisons when the other bound is float; integer-only and
float-only paths remain unchanged. Promotion uses IEEE double (C++
double and Go float64) and only affects template dispatch and
value-extraction paths; GetValueWithCastNumber safely converts int64 to
double and index/non-index code paths both normalize consistently,
preserving semantics for comparisons and avoiding assertion failures.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-12-31 11:52:24 +08:00
junjiejiangjjj
1100d8f7e2
feat: Add semantic highlight (#46189)
https://github.com/milvus-io/milvus/issues/42589

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Semantic Highlighting Feature

**Core Invariant**: Semantic highlighting operates on a per-field basis
with independent text processing through an external Zilliz highlight
provider. The implementation maintains field ID to field name mapping
and correlates highlight results back to original field outputs.

**What is Added**: This PR introduces semantic highlighting capability
for search results alongside the existing lexical highlighting. The
feature consists of:
- New `SemanticHighlight` orchestrator that validates queries/input
fields against collection schema, instantiates a Zilliz-based provider,
and batches text processing across multiple queries
- New `SemanticHighlighter` proxy wrapper implementing the `Highlighter`
interface for search pipeline integration
- New `semanticHighlightOperator` that processes search results by
delegating per-field text processing to the provider and attaching
correlated `HighlightResult` data to search outputs
- New gRPC service definition (`HighlightService`) and
`ZillizClient.Highlight()` method for external provider communication

**No Data Loss or Regression**: The change is purely additive without
modifying existing logic:
- Lexical highlighting path remains unchanged (separate switch case in
`createHighlightTask`)
- New `HighlightResults` field is only populated when semantic
highlighting is explicitly requested via `HighlightType_Semantic` enum
value
- Gracefully handles missing fields by returning explicit errors rather
than silent failures
- Pipeline operator integration follows existing patterns and only
processes when semantic highlighter is instantiated

**Why This Design**: Semantic highlighting is routed through the same
pipeline operator pattern as lexical highlighting, ensuring consistent
integration into search workflows. The per-field model allows flexible
highlighting across different text columns and batch processing ensures
efficient handling of multiple queries with configurable provider
constraints.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>
2025-12-31 11:41:22 +08:00
foxspy
49939f5f2b
fix: enable diskann option by default (#46584)
issue: #46481 


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: DiskANN requires OS support for asynchronous I/O
(AIO); the Makefile now encodes this by defaulting disk_index=OFF on
Darwin (macOS) and disk_index=ON on other OSes where AIO is available.
- Simplified logic: the build-time default was inverted for non-Darwin
platforms so DiskANN is enabled by default; redundant conditional
handling that previously forced OFF everywhere has been removed in favor
of an OS-based default while preserving the single manual override
variable.
- No data loss or regression: this is a compile-time change only — it
toggles inclusion of DiskANN code paths at build time and does not
modify runtime persistence or existing index files. macOS builds still
skip AIO-dependent DiskANN code paths, and Linux/other builds merely
compile support by default; no migration or runtime data-path changes
are introduced.
- Backward compatibility / fix for issue #46481: addresses the reported
need to enable DiskANN by default (issue #46481) while keeping explicit
disk_index overrides intact for CI and developer workflows, so existing
build scripts and deployments that pass disk_index continue to behave
unchanged.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-12-31 11:07:22 +08:00
yanliang567
15ce8aedd8
test: Add some tests for group by search support json and dynamic field (#46630)
related issue: #46616


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: these tests assume the v2 group-by search
implementation (TestMilvusClientV2Base and pymilvus v2 APIs such as
AnnSearchRequest/WeightedRanker) is functionally correct; the PR extends
coverage to validate group-by semantics when using JSON fields and
dynamic fields (see
tests/python_client/milvus_client_v2/test_milvus_client_search_group_by.py
— TestGroupSearch.setup_class and parametrized group_by_field cases).
- Logic removed/simplified: legacy v1 test scaffolding and duplicated
parametrized fixtures/test permutations were consolidated into
v2-focused suites (TestGroupSearch now inherits TestMilvusClientV2Base;
old TestGroupSearch/TestcaseBase patterns and large blocks in
test_mix_scenes were removed) to avoid redundant fixture permutations
and duplicate assertions while reusing shared helpers in common_func
(e.g., gen_scalar_field, gen_row_data_by_schema) and common_type
constants.
- Why this does NOT introduce data loss or behavior regression: only
test code, test helpers, and test imports were changed — no
production/server code altered. Test helper changes are
backward-compatible (gen_scalar_field forces primary key nullable=False
and only affects test data generation paths in
tests/python_client/common/common_func.py; get_field_dtype_by_field_name
now accepts schema dicts/ORM schemas and is used only by tests to choose
vector generation) and collection creation/insertion in tests use the
same CollectionSchema/FieldSchema paths, so production
storage/serialization logic is untouched.
- New capability (test addition): adds v2 test coverage for group-by
search over JSON and dynamic fields plus related scenarios — pagination,
strict/non-strict group_size, min/max group constraints, multi-field
group-bys and binary vector cases — implemented in
tests/python_client/milvus_client_v2/test_milvus_client_search_group_by.py
to address issue #46616.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: yanliang567 <yanliang.qiao@zilliz.com>
2025-12-31 11:03:21 +08:00
zhagnlu
031acf5711
enhance: convert jsonstats translator to bson_index translator (#45036)
issue: #42533

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-12-31 10:39:21 +08:00
nico
e75ad275aa
test: update tets cases (#46699)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Pull Request Summary: Test Case Updates for API Behavior Changes

**Core Invariant**: These test case updates reflect backend API
improvements to error messaging and schema information returned by
collection operations. The changes maintain backward compatibility—no
public signatures change, and all modifications are test expectation
updates.

**Updated Error Messages for Better Diagnostics**:
- `test_add_field_feature.py`: Updated expected error when adding a
vector field without dimension specification from a generic "not support
to add vector field" to the more descriptive "vector field must have
dimension specified, field name = {field_name}: invalid parameter". This
change is non-breaking for clients that only check error codes; it
improves developer experience with clearer error context.

**Schema Information Extension**:
- `test_milvus_client_collection.py`: Added `enable_namespace: False` to
the expected `describe_collection()` output. This is a new boolean field
in the collection metadata that defaults to False, representing an
opt-in feature. Existing code querying describe_collection continues to
work; the new field is simply an additional property in the response
dictionary.

**Dynamic Error Message Construction**:
- `test_milvus_client_search_invalid.py`: Replaced hardcoded error
message with conditional logic that generates the appropriate error
based on input state (None vectors vs invalid vector data). This
prevents test brittle failure if multiple error conditions exist, and
correctly validates the API's behavior handles both "missing data" and
"malformed data" cases distinctly.

**No Regression Risk**: All changes update test expectations to match
improved backend behavior. The changes are additive (new field in
schema) or clarifying (better error messages), with no modifications to
existing response structures or behavior for valid inputs.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: nico <cheng.yuan@zilliz.com>
2025-12-31 10:17:22 +08:00
cai.zhang
b13aac5164
fix: Include fieldID in raw data cleanup to prevent delete other fields (#46688)
issue: #46687 

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: raw-data cleanup must be scoped to (segment_id,
field_id) so deleting temporary raw files for one field never removes
raw files for other fields in the same segment (prevents cross-field
deletion during index builds).
- Root cause and fix (bug): VectorDiskIndex::Build() and
BuildWithDataset() called RemoveDir on the segment-level path; this
removed rawdata/{segment_id}/. The fix changes both calls to remove
storage::GenFieldRawDataPathPrefix(local_chunk_manager, segment_id,
field_id) instead, limiting cleanup to rawdata/{segment_id}_{field_id}/
(field-scoped).
- Logic removed/simplified: the old helper GetSegmentRawDataPathPrefix
was removed and callers were switched to GenFieldRawDataPathPrefix;
cleanup logic is simplified from segment-level to field-level path
generation and removal, eliminating redundant broad deletions.
- Why this does NOT cause data loss or regress behavior: the change
narrows RemoveDir() to the exact field path used when caching raw data
and offsets earlier in Build (offsets_path and CacheRawDataToDisk
produce field-scoped local paths). Build still writes/reads offsets and
raw data from GenFieldRawDataPathPrefix(...) and then removes that same
prefix after successful index.Build(); therefore only temporary files
for the built field are deleted and other fields’ raw files under the
same segment are preserved. This fixes issue #46687 by preventing
accidental deletion of other fields’ raw data.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-12-30 21:13:21 +08:00
aoiasd
26c86ec221
enhance: fix unstable unit test (#46626)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: TestWait must deterministically verify that
FixedSizeAllocator.Wait() is notified when virtual resources are
released, so an allocation blocked due to exhausted virtual capacity
eventually succeeds after explicit deallocations.
- Removed/simplified logic: replaced the previous flaky pattern that
spawned 100 concurrent goroutines performing Reallocate with an explicit
channel-synchronized release goroutine that performs 100 sequential
negative Reallocate calls only after the test blocks on allocation. This
eliminates timing-dependent concurrency and the nondeterministic i-based
assertion.
- Why no data loss or behavior regression: only the test changed —
allocator implementation (Allocate/Reallocate/Release/Wait/notify) and
public APIs are unchanged. The test now exercises the same code paths
(Allocate fails, Wait blocks on cond, Reallocate/Release call notify),
but in a deterministic order, so the allocator semantics and resource
accounting (used, allocs map) remain unaffected.
- Change type and intent: Enhancement/refactor of unit test stability —
it tightens test synchronization to remove race-dependent assertions and
ensure the Wait/notify mechanism is reliably exercised without modifying
allocator logic.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-12-30 17:05:21 +08:00
Spade A
1a6f3c4305
enhance: batch processing for ngram (#46648)
issue: https://github.com/milvus-io/milvus/issues/42053
Process ngram in batch rather than all by once.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Batch Processing for N-gram Queries

**Core Invariant:** All data iteration is now driven by `batch_size_` as
the fundamental unit; for sealed chunked segments processing string/JSON
data, processing is strictly stateless to allow specialized batched
algorithms.

**Simplified Logic:**
- Removed the `process_all_chunks` boolean flag from
`ProcessMultipleChunksCommon` (renamed to
`ProcessDataChunksForMultipleChunk`) as it was redundant—all iteration
paths now converge on the same `batch_size_`-driven chunking strategy
with unified data size clamping (`std::min(chunk_size, batch_size_ -
processed_size)`).
- Eliminated wrapper delegation methods
(`ProcessDataChunksForMultipleChunk` and
`ProcessAllChunksForMultipleChunk` old wrappers) that pointed to a
single common implementation with a conditional flag.

**No Data Loss or Behavior Regression:**
- The new `ProcessAllDataChunkBatched<T>` is an additional stateless
public path (requires sealed + chunked segments, type constraints:
`std::string_view|Json|ArrayView`) that iterates all `num_data_chunk_`
chunks in `batch_size_` granularity without mutating cursor state
(`current_data_chunk_`, `current_data_chunk_pos_`), ensuring
deterministic re-entrant processing.
- Existing cursor-based APIs (`ProcessDataChunksForMultipleChunk`,
`ProcessChunkForSealedSeg`) remain unchanged for standard expression
evaluation—no segment state is corrupted.
- N-gram query execution now routes through
`ExecuteQueryWithPredicate<T, Predicate>(literal, segment, predicate,
need_post_filter)` which forwards generic predicates and delegates to
`segment->ProcessAllDataChunkBatched<T>(execute_batch, res)` for
post-filtering, avoiding per-chunk single-pass traversal.

**Enhancement:** Generic predicate template `template <typename T,
typename Predicate>` with perfect forwarding (`Predicate&& predicate`)
replaces the fixed `std::function<bool(const T&)>` signature,
eliminating function wrapper overhead for n-gram matcher closures and
enabling efficient batch processing callbacks.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-12-30 16:57:22 +08:00
Feilong Hou
a1721bb47b
test: add more test case on partial update (#46628)
Issue: #46627  
add one more test case to cover duplicate pk partial update

 On branch feature/partial-update
 Changes to be committed:
	modified:   milvus_client/test_milvus_client_partial_update.py

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: upserts with partial_update=True consolidate records
by primary key (PK) rather than creating duplicate rows; this test
verifies the partial-update upsert path preserves PK identity and merge
semantics.
- Change: adds test
test_milvus_client_partial_update_duplicate_pk_partial_update which
inserts duplicate-PK batches, then calls client.upsert(...,
partial_update=True) on a subset of fields and asserts final row count
equals default_nb, exercising the partial-update code path (upsert →
partial update handling → query) not previously covered.
- No production logic removed/simplified: this PR only adds test
coverage (no code paths removed or altered); nothing in production code
is changed or simplified by the PR.
- No data loss or regression introduced: the test validates concrete
code paths — upsert with partial_update True followed by
query(out_fields/with_vec, pk checks) — and asserts deduplication
(2×default_nb → default_nb). Because the PR only adds assertions against
existing behavior and does not modify runtime logic, it cannot cause
data loss or behavioral regressions.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Eric Hou <eric.hou@zilliz.com>
Co-authored-by: Eric Hou <eric.hou@zilliz.com>
2025-12-30 16:01:26 +08:00
Feilong Hou
69a2d202b0
test: cover more timesamptz e2e (#46575)
Issue: #46424  
test:add_collection_field(invalid_default_value)
       hybrid_search(NOT supported_
simplify some test cases using one single collection to save time.
       query with different time shift and timezone settings

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: TIMESTAMPTZ values are treated as absolute instants
(timezone-preserving). Tests assume conversions between stored instants
and display timezones/time-shifts are deterministic and reversible; the
PR validates queries/reads across different timezone and time-shift
settings against that invariant.

- Removed/simplified logic: duplicated per-test create/insert/teardown
flows and several isolated timestamptz unit cases (edge_case, Feb_29,
partial_update, standalone query) were consolidated into a module-scoped
fixture that creates a single COLLECTION_NAME, inserts ROWS, and handles
teardown. This removes redundant setup/teardown code and repeated
scaffolding while preserving the same API exercise points
(create_collection, insert, query, alter_collection_properties,
alter_database_properties, describe_collection, describe_database).

- No data loss or behavior regression: only test code was reorganized
and new assertions exercise the same production APIs and code paths used
previously (create_collection → insert → query / alter_properties →
describe). The fixture inserts the same ROWS and tests still
convert/compare timestamptz values via cf.convert_timestamptz and query
check routines; the new invalid-default-value test only asserts error
handling when adding a TIMESTAMPTZ field with an invalid default and
does not mutate persisted data or change production logic.

- PR type (Enhancement/Test): expands and reorganizes E2E test coverage
for TIMESTAMPTZ—centralizes collection setup to reduce runtime and
flakiness, adds explicit coverage for invalid-default-value behavior,
and increases timezone/time-shift query scenarios without altering
product behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Eric Hou <eric.hou@zilliz.com>
Co-authored-by: Eric Hou <eric.hou@zilliz.com>
2025-12-30 15:59:21 +08:00
zhagnlu
83ab90af93
enhance: modify bson thirdparty lib compile mode (#45406)
#42533

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-12-30 15:41:21 +08:00
yihao.dai
b18ebd9468
enhance: Remove legacy cdc/replication (#46603)
issue: https://github.com/milvus-io/milvus/issues/44123

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: legacy in-cluster CDC/replication plumbing
(ReplicateMsg types, ReplicateID-based guards and flags) is obsolete —
the system relies on standard msgstream positions, subPos/end-ts
semantics and timetick ordering as the single source of truth for
message ordering and skipping, so replication-specific
channels/types/guards can be removed safely.

- Removed/simplified logic (what and why): removed replication feature
flags and params (ReplicateMsgChannel, TTMsgEnabled,
CollectionReplicateEnable), ReplicateMsg type and its tests, ReplicateID
constants/helpers and MergeProperties hooks, ReplicateConfig and its
propagation (streamPipeline, StreamConfig, dispatcher, target),
replicate-aware dispatcher/pipeline branches, and replicate-mode
pre-checks/timestamp-allocation in proxy tasks — these implemented a
redundant alternate “replicate-mode” pathway that duplicated
position/end-ts and timetick logic.

- Why this does NOT cause data loss or regression (concrete code paths):
no persistence or core write paths were removed — proxy PreExecute flows
(internal/proxy/task_*.go) still perform the same schema/ID/size
validations and then follow the normal non-replicate execution path;
dispatcher and pipeline continue to use position/subPos and
pullback/end-ts in Seek/grouping (pkg/mq/msgdispatcher/dispatcher.go,
internal/util/pipeline/stream_pipeline.go), so skipping and ordering
behavior remains unchanged; timetick emission in rootcoord
(sendMinDdlTsAsTt) is now ungated (no silent suppression), preserving or
increasing timetick delivery rather than removing it.

- PR type and net effect: Enhancement/Refactor — removes deprecated
replication API surface (types, helpers, config, tests) and replication
branches, simplifies public APIs and constructor signatures, and reduces
surface area for future maintenance while keeping DML/DDL persistence,
ordering, and seek semantics intact.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-12-30 14:53:21 +08:00
yihao.dai
b7761d67a3
enhance: Enhance logs for proxy and rootcoord meta table (#46652)
issue: https://github.com/milvus-io/milvus/issues/46651

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Enhancement: Add Context-Aware Logging for Proxy and RootCoord Meta
Table Operations

**Core Invariant**: All changes maintain existing cache behavior and
state transition logic by purely enhancing observability through
context-aware logging without modifying control flow, return values, or
data structures.

**Logic Simplified Without Regression**:
- Removed internal helper method `getFullCollectionInfo` from MetaCache
by inlining its logic directly into GetCollectionInfo, eliminating an
unnecessary abstraction layer while preserving the exact same
cache-hit/miss and fetch-or-update paths
- This consolidation has no impact on behavior because the helper was
only called from one location and the inlined logic executes identically

**Enhanced Logging for Observability (No Behavior Changes)**:
- Added context-aware logging (log.Ctx(ctx)) to cache miss scenarios and
timestamp comparisons in proxy MetaCache, enabling request tracing
without altering cache lookup logic
- Expanded RootCoord MetaTable's internal helper method signatures to
propagate context for contextual logging across collection lifecycle
events (begin truncate, update state, remove names/aliases, delete from
collections map), while keeping all call sites and state transitions
unchanged
- Enhanced DescribeCollection logging in proxy to capture request scope
(role, database, collection IDs, timestamp) and response schema at
operation boundaries

**Type**: Enhancement focused on improved observability. All
modifications are strictly additive logging; no data structures, caching
strategies, or core logic paths were altered.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-12-30 14:41:20 +08:00
zhuwenxing
e3a85be435
test: replace parquet with jsonl for EventRecords and RequestRecords in checker (#46671)
/kind improvement

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: tests' persistence of EventRecords and RequestRecords
must be append-safe under concurrent writers; this PR replaces Parquet
with JSONL and uses per-file locks and explicit buffer flushes to
guarantee atomic, append-safe writes (EventRecords uses event_lock +
append per line; RequestRecords buffers under request_lock and flushes
to file when threshold or on sink()).

- Logic removed/simplified and rationale: DataFrame-based parquet
append/read logic (pyarrow/fastparquet) and implicit parquet buffering
were removed in favor of simple line-oriented JSON writes and explicit
buffer management. The complex Parquet append/merge paths were redundant
because parquet append under concurrent test-writer patterns caused
corruption; JSONL removes the append-mode complexity and the
parquet-specific buffering/serialization code.

- Why no data loss or behavior regression (concrete code paths):
EventRecords.insert writes a complete JSON object per event under
event_lock to /tmp/ci_logs/event_records_*.jsonl and get_records_df
reads every JSON line under the same lock (or returns an empty DataFrame
with the same schema on FileNotFound/Error), preserving all fields
event_name/event_status/event_ts. RequestRecords.insert appends to an
in-memory buffer under request_lock and triggers _flush_buffer() when
len(buffer) >= 100; _flush_buffer() writes each buffered JSON line to
/tmp/ci_logs/request_records_*.jsonl and clears the buffer; sink() calls
_flush_buffer() under request_lock before get_records_df() reads the
file — ensuring all buffered records are persisted before reads. Both
read paths handle FileNotFoundError and exceptions by returning empty
DataFrames with identical column schemas, so external callers see the
same API and no silent record loss.

- Enhancement summary (concrete): Replaces flaky Parquet append/read
with JSONL + explicit locking and deterministic flush semantics,
removing the root cause of parquet append corruption in tests while
keeping the original DataFrame-based analysis consumers unchanged
(get_records_df returns equivalent schemas).
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>
2025-12-30 14:13:21 +08:00
zhuwenxing
f4f5e0f4dc
test: add highlighter in checker (#46289)
/kind improvement

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Tests**
  * Updated test suite dependencies to pymilvus 2.7.0rc91.
  * Enhanced text highlighting validation in test checkers.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>
2025-12-30 14:11:26 +08:00
congqixia
b4682b7352
fix: use LoadDeltaData instead of Delete for L0 growing forward (#46657)
Related to #46660
Replace segment.Delete() with segment.LoadDeltaData() when forwarding L0
deletions to growing segments. LoadDeltaData is the more appropriate API
for bulk loading delta data compared to individual Delete calls.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
• Core invariant: forwarding L0 deletions to growing segments must use
the bulk-delta API (storage.DeltaData + segment.LoadDeltaData) because
LoadDeltaData preserves paired primary keys and timestamps as a single
atomic delta payload; segment.Delete was intended for per-delete RPCs
and not for loading L0 delta payloads.

• Logic removed/simplified: addL0GrowingBF() no longer calls
segment.Delete for buffered L0 keys. Instead the buffered callback
builds a storage.DeltaData via storage.NewDeltaDataWithData(pks, tss)
and calls segment.LoadDeltaData(ctx, dd). This eliminates the previous
per-batch Delete call path and centralizes forwarding as a single
delta-load operation.

• Why this does not cause data loss or regression: the new path supplies
identical PK+timestamp pairs to the segment via DeltaData; LoadDeltaData
applies the same delete semantics but accepts batched delta payloads.
The change is limited to the L0→growing Bloom-Filter forward path
(addL0GrowingBF/addL0ForGrowingLoad), leaving sealed-segment deletes,
streaming direct forwarding, and remote-load policies unchanged. Also,
the prior code would fail on L0Segment.Delete (L0 segments prohibit
Delete), so switching to LoadDeltaData prevents lost-forwarding caused
by unsupported Delete calls.

• Category: Enhancement / Refactor — replaces inappropriate per-delete
calls with the correct bulk delta-load API, simplifying error handling
around NewDeltaDataWithData and ensuring API contract correctness for
L0→growing forwarding.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-12-30 14:05:21 +08:00
jac
898e6d6e94
fix: [master]update ngram test error codes to match actual server responses (#46675)
- Updates e2e test error code and message after pymilvus update

see #46677

Signed-off-by: silas.jiang <silas.jiang@zilliz.com>
Co-authored-by: silas.jiang <silas.jiang@zilliz.com>
2025-12-30 13:07:20 +08:00
aoiasd
90809d1d86
fix: highlight with multi analyzer failed (#46527)
relate: https://github.com/milvus-io/milvus/issues/46498

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: text fields configured with multi_analyzer_params must
include a "by_field" string that names another field containing per-row
analyzer choices; schemaInfo.GetMultiAnalyzerNameFieldID caches and
returns the dependent field ID (or 0 if none) and relies on that mapping
to make per-row analyzer names available to the highlighter.
- What changed / simplified: the highlighter is now schema-aware —
addTaskWithSearchText accepts *schemaInfo and uses
GetMultiAnalyzerNameFieldID to resolve the analyzer-name field;
resolution and caching moved into schemaInfo.multiAnalyzerFieldMap
(meta_cache.go), eliminating ad-hoc/typeutil-only lookups and duplicated
logic; GetMultiAnalyzerParams now gates on EnableAnalyzer(),
centralizing analyzer enablement checks.
- Why this fixes the bug (root cause): fixes #46498 — previously the
highlighter failed when the analyzer-by-field was not in output_fields.
The change (1) populates task.AnalyzerNames (defaulting missing names to
"default") when multi-analyzer is configured and (2) appends the
analyzer-name field ID to LexicalHighlighter.extraFields so FieldIDs
includes it; the operator then requests the analyzer-name column at
search time, ensuring per-row analyzer selection is available for
highlighting.
- No data-loss or regression: when no multi-analyzer is configured
GetMultiAnalyzerNameFieldID returns 0 and behavior is unchanged; the
patch only adds the analyzer-name field to requested output IDs (no
mutation of stored data). Error handling on malformed params is
preserved (errors are returned instead of silently changing data), and
single-analyzer behavior remains untouched.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-12-30 11:55:21 +08:00
jiamingli-maker
ebe82db4fe
test: Add HNSW_PQ test cases and update HNSW_SQ (#46604)
/kind improvement

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: test infrastructure treats insertion granularity as
orthogonal to data semantics—bulk generation
gen_row_data_by_schema(nb=2000, start=0, random_pk=False) yields the
same sequential PKs and vector payloads as prior multi-batch inserts, so
tests relying on collection lifecycle, flush, index build, load and
search behave identically.
- What changed / simplified: added a full HNSW_PQ parameterized test
suite (tests/python_client/testcases/indexes/idx_hnsw_pq.py and
test_hnsw_pq.py) and simplified HNSW_SQ test insertion by replacing
looped per-batch generation+insert with a single bulk
gen_row_data_by_schema(...) + insert. The per-batch PK sequencing and
repeated vector generation were redundant for correctness and were
removed to reduce complexity.
- Why this does NOT cause data loss or behavior regression: the
post-insert code paths remain unchanged—tests still call client.flush(),
create_index(...), util.wait_for_index_ready(), collection.load(), and
perform searches that assert describe_index and search outputs. Because
start=0 and random_pk=False reproduce identical sequential PKs (0..1999)
and the same vectors, index creation and search validation operate on
identical data and index parameters, preserving previous assertions and
outcomes.
- New capability: comprehensive HNSW_PQ coverage (build params: M,
efConstruction, m, nbits, refine, refine_type; search params: ef,
refine_k) across vector types (FLOAT_VECTOR, FLOAT16_VECTOR,
BFLOAT16_VECTOR, INT8_VECTOR) and metrics (L2, IP, COSINE), implemented
as data-driven tests to validate success and failure/error messages for
boundary, type-mismatch and inter-parameter constraints.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: zilliz <jiaming.li@zilliz.com>
2025-12-30 10:07:21 +08:00
lif
63e191a347
fix: correct typo Seperate to Separate in build script (#46632)
issue: #46636

## Summary
- Fix spelling error in comment: `Seperate` -> `Separate`
- Location: `build/build_image_gpu.sh` line 38

## Test Plan
- [x] Comment-only change, no functional impact

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## PR Summary: Typo Correction in Build Script Comment

• **Core Assumption**: This change relies on the assumption that
documentation and comments should reflect correct spelling to maintain
code quality and readability for maintainers. The comment is purely
informational describing the conditional logic below it.

• **What Changed**: A single spelling correction in a comment on line 38
of `build/build_image_gpu.sh`, changing "Seperate" to "Separate". No
code logic, control flow, or build behavior is altered—this is a
comment-only edit.

• **No Regression**: This change introduces zero behavioral or
functional impact because the modified content is a comment that does
not execute. The conditional logic immediately following (lines 39-42)
and the docker build command remain completely unchanged. Build output,
image creation, dependency installation, and all runtime behavior are
identical before and after this change.

• **Rationale**: Correcting obvious spelling errors in comments improves
code maintainability and reduces potential confusion for developers
reading the build script, while incurring zero risk to the system.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: majiayu000 <1835304752@qq.com>
2025-12-30 09:35:21 +08:00
Chun Han
da732ec04d
enhance: change credential provider to singleton(#46649) (#46653)
related: #46649

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: STS IAM credential providers for Aliyun, Tencent
Cloud, and Huawei Cloud are global, stateless resources that must be
instantiated once and reused across all ChunkManager instances
(singleton), rather than created per-manager.
- Logic removed/simplified: Removed per-instance Aws::MakeShared
instantiation of STSAssumeRoleWebIdentityCredentialsProvider inside
Aliyun/Tencent/Huawei ChunkManager constructors and replaced them with
public static Get...CredentialsProvider() methods that return a
thread-safe, lazily-initialized shared_ptr singleton (static local
variable). This eliminates duplicate provider construction and
header/signal dependency usages tied to per-constructor instantiation.
- Why this does NOT introduce data loss or behavior regression:
Credential acquisition and usage paths are unchanged — callers still
call provider->GetAWSCredentials() and use the returned AWSCredentials
to construct Aws::S3::S3Client. The singleton returns the same provider
object but the provider is stateless with respect to per-manager data
(it only reads environment/platform credentials and produces
AWSCredentials). C++11+ static local initialization provides atomic,
thread-safe construction, so first-access semantics and validation
checks (AssertInfo on access key/secret/token) remain intact.
- PR type (Enhancement/Refactor): Improves credential management by
centralizing provider lifecycle, removing redundant allocations and
header dependencies, and enforcing a single shared provider per cloud
vendor where IAM is used.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-12-29 20:35:21 +08:00
Buqian Zheng
dc7c92d398
fix: scalar bench builds on its own, removing related target from milvus (#46658)
issue: https://github.com/milvus-io/milvus/issues/44452

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

**Scalar Bench Decoupled from Milvus Build System**

- **Core assumption**: Scalar-bench is now managed as an independent
build artifact outside the milvus repository, eliminating the need for
conditional compilation integration within milvus's Makefile and
CMakeLists.txt.

- **Build infrastructure simplified**: Removed `scalar-bench` and
`scalar-bench-ui` targets from Makefile and deleted the entire
`ENABLE_SCALAR_BENCH` conditional block in
`internal/core/unittest/CMakeLists.txt` (which handled FetchContent,
cache variables, and subdirectory integration)—this eliminates optional,
redundant build-time coupling that is no longer necessary.

- **No regression introduced**: The removal only affects optional
build-time integration paths. Core C++ builds continue functioning as
before, and unit tests remain unaffected since `ENABLE_SCALAR_BENCH` was
always optional (not a required dependency); the newly added
`plan-parser-so` dependency on core build targets appears to be a
separate, required component.

- **Decoupling benefit**: Scalar-benchmark can now evolve and release on
its own schedule independent of milvus release cycles, while maintaining
clean separation of concerns between the two projects.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-12-29 20:13:21 +08:00
Bingyi Sun
f9827392bb
enhance: implement external collection update task with source change detection (#45905)
issue: #45881 
Add persistent task management for external collections with automatic
detection of external_source and external_spec changes. When source
changes, the system aborts running tasks and creates new ones, ensuring
only one active task per collection. Tasks validate their source on
completion to prevent superseded tasks from committing results.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: at most one active UpdateExternalCollection task
exists per collection — tasks are serialized by collectionID
(collection-level locking) and any change to external_source or
external_spec aborts superseded tasks and causes a new task creation
(externalCollectionManager + external_collection_task_meta
collection-based locks enforce this).
- What was simplified/removed: per-task fine-grained locking and
concurrent multi-task acceptance per collection were replaced by
collection-level synchronization (external_collection_task_meta.go) and
a single persistent task lifecycle in DataCoord/Index task code;
redundant double-concurrent update paths were removed by checking
existing task presence in AddTask/LoadOrStore and aborting/overwriting
via Drop/Cancel flows.
- Why this does NOT cause data loss or regress behavior: task state
transitions and commit are validated against the current external
source/spec before applying changes — UpdateStateWithMeta and SetJobInfo
verify task metadata and persist via catalog only under matching
collection-state; DataNode externalCollectionManager persists task
results to in-memory manager and exposes Query/Drop flows (services.go)
without modifying existing segment data unless a task successfully
finishes and SetJobInfo atomically updates segments via meta/catalog
calls, preventing superseded tasks from committing stale results.
- New capability added: end-to-end external collection update workflow —
DataCoord Index task + Cluster RPC helpers + DataNode external task
runner and ExternalCollectionManager enable creating, querying,
cancelling, and applying external collection updates
(fragment-to-segment balancing, kept/updated segment handling, allocator
integration); accompanying unit tests cover success, failure,
cancellation, allocator errors, and balancing logic.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-12-29 19:53:21 +08:00
wei liu
293838bb67
enhance: add delegator catching up streaming data state tracking (#46551)
issue: #46550
- Add CatchUpStreamingDataTsLag parameter to control tolerable lag
  threshold for delegator to be considered caught up
- Add catchingUpStreamingData field in delegator to track whether
  delegator has caught up with streaming data
- Add catching_up_streaming_data field in LeaderViewStatus proto
- Check catching up status in CheckDelegatorDataReady, return not
  ready when delegator is still catching up streaming data
- Add unit tests for the new functionality

When tsafe lag exceeds the threshold, the distribution will not be
considered serviceable, preventing queries from timing out in waitTSafe.
This is useful when streaming message queue consumption is slow.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: a delegator must not be considered serviceable while
its tsafe lags behind the latest committed timestamp beyond a
configurable tolerance; a delegator is "caught-up" only when
(latestTsafe - delegator.GetTSafe()) < CatchUpStreamingDataTsLag
(configured by queryNode.delegator.catchUpStreamingDataTsLag, default
1s).
- New capability and where it takes effect: adds streaming-catchup
tracking to QueryNode/QueryCoord — an atomic catchingUpStreamingData
flag on shardDelegator (internal/querynodev2/delegator/delegator.go), a
new param CatchUpStreamingDataTsLag
(pkg/util/paramtable/component_param.go), and a
LeaderViewStatus.catching_up_streaming_data field in the proto
(pkg/proto/query_coord.proto). The flag is exposed in
GetDataDistribution (internal/querynodev2/services.go) and used by
QueryCoord readiness checks
(internal/querycoordv2/utils/util.go::CheckDelegatorDataReady) to reject
leaders that are still catching up.
- What logic is simplified/added (not removed): instead of relying
solely on segment distribution/worker heartbeats, the PR adds an
explicit readiness gate that returns "not available" when the delegator
reports catching-up-streaming-data. This is strictly additive — no
existing checks are removed; the new precondition runs before segment
availability validation to prevent premature routing to slow-consuming
delegators.
- Why this does NOT cause data loss or regress behavior: the change only
controls serviceability visibility and routing — it never drops or
mutates data. Concretely: shardDelegator starts with
catchingUpStreamingData=true and flips to false in UpdateTSafe once the
sampled lag falls below the configured threshold
(internal/querynodev2/delegator/delegator.go::UpdateTSafe). QueryCoord
will short-circuit in CheckDelegatorDataReady when
leader.Status.GetCatchingUpStreamingData() is true
(internal/querycoordv2/utils/util.go), returning a channel-not-available
error before any segment checks; when the flag clears, existing
segment-distribution checks (same code paths) resume. Tests added cover
both catching-up and caught-up paths
(internal/querynodev2/delegator/delegator_test.go,
internal/querycoordv2/utils/util_test.go,
internal/querynodev2/services_test.go), demonstrating convergence
without changed data flows or deletion of data.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-12-29 17:15:21 +08:00
XuanYang-cn
4230a5beaa
enhance: revert mmap configs (#46581)
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-12-29 15:09:21 +08:00
Chun Han
f087b7432e
fix: increase expiry time for huawei cloud(#46296) (#46298)
related: #46296

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: expiration comparisons use
Aws::Utils::DateTime::Now().count() which returns milliseconds; any
expiration grace period must be expressed in milliseconds and compared
via (GetExpiration() - Now()).count() in ExpiresSoon() (Huawei and
Tencent providers).
- Root cause and fix: the grace period constant was authored as 7200
(seconds) but used against millisecond counts, causing premature
refreshes. The PR changes
STS_CREDENTIAL_PROVIDER_EXPIRATION_GRACE_PERIOD to 180 * 1000 (180000
ms) in HuaweiCloudCredentialsProvider.cpp and
TencentCloudCredentialsProvider.cpp to align units and stop unnecessary
refreshes.
- Removed/replaced redundant/incorrect behavior: the PR does not add new
control flow but corrects unit mismatch and simplifies logging/STS
request handling — HuaweiCloudSTSClient now explicitly requests a
7200-second token by adding "token": {"duration_seconds": 7200} to the
JSON body and uses JsonValue(...).View() for parsing; Huawei logging
level raised from TRACE to DEBUG and now logs expiration_count_diff_ms
for clarity. These changes remove ambiguity about requested token
lifetime and improve diagnostic output.
- No data loss or regression: credential contents and assignment are
unchanged — Reload()/RefreshIfExpired()/ExpiresSoon() still populate
m_credentials from STS responses and return them via
GetAWSCredentials(); only the grace-period unit and the Huawei STS
request body/parsing/logging were adjusted. Code paths affected are
ExpiresSoon()/RefreshIfExpired()/Reload() in both providers and
HuaweiCloudSTSCredentialsClient::callHuaweiCloudSTS; since credentials
are still read from the same response fields (access, secret,
securitytoken, expires_at) and assigned to result.creds, there is no
data loss or altered persistence/authorization semantics beyond aligning
requested token duration and correct refresh timing.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-12-29 14:05:20 +08:00
junjiejiangjjj
c85f7d5d84
fix: Update milvus-proto (#46561)
https://github.com/milvus-io/milvus/issues/46543

Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>
2025-12-29 11:29:22 +08:00
zhenshan.cao
1399d955fc
enhance: optimize timestamptz comparison without interval (#46619)
issue: https://github.com/milvus-io/milvus/issues/46618

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

• **Core Invariant**: TIMESTAMPTZ values are internally stored as int64
Unix microseconds. Simple comparisons without intervals can safely use
native int64 range evaluation (`ExecRangeVisitorImpl<int64_t>`) and
`UnaryRangeExpr` to leverage index-based scans, since the underlying
data type and comparison semantics remain unchanged.

• **Logic Optimization**: The parser now branches on interval presence.
When `ctx.GetOp1() == nil` (no interval), it returns a lightweight
`UnaryRangeExpr` for fast indexed range scans. When an interval exists,
it falls back to the heavier `TimestamptzArithCompareExpr` for
arithmetic evaluation. This eliminates redundant ISO interval parsing
and type conversions for the common case of interval-free comparisons.

• **No Regression**: The `UnaryRangeExpr` path preserves exact
comparison semantics by treating TIMESTAMPTZ as int64 directly, matching
the storage format. For reverse comparisons (e.g., `'2025-01-01' >
column`), operator reversal correctly normalizes to column-centric form
(`column < '2025-01-01'`), maintaining logical equivalence.
Interval-based comparisons continue through the unchanged
`TimestamptzArithCompareExpr` path.

• **Coverage**: Both forward (column left of operator) and reverse
(column right of operator) comparison syntaxes are handled with explicit
branching logic, ensuring the optimization applies uniformly across
comparison patterns.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2025-12-29 11:05:20 +08:00
Spade A
0114bd1dc9
feat: support match operator family (#46518)
issue: https://github.com/milvus-io/milvus/issues/46517
ref: https://github.com/milvus-io/milvus/issues/42148

This PR supports match operator family with struct array and brute force
search only.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: match operators only target struct-array element-level
predicates and assume callers provide a correct row_start so element
indices form a contiguous range; IArrayOffsets implementations convert
row-level bitmaps/rows (starting at row_start) into element-level
bitmaps or a contiguous element-offset vector used by brute-force
evaluation.

- New capability added: end-to-end support for MATCH_* semantics
(match_any, match_all, match_least, match_most, match_exact) — parser
(grammar + proto), planner (ParseMatchExprs), expr model
(expr::MatchExpr), compilation (Expr→PhyMatchFilterExpr), execution
(PhyMatchFilterExpr::Eval uses element offsets/bitmaps), and unit tests
(MatchExprTest + parser tests). Implementation currently works for
struct-array inputs and uses brute-force element counting via
RowBitsetToElementOffsets/RowBitsetToElementBitset.

- Logic removed or simplified and why: removed the ad-hoc
DocBitsetToElementOffsets helper and consolidated offset/bitset
derivation into IArrayOffsets::RowBitsetToElementOffsets and a
row_start-aware RowBitsetToElementBitset, and removed EvalCtx overloads
that embedded ExprSet (now EvalCtx(exec_ctx, offset_input)). This
centralizes array-layout logic in ArrayOffsets and removes duplicated
offset conversion and EvalCtx variants that were redundant for
element-level evaluation.

- No data loss / no behavior regression: persistent formats are
unchanged (no proto storage or on-disk layout changed); callers were
updated to supply row_start and now route through the centralized
ArrayOffsets APIs which still use the authoritative
row_to_element_start_ mapping, preserving exact element index mappings.
Eval logic changes are limited to in-memory plumbing (how
offsets/bitmaps are produced and how EvalCtx is constructed); expression
evaluation still invokes exprs_->Eval where needed, so existing behavior
and stored data remain intact.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-12-29 11:03:26 +08:00
wei liu
0d70d2b98c
enhance: simplify seek position selection in WatchDmChannels (#46567)
issue: #46566
Remove the complex comparison logic between seekPosition and
deleteCheckpoint. Use seekPosition directly since:

- L0 segments are loaded before consuming message stream, which contain
delete records from [deleteCheckpoint, L0.endPosition]
- DataCoord ensures seekPosition is based on channel checkpoint, updated
after data (including deletes) is flushed
- L0 segments should cover up to seekPosition, avoiding data loss
- This eliminates redundant message consumption when seekPosition >
deleteCheckpoint

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: L0 segments are loaded before consuming the DM channel
stream and contain delete records for range [deleteCheckpoint,
L0.endPosition]; DataCoord guarantees channel.GetSeekPosition() is
derived from the channel checkpoint after data (including deletes) is
flushed, so L0 segments collectively cover up to that seekPosition.
- Change made: removed the prior branching that built a synthetic seek
position from deleteCheckpoint vs. channel checkpoint and instead always
calls channel.GetSeekPosition() (used directly in ConsumeMsgStream).
Added an informational log comparing seekPosition and deleteCheckpoint.
- Why the removed logic was redundant: deleteCheckpoint represented the
smallest start position of L0 segments and was used to avoid
re-consuming delete messages already present in loaded L0 segments.
Because L0 segments already include deletes up to the channel checkpoint
and DataCoord updates the channel checkpoint after flush, using
deleteCheckpoint to alter the seek introduces duplicate consumption
without benefit.
- Why this is safe (no data loss/regression): L0 segments are guaranteed
to be loaded before consumption, so deletes present in L0 cover the
range up to channel.GetSeekPosition(); delete records earlier than
deleteCheckpoint have been compacted to L1 and can be evicted from the
delete buffer. The code path still calls ConsumeMsgStream with the
channel seek position, preserving original consumption/error handling,
so no messages are skipped and no additional delete application occurs
beyond what L0/L1 already cover.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-12-29 10:53:22 +08:00
marcelo-cjl
3c2cf2c066
feat: Add nullable vector support in import utility layer (#46142)
related: #45993 

Add nullable vector support in import utility layer
    
Key changes:

ImportV2 util:
- Add nullable vector types (FloatVector, Float16Vector, BFloat16Vector,
BinaryVector, SparseFloatVector, Int8Vector) to
AppendNullableDefaultFieldsData()
- Add tests for nullable vector field data appending

CSV/JSON/Numpy readers:
- Add nullPercent parameter to test data generation for better null
coverage
- Mark vector fields as nullable in test schemas
- Add test cases for nullable vector field parsing
- Refactor tests to use loop-based approach with 0%, 50%, 100% null
percentages

Parquet field reader:
- Add ReadNullableBinaryData() for nullable
BinaryVector/Float16Vector/BFloat16Vector
- Add ReadNullableFloatVectorData() for nullable FloatVector
- Add ReadNullableSparseFloatVectorData() for nullable SparseFloatVector
- Add ReadNullableInt8VectorData() for nullable Int8Vector
- Add ReadNullableStructData() for generic nullable struct data
- Update Next() to use nullable read methods when field is nullable
- Add null data validation for non-nullable fields

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: import must preserve per-row alignment and validity
for every field — nullable vector fields are expected to be encoded with
per-row validity masks and all readers/writers must emit arrays aligned
to original input rows (null entries represented explicitly).
- New feature & scope: adds end-to-end nullable-vector support in the
import utility layer — AppendNullableDefaultFieldsData in
internal/datanode/importv2/util.go now appends nil placeholders for
nullable vectors (FloatVector, Float16Vector, BFloat16Vector,
BinaryVector, SparseFloatVector, Int8Vector); parquet reader
(internal/util/importutilv2/parquet/field_reader.go) adds
ReadNullableBinaryData, ReadNullableFloatVectorData,
ReadNullableSparseFloatVectorData, ReadNullableInt8VectorData,
ReadNullableStructData and routes nullable branches to these helpers;
CSV/JSON/Numpy readers and test utilities updated to generate and
validate 0/50/100% null scenarios and mark vector fields as nullable in
test schemas.
- Logic removed / simplified: eliminates ad-hoc "parameter-invalid"
rejections for nullable vectors inside FieldReader.Next by centralizing
nullable handling into ReadNullable* helpers and shared validators
(getArrayDataNullable,
checkNullableVectorAlignWithDim/checkNullableVectorAligned), simplifying
control flow and removing scattered special-case checks.
- No data loss / no regression (concrete code paths): nulls are
preserved end-to-end — AppendNullableDefaultFieldsData explicitly
inserts nil entries per null row (datanode import append path);
ReadNullable*Data helpers return both data and []bool validity masks so
callers in field_reader.go and downstream readers receive exact per-row
validity; testutil.BuildSparseVectorData was extended to accept
validData so sparse vectors are materialized only for valid rows while
null rows are represented as missing. These concrete paths ensure null
rows are represented rather than dropped, preventing data loss or
behavioral regression.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
2025-12-29 10:51:21 +08:00
Buqian Zheng
dce44f2a20
test: reduce test time for TestSparsePlaceholderGroupSize (#46637)
issue: #44452

## Summary
Reduce test combinations in `TestSparsePlaceholderGroupSize` to decrease
test execution time:
- `nqs`: from `[1, 10, 100, 1000, 10000]` to `[1, 100, 10000]`
- `averageNNZs`: from `[1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024,
2048]` to `[1, 4, 16, 64, 256, 1024]`

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## TestSparsePlaceholderGroupSize Test Reduction

**Core Invariant:** The sparse vector NNZ estimation algorithm
(`EstimateSparseVectorNNZFromPlaceholderGroup`) must maintain accuracy
within bounded error thresholds—individual cases < 10% error and no more
than 2% of cases exceeding 5% error—across representative parameter
ranges.

**Test Coverage Optimized, Not Removed:** Test combinations reduced from
60 to 18 by pruning redundant parameter points while retaining critical
coverage: nqs now tests [1, 100, 10000] (min, mid, max) and averageNNZs
tests [1, 4, 16, 64, 256, 1024] (exponential spacing). Variant
generation logic (powers of 2 scaling) remains unchanged, ensuring error
scenarios are still exercised.

**No Behavioral Regression:** The algorithm under test is untouched;
only test case frequency decreases. The same assertions validate error
bounds are satisfied—individual assertions (`assert.Less(errorRatio,
10.0)`) and statistical assertions (`assert.Less(largeErrorRatio, 2.0)`)
remain identical, confirming that estimation quality is still verified.

**Why Safe:** Exponential spacing of removed parameters (e.g., nqs: 10,
1000 removed; averageNNZs: 2, 8, 32, 128, 512, 2048 removed) addresses
diminishing returns—intermediate values provide no new error scenarios
beyond what surrounding powers-of-2 values expose, while keeping test
execution time proportional to coverage value gained.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-12-29 09:51:21 +08:00
sijie-ni-0214
0a54c93227
fix: etcd RPC size limit exceeded when dropping collection (#46414)
issue: https://github.com/milvus-io/milvus/issues/46410

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: etcd metadata and in-memory Segment/TextIndex records
must store only compact filenames for text-index files; full object keys
are deterministically reconstructed at use-sites from a stable root +
common.TextIndexPath + IDs via metautil.BuildTextLogPaths.

- Bug & fix (issue #46410): the etcd RPC size overflow was caused by
persisting full upload keys in segment/TextIndex metadata. Fix: at
upload/creation sites (internal/datanode/compactor/sort_compaction.go
and internal/datanode/index/task_stats.go) store only filenames using
metautil.ExtractTextLogFilenames; at consumption/use sites
(internal/datacoord/garbage_collector.go,
internal/querynodev2/segments/segment.go, and other GC/loader code)
reconstruct full paths with metautil.BuildTextLogPaths before accessing
object storage.

- Simplified/removed logic: removed the redundant practice of carrying
full object keys through metadata and in-memory structures; callers now
persist compact filenames and perform on-demand path reconstruction.
This eliminates large payloads in etcd and reduces memory pressure while
preserving the same runtime control flow and error handling.

- No data loss / no regression: filename extraction is a deterministic
suffix operation (metautil.ExtractTextLogFilenames) and reloadFromKV
performs backward compatibility (internal/datacoord/meta.go converts
existing full-path entries to filenames before caching). All read paths
reconstruct full paths at runtime (garbage_collector.getTextLogs,
LocalSegment.LoadTextIndex, GC/loader), so no files are modified/deleted
and access semantics remain identical.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: sijie-ni-0214 <sijie.ni@zilliz.com>
2025-12-28 15:31:19 +08:00
congqixia
6c8e11da4f
feat: [GoSDK] add QueryIterator support for Go client (#46633)
Related to #31293

Implement QueryIterator for the Go SDK to enable efficient iteration
over large query result sets using PK-based pagination.

Key changes:
- Add QueryIterator interface and implementation with PK-based
pagination
- Support Int64 and VarChar primary key types for pagination filtering
- Add QueryIteratorOption with batchSize, limit, filter, outputFields
config
- Fix ResultSet.Slice to handle Query results without IDs/Scores
- Add comprehensive unit tests and integration tests

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: the iterator requires the collection primary key (PK)
to be present in outputFields so PK-based pagination and accurate row
counting work. The constructor enforces this by appending the PK to
outputFields when absent, and all pagination (lastPK tracking, PK-range
filters) and ResultCount calculations depend on that guaranteed PK
column.

- New capability: adds a public QueryIterator API (Client.QueryIterator,
QueryIterator interface, QueryIteratorOption) that issues server-side
Query RPCs in configurable batches and implements PK-based pagination
supporting Int64 and VarChar PKs, with options for batchSize, limit,
filter, outputFields and an upfront first-batch validation to fail fast
on invalid params.

- Removed/simplified logic: ResultSet.Slice no longer assumes IDs and
Scores are always present — it branches on presence of IDs (use IDs
length when non-nil; otherwise derive row count from Fields[0]) and
guards Scores slicing. This eliminates redundant/unsafe assumptions and
centralizes correct row-count logic based on actual returned fields.

- No data loss or behavior regression: pagination composes the user
filter with a PK-range filter and always requests the PK field, so
lastPK is extracted from a real column and fetchNextBatch only advances
when rows are returned; EOF is returned only when the server returns no
rows or iterator limit is reached. ResultSet.Slice guards prevent panics
for queries that lack IDs/Scores; Query RPC → ResultSet.Fields remains
the authoritative path for row data, so rows are not dropped and
existing query behavior is preserved.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-12-27 01:43:20 +08:00
aoiasd
55feb7ded8
feat: set related resource ids in collection schema (#46423)
Support crate analyzer with file resource info, and return used file
resource ids when validate analyzer.
Save the related resource ids in collection schema.
relate: https://github.com/milvus-io/milvus/issues/43687

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: analyzer file-resource resolution is deterministic and
traceable by threading a FileResourcePathHelper (collecting used
resource IDs in a HashSet) through all tokenizer/analyzer construction
and validation paths; validate_analyzer(params, extra_info) returns the
collected Vec<i64) which is propagated through C/Rust/Go layers to
callers (CValidateResult → RustResult::from_vec_i64 → Go []int64 →
querypb.ValidateAnalyzerResponse.ResourceIds →
CollectionSchema.FileResourceIds).

- Logic removed/simplified: ad‑hoc, scattered resource-path lookups and
per-filter file helpers (e.g., read_synonyms_file and other inline
file-reading logic) were consolidated into ResourceInfo +
FileResourcePathHelper and a centralized get_resource_path(helper, ...)
API; filter/tokenizer builder APIs now accept &mut
FileResourcePathHelper so all file path resolution and ID collection use
the same path and bookkeeping logic (redundant duplicated lookups
removed).

- Why no data loss or behavior regression: changes are additive and
default-preserving — existing call sites pass extra_info = "" so
analyzer creation/validation behavior and error paths remain unchanged;
new Collection.FileResourceIds is populated from resp.ResourceIds in
validateSchema and round‑tripped through marshal/unmarshal
(model.Collection ↔ schemapb.CollectionSchema) so schema persistence
uses the new list without overwriting other schema fields; proto change
adds a repeated field (resource_ids) which is wire‑compatible (older
clients ignore extra field). Concrete code paths: analyzer creation
still uses create_analyzer (now with extra_info ""), tokenizer
validation still returns errors as before but now also returns IDs via
CValidateResult/RustResult, and rootcoord.validateSchema assigns
resp.ResourceIds → schema.FileResourceIds.

- New capability added: end‑to‑end discovery, return, and persistence of
file resource IDs used by analyzers — validate flows now return resource
IDs and the system stores them in collection schema (affects tantivy
analyzer binding, canalyzer C bindings, internal/util analyzer APIs,
querynode ValidateAnalyzer response, and rootcoord/create_collection
flow).
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-12-26 22:49:19 +08:00
yihao.dai
512884524b
enhance: Maintain compatibility with the legacy FlushAll (#46564)
issue: https://github.com/milvus-io/milvus/issues/45919

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: FlushAll verification must accept both per-channel
FlushAllTss (new schema) and the legacy single FlushAllTs;
GetFlushAllState chooses the verification path based on which field is
present and treats a channel as flushed only if its channel checkpoint
timestamp >= the applicable threshold (per-channel timestamp or legacy
FlushAllTs).
- Logic removed/simplified: The previous mixed/ambiguous checks were
split into two focused
routines—verifyFlushAllStateByChannelFlushAllTs(logger, channel,
flushAllTss) and verifyFlushAllStateByLegacyFlushAllTs(logger, channel,
flushAllTs)—and GetFlushAllState now selects one path. This centralizes
compatibility logic, eliminates interleaved/duplicated checks, and
retains the outer-loop short-circuiting on the first unflushed channel.
- Why this does NOT cause data loss or regressions: Changes only affect
read-only verification paths (GetFlushAllState/GetFlushState) that
compare in-memory channel checkpoints (meta.GetChannelCheckpoint) to
provided thresholds; no writes to checkpoints or persisted state occur
and FlushAll enqueue/wait behavior is unchanged. Unit tests were added
to cover legacy FlushAllTs behavior and the new FlushAllMsgs→FlushAllTs
extraction, exercising both code paths.
- Enhancement scope and location: Adds backward-compatible support and
concrete FlushAllTs extraction from streaming FlushAllMsgs in Proxy
(internal/proxy/task_flush_all_streaming.go) and compatibility verifiers
in DataCoord (internal/datacoord/services.go), plus corresponding tests
(internal/datacoord/services_test.go, internal/proxy/*_test.go).
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-12-26 18:59:20 +08:00
cai.zhang
8d12bfb436
fix: Restore the compaction task correctly to ensure it can be properly cleaned up (#46577)
issue: #46576 

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: During meta load, only tasks that are truly
terminal-cleaned (states cleaned or unknown) should be dropped; all
other non-terminal tasks (including timeout and completed) must be
restored so the inspector can reattach them to executing/cleaning queues
and finish their cleanup lifecycle.
- Removed/simplified logic: loadMeta no longer uses the broad
isCompactionTaskFinished predicate (which treated timeout, completed,
cleaned, unknown as terminal). It now uses the new
isCompactionTaskCleaned predicate that only treats cleaned/unknown as
terminal. This removes the redundant exclusion of timeout/completed
tasks and simplifies the guard to drop only cleaned/unknown tasks.
- Bug fix (root cause & exact change): Fixes issue #46576 — the previous
isCompactionTaskFinished caused timeout/completed tasks to be skipped
during meta load and thus not passed into restoreTask(). The PR adds
isCompactionTaskCleaned and replaces the finished check so timeout and
completed tasks are included in restoreTask() and re-attached to the
inspector’s existing executing/cleaning queues.
- No data loss or regression: Tasks in cleaned/unknown remain dropped
(isCompactionTaskCleaned still returns true for cleaned/unknown).
Non-terminal timeout/completed tasks now follow the same restoreTask()
control path used previously for restored tasks — they are enqueued into
the inspector’s queue/executing/cleaning flows rather than being
discarded. No exported signatures changed and all restored tasks flow
into existing handlers, avoiding behavior regression or data loss.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-12-26 18:57:19 +08:00
yihao.dai
e0fd091d41
fix: Fix replicate lag when server is idle (#46574)
issue: https://github.com/milvus-io/milvus/issues/46116

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: the metric CDCLastReplicatedTimeTick must reflect the
most recent time-tick when replication has effectively processed all
pending messages (including idle periods), so reported replicate lag =
confirmed WAL tick − last replicated tick can reach zero when the server
is idle.

- Exact fix (bug): addresses issue #46116 by ensuring the
last-replicated metric is updated when the server is idle. Concretely, a
new ReplicateMetrics.UpdateLastReplicatedTimeTick(ts uint64) was added
and called from OnConfirmed (OnConfirmed now delegates to
UpdateLastReplicatedTimeTick(msg.TimeTick())), and from Replicate’s
self-controlled-message path when the pending queue is empty — so the
code records the time tick before returning ErrReplicateIgnored.

- Logic simplified / removed: direct, ad-hoc metric writes in
OnConfirmed were replaced by a single UpdateLastReplicatedTimeTick
helper on the metrics implementation. The scattered manual set of
CDCLastReplicatedTimeTick is consolidated into one method, removing
redundant direct metric manipulations and centralizing timestamp
conversion (tsoutil.PhysicalTimeSeconds).

- No data loss / no behavior regression: this change only updates
monitoring metrics and does not alter replication control flow or
message processing. Replicate still returns ErrReplicateIgnored for
self-controlled messages and does not change message persistence or
acknowledgement paths; OnConfirmed continues to be invoked on confirmed
messages but now delegates metric recording to the new method. Therefore
no replication state, message ordering, or persistence semantics are
modified.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-12-26 18:13:19 +08:00
nico
db3f065a61
test: test case json_contains_any is not stable (#46506)
### **User description**
issue: #46367


___

### **PR Type**
Bug fix, Tests


___

### **Description**
- Fix unstable test case by adjusting float precision

- Change listMix float value from 1.1 to 1.111

- Improves test stability for json_contains_any query


___

### Diagram Walkthrough


```mermaid
flowchart LR
  A["Test Data Generation"] -- "Adjust float precision" --> B["listMix field value"]
  B -- "1.1 to 1.111" --> C["Improved test stability"]
```



<details><summary><h3>File Walkthrough</h3></summary>

<table><thead><tr><th></th><th align="left">Relevant
files</th></tr></thead><tbody><tr><td><strong>Bug
fix</strong></td><td><table>
<tr>
  <td>
    <details>
<summary><strong>test_milvus_client_query.py</strong><dd><code>Adjust
float precision in test data</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary>
<hr>

tests/python_client/milvus_client/test_milvus_client_query.py

<ul><li>Modified test data generation in
<br><code>test_milvus_client_query_expr_all_datatype_json_contains_all</code>
method<br> <li> Changed <code>listMix</code> field float value from
<code>1.1</code> to <code>1.111</code> for improved <br>precision<br>
<li> Addresses test instability issue by adjusting floating-point test
data</ul>


</details>


  </td>
<td><a
href="https://github.com/milvus-io/milvus/pull/46506/files#diff-d6fe357e4678415bc62596b802571043fa571c7d1b8e841aa43124437dd2f739">+1/-1</a>&nbsp;
&nbsp; &nbsp; </td>

</tr>
</table></td></tr></tbody></table>

</details>

___



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: the test assumes stable float equality/containment
behavior for JSON-typed fields when generating test rows; small changes
in stored float precision can make json_contains_any assertions flaky.
- Exact fix for the bug (refs #46367): in
tests/python_client/milvus_client/test_milvus_client_query.py, the test
data value for the second element of the "listMix" JSON field was
adjusted from i * 1.1 to i * 1.111 in
test_milvus_client_query_expr_all_datatype_json_contains_all to increase
numeric precision and remove instability in json_contains_any
assertions.
- Logic removed/simplified: no production logic was changed or removed —
only a one-line test-data change. There is no control-flow or
algorithmic simplification because the test’s intent and checks remain
identical; the change removes the redundant dependence on a borderline
float value that caused flakiness.
- No data loss or behavior regression: this change only updates
test-generated input
(test_milvus_client_query_expr_all_datatype_json_contains_all) and does
not touch any library or runtime code paths. Production code paths
(query parsing/execution, JSON handling) are unchanged, so no persisted
data, API behavior, or client logic is affected.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: nico <cheng.yuan@zilliz.com>
2025-12-26 15:13:19 +08:00
wei liu
b54b9e0dfa
enhance: add channel balanced primary key generator for integration tests (#46431)
issue: #46352

- Add GenerateChannelBalancedPrimaryKeys function supporting Int64 and
VarChar PK types with even distribution across channels
- Add GenerateBalancedInt64PKs and GenerateBalancedVarCharPKs helper
functions using murmur3 and crc32 hash algorithms respectively
- Add PrimaryKeyConfig struct to configure PK generation in
InsertAndFlush
- Update InsertAndFlush to use channel balanced PKs instead of random
hash keys for better test coverage
- Add comprehensive unit tests including end-to-end validation with
typeutil.HashPK2Channels to verify exact channel distribution

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Tests**
* Enhanced integration tests with configurable primary-key sequencing
across insert batches.
* Added utilities to generate channel-balanced primary keys for integer
and string types.
* Expanded test coverage validating balanced distribution, uniqueness,
continuation, and large-scale behavior across multiple channels.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: wayblink <anyang.wang@zilliz.com>
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-12-26 14:49:19 +08:00
Zhen Ye
2c2cbe89c2
fix: flush log when os exit (#46608)
issue: #45640

Signed-off-by: chyezh <chyezh@outlook.com>
2025-12-26 14:25:18 +08:00
congqixia
ef6d9c25c2
fix: check final result only in LeaderCacheObserver flaky test (#46601)
Related to #46600

The test previously checked if all 3 collection IDs were batched
together in a single InvalidateShardLeaderCache call. This caused
flakiness because the observer may split events across multiple calls.

Fix by accumulating all collection IDs across multiple calls and
verifying that eventually all expected IDs (1, 2, 3) are processed.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: the test asserts that all registered collection IDs
{1,2,3} are eventually processed by InvalidateShardLeaderCache across
any number of calls — i.e., the observer must invalidate every
registered collection ID, not necessarily in a single batched RPC (fixes
flaky assumption from issue #46600).
- Logic removed/simplified: the strict expectation that all three IDs
arrive in one InvalidateShardLeaderCache call was replaced by
accumulating IDs into a ConcurrentSet (collectionIDs.Upsert in the mock)
and asserting eventual containment of 1,2,3. This removes the brittle
per-call batching assertion and uses a set-based accumulation (lines
where the mock calls Upsert and final Eventually checks
collectionIDs.Contain(...)).
- Why this is safe (no data loss or behavior regression): only test
assertions changed — production code (LeaderCacheObserver calling
InvalidateShardLeaderCache) is unchanged. The mock intercepts
InvalidateShardLeaderCache and accumulates req.GetCollectionIDs(); the
test still verifies single-ID handling via the existing len==1 &&
lo.Contains(... ) check (first mock block) and verifies that all IDs
were invalidated over time in the batch scenario (second mock block). No
production code paths were modified, so invalidation behavior and RPC
usage remain identical.
- Bug-fix note: this is a targeted test-only fix for issue #46600 — it
tolerates legitimate splitting of events across multiple
InvalidateShardLeaderCache invocations by aggregating IDs across calls
in the test mock, eliminating flakiness without altering runtime
behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-12-26 10:17:19 +08:00
sijie-ni-0214
fc45905ee0
enhance: Optimize QuotaCenter CPU usage (#46388)
issue: https://github.com/milvus-io/milvus/issues/46387

---------

Signed-off-by: sijie-ni-0214 <sijie.ni@zilliz.com>
2025-12-26 10:09:19 +08:00
congqixia
6f94d8c41a
fix: Handle legacy binlog format (v1) in segment load diff computation (#46598)
When computing load diff, binlogs in v1/legacy format have empty
child_fields. In this case, the field_id itself should be used as the
child_id (group_id == field_id for legacy format).

Without this fix, legacy format binlogs are not recognized during diff
computation, causing segments to fail loading and TestProxy to timeout.

Changes:
- Add fallback to use fieldid as child_id when child_fields is empty
- Add LoadDiff::ToString() for debugging
- Add logging for diff in Load/Reopen operations
- Add comprehensive unit tests for legacy format handling

Related to #46594

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: load-diff computation must enumerate every binlog
child group for a field so current vs new segment state comparisons
include all column-group/binlog groups; for legacy (v1) binlogs that
have empty child_fields, the code must treat group_id == field_id to
preserve that mapping.
- Bug fix (resolves #46594): SegmentLoadInfo now normalizes
field_binlog.child_fields() into a vector and falls back to using
field_id as the single child group when child_fields is empty; the same
normalization is applied for both current and new-info paths, ensuring
legacy v1 binlogs are discovered and included in Load/ComputeDiff
results so segments load correctly.
- Logic simplified: removed the implicit assumption that child_fields is
always present by centralizing a single normalization/fallback step used
symmetrically for both diff paths, avoiding ad-hoc special-casing and
unifying iteration over child groups.
- No data loss / no behavior regression: the fallback only activates
when child_fields is empty — non-legacy binlogs continue to use their
child_fields unchanged. Add/drop semantics are preserved because the
same normalization is applied to both sides of the diff. Unit tests
(v1-only, v4-only, mixed cases) were added to validate correctness;
LoadDiff::ToString() and extra logging are diagnostic only.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Cai Zhang <cai.zhang@zilliz.com>

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-12-25 23:33:19 +08:00
zhenshan.cao
85486df8c9
fix: failed to check invalid timestamptz default value (#46546)
Also support space separator and offset in TIMESTAMPTZ
issue: https://github.com/milvus-io/milvus/issues/46376
https://github.com/milvus-io/milvus/issues/46365

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2025-12-25 15:59:05 +08:00
congqixia
6e07c3fee8
fix: remove EnableStorageV2 override in TestProxy (#46594) (#46596)
Related to #46594

Remove the temporary config override that forced EnableStorageV2 to
false in TestProxy. This override caused test failures with the new load
logic, as segments could not be loaded with v1 format.

This PR is a quick fix to make ut back to normal

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-12-25 15:35:18 +08:00
cai.zhang
b2fa3dd0ae
fix: Disable VCS to allow pkg tests to run (#46501)
### **Description**
- Add `-buildvcs=false` flag to Go test commands in coverage script

- Increase default session TTL from 10s to 15s

- Update SessionTTL parameter default value from 30 to 15


Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Co-authored-by: bigsheeper <yihao.dai@zilliz.com>
Co-authored-by: chyezh <chyezh@outlook.com>
Co-authored-by: czs007 <zhenshan.cao@zilliz.com>
2025-12-25 14:31:19 +08:00
Buqian Zheng
6ac66e38d1
enhance: STL_SORT to support LIKE operator (#46534)
issue: https://github.com/milvus-io/milvus/issues/44399

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **New Features**
* Enhanced pattern matching for string indexes with support for prefix,
postfix, inner, and regex-based matching operations.
* Optimized pattern matching performance through prefix-based filtering
and range-based lookups.

* **Tests**
* Added comprehensive test coverage for pattern matching functionality
across multiple index implementations.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-12-24 19:45:20 +08:00