If QueryNode loads multiple sealed segments with BM25 enabled, BM25
stats registration into IDFOracle could stop after the first segment due
to an early-terminating ConcurrentMap.Range callback. This change:
Register BM25 stats for all sealed segments by continuing iteration
(return true) during sealed-segment load
Prevent repeated warnings like idf oracle lack some sealed segment
Ensure IDF/BM25 statistics are not silently incomplete (improving BM25
ranking correctness)
issue: #46725
Core invariant: for any BM25-enabled collection, every loaded sealed
segment with available BM25 stats must be registered into IDFOracle, so
SyncDistribution can always find the sealed segments present in the
distribution snapshot.
Bug fix: ConcurrentMap.Range respects the callback’s boolean return;
returning false stops iteration. The sealed BM25 stats registration
callback previously returned false, which could register only the first
sealed segment and leave the rest missing—causing IDFOracle to warn idf
oracle lack some sealed segment and potentially compute IDF from
incomplete stats. Fixed by returning true to continue iterating and
registering all segments.
No behavior regression: the change only affects the sealed-segment BM25
stats registration loop; it does not alter segment loading, distribution
snapshot generation, or non-BM25 codepaths. For collections without BM25
(or when BM25 stats are nil), behavior remains unchanged.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: For any BM25-enabled collection, every loaded sealed
segment with available BM25 stats must be registered into IDFOracle so
SyncDistribution can discover them in distribution snapshots.
- Bug fix (links to #46725): The BM25 stats registration callback used
with bm25Stats.Range() in loadStreamDelete() returned false, which
prematurely stopped iteration after the first sealed segment and left
subsequent sealed segments unregistered. The fix changes the callback to
return true so the Range loop completes and registers BM25 stats for all
sealed segments.
- Logic simplified/removed: The early-return (false) in the
ConcurrentMap.Range callback that aborted further registrations has been
removed (replaced by returning true). That early abort was redundant and
incorrect because registration must proceed for every entry; allowing
Range to continue restores the intended one-to-many registration
behavior.
- No data loss or regression: The change is narrowly scoped to the
sealed-segment BM25 stats registration loop in
internal/querynodev2/delegator/delegator_data.go and does not modify
segment loading, distribution snapshot generation, growing-segment
handling, or non-BM25 codepaths. Returning true only permits full
iteration and registration; it does not delete or alter existing data
structures or load state, so IDF/BM25 statistics become complete without
changing other behaviors.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: thangTang <tangtianhang099@gmail.com>
issue: https://github.com/milvus-io/milvus/issues/46678
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: Text index log keys are canonicalized at KV
(serialization) boundaries — etcd stores compressed filename-only
entries, while in-memory and runtime consumers must receive full
object-storage keys so Datanode/QueryNode can load text indexes
directly.
- Logic removed/simplified: ad-hoc reconstruction of full text-log paths
scattered across components (garbage_collector.getTextLogs,
querynodev2.LoadTextIndex, compactor/index task code) was removed;
consumers now use TextIndexStats.Files as-provided (full keys). Path
compression/decompression was centralized into KV marshal/unmarshal
utilities (metautil.ExtractTextLogFilenames in marshalSegmentInfo and
metautil.BuildTextLogPaths in kv_catalog.listSegments), eliminating
redundant, inconsistent prefix-rebuilding logic that broke during
rolling upgrades.
- Why this does NOT cause data loss or regressions: before persist,
marshalSegmentInfo compresses TextStatsLogs.Files to filenames
(metautil.ExtractTextLogFilenames) so stored KV remains compact; on
load, kv_catalog.listSegments calls metautil.BuildTextLogPaths to
restore full paths and includes compatibility logic that leaves
already-full keys unchanged. Thus every persisted filename is
recoverable to a valid full key and consumers receive correct full paths
(see marshalSegmentInfo → KV write path and kv_catalog.listSegments →
reload path), preventing dropped or malformed keys.
- Bug fix (refs #46678): resolves text-log loading failures during
cluster upgrades by centralizing path handling at KV encode/decode and
removing per-component path reconstruction — the immediate fix is
changing consumers to read TextIndexStats.Files directly and relying on
marshal/unmarshal to perform compression/expansion, preventing
mixed-format failures during rolling upgrades.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: sijie-ni-0214 <sijie.ni@zilliz.com>
issue: #45640
- After async logging, the C log and go log has no order promise,
meanwhile the C log format is not consistent with Go Log; so we close
the output of glog, just forward the log result operation into Go side
which will be handled by the async zap logger.
- Use CGO to filter all cgo logging and promise the order between c log
and go log.
- Also fix the metric name, add new metric to count the logging.
- TODO: after woodpecker use the logger of milvus, we can add bigger
buffer for logging.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: all C (glog) and Go logs must be routed through the
same zap async pipeline so ordering and formatting are preserved; this
PR ensures every glog emission is captured and forwarded to zap before
any async buffering diverges the outputs.
- Logic removed/simplified: direct glog outputs and hard
stdout/stderr/log_dir settings are disabled (configs/glog.conf and flags
in internal/core/src/config/ConfigKnowhere.cpp) because they are
redundant once a single zap sink handles all logs; logging metrics were
simplified from per-length/volatile gauges to totalized counters
(pkg/metrics/logging_metrics.go & pkg/log/*), removing duplicate
length-tracking and making accounting consistent.
- No data loss or behavior regression (concrete code paths): Google
logging now adds a GoZapSink (internal/core/src/common/logging_c.h,
logging_c.cpp) that calls the exported CGO bridge goZapLogExt
(internal/util/cgo/logging/logging.go). Go side uses
C.GoStringN/C.GoString to capture full message and file, maps glog
severities to zapcore levels, preserves caller info, and writes via the
existing zap async core (same write path used by Go logs). The C++
send() trims glog's trailing newline and forwards exact buffers/lengths,
so message content, file, line, and severity are preserved and
serialized through the same async writer—no log entries are dropped or
reordered relative to Go logs.
- Capability added (where it takes effect): a CGO bridge that forwards
glog into zap—new Go-exported function goZapLogExt
(internal/util/cgo/logging/logging.go), a GoZapSink in C++ that forwards
glog sends (internal/core/src/common/logging_c.h/.cpp), and blank
imports of the cgo initializer across multiple packages (various
internal/* files) to ensure the bridge is registered early so all C logs
are captured.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #45841
- CPP log make the multi log line in one debug, remove the "\n\t".
- remove some log that make no sense.
- slow down some log like ChannelDistManager.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: logging is purely observational — this PR only
reduces, consolidates, or reformats diagnostic output (removing
per-item/noise logs, consolidating batched logs, and converting
multi-line log strings) while preserving all control flow, return
values, and state mutations across affected code paths.
- Removed / simplified logic: deleted low-value per-operation debug/info
logs (e.g., ListIndexes, GetRecoveryInfo, GcConfirm,
push-to-reorder-buffer, several streaming/wal/debug traces), replaced
per-item inline logs with single batched deferred logs in
querynodev2/delegator (logExcludeInfo) and CleanInvalid, changed C++
PlanNode ToString() multi-line output to compact single-line bracketed
format (removed "\n\t"), and added thresholded interceptor logging
(InterceptorMetrics.ShouldBeLogged) and message-type-driven log levels
to avoid verbose entries.
- Why this does NOT cause data loss or behavioral regression: no
function signatures, branching, state updates, persistence calls, or
return values were changed — examples: ListIndexes still returns the
same Status/IndexInfos; GcConfirm still constructs and returns
resp.GetGcFinished(); Insert and CleanInvalid still perform the same
insert/removal operations (only their per-item logging was aggregated);
PlanNode ToString changes only affect emitted debug strings. All error
handling and control flow paths remain intact.
- Enhancement intent: reduce log volume and improve signal-to-noise for
debugging by removing redundant, noisy logs and emitting concise,
rate-/threshold-limited summaries while preserving necessary diagnostics
and original program behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #46669
When partial result is enabled (PartialResultRequiredDataRatio < 1.0),
the Serviceable() method would return true even if syncedByCoord is
false (by bypassing viewReady check). However, PinReadableSegments uses
GetLoadedRatio() == 1.0 to decide whether to filter segments by target
version.
This causes a problem: when loadedRatio == 1.0 but syncedByCoord ==
false, segments are filtered by an incorrect target version, resulting
in an empty segment list during search.
This change:
- Replace GetLoadedRatio() == 1.0 with Serviceable() check to ensure
target version filtering only happens after coord sync completes
- Remove partial result bypass in Serviceable() to keep the check
consistent
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Bug Fix Summary
**Core Invariant**: `Serviceable()` must enforce a strict requirement
that both data loading AND coordinator synchronization are complete
before allowing full search operations. This prevents using stale or
uninitialized target versions.
**Logic Removed/Simplified**:
- Removed the partial-result bypass from `Serviceable()` that previously
allowed it to return `true` even when `syncedByCoord == false`
- Replaced `GetLoadedRatio() == 1.0` checks in `PinReadableSegments`
with `Serviceable()` calls to ensure target-version filtering only
occurs after coord sync completes
- Simplified the serviceability condition from parameterized
partial-result logic to a direct conjunction: `loadedRatio >= 1.0 AND
syncedByCoord == true`
**No Data Loss or Regression**: The change is safe because:
- When `Serviceable()` returns `true` (both loadedRatio ≥ 1.0 AND
syncedByCoord ≥ true), segments are filtered by the current valid target
version—this is the full-result path
- When `Serviceable()` returns `false` but `loadedRatio >=
requiredLoadRatio` (partial result case), segments are filtered against
the query view's segment lists rather than target version, ensuring
non-empty results as validated by
`TestPinReadableSegments_PartialResultNotEmpty`
- The test explicitly demonstrates that even with `loadedRatio == 1.0`
and `syncedByCoord == false`, calling `PinReadableSegments(0.8,
partition)` returns segments (partial result) instead of an empty list,
which was the bug root cause
**Root Cause Fix**: Previously, segments could be filtered with
`unreadableTargetVersion` when `loadedRatio == 1.0` but the querycoord
hadn't yet synced the target, causing empty segment lists. Now the sync
state is checked before deciding the filtering strategy, preventing this
race condition.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/44123
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: legacy in-cluster CDC/replication plumbing
(ReplicateMsg types, ReplicateID-based guards and flags) is obsolete —
the system relies on standard msgstream positions, subPos/end-ts
semantics and timetick ordering as the single source of truth for
message ordering and skipping, so replication-specific
channels/types/guards can be removed safely.
- Removed/simplified logic (what and why): removed replication feature
flags and params (ReplicateMsgChannel, TTMsgEnabled,
CollectionReplicateEnable), ReplicateMsg type and its tests, ReplicateID
constants/helpers and MergeProperties hooks, ReplicateConfig and its
propagation (streamPipeline, StreamConfig, dispatcher, target),
replicate-aware dispatcher/pipeline branches, and replicate-mode
pre-checks/timestamp-allocation in proxy tasks — these implemented a
redundant alternate “replicate-mode” pathway that duplicated
position/end-ts and timetick logic.
- Why this does NOT cause data loss or regression (concrete code paths):
no persistence or core write paths were removed — proxy PreExecute flows
(internal/proxy/task_*.go) still perform the same schema/ID/size
validations and then follow the normal non-replicate execution path;
dispatcher and pipeline continue to use position/subPos and
pullback/end-ts in Seek/grouping (pkg/mq/msgdispatcher/dispatcher.go,
internal/util/pipeline/stream_pipeline.go), so skipping and ordering
behavior remains unchanged; timetick emission in rootcoord
(sendMinDdlTsAsTt) is now ungated (no silent suppression), preserving or
increasing timetick delivery rather than removing it.
- PR type and net effect: Enhancement/Refactor — removes deprecated
replication API surface (types, helpers, config, tests) and replication
branches, simplifies public APIs and constructor signatures, and reduces
surface area for future maintenance while keeping DML/DDL persistence,
ordering, and seek semantics intact.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Related to #46660
Replace segment.Delete() with segment.LoadDeltaData() when forwarding L0
deletions to growing segments. LoadDeltaData is the more appropriate API
for bulk loading delta data compared to individual Delete calls.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
• Core invariant: forwarding L0 deletions to growing segments must use
the bulk-delta API (storage.DeltaData + segment.LoadDeltaData) because
LoadDeltaData preserves paired primary keys and timestamps as a single
atomic delta payload; segment.Delete was intended for per-delete RPCs
and not for loading L0 delta payloads.
• Logic removed/simplified: addL0GrowingBF() no longer calls
segment.Delete for buffered L0 keys. Instead the buffered callback
builds a storage.DeltaData via storage.NewDeltaDataWithData(pks, tss)
and calls segment.LoadDeltaData(ctx, dd). This eliminates the previous
per-batch Delete call path and centralizes forwarding as a single
delta-load operation.
• Why this does not cause data loss or regression: the new path supplies
identical PK+timestamp pairs to the segment via DeltaData; LoadDeltaData
applies the same delete semantics but accepts batched delta payloads.
The change is limited to the L0→growing Bloom-Filter forward path
(addL0GrowingBF/addL0ForGrowingLoad), leaving sealed-segment deletes,
streaming direct forwarding, and remote-load policies unchanged. Also,
the prior code would fail on L0Segment.Delete (L0 segments prohibit
Delete), so switching to LoadDeltaData prevents lost-forwarding caused
by unsupported Delete calls.
• Category: Enhancement / Refactor — replaces inappropriate per-delete
calls with the correct bulk delta-load API, simplifying error handling
around NewDeltaDataWithData and ensuring API contract correctness for
L0→growing forwarding.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #46550
- Add CatchUpStreamingDataTsLag parameter to control tolerable lag
threshold for delegator to be considered caught up
- Add catchingUpStreamingData field in delegator to track whether
delegator has caught up with streaming data
- Add catching_up_streaming_data field in LeaderViewStatus proto
- Check catching up status in CheckDelegatorDataReady, return not
ready when delegator is still catching up streaming data
- Add unit tests for the new functionality
When tsafe lag exceeds the threshold, the distribution will not be
considered serviceable, preventing queries from timing out in waitTSafe.
This is useful when streaming message queue consumption is slow.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: a delegator must not be considered serviceable while
its tsafe lags behind the latest committed timestamp beyond a
configurable tolerance; a delegator is "caught-up" only when
(latestTsafe - delegator.GetTSafe()) < CatchUpStreamingDataTsLag
(configured by queryNode.delegator.catchUpStreamingDataTsLag, default
1s).
- New capability and where it takes effect: adds streaming-catchup
tracking to QueryNode/QueryCoord — an atomic catchingUpStreamingData
flag on shardDelegator (internal/querynodev2/delegator/delegator.go), a
new param CatchUpStreamingDataTsLag
(pkg/util/paramtable/component_param.go), and a
LeaderViewStatus.catching_up_streaming_data field in the proto
(pkg/proto/query_coord.proto). The flag is exposed in
GetDataDistribution (internal/querynodev2/services.go) and used by
QueryCoord readiness checks
(internal/querycoordv2/utils/util.go::CheckDelegatorDataReady) to reject
leaders that are still catching up.
- What logic is simplified/added (not removed): instead of relying
solely on segment distribution/worker heartbeats, the PR adds an
explicit readiness gate that returns "not available" when the delegator
reports catching-up-streaming-data. This is strictly additive — no
existing checks are removed; the new precondition runs before segment
availability validation to prevent premature routing to slow-consuming
delegators.
- Why this does NOT cause data loss or regress behavior: the change only
controls serviceability visibility and routing — it never drops or
mutates data. Concretely: shardDelegator starts with
catchingUpStreamingData=true and flips to false in UpdateTSafe once the
sampled lag falls below the configured threshold
(internal/querynodev2/delegator/delegator.go::UpdateTSafe). QueryCoord
will short-circuit in CheckDelegatorDataReady when
leader.Status.GetCatchingUpStreamingData() is true
(internal/querycoordv2/utils/util.go), returning a channel-not-available
error before any segment checks; when the flag clears, existing
segment-distribution checks (same code paths) resume. Tests added cover
both catching-up and caught-up paths
(internal/querynodev2/delegator/delegator_test.go,
internal/querycoordv2/utils/util_test.go,
internal/querynodev2/services_test.go), demonstrating convergence
without changed data flows or deletion of data.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #46566
Remove the complex comparison logic between seekPosition and
deleteCheckpoint. Use seekPosition directly since:
- L0 segments are loaded before consuming message stream, which contain
delete records from [deleteCheckpoint, L0.endPosition]
- DataCoord ensures seekPosition is based on channel checkpoint, updated
after data (including deletes) is flushed
- L0 segments should cover up to seekPosition, avoiding data loss
- This eliminates redundant message consumption when seekPosition >
deleteCheckpoint
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: L0 segments are loaded before consuming the DM channel
stream and contain delete records for range [deleteCheckpoint,
L0.endPosition]; DataCoord guarantees channel.GetSeekPosition() is
derived from the channel checkpoint after data (including deletes) is
flushed, so L0 segments collectively cover up to that seekPosition.
- Change made: removed the prior branching that built a synthetic seek
position from deleteCheckpoint vs. channel checkpoint and instead always
calls channel.GetSeekPosition() (used directly in ConsumeMsgStream).
Added an informational log comparing seekPosition and deleteCheckpoint.
- Why the removed logic was redundant: deleteCheckpoint represented the
smallest start position of L0 segments and was used to avoid
re-consuming delete messages already present in loaded L0 segments.
Because L0 segments already include deletes up to the channel checkpoint
and DataCoord updates the channel checkpoint after flush, using
deleteCheckpoint to alter the seek introduces duplicate consumption
without benefit.
- Why this is safe (no data loss/regression): L0 segments are guaranteed
to be loaded before consumption, so deletes present in L0 cover the
range up to channel.GetSeekPosition(); delete records earlier than
deleteCheckpoint have been compacted to L1 and can be evicted from the
delete buffer. The code path still calls ConsumeMsgStream with the
channel seek position, preserving original consumption/error handling,
so no messages are skipped and no additional delete application occurs
beyond what L0/L1 already cover.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/46410
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: etcd metadata and in-memory Segment/TextIndex records
must store only compact filenames for text-index files; full object keys
are deterministically reconstructed at use-sites from a stable root +
common.TextIndexPath + IDs via metautil.BuildTextLogPaths.
- Bug & fix (issue #46410): the etcd RPC size overflow was caused by
persisting full upload keys in segment/TextIndex metadata. Fix: at
upload/creation sites (internal/datanode/compactor/sort_compaction.go
and internal/datanode/index/task_stats.go) store only filenames using
metautil.ExtractTextLogFilenames; at consumption/use sites
(internal/datacoord/garbage_collector.go,
internal/querynodev2/segments/segment.go, and other GC/loader code)
reconstruct full paths with metautil.BuildTextLogPaths before accessing
object storage.
- Simplified/removed logic: removed the redundant practice of carrying
full object keys through metadata and in-memory structures; callers now
persist compact filenames and perform on-demand path reconstruction.
This eliminates large payloads in etcd and reduces memory pressure while
preserving the same runtime control flow and error handling.
- No data loss / no regression: filename extraction is a deterministic
suffix operation (metautil.ExtractTextLogFilenames) and reloadFromKV
performs backward compatibility (internal/datacoord/meta.go converts
existing full-path entries to filenames before caching). All read paths
reconstruct full paths at runtime (garbage_collector.getTextLogs,
LocalSegment.LoadTextIndex, GC/loader), so no files are modified/deleted
and access semantics remain identical.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: sijie-ni-0214 <sijie.ni@zilliz.com>
Support crate analyzer with file resource info, and return used file
resource ids when validate analyzer.
Save the related resource ids in collection schema.
relate: https://github.com/milvus-io/milvus/issues/43687
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: analyzer file-resource resolution is deterministic and
traceable by threading a FileResourcePathHelper (collecting used
resource IDs in a HashSet) through all tokenizer/analyzer construction
and validation paths; validate_analyzer(params, extra_info) returns the
collected Vec<i64) which is propagated through C/Rust/Go layers to
callers (CValidateResult → RustResult::from_vec_i64 → Go []int64 →
querypb.ValidateAnalyzerResponse.ResourceIds →
CollectionSchema.FileResourceIds).
- Logic removed/simplified: ad‑hoc, scattered resource-path lookups and
per-filter file helpers (e.g., read_synonyms_file and other inline
file-reading logic) were consolidated into ResourceInfo +
FileResourcePathHelper and a centralized get_resource_path(helper, ...)
API; filter/tokenizer builder APIs now accept &mut
FileResourcePathHelper so all file path resolution and ID collection use
the same path and bookkeeping logic (redundant duplicated lookups
removed).
- Why no data loss or behavior regression: changes are additive and
default-preserving — existing call sites pass extra_info = "" so
analyzer creation/validation behavior and error paths remain unchanged;
new Collection.FileResourceIds is populated from resp.ResourceIds in
validateSchema and round‑tripped through marshal/unmarshal
(model.Collection ↔ schemapb.CollectionSchema) so schema persistence
uses the new list without overwriting other schema fields; proto change
adds a repeated field (resource_ids) which is wire‑compatible (older
clients ignore extra field). Concrete code paths: analyzer creation
still uses create_analyzer (now with extra_info ""), tokenizer
validation still returns errors as before but now also returns IDs via
CValidateResult/RustResult, and rootcoord.validateSchema assigns
resp.ResourceIds → schema.FileResourceIds.
- New capability added: end‑to‑end discovery, return, and persistence of
file resource IDs used by analyzers — validate flows now return resource
IDs and the system stores them in collection schema (affects tantivy
analyzer binding, canalyzer C bindings, internal/util analyzer APIs,
querynode ValidateAnalyzer response, and rootcoord/create_collection
flow).
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
Related to #46133
Move jemalloc_stats.go and its test file from pkg/util/hardware to
internal/util/segcore. This is a more appropriate location because:
- jemalloc_stats depends on milvus_core C++ library via cgo
- The pkg directory should remain independent of internal C++
dependencies
- segcore is the natural home for core memory allocator utilities
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Refactor**
* Improved internal code organization by reorganizing memory statistics
collection infrastructure for better maintainability and modularity. No
impact on end-user functionality or behavior.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
related: #45993
This commit extends nullable vector support to the proxy layer,
querynode,
and adds comprehensive validation, search reduce, and field data
handling
for nullable vectors with sparse storage.
Proxy layer changes:
- Update validate_util.go checkAligned() with getExpectedVectorRows()
helper
to validate nullable vector field alignment using valid data count
- Update checkFloatVectorFieldData/checkSparseFloatVectorFieldData for
nullable vector validation with proper row count expectations
- Add FieldDataIdxComputer in typeutil/schema.go for logical-to-physical
index translation during search reduce operations
- Update search_reduce_util.go reduceSearchResultData to use
idxComputers
for correct field data indexing with nullable vectors
- Update task.go, task_query.go, task_upsert.go for nullable vector
handling
- Update msg_pack.go with nullable vector field data processing
QueryNode layer changes:
- Update segments/result.go for nullable vector result handling
- Update segments/search_reduce.go with nullable vector offset
translation
Storage and index changes:
- Update data_codec.go and utils.go for nullable vector serialization
- Update indexcgowrapper/dataset.go and index.go for nullable vector
indexing
Utility changes:
- Add FieldDataIdxComputer struct with Compute() method for efficient
logical-to-physical index mapping across multiple field data
- Update EstimateEntitySize() and AppendFieldData() with fieldIdxs
parameter
- Update funcutil.go with nullable vector support functions
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Full support for nullable vector fields (float, binary, float16,
bfloat16, int8, sparse) across ingest, storage, indexing, search and
retrieval; logical↔physical offset mapping preserves row semantics.
* Client: compaction control and compaction-state APIs.
* **Bug Fixes**
* Improved validation for adding vector fields (nullable + dimension
checks) and corrected search/query behavior for nullable vectors.
* **Chores**
* Persisted validity maps with indexes and on-disk formats.
* **Tests**
* Extensive new and updated end-to-end nullable-vector tests.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/45890
ComputePhraseMatchSlop accepts three pararms:
1. A string: query text
2. Some trings: data texts
3. Analyzer params,
Slop will be calculated for the query text with each data text in the
context of phrase match where they are tokenized with tokenizer with
analyzer params.
So two array will be returned:
1. is_match: is phrase match can sucess
2. slop: the related slop if phrase match can sucess, or -1 is cannot.
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
issue: #46358
This PR implements segment reopening functionality on query nodes,
enabling the application of data or schema changes to already-loaded
segments without requiring a full reload.
### Core (C++)
**New SegmentLoadInfo class**
(`internal/core/src/segcore/SegmentLoadInfo.h/cpp`):
- Encapsulates segment load configuration with structured access
- Implements `ComputeDiff()` to calculate differences between old and
new load states
- Tracks indexes, binlogs, and column groups that need to be loaded or
dropped
- Provides `ConvertFieldIndexInfoToLoadIndexInfo()` for index loading
**ChunkedSegmentSealedImpl modifications**:
- Added `Reopen(const SegmentLoadInfo&)` method to apply incremental
changes based on computed diff
- Refactored `LoadColumnGroups()` and `LoadColumnGroup()` to support
selective loading via field ID map
- Extracted `LoadBatchIndexes()` and `LoadBatchFieldData()` for reusable
batch loading logic
- Added `LoadManifest()` for manifest-based loading path
- Updated all methods to use `SegmentLoadInfo` wrapper instead of direct
proto access
**SegmentGrowingImpl modifications**:
- Added `Reopen()` stub method for interface compliance
**C API additions** (`segment_c.h/cpp`):
- Added `ReopenSegment()` function exposing reopen to Go layer
### Go Side
**QueryNode handlers** (`internal/querynodev2/`):
- Added `HandleReopen()` in handlers.go
- Added `ReopenSegments()` RPC in services.go
**Segment interface** (`internal/querynodev2/segments/`):
- Extended `Segment` interface with `Reopen()` method
- Implemented `Reopen()` in LocalSegment
- Added `Reopen()` to segment loader
**Segcore wrapper** (`internal/util/segcore/`):
- Added `Reopen()` method in segment.go
- Added `ReopenSegmentRequest` in requests.go
### Proto
- Added new fields to support reopen in `query_coord.proto`
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #44956
Pass ManifestPath field to SegmentLoadInfo when loading growing segments
in loadGrowingSegments function. This ensures storage v2 can properly
locate segment data via manifest path, consistent with other segment
loading paths.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/42148
For a vector field inside a STRUCT, since a STRUCT can only appear as
the element type of an ARRAY field, the vector field in STRUCT is
effectively an array of vectors, i.e. an embedding list.
Milvus already supports searching embedding lists with metrics whose
names start with the prefix MAX_SIM_.
This PR allows Milvus to search embeddings inside an embedding list
using the same metrics as normal embedding fields. Each embedding in the
list is treated as an independent vector and participates in ANN search.
Further, since STRUCT may contain scalar fields that are highly related
to the embedding field, this PR introduces an element-level filter
expression to refine search results.
The grammar of the element-level filter is:
element_filter(structFieldName, $[subFieldName] == 3)
where $[subFieldName] refers to the value of subFieldName in each
element of the STRUCT array structFieldName.
It can be combined with existing filter expressions, for example:
"varcharField == 'aaa' && element_filter(struct_field, $[struct_int] ==
3)"
A full example:
```
struct_schema = milvus_client.create_struct_field_schema()
struct_schema.add_field("struct_str", DataType.VARCHAR, max_length=65535)
struct_schema.add_field("struct_int", DataType.INT32)
struct_schema.add_field("struct_float_vec", DataType.FLOAT_VECTOR, dim=EMBEDDING_DIM)
schema.add_field(
"struct_field",
datatype=DataType.ARRAY,
element_type=DataType.STRUCT,
struct_schema=struct_schema,
max_capacity=1000,
)
...
filter = "varcharField == 'aaa' && element_filter(struct_field, $[struct_int] == 3 && $[struct_str] == 'abc')"
res = milvus_client.search(
COLLECTION_NAME,
data=query_embeddings,
limit=10,
anns_field="struct_field[struct_float_vec]",
filter=filter,
output_fields=["struct_field[struct_int]", "varcharField"],
)
```
TODO:
1. When an `element_filter` expression is used, a regular filter
expression must also be present. Remove this restriction.
2. Implement `element_filter` expressions in the `query`.
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
issue: #46217
The test was failing intermittently because it didn't wait for the
pipeline to finish processing messages before exiting. The test sent a
message to the pipeline and immediately returned, causing the deferred
Close() to execute before ProcessInsert, ProcessDelete, and UpdateTSafe
could be called.
Fix by:
- Moving message construction before mock expectations setup
- Adding a done channel to synchronize on UpdateTSafe completion
- Waiting for the signal before test exits
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
relate:https://github.com/milvus-io/milvus/issues/43687
Support use file resource with sync mode.
Auto download or remove file resource to local when user add or remove
file resource.
Sync file resource to node when find new node session.
---------
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
issue: #40513
for querynode which return resource exhausted error, add a penalty
duration on it, and suspend loading new resource until penalty duration
expired.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Previously, search with highlight only supported using BM25 search text
as the highlight target.
This PR adds support for highlighting with user-defined queries.
relate: https://github.com/milvus-io/milvus/issues/42589
---------
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
After segments gained self-management capabilities for loading, the
index information from the initial load was not being preserved in the
Go-side segment metadata. This caused QueryCoord to repeatedly dispatch
load index tasks, which would fail in segcore since the indexes were
already loaded.
**Root Cause:**
The segment's `fieldIndexes` map was not being populated with index
metadata after calling `FinishLoad`, leading to a mismatch between the
Go-side metadata and segcore's internal state.
**Solution:**
After successfully loading a sealed segment, iterate through
`loadInfo.IndexInfos` and insert each index entry into the segment's
`fieldIndexes` map. This ensures the Go-side metadata stays in sync with
segcore and prevents redundant load index operations.
Fixes#45802
Related to #45060
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #45718
Logging complete segment ID arrays caused excessive log volume (3-6 TB
for 200k segments). Remove arrays from logger fields and keep only
segment counts for observability.
Changes:
- Remove requestSegments/preparedSegments arrays from Load logger
- Remove segmentIDs from BM25 stats logs
- Remove entries structure from sync distribution log
This reduces log volume by 99.99% for large-scale operations.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Related to #45060
Refactor segment loading architecture to make segments autonomously
manage their own loading process, moving the orchestration logic from Go
(segment_loader.go) to C++ (segcore).
**C++ Layer (segcore):**
- Added `SetLoadInfo()` and `Load()` methods to `SegmentInterface` and
implementations
- Implemented `ChunkedSegmentSealedImpl::Load()` with parallel loading
strategy:
- Separates indexed fields from non-indexed fields
- Loads indexes concurrently using thread pools
- Loads field data for non-indexed fields in parallel
- Implemented `SegmentGrowingImpl::Load()` to convert and load field
data
- Extracted `LoadIndexData()` as a reusable utility function in
`Utils.cpp`
- Added `SegmentLoad()` C binding in `segment_c.cpp`
**Go Layer:**
- Added `Load()` method to segment interfaces
- Updated mock implementations and test interfaces
- Integrated new C++ `SegmentLoad()` binding in Go segment wrapper
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #45314
This PR only ensures that no panic occurs. However, we still need to
provide protection for the delegator handling ongoing query tasks.
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Related to #45333
Fix segment loading failure when adding fields with text match enabled.
The issue occurred because text indexes were being loaded before
FinishLoad() was called, meaning raw data was not properly available
when text index creation attempted to access it, resulting in "failed to
create text index, neither raw data nor index are found" errors.
Solution is to move the FinishLoad() call to execute after raw data
loading but before text index loading. This ensures that:
1. Raw data is properly loaded and available in memory
2. Text indexes can access the raw data they need during creation
3. The segment is in the correct state before any index operations
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Currently, the index type AiSAQ RAM usage estimation is not being
calculated correctly.
AiSAQ index type consumes less RAM usage while loading the index than
DISKANN does, and the query node module is missing the implementation of
the RAM usage estimation for that AiSAQ index type.
We suggest that the AiSAQ RAM usage estimation calculation should be as
follows:
UsedDiskMemoryRatioAisaq = 1024 (contrary to the UsedDiskMemoryRatio,
which is 4)
neededMemSize = indexInfo.IndexSize / UsedDiskMemoryRatioAisaq
neededDiskSize = indexInfo.IndexSize
Reported issue is #45247
---------
Signed-off-by: Lior Friedman <lior.friedman@il.kioxia.com>
Signed-off-by: friedl <lior.friedman@kioxia.com>
Co-authored-by: friedl <lior.friedman@kioxia.com>
Related to #43028
Initialize the schema version field when creating a new collection
instance in QueryNode. The schema version is extracted from loadMetaInfo
and assigned to the collection, ensuring proper schema version tracking
and consistency across the distributed system.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #43897
- Alter collection/database is implemented by WAL-based DDL framework
now.
- Support AlterCollection/AlterDatabase in wal now.
- Alter operation can be synced by new CDC now.
- Refactor some UT for alter DDL.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
Related to #44509
Fix a bug where QueryNodeNumEntities metrics were not updated for
collections with zero segments, causing stale metrics when all segments
are flushed or compacted.
The previous implementation used separate loops: one to update size
metrics for all collections, and another to update num entities metrics
only for collections present in the grouped segments map. Collections
with no segments were skipped in the second loop, leaving their
NumEntities metrics stale.
Changes:
- Consolidate size and num entities metric updates into single loop
- Iterate over all collections instead of grouped segments
- Get collection metadata from manager instead of segment instances
- Correctly set NumEntities to 0 for collections with no segments
- Apply the same fix to both growing and sealed segment processing
- Add nil check for collection metadata before processing
This ensures all collection metrics are updated consistently, even when
segment count drops to zero.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
relate: https://github.com/milvus-io/milvus/issues/43687
Add global analyzer options, avoid having to merge some milvus params
into user's analyzer params.
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
This commit introduces the foundation for enabling segments to manage
their own loading process by passing load information during segment
creation.
Changes:
C++ Layer:
- Add NewSegmentWithLoadInfo() C API to create segments with serialized
load info
- Add SetLoadInfo() method to SegmentInterface for storing load
information
- Refactor segment creation logic into shared CreateSegment() helper
function
- Add comprehensive documentation for the new API
Go Layer:
- Extend CreateCSegmentRequest to support optional LoadInfo field
- Update segment creation in querynode to pass SegmentLoadInfo when
available
- Add ConvertToSegcoreSegmentLoadInfo() and helper converters for proto
translation
Proto Definitions:
- Add segcorepb.SegmentLoadInfo message with essential loading metadata
- Add supporting messages: Binlog, FieldBinlog, FieldIndexInfo,
TextIndexStats, JsonKeyStats
- Remove dependency on data_coord.proto by creating segcore-specific
definitions
Testing:
- Add comprehensive unit tests for proto conversion functions
- Test edge cases including nil inputs, empty data, and nil array/map
elements
This is the first step toward issue #45060 - enabling segments to
autonomously manage their loading process, which will:
- Clarify responsibilities between Go and C++ layers
- Reduce cross-language call overhead
- Enable precise resource management at the C++ level
- Support better integration with caching layer
- Enable proactive schema evolution handling
Related to #45060
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
relate: https://github.com/milvus-io/milvus/issues/43687
We used to run the temporary analyzer and validate analyzer on the
proxy, but the proxy should not be a computation-heavy node. This PR
move all analyzer calculations to the streaming node.
---------
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>