issue: #41435
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added segment resource tracking and automatic memory/disk accounting
during inserts, deletes, loads and reopen.
* Exposed a configuration to set interim index memory expansion rate.
* Added explicit loaded-resource charge/refund operations and Bloom
filter resource lifecycle management.
* **Bug Fixes**
* Ensured consistent memory-size vs. row-count calculations across
segment operations.
* Improved resource refunding and cleanup when segments are released or
closed.
* **Tests**
* Added comprehensive resource-tracking and concurrency tests, plus
Bloom filter accounting tests.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
related: https://github.com/milvus-io/milvus/issues/46442
core changes:
- Add config (default: false) to disable /expr endpoint by default
- On Proxy nodes, require root user authentication via HTTP Basic Auth
when enabled
- On non-Proxy nodes, keep original auth parameter behavior for backward
compatibility
- Add HasRegistered() and AuthBypass to expr package for node type
detection
---------
Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
Add a thread pool to load BM25 stats and save them to local disk during
loading, reducing peak memory usage.
relate: https://github.com/milvus-io/milvus/issues/41424
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: sealed-segment BM25 statistics are immutable and may
be persisted to disk independently from growing-segment stats; IDF state
is reconstructed by combining persisted sealed stats (preloaded from
disk) with in-memory growing stats at runtime (preloadSealed +
RegisterGrowing).
- Capability added: asynchronous BM25 loading via a dedicated
BM25LoadPool (config key common.threadCoreCoefficient.bm25Load and param
BM25LoadThreadCoreCoefficient) — delegator.loadBM25Stats is executed on
the pool to load sealed BM25 stats and call idfOracle.RegisterSealed,
reducing peak memory during segment load.
- Logic removed/simplified and why: the previous single Register(segID,
stats, state) + background local-writer/spool loop was split into
RegisterGrowing (in-memory) and RegisterSealed (sealed + on-disk
persistence) and the localloop removed; RegisterSealed writes sealed
stats directly (ToLocal) and uses singleflight to deduplicate,
eliminating redundant spooling and lifecycle complexity while clarifying
sealed vs growing flows.
- Why this does NOT introduce data loss or behavior regression: sealed
stats are still written and reloaded (RegisterSealed persists to
dirPath; preloadSealed merges disk and memory on first load), growing
segments continue to register in-memory via RegisterGrowing,
loadSegments now defers sealed BM25 handling to loadBM25Stats but still
registers sealed candidates after load, and tests were updated to
reflect RegisterSealed/RegisterGrowing usage—so
serialization/deserialization, preload semantics, and test coverage
preserve existing persisted state and runtime behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
Related to #46617
Bump milvus-storage version from 839a8e5 to 2ab2904, which introduces
subtree filesystem support. This allows removing manual bucket name
concatenation logic across the codebase as the storage layer now handles
path prefixing internally.
Changes:
- Remove bucket name prefix logic in datanode, querynode, and storage
layers
- Simplify FileManager::GetRemoteIndexFilePrefixV2()
- Rename CColumnGroups API to CColumnSplits to align with upstream
- Update DiskFileManagerTest paths for new directory structure
- Add FFI packed reader/writer unit tests
Co-authored-by: Wei Liu <wei.liu@zilliz.com>
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Co-authored-by: Wei Liu <wei.liu@zilliz.com>
issue: #41435
- fix: avoid double destruction with placement new
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Pull Request Summary
**Bug Fix: milvus-common dependency update to address double destruction
with placement new**
- **Root Cause Resolution:** Updates the milvus-common library
dependency from commit 4d7781d to fbe5cf7 to fix a double destruction
issue that occurs when placement new is used for object construction.
The upstream fix ensures proper lifecycle management of objects
constructed with placement new semantics, preventing accidental
re-destruction of objects allocated in pre-allocated memory regions.
- **No Logic Changes in Milvus Core:** This PR contains only a
CMakeLists.txt version bump in
`internal/core/thirdparty/milvus-common/CMakeLists.txt`; no Milvus
codebase logic is modified, removed, or simplified. The fix is entirely
contained within the milvus-common library dependency fetched during the
build process.
- **Data Integrity and Behavior Preservation:** No behavior regression
or data loss is introduced. The change is a pure dependency update to
pull in an upstream bug fix. All memory management and object lifecycle
handling improvements are confined to the milvus-common library,
remaining transparent to Milvus core operations.
- **Issue Resolution:** Addresses issue #41435 by integrating the
corrected milvus-common version that prevents double destruction bugs
occurring in code paths that use placement new for memory-efficient
object construction.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
issue: #46779
related: #45993
Return clear error when using is null/is not null filter on vector
fields
Return clear error when search by IDs with all null vectors
Fix nq mismatch when search by IDs with mixed null/valid vectors
Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
issue: #46895
related: #45993
Add validation in addCollectionFieldTask to prevent adding vector fields
when the maximum vector field limit is already reached.
Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
issue: #46856
if the delegator is no recovering, the tslag may be huge. so the quota
center may see the huge lag, forbid the writing.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #44358
The panic "concurrent map iteration and map write" was introduced in PR
#44361. It occurred when QueryCopySegment RPC iterated segmentResults
while copySingleSegment was updating it concurrently.
- Deep copy segmentResults in Clone() to avoid shared map reference
- Return map copy in GetSegmentResults() to prevent iteration conflict
- Update tests to get task from manager after Update() operations
This fix follows the same deep-copy pattern used in ImportTask and
L0ImportTask.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #42937
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: callers must explicitly close output streams (call
Close()) instead of relying on RemoteOutputStream's destructor to
perform closure.
- Logic removed/simplified: RemoteOutputStream's destructor no longer
closes or asserts on the underlying arrow::io::OutputStream; an explicit
public Close() method was added and closure responsibility moved to that
code path.
- Why this is safe (no data loss/regression): callers now invoke Close()
before reading or destroying streams (e.g.,
DiskFileManagerTest::ReadAndWriteWithStream calls os->Close() before
opening the input stream). Write paths remain unchanged
(RemoteOutputStream::Write -> output_stream_->Write), and Close invokes
output_stream_->Close() with status assertion, ensuring
flush/confirmation via the same API and preserving data integrity;
removing destructor-side asserts prevents unexpected failures during
object destruction without changing write/close semantics.
- Chore: updated third-party pins — internal/core/thirdparty/knowhere
CMakeLists.txt: KNOWHERE_VERSION -> a59816e;
internal/core/thirdparty/milvus-common CMakeLists.txt:
MILVUS-COMMON-VERSION -> b6629f7.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
issue: #46762
Copy the fieldIndexes map while holding the read lock to prevent data
race. The original code released the lock before iterating over the map,
which could cause concurrent access issues.
Affected methods:
- GetSegmentIndexState
- GetIndexedSegments
- IsUnIndexedSegment
- GetSegmentIndexedFields
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Fixes: #46840
The test "failure_returns_partial_file_list" had 3 binlog entries but
only mocked 2 Copy calls, causing flaky behavior. Remove the unmocked
third binlog to make the test deterministic.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #46818
When a collection has autoID enabled and `allow_insert_auto_id` property
set to true, the RESTful v2 insert API was incorrectly rejecting
requests that included the primary key field. This fix adds proper
checking of the `allow_insert_auto_id` flag in the `anyToColumns`
function.
Changes:
- Read `allow_insert_auto_id` property from collection schema
- Skip PK field only when autoID is enabled AND allow_insert_auto_id is
false
- Allow PK field in insert request when allow_insert_auto_id is true
- Filter out empty PK column when autoID is enabled and user didn't
provide PK
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #44956
Before reading field data from loon manifest, check if the field exists
in column groups first. If the field does not exist, return an empty
result instead of proceeding with the read operation.
This is a workaround until loon natively supports returning null for
non-existent fields.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
relate: https://github.com/milvus-io/milvus/issues/46718
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Enhancement: Optimize Jieba and Lindera Analyzer Clone
**Core Invariant**: JiebaTokenizer and LinderaTokenizer must be
efficiently cloneable without lifetime constraints to support analyzer
composition in multi-language detection chains.
**What Logic Was Improved**:
- **JiebaTokenizer**: Replaced `Cow<'a, Jieba>` with
`Arc<jieba_rs::Jieba>` and removed the `<'a>` lifetime parameter. The
global JIEBA instance now wraps in Arc, enabling `#[derive(Clone)]` on
the struct. This eliminates lifetime management complexity while
maintaining zero-copy sharing via atomic reference counting.
- **LinderaTokenizer**: Introduced public `LinderaSegmenter` struct
encapsulating dictionary and mode state, and implemented explicit
`Clone` that properly duplicates the segmenter (cloning Arc-wrapped
dictionary), applies `box_clone()` to each boxed token filter, and
clones the token buffer. Previously, Clone was either unavailable or
incompletely handled trait objects.
**Why Previous Implementation Was Limiting**:
- The `Cow::Borrowed` pattern for JiebaTokenizer created explicit
lifetime dependencies that prevented straightforward `#[derive(Clone)]`.
Switching to Arc eliminates borrow checker constraints while providing
the same reference semantics for immutable shared state.
- LinderaTokenizer's token filters are boxed trait objects, which cannot
be auto-derived. Manual Clone implementation with `box_clone()` calls
correctly handles polymorphic filter duplication.
**No Data Loss or Behavior Regression**:
- Arc cloning is semantically equivalent to `Cow::Borrowed` for
read-only access; both efficiently share the underlying Jieba instance
and Dictionary without data duplication.
- The explicit Clone preserves all tokenizer state: segmenter (with
shared Arc dictionary), all token filters (via individual box_clone),
and the token buffer used during tokenization.
- Token stream behavior unchanged—segmentation and filter application
order remain identical.
- New benchmarks (`bench_jieba_tokenizer_clone`,
`bench_lindera_tokenizer_clone`) measure and validate clone performance
for both tokenizers.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
issue: #46540
Empty timetick is just used to sync up the time clock between different
component in milvus. So empty timetick can be ignored if we achieve the
lsn/mvcc semantic for timetick. Currently, some components need the
empty timetick to trigger some operation, such as flush/tsafe. So we
only slow down the empty time tick for 5 seconds.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: with LSN/MVCC semantics consumers only need (a) the
first timetick that advances the latest-required-MVCC to unblock
MVCC-dependent waits and (b) occasional periodic timeticks (~≤5s) for
clock synchronization—therefore frequent non-persisted empty timeticks
can be suppressed without breaking MVCC correctness.
- Logic removed/simplified: per-message dispatch/consumption of frequent
non-persisted empty timeticks is suppressed — an MVCC-aware filter
emptyTimeTickSlowdowner (internal/util/pipeline/consuming_slowdown.go)
short-circuits frequent empty timeticks in the stream pipeline
(internal/util/pipeline/stream_pipeline.go), and the WAL flusher
rate-limits non-persisted timetick dispatch to one emission per ~5s
(internal/streamingnode/server/flusher/flusherimpl/wal_flusher.go); the
delegator exposes GetLatestRequiredMVCCTimeTick to drive the filter
(internal/querynodev2/delegator/delegator.go).
- Why this does NOT introduce data loss or regressions: the slowdowner
always refreshes latestRequiredMVCCTimeTick via
GetLatestRequiredMVCCTimeTick and (1) never filters timeticks <
latestRequiredMVCCTimeTick (so existing tsafe/flush waits stay
unblocked) and (2) always lets the first timetick ≥
latestRequiredMVCCTimeTick pass to notify pending MVCC waits;
separately, WAL flusher suppression applies only to non-persisted
timeticks and still emits when the 5s threshold elapses, preserving
periodic clock-sync messages used by flush/tsafe.
- Enhancement summary (where it takes effect): adds
GetLatestRequiredMVCCTimeTick on ShardDelegator and
LastestMVCCTimeTickGetter, wires emptyTimeTickSlowdowner into
NewPipelineWithStream (internal/util/pipeline), and adds WAL flusher
rate-limiting + metrics
(internal/streamingnode/server/flusher/flusherimpl/wal_flusher.go,
pkg/metrics) to reduce CPU/dispatch overhead while keeping MVCC
correctness and periodic synchronization.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #46737
PR #46440 refactored checkStale to use action.Node() for all action
types, which breaks LeaderAction stale checking. For LeaderAction,
Node() returns the worker node where segment resides, but the task is
executed on the leader node (delegator).
When syncing segments from RO worker nodes to a RW delegator, using
action.Node() incorrectly marks the task as stale because the worker is
RO, even though the leader is RW and the task should proceed.
This fix:
- Uses leaderID instead of Node() for LeaderAction stale checking
- Adds detailed comments explaining the distinction
- Adds unit tests covering the RO worker with RW leader scenario
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/46743
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary
**Core Invariant:** Sort compaction tasks must not be created
concurrently for the same segment. The system relies on atomic
check-and-set semantics to prevent duplicate task creation.
**What Logic is Improved:** The code now guards sort compaction task
creation with an explicit `CheckAndSetSegmentsCompacting` check before
calling `createSortCompactionTask`. Previously, tasks could be attempted
for segments already undergoing compaction, triggering warning logs that
incorrectly suggested task creation failures. The fix skips task
creation when a segment is already compacting, avoiding these misleading
warnings entirely.
**Why No Data Loss or Regression:**
- The `CheckAndSetSegmentsCompacting` method atomically checks whether a
segment is already being compacted and only proceeds if it's not; this
is the correct guard pattern for preventing concurrent compactions
- When a segment is already compacting (`isCompacting == true`), the
code correctly increments the done counter and skips to the next
segment, which is the intended behavior (no wasted task creation
attempts)
- The function signature change to `createSortCompactionTask` adds only
an internal parameter (the current task context for logging); no public
APIs are affected
- Logging refactoring maintains semantic equivalence while providing
task-scoped context
**Concrete Fix:** The misleading warning during sort compaction is
eliminated by preventing task creation attempts for already-compacting
segments through the mutex-protected `CheckAndSetSegmentsCompacting`
guard, rather than attempting creation and failing downstream.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Add support for DataNode compaction using file resources in ref mode.
SortCompation and StatsJobs will build text indexes, which may use file
resources.
relate: https://github.com/milvus-io/milvus/issues/43687
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: file resources (analyzer binaries/metadata) are only
fetched, downloaded and used when the node is configured in Ref mode
(fileresource.IsRefMode via CommonCfg.QNFileResourceMode /
DNFileResourceMode); Sync now carries a version and managers track
per-resource versions/resource IDs so newer resource sets win and older
entries are pruned (RefManager/SynchManager resource maps).
- Logic removed / simplified: component-specific FileResourceMode flags
and an indirection through a long-lived BinlogIO wrapper were
consolidated — file-resource mode moved to CommonCfg, Sync/Download APIs
became version- and context-aware, and compaction/index tasks accept a
ChunkManager directly (binlog IO wrapper creation inlined). This
eliminates duplicated config checks and wrapper indirection while
preserving the same chunk/IO semantics.
- Why no data loss or behavior regression: all file-resource code paths
are gated by the configured mode (default remains "sync"); when not in
ref-mode or when no resources exist, compaction and stats flows follow
existing code paths unchanged. Versioned Sync + resourceID maps ensure
newly synced sets replace older ones and RefManager prunes stale files;
GetFileResources returns an error if requested IDs are missing (prevents
silent use of wrong resources). Analyzer naming/parameter changes add
analyzer_extra_info but default-callers pass "" so existing analyzers
and index contents remain unchanged.
- New capability: DataNode compaction and StatsJobs can now build text
indexes using external file resources in Ref mode — DataCoord exposes
GetFileResources and populates CompactionPlan.file_resources;
SortCompaction/StatsTask download resources via fileresource.Manager,
produce an analyzer_extra_info JSON (storage + resource->id map) via
analyzer.BuildExtraResourceInfo, and propagate analyzer_extra_info into
BuildIndexInfo so the tantivy bindings can load custom analyzers during
text index creation.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
related: #36380
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: aggregation is centralized and schema-aware — all
aggregate functions are created via the exec Aggregate registry
(milvus::exec::Aggregate) and validated by ValidateAggFieldType, use a
single in-memory accumulator layout (Accumulator/RowContainer) and
grouping primitives (GroupingSet, HashTable, VectorHasher), ensuring
consistent typing, null semantics and offsets across planner → exec →
reducer conversion paths (toAggregateInfo, Aggregate::create,
GroupingSet, AggResult converters).
- Removed / simplified logic: removed ad‑hoc count/group-by and reducer
code (CountNode/PhyCountNode, GroupByNode/PhyGroupByNode, cntReducer and
its tests) and consolidated into a unified AggregationNode →
PhyAggregationNode + GroupingSet + HashTable execution path and
centralized reducers (MilvusAggReducer, InternalAggReducer,
SegcoreAggReducer). AVG now implemented compositionally (SUM + COUNT)
rather than a bespoke operator, eliminating duplicate implementations.
- Why this does NOT cause data loss or regressions: existing data-access
and serialization paths are preserved and explicitly validated —
bulk_subscript / bulk_script_field_data and FieldData creation are used
for output materialization; converters (InternalResult2AggResult ↔
AggResult2internalResult, SegcoreResults2AggResult ↔
AggResult2segcoreResult) enforce shape/type/row-count validation; proxy
and plan-level checks (MatchAggregationExpression,
translateOutputFields, ValidateAggFieldType, translateGroupByFieldIds)
reject unsupported inputs (ARRAY/JSON, unsupported datatypes) early.
Empty-result generation and explicit error returns guard against silent
corruption.
- New capability and scope: end-to-end GROUP BY and aggregation support
added across the stack — proto (plan.proto, RetrieveRequest fields
group_by_field_ids/aggregates), planner nodes (AggregationNode,
ProjectNode, SearchGroupByNode), exec operators (PhyAggregationNode,
PhyProjectNode) and aggregation core (Aggregate implementations:
Sum/Count/Min/Max, SimpleNumericAggregate, RowContainer, GroupingSet,
HashTable) plus proxy/querynode reducers and tests — enabling grouped
and global aggregation (sum, count, min, max, avg via sum+count) with
schema-aware validation and reduction.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
Related to #46595
Remove the EnableStorageV2 config option and enforce StorageV2 format
across all write paths including compaction, import, write buffer, and
streaming segment allocation. V1 format write tests are now skipped as
writing V1 format is no longer supported.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #46090
This change introduces a global node blacklist mechanism to immediately
cut off query traffic to failed delegators across all concurrent
requests.
Key features:
- Introduce ChannelBlacklist to track failed delegator nodes per channel
- When a query fails, the node is immediately blacklisted and excluded
from ALL subsequent requests (not just retries within the same request)
- Blacklisted nodes are automatically excluded during node selection
- Entries expire after configurable duration (default 30s) to allow
automatic recovery when nodes become healthy again
- Background cleanup loop removes expired entries periodically
- Add proxy.replicaBlacklistDuration and
proxy.replicaBlacklistCleanupInterval configuration parameters
- Blacklist can be disabled by setting duration to 0
Before this change:
- Failed nodes were only excluded within the same request's retry loop
- Concurrent requests would still attempt to query the failed node
- Each request had to experience its own failure before avoiding the
node
After this change:
- Once a node fails, it is immediately excluded from all requests
- New requests arriving during the blacklist period will skip the failed
node without experiencing any failure
- This significantly reduces latency spikes during node failures
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/46713
/kind bug
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: a Limiter's hasUpdated flag controls whether its
non-Infinite limit is exported to proxies via toRequestLimiter(); only
limiters with hasUpdated==true and non-Inf rates produce rate updates
delivered to proxies (pkg/util/ratelimitutil/limiter.go: HasUpdated /
toRequestLimiter behavior unchanged).
- Exact bug and fix (issue #46713): collection-level limiters created
from configured collection/partition/database properties were
constructed with correct limits but left hasUpdated==false, so they were
skipped by the existing !HasUpdated() check and never sent to proxies.
Fix: add Limiter.SetHasUpdated(updated bool) and call a new
updateLimiterHasUpdated helper immediately after creating limiter nodes
during initialization/reset (internal/rootcoord/quota_center.go) to mark
non-Inf newly-created limiters as updated so they are included in
toRequestLimiter exports.
- Logic simplified / redundancy removed: initialization now explicitly
sets limiter initialization state (hasUpdated) for newly-created
non-Infinite limiters instead of relying on implicit later side-effects
to toggle the flag; this removes the implicit gap between creation and
the expectation that a configured limiter should be published.
- No data-loss or behavior regression: the change only mutates the
in-memory hasUpdated flag for freshly created limiter instances
(pkg/util/ratelimitutil/limiter.go: SetHasUpdated) and sets it in the
limiter initialization path (internal/rootcoord/quota_center.go). It
does not alter token accounting (advance, AllowN, Cancel), rate
computation, SetLimit semantics, persistence, or proxy filtering
logic—only ensures intended collection-level rates are delivered to
proxies—so no persisted data or runtime rate behavior is removed or
degraded.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: guanghuihuang <guanghuihuang@didiglobal.com>
Co-authored-by: guanghuihuang <guanghuihuang@didiglobal.com>
issue: #44358
Implement complete snapshot management system including creation,
deletion, listing, description, and restoration capabilities across all
system components.
Key features:
- Create snapshots for entire collections
- Drop snapshots by name with proper cleanup
- List snapshots with collection filtering
- Describe snapshot details and metadata
Components added/modified:
- Client SDK with full snapshot API support and options
- DataCoord snapshot service with metadata management
- Proxy layer with task-based snapshot operations
- Protocol buffer definitions for snapshot RPCs
- Comprehensive unit tests with mockey framework
- Integration tests for end-to-end validation
Technical implementation:
- Snapshot metadata storage in etcd with proper indexing
- File-based snapshot data persistence in object storage
- Garbage collection integration for snapshot cleanup
- Error handling and validation across all operations
- Thread-safe operations with proper locking mechanisms
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant/assumption: snapshots are immutable point‑in‑time
captures identified by (collection, snapshot name/ID); etcd snapshot
metadata is authoritative for lifecycle (PENDING → COMMITTED → DELETING)
and per‑segment manifests live in object storage (Avro / StorageV2). GC
and restore logic must see snapshotRefIndex loaded
(snapshotMeta.IsRefIndexLoaded) before reclaiming or relying on
segment/index files.
- New capability added: full end‑to‑end snapshot subsystem — client SDK
APIs (Create/Drop/List/Describe/Restore + restore job queries),
DataCoord SnapshotWriter/Reader (Avro + StorageV2 manifests),
snapshotMeta in meta, SnapshotManager orchestration
(create/drop/describe/list/restore), copy‑segment restore
tasks/inspector/checker, proxy & RPC surface, GC integration, and
docs/tests — enabling point‑in‑time collection snapshots persisted to
object storage and restorations orchestrated across components.
- Logic removed/simplified and why: duplicated recursive
compaction/delta‑log traversal and ad‑hoc lookup code were consolidated
behind two focused APIs/owners (Handler.GetDeltaLogFromCompactTo for
delta traversal and SnapshotManager/SnapshotReader for snapshot I/O).
MixCoord/coordinator broker paths were converted to thin RPC proxies.
This eliminates multiple implementations of the same traversal/lookup,
reducing divergence and simplifying responsibility boundaries.
- Why this does NOT introduce data loss or regressions: snapshot
create/drop use explicit two‑phase semantics (PENDING → COMMIT/DELETING)
with SnapshotWriter writing manifests and metadata before commit; GC
uses snapshotRefIndex guards and
IsRefIndexLoaded/GetSnapshotBySegment/GetSnapshotByIndex checks to avoid
removing referenced files; restore flow pre‑allocates job IDs, validates
resources (partitions/indexes), performs rollback on failure
(rollbackRestoreSnapshot), and converts/updates segment/index metadata
only after successful copy tasks. Extensive unit and integration tests
exercise pending/deleting/GC/restore/error paths to ensure idempotence
and protection against premature deletion.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Add configurable MAP_POPULATE flag support for mmap operations to reduce
page faults and improve first read performance.
Key changes:
- Add `queryNode.mmap.populate` config (default: true) to control
MAP_POPULATE flag usage
- Add `mmap_populate` parameter to MmapChunkTarget, ChunkTranslator,
GroupChunkTranslator, and ManifestGroupTranslator
- Apply MAP_POPULATE to both MmapChunkTarget and MemChunkTarget
- Propagate mmap_populate setting through chunk creation pipeline
When enabled, MAP_POPULATE pre-faults the mapped pages into memory,
eliminating page faults during subsequent access and improving query
performance for the first read operations.
issue: #46760
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
If QueryNode loads multiple sealed segments with BM25 enabled, BM25
stats registration into IDFOracle could stop after the first segment due
to an early-terminating ConcurrentMap.Range callback. This change:
Register BM25 stats for all sealed segments by continuing iteration
(return true) during sealed-segment load
Prevent repeated warnings like idf oracle lack some sealed segment
Ensure IDF/BM25 statistics are not silently incomplete (improving BM25
ranking correctness)
issue: #46725
Core invariant: for any BM25-enabled collection, every loaded sealed
segment with available BM25 stats must be registered into IDFOracle, so
SyncDistribution can always find the sealed segments present in the
distribution snapshot.
Bug fix: ConcurrentMap.Range respects the callback’s boolean return;
returning false stops iteration. The sealed BM25 stats registration
callback previously returned false, which could register only the first
sealed segment and leave the rest missing—causing IDFOracle to warn idf
oracle lack some sealed segment and potentially compute IDF from
incomplete stats. Fixed by returning true to continue iterating and
registering all segments.
No behavior regression: the change only affects the sealed-segment BM25
stats registration loop; it does not alter segment loading, distribution
snapshot generation, or non-BM25 codepaths. For collections without BM25
(or when BM25 stats are nil), behavior remains unchanged.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: For any BM25-enabled collection, every loaded sealed
segment with available BM25 stats must be registered into IDFOracle so
SyncDistribution can discover them in distribution snapshots.
- Bug fix (links to #46725): The BM25 stats registration callback used
with bm25Stats.Range() in loadStreamDelete() returned false, which
prematurely stopped iteration after the first sealed segment and left
subsequent sealed segments unregistered. The fix changes the callback to
return true so the Range loop completes and registers BM25 stats for all
sealed segments.
- Logic simplified/removed: The early-return (false) in the
ConcurrentMap.Range callback that aborted further registrations has been
removed (replaced by returning true). That early abort was redundant and
incorrect because registration must proceed for every entry; allowing
Range to continue restores the intended one-to-many registration
behavior.
- No data loss or regression: The change is narrowly scoped to the
sealed-segment BM25 stats registration loop in
internal/querynodev2/delegator/delegator_data.go and does not modify
segment loading, distribution snapshot generation, growing-segment
handling, or non-BM25 codepaths. Returning true only permits full
iteration and registration; it does not delete or alter existing data
structures or load state, so IDF/BM25 statistics become complete without
changing other behaviors.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: thangTang <tangtianhang099@gmail.com>
issue: #46635
## Summary
- Fix spelling error in constant name: `CredentialSeperator` ->
`CredentialSeparator`
- Updated all usages across the codebase to use the correct spelling
## Changes
- `pkg/util/constant.go`: Renamed the constant
- `pkg/util/contextutil/context_util.go`: Updated usage
- `pkg/util/contextutil/context_util_test.go`: Updated usage
- `internal/proxy/authentication_interceptor.go`: Updated usage
- `internal/proxy/util.go`: Updated usage
- `internal/proxy/util_test.go`: Updated usage
- `internal/proxy/trace_log_interceptor_test.go`: Updated usage
- `internal/proxy/accesslog/info/util.go`: Updated usage
- `internal/distributed/proxy/service.go`: Updated usage
- `internal/distributed/proxy/httpserver/utils.go`: Updated usage
## Test Plan
- [x] All references updated consistently
- [x] No functional changes - only constant name spelling correction
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: the separator character for credentials remains ":"
everywhere — only the exported identifier was renamed from
CredentialSeperator → CredentialSeparator; the constant value and
split/join semantics are unchanged.
- Change (bug fix): corrected the misspelled exported constant in
pkg/util/constant.go and updated all references across the codebase
(parsing, token construction, header handling and tests) to use the new
identifier; this is an identifier rename that removes an inconsistent
symbol and prevents compile-time/reference errors.
- Logic simplified/redundant work removed: no runtime logic was removed;
the simplification is purely maintenance-focused — eliminating a
misspelled exported name that could cause developers to introduce
duplicate or incorrect constants.
- No data loss or behavior regression: runtime code paths are unchanged
— e.g., GetAuthInfoFromContext, ParseUsernamePassword,
AuthenticationInterceptor, proxy service token construction and
access-log extraction still use ":" to split/join credentials; updated
and added unit tests (parsing and metadata extraction) exercise these
paths and validate identical semantics.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: majiayu000 <1835304752@qq.com>
Signed-off-by: lif <1835304752@qq.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
issue: #46740
- Add duplicate ID check before query execution (fail fast)
- Auto infer anns_field when only one vector field exists in schema
Signed-off-by: Li Liu <li.liu@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/46678
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: Text index log keys are canonicalized at KV
(serialization) boundaries — etcd stores compressed filename-only
entries, while in-memory and runtime consumers must receive full
object-storage keys so Datanode/QueryNode can load text indexes
directly.
- Logic removed/simplified: ad-hoc reconstruction of full text-log paths
scattered across components (garbage_collector.getTextLogs,
querynodev2.LoadTextIndex, compactor/index task code) was removed;
consumers now use TextIndexStats.Files as-provided (full keys). Path
compression/decompression was centralized into KV marshal/unmarshal
utilities (metautil.ExtractTextLogFilenames in marshalSegmentInfo and
metautil.BuildTextLogPaths in kv_catalog.listSegments), eliminating
redundant, inconsistent prefix-rebuilding logic that broke during
rolling upgrades.
- Why this does NOT cause data loss or regressions: before persist,
marshalSegmentInfo compresses TextStatsLogs.Files to filenames
(metautil.ExtractTextLogFilenames) so stored KV remains compact; on
load, kv_catalog.listSegments calls metautil.BuildTextLogPaths to
restore full paths and includes compatibility logic that leaves
already-full keys unchanged. Thus every persisted filename is
recoverable to a valid full key and consumers receive correct full paths
(see marshalSegmentInfo → KV write path and kv_catalog.listSegments →
reload path), preventing dropped or malformed keys.
- Bug fix (refs #46678): resolves text-log loading failures during
cluster upgrades by centralizing path handling at KV encode/decode and
removing per-component path reconstruction — the immediate fix is
changing consumers to read TextIndexStats.Files directly and relying on
marshal/unmarshal to perform compression/expansion, preventing
mixed-format failures during rolling upgrades.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: sijie-ni-0214 <sijie.ni@zilliz.com>
issue: #46033
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Pull Request Summary: Entity-Level TTL Field Support
### Core Invariant and Design
This PR introduces **per-entity TTL (time-to-live) expiration** via a
dedicated TIMESTAMPTZ field as a fine-grained alternative to
collection-level TTL. The key invariant is **mutual exclusivity**:
collection-level TTL and entity-level TTL field cannot coexist on the
same collection. Validation is enforced at the proxy layer during
collection creation/alteration (`validateTTL()` prevents both being set
simultaneously).
### What Is Removed and Why
- **Global `EntityExpirationTTL` parameter** removed from config
(`configs/milvus.yaml`, `pkg/util/paramtable/component_param.go`). This
was the only mechanism for collection-level expiration. The removal is
safe because:
- The collection-level TTL path (`isEntityExpired(ts)` check) remains
intact in the codebase for backward compatibility
- TTL field check (`isEntityExpiredByTTLField()`) is a secondary path
invoked only when a TTL field is configured
- Existing deployments using collection TTL can continue without
modification
The global parameter was removed specifically because entity-level TTL
makes per-entity control redundant with a collection-wide setting, and
the PR chooses one mechanism per collection rather than layering both.
### No Data Loss or Behavior Regression
**TTL filtering logic is additive and safe:**
1. **Collection-level TTL unaffected**: The `isEntityExpired(ts)` check
still applies when no TTL field is configured; callers of
`EntityFilter.Filtered()` pass `-1` as the TTL expiration timestamp when
no field exists, causing `isEntityExpiredByTTLField()` to return false
immediately
2. **Null/invalid TTL values treated safely**: Rows with null TTL or TTL
≤ 0 are marked as "never expire" (using sentinel value `int64(^uint64(0)
>> 1)`) and are preserved across compactions; percentile calculations
only include positive TTL values
3. **Query-time filtering automatic**: TTL filtering is transparently
added to expression compilation via `AddTTLFieldFilterExpressions()`,
which appends `(ttl_field IS NULL OR ttl_field > current_time)` to the
filter pipeline. Entities with null TTL always pass the filter
4. **Compaction triggering granular**: Percentile-based expiration (20%,
40%, 60%, 80%, 100%) allows configurable compaction thresholds via
`SingleCompactionRatioThreshold`, preventing premature data deletion
### Capability Added: Per-Entity Expiration with Data Distribution
Awareness
Users can now specify a TIMESTAMPTZ collection property `ttl_field`
naming a schema field. During data writes, TTL values are collected per
segment and percentile quantiles (5-value array) are computed and stored
in segment metadata. At query time, the TTL field is automatically
filtered. At compaction time, segment-level percentiles drive
expiration-based compaction decisions, enabling intelligent compaction
of segments where a configurable fraction of data has expired (e.g.,
compact when 40% of rows are expired, controlled by threshold ratio).
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>