milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-07 19:31:51 +08:00

Author	SHA1	Message	Date
wei liu	975c91df16	feat: Add comprehensive snapshot functionality for collections (#44361 ) issue: #44358 Implement complete snapshot management system including creation, deletion, listing, description, and restoration capabilities across all system components. Key features: - Create snapshots for entire collections - Drop snapshots by name with proper cleanup - List snapshots with collection filtering - Describe snapshot details and metadata Components added/modified: - Client SDK with full snapshot API support and options - DataCoord snapshot service with metadata management - Proxy layer with task-based snapshot operations - Protocol buffer definitions for snapshot RPCs - Comprehensive unit tests with mockey framework - Integration tests for end-to-end validation Technical implementation: - Snapshot metadata storage in etcd with proper indexing - File-based snapshot data persistence in object storage - Garbage collection integration for snapshot cleanup - Error handling and validation across all operations - Thread-safe operations with proper locking mechanisms <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - Core invariant/assumption: snapshots are immutable point‑in‑time captures identified by (collection, snapshot name/ID); etcd snapshot metadata is authoritative for lifecycle (PENDING → COMMITTED → DELETING) and per‑segment manifests live in object storage (Avro / StorageV2). GC and restore logic must see snapshotRefIndex loaded (snapshotMeta.IsRefIndexLoaded) before reclaiming or relying on segment/index files. - New capability added: full end‑to‑end snapshot subsystem — client SDK APIs (Create/Drop/List/Describe/Restore + restore job queries), DataCoord SnapshotWriter/Reader (Avro + StorageV2 manifests), snapshotMeta in meta, SnapshotManager orchestration (create/drop/describe/list/restore), copy‑segment restore tasks/inspector/checker, proxy & RPC surface, GC integration, and docs/tests — enabling point‑in‑time collection snapshots persisted to object storage and restorations orchestrated across components. - Logic removed/simplified and why: duplicated recursive compaction/delta‑log traversal and ad‑hoc lookup code were consolidated behind two focused APIs/owners (Handler.GetDeltaLogFromCompactTo for delta traversal and SnapshotManager/SnapshotReader for snapshot I/O). MixCoord/coordinator broker paths were converted to thin RPC proxies. This eliminates multiple implementations of the same traversal/lookup, reducing divergence and simplifying responsibility boundaries. - Why this does NOT introduce data loss or regressions: snapshot create/drop use explicit two‑phase semantics (PENDING → COMMIT/DELETING) with SnapshotWriter writing manifests and metadata before commit; GC uses snapshotRefIndex guards and IsRefIndexLoaded/GetSnapshotBySegment/GetSnapshotByIndex checks to avoid removing referenced files; restore flow pre‑allocates job IDs, validates resources (partitions/indexes), performs rollback on failure (rollbackRestoreSnapshot), and converts/updates segment/index metadata only after successful copy tasks. Extensive unit and integration tests exercise pending/deleting/GC/restore/error paths to ensure idempotence and protection against premature deletion. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2026-01-06 10:15:24 +08:00
tianhang	4f8a8dd7ef	fix: Register all sealed segments’ BM25 stats in IDFOracle (#46726 ) If QueryNode loads multiple sealed segments with BM25 enabled, BM25 stats registration into IDFOracle could stop after the first segment due to an early-terminating ConcurrentMap.Range callback. This change: Register BM25 stats for all sealed segments by continuing iteration (return true) during sealed-segment load Prevent repeated warnings like idf oracle lack some sealed segment Ensure IDF/BM25 statistics are not silently incomplete (improving BM25 ranking correctness) issue: #46725 Core invariant: for any BM25-enabled collection, every loaded sealed segment with available BM25 stats must be registered into IDFOracle, so SyncDistribution can always find the sealed segments present in the distribution snapshot. Bug fix: ConcurrentMap.Range respects the callback’s boolean return; returning false stops iteration. The sealed BM25 stats registration callback previously returned false, which could register only the first sealed segment and leave the rest missing—causing IDFOracle to warn idf oracle lack some sealed segment and potentially compute IDF from incomplete stats. Fixed by returning true to continue iterating and registering all segments. No behavior regression: the change only affects the sealed-segment BM25 stats registration loop; it does not alter segment loading, distribution snapshot generation, or non-BM25 codepaths. For collections without BM25 (or when BM25 stats are nil), behavior remains unchanged. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - Core invariant: For any BM25-enabled collection, every loaded sealed segment with available BM25 stats must be registered into IDFOracle so SyncDistribution can discover them in distribution snapshots. - Bug fix (links to #46725): The BM25 stats registration callback used with bm25Stats.Range() in loadStreamDelete() returned false, which prematurely stopped iteration after the first sealed segment and left subsequent sealed segments unregistered. The fix changes the callback to return true so the Range loop completes and registers BM25 stats for all sealed segments. - Logic simplified/removed: The early-return (false) in the ConcurrentMap.Range callback that aborted further registrations has been removed (replaced by returning true). That early abort was redundant and incorrect because registration must proceed for every entry; allowing Range to continue restores the intended one-to-many registration behavior. - No data loss or regression: The change is narrowly scoped to the sealed-segment BM25 stats registration loop in internal/querynodev2/delegator/delegator_data.go and does not modify segment loading, distribution snapshot generation, growing-segment handling, or non-BM25 codepaths. Returning true only permits full iteration and registration; it does not delete or alter existing data structures or load state, so IDF/BM25 statistics become complete without changing other behaviors. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: thangTang <tangtianhang099@gmail.com>	2026-01-05 17:41:24 +08:00
Zhen Ye	ca8740c7c0	fix: remove redundant log (#46695 ) issue: #45841 - CPP log make the multi log line in one debug, remove the "\n\t". - remove some log that make no sense. - slow down some log like ChannelDistManager. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - Core invariant: logging is purely observational — this PR only reduces, consolidates, or reformats diagnostic output (removing per-item/noise logs, consolidating batched logs, and converting multi-line log strings) while preserving all control flow, return values, and state mutations across affected code paths. - Removed / simplified logic: deleted low-value per-operation debug/info logs (e.g., ListIndexes, GetRecoveryInfo, GcConfirm, push-to-reorder-buffer, several streaming/wal/debug traces), replaced per-item inline logs with single batched deferred logs in querynodev2/delegator (logExcludeInfo) and CleanInvalid, changed C++ PlanNode ToString() multi-line output to compact single-line bracketed format (removed "\n\t"), and added thresholded interceptor logging (InterceptorMetrics.ShouldBeLogged) and message-type-driven log levels to avoid verbose entries. - Why this does NOT cause data loss or behavioral regression: no function signatures, branching, state updates, persistence calls, or return values were changed — examples: ListIndexes still returns the same Status/IndexInfos; GcConfirm still constructs and returns resp.GetGcFinished(); Insert and CleanInvalid still perform the same insert/removal operations (only their per-item logging was aggregated); PlanNode ToString changes only affect emitted debug strings. All error handling and control flow paths remain intact. - Enhancement intent: reduce log volume and improve signal-to-noise for debugging by removing redundant, noisy logs and emitting concise, rate-/threshold-limited summaries while preserving necessary diagnostics and original program behavior. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-12-31 15:35:21 +08:00
wei liu	c2677967ad	fix: prevent empty segment list when partial result is enabled (#46670 ) issue: #46669 When partial result is enabled (PartialResultRequiredDataRatio < 1.0), the Serviceable() method would return true even if syncedByCoord is false (by bypassing viewReady check). However, PinReadableSegments uses GetLoadedRatio() == 1.0 to decide whether to filter segments by target version. This causes a problem: when loadedRatio == 1.0 but syncedByCoord == false, segments are filtered by an incorrect target version, resulting in an empty segment list during search. This change: - Replace GetLoadedRatio() == 1.0 with Serviceable() check to ensure target version filtering only happens after coord sync completes - Remove partial result bypass in Serviceable() to keep the check consistent <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Bug Fix Summary Core Invariant: `Serviceable()` must enforce a strict requirement that both data loading AND coordinator synchronization are complete before allowing full search operations. This prevents using stale or uninitialized target versions. Logic Removed/Simplified: - Removed the partial-result bypass from `Serviceable()` that previously allowed it to return `true` even when `syncedByCoord == false` - Replaced `GetLoadedRatio() == 1.0` checks in `PinReadableSegments` with `Serviceable()` calls to ensure target-version filtering only occurs after coord sync completes - Simplified the serviceability condition from parameterized partial-result logic to a direct conjunction: `loadedRatio >= 1.0 AND syncedByCoord == true` No Data Loss or Regression: The change is safe because: - When `Serviceable()` returns `true` (both loadedRatio ≥ 1.0 AND syncedByCoord ≥ true), segments are filtered by the current valid target version—this is the full-result path - When `Serviceable()` returns `false` but `loadedRatio >= requiredLoadRatio` (partial result case), segments are filtered against the query view's segment lists rather than target version, ensuring non-empty results as validated by `TestPinReadableSegments_PartialResultNotEmpty` - The test explicitly demonstrates that even with `loadedRatio == 1.0` and `syncedByCoord == false`, calling `PinReadableSegments(0.8, partition)` returns segments (partial result) instead of an empty list, which was the bug root cause Root Cause Fix: Previously, segments could be filtered with `unreadableTargetVersion` when `loadedRatio == 1.0` but the querycoord hadn't yet synced the target, causing empty segment lists. Now the sync state is checked before deciding the filtering strategy, preventing this race condition. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-12-31 15:03:22 +08:00
congqixia	b4682b7352	fix: use LoadDeltaData instead of Delete for L0 growing forward (#46657 ) Related to #46660 Replace segment.Delete() with segment.LoadDeltaData() when forwarding L0 deletions to growing segments. LoadDeltaData is the more appropriate API for bulk loading delta data compared to individual Delete calls. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> • Core invariant: forwarding L0 deletions to growing segments must use the bulk-delta API (storage.DeltaData + segment.LoadDeltaData) because LoadDeltaData preserves paired primary keys and timestamps as a single atomic delta payload; segment.Delete was intended for per-delete RPCs and not for loading L0 delta payloads. • Logic removed/simplified: addL0GrowingBF() no longer calls segment.Delete for buffered L0 keys. Instead the buffered callback builds a storage.DeltaData via storage.NewDeltaDataWithData(pks, tss) and calls segment.LoadDeltaData(ctx, dd). This eliminates the previous per-batch Delete call path and centralizes forwarding as a single delta-load operation. • Why this does not cause data loss or regression: the new path supplies identical PK+timestamp pairs to the segment via DeltaData; LoadDeltaData applies the same delete semantics but accepts batched delta payloads. The change is limited to the L0→growing Bloom-Filter forward path (addL0GrowingBF/addL0ForGrowingLoad), leaving sealed-segment deletes, streaming direct forwarding, and remote-load policies unchanged. Also, the prior code would fail on L0Segment.Delete (L0 segments prohibit Delete), so switching to LoadDeltaData prevents lost-forwarding caused by unsupported Delete calls. • Category: Enhancement / Refactor — replaces inappropriate per-delete calls with the correct bulk delta-load API, simplifying error handling around NewDeltaDataWithData and ensuring API contract correctness for L0→growing forwarding. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-12-30 14:05:21 +08:00
wei liu	293838bb67	enhance: add delegator catching up streaming data state tracking (#46551 ) issue: #46550 - Add CatchUpStreamingDataTsLag parameter to control tolerable lag threshold for delegator to be considered caught up - Add catchingUpStreamingData field in delegator to track whether delegator has caught up with streaming data - Add catching_up_streaming_data field in LeaderViewStatus proto - Check catching up status in CheckDelegatorDataReady, return not ready when delegator is still catching up streaming data - Add unit tests for the new functionality When tsafe lag exceeds the threshold, the distribution will not be considered serviceable, preventing queries from timing out in waitTSafe. This is useful when streaming message queue consumption is slow. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - Core invariant: a delegator must not be considered serviceable while its tsafe lags behind the latest committed timestamp beyond a configurable tolerance; a delegator is "caught-up" only when (latestTsafe - delegator.GetTSafe()) < CatchUpStreamingDataTsLag (configured by queryNode.delegator.catchUpStreamingDataTsLag, default 1s). - New capability and where it takes effect: adds streaming-catchup tracking to QueryNode/QueryCoord — an atomic catchingUpStreamingData flag on shardDelegator (internal/querynodev2/delegator/delegator.go), a new param CatchUpStreamingDataTsLag (pkg/util/paramtable/component_param.go), and a LeaderViewStatus.catching_up_streaming_data field in the proto (pkg/proto/query_coord.proto). The flag is exposed in GetDataDistribution (internal/querynodev2/services.go) and used by QueryCoord readiness checks (internal/querycoordv2/utils/util.go::CheckDelegatorDataReady) to reject leaders that are still catching up. - What logic is simplified/added (not removed): instead of relying solely on segment distribution/worker heartbeats, the PR adds an explicit readiness gate that returns "not available" when the delegator reports catching-up-streaming-data. This is strictly additive — no existing checks are removed; the new precondition runs before segment availability validation to prevent premature routing to slow-consuming delegators. - Why this does NOT cause data loss or regress behavior: the change only controls serviceability visibility and routing — it never drops or mutates data. Concretely: shardDelegator starts with catchingUpStreamingData=true and flips to false in UpdateTSafe once the sampled lag falls below the configured threshold (internal/querynodev2/delegator/delegator.go::UpdateTSafe). QueryCoord will short-circuit in CheckDelegatorDataReady when leader.Status.GetCatchingUpStreamingData() is true (internal/querycoordv2/utils/util.go), returning a channel-not-available error before any segment checks; when the flag clears, existing segment-distribution checks (same code paths) resume. Tests added cover both catching-up and caught-up paths (internal/querynodev2/delegator/delegator_test.go, internal/querycoordv2/utils/util_test.go, internal/querynodev2/services_test.go), demonstrating convergence without changed data flows or deletion of data. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-12-29 17:15:21 +08:00
aoiasd	7e4f87e351	fix: Init analyzer at delegator for all field with enable analyzer (#46361 ) To support text match highlight relate: https://github.com/milvus-io/milvus/issues/46308 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-12-19 10:23:18 +08:00
aoiasd	7d19c40e3c	feat: support search highlight with queries (#45736 ) Previously, search with highlight only supported using BM25 search text as the highlight target. This PR adds support for highlighting with user-defined queries. relate: https://github.com/milvus-io/milvus/issues/42589 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-12-01 10:17:09 +08:00
aoiasd	5efb0cedc8	feat: support use fragment config for highlight (#45099 ) relate: https://github.com/milvus-io/milvus/issues/42589 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-11-24 17:07:06 +08:00
wei liu	3fbee154f6	enhance: Remove large segment ID arrays from QueryNode logs (#45719 ) issue: #45718 Logging complete segment ID arrays caused excessive log volume (3-6 TB for 200k segments). Remove arrays from logger fields and keep only segment counts for observability. Changes: - Remove requestSegments/preparedSegments arrays from Load logger - Remove segmentIDs from BM25 stats logs - Remove entries structure from sync distribution log This reduces log volume by 99.99% for large-scale operations. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-11-20 17:18:14 +08:00
aoiasd	947c8855f3	feat: support search bm25 with highlight (#44923 ) relate: https://github.com/milvus-io/milvus/issues/42589 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-11-18 16:09:39 +08:00
aoiasd	ac82bad0b3	enhance: optimize idf oracle sync logic (#44628 ) Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-10-20 15:42:08 +08:00
aoiasd	754997ac2b	enhance: update some annotations (#44769 ) relate: https://github.com/milvus-io/milvus/issues/43114 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-10-17 16:22:02 +08:00
wei liu	33d1e7de83	fix: Replace incorrect log import with milvus v2 log package (#44731 ) issue: #44730 Fix the issue where logs were not outputting as expected due to incorrect log package imports across multiple components. Changes include: - Add golangci-lint rule to forbid github.com/pingcap/log usage - Replace github.com/pingcap/log with github.com/milvus-io/milvus/pkg/v2/log Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-10-10 20:27:57 +08:00
aoiasd	78ee76f018	enhance: support preload sealed segment bm25 stats and optimize bm25 stats serialize (#44279 ) relate: https://github.com/milvus-io/milvus/issues/41424 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-09-29 16:35:05 +08:00
jiaqizho	338ed2fed4	enhance: Introduce sparse filter in query (#44347 ) issue: #44373 The current commit implements sparse filtering in query tasks using the statistical information (Bloom filter/MinMax) of the Primary Key (PK). The statistical information of the PK is bound to the segment during the segment loading phase. A new filter has been added to the segment filter to enable the sparse filtering functionality. Signed-off-by: jiaqizho <jiaqi.zhou@zilliz.com>	2025-09-23 09:58:09 +08:00
aoiasd	9add663a08	fix: idf oracle use wrong dir (#44266 ) relate: https://github.com/milvus-io/milvus/issues/44264 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-09-10 14:41:56 +08:00
Bingyi Sun	0c0630cc38	feat: support dropping index without releasing collection (#42941 ) issue: #42942 This pr includes the following changes: 1. Added checks for index checker in querycoord to generate drop index tasks 2. Added drop index interface to querynode 3. To avoid search failure after dropping the index, the querynode allows the use of lazy mode (warmup=disable) to load raw data even when indexes contain raw data. 4. In segcore, loading the index no longer deletes raw data; instead, it evicts it. 5. In expr, the index is pinned to prevent concurrent errors. --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-09-02 16:17:52 +08:00
sparknack	70c8114e85	enhance: cachinglayer: resource management for segment loading (#43846 ) issue: #41435 --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-08-29 11:37:50 +08:00
Chun Han	da156981c6	feat: milvus support posix-compatible mode(milvus-io#43942) (#43944 ) related: #43942 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-08-27 16:29:50 +08:00
Tianx	c0d62268ac	feat: add timesatmptz data type (#44005 ) issue: https://github.com/milvus-io/milvus/issues/27467 > https://github.com/milvus-io/milvus/issues/27467#issuecomment-3092211420 > * [x] M1 Create collection with timestamptz field > * [x] M2 Insert timestamptz field data > * [x] M3 Retrieve timestamptz field data > * [x] M4 Implement handoff[ ] The second PR of issue: https://github.com/milvus-io/milvus/issues/27467, which completes M1-M4 described above. --------- Signed-off-by: xtx <xtianx@smail.nju.edu.cn>	2025-08-26 15:59:53 +08:00
congqixia	de3e5c285b	enhance: Add downgrade tsafe switch param item (#43874 ) Related to #43873 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-08-15 12:31:43 +08:00
wei liu	715b5153b8	enhance: Improve delegator serviceable check logic in PinReadableSegments (#43768 ) issue: #43767 - Enhance serviceable check logic to properly handle full vs partial result requirements - For full result (requiredLoadRatio >= 1.0): check queryView.Serviceable() - For partial result (requiredLoadRatio < 1.0): check load ratio satisfaction - Add comprehensive unit tests covering all serviceable check scenarios This enhancement ensures delegator correctly validates serviceability based on the requested result completeness, improving reliability of query operations. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-08-07 12:13:40 +08:00
Zhen Ye	5551d99425	enhance: remove old arch non-streaming arch code (#43651 ) issue: #41609 - remove all dml dead code at proxy - remove dead code at l0_write_buffer - remove msgstream dependency at proxy - remove timetick reporter from proxy - remove replicate stream implementation --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-08-06 14:41:40 +08:00
sparknack	bdd65871ea	enhance: tiered storage: estimate segment loading resource usage while considering eviction (#43323 ) issue: #41435 After introducing the caching layer's lazy loading and eviction mechanisms, most parts of a segment won't be loaded into memory or disk immediately, even if the segment is marked as LOADED. This means physical resource usage may be very low. However, we still need to reserve enough resources for the segments marked as LOADED. Thus, the logic of resource usage estimation during segment loading, which based on physcial resource usage only for now, should be changed. To address this issue, we introduced the concept of logical resource usage in this patch. This can be thought of as the base reserved resource for each LOADED segment. A segment’s logical resource usage is derived from its final evictable and inevictable resource usage and calculated as follows: ``` SLR = SFPIER + evitable_cache_ratio * SFPER ``` it also equals to ``` SLR = (SFPIER + SFPER) - (1.0 - evitable_cache_ratio) * SFPER ``` `SLR`: The logical resource usage of a segment. `SFPIER`: The final physical inevictable resource usage of a segment. `SFPER`: The final physical evictable resource usage of a segment. `evitable_cache_ratio`: The ratio of a segment's evictable resources that can be cached locally. The higher the ratio, the more physical memory is reserved for evictable memory. When loading a segment, two types of resource usage are taken into account. First is the estimated maximum physical resource usage: ``` PPR = HPR + CPR + SMPR - SFPER ``` `PPR`: The predicted physical resource usage after the current segment is allowed to load. `HPR`: The physical resource usage obtained from hardware information. `CPR`: The total physical resource usage of segments that have been committed but not yet loaded. When one new segment is allow to load, `CPR' = CPR + (SMR - SER)`. When one of the committed segments is loaded, `CPR' = CPR - (SMR - SER)`. `SMPR`: The maximum physical resource usage of the current segment. `SFPER`: The final physical evictable resource usage of the current segment. Second is the estimated logical resource usage, this check is only valid when eviction is enabled: ``` PLR = LLR + CLR + SLR ``` `PLR`: The predicted logical resource usage after the current segment is allowed to load. `LLR`: The total logical resource usage of all loaded segments. When a new segment is loaded, `LLR` should be updated to `LLR' = LLR + SLR`. `CLR`: The total logical resource usage of segments that have been committed but not yet loaded. When one new segment is allow to load, `CLR' = CLR + SLR`. When one of the committed segments is loaded, `CLR' = CLR - SLR`. `SLR`: The logical resource usage of the current segment. Only when `PPR < PRL && PLR < PRL` (`PRL`: Physical resource limit of the querynode), the segment is allowed to be loaded. --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-08-01 21:31:37 +08:00
wei liu	7b8bf6393b	enhance: Improve partial result evaluation with row count based strategy (#43361 ) issue: #43360 Enhance the partial result evaluation mechanism in delegator to use row count based data ratio instead of simple segment count ratio for better accuracy. Key improvements: - Introduce PartialResultEvaluator interface for flexible evaluation strategy - Implement NewRowCountBasedEvaluator using sealed segment row count data - Replace segment count based ratio with row count based data ratio calculation - Update PinReadableSegments to return sealedRowCount information - Modify executeSubTasks to use configurable evaluator for partial result decisions - Add comprehensive unit tests for the new row count based evaluation logic This change provides more accurate partial result evaluation by considering the actual data volume rather than just segment quantity, leading to better query performance and consistency when some segments are unavailable. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-07-28 10:18:55 +08:00
wei liu	990a25e51a	fix: Prevent delete records loss during slow segment loading [QueryNodeV2] (#43527 ) issue: #42884 Fixes an issue where delete records for a segment are lost from the delete buffer if `load segment` execution on the delegator is too slow, causing `syncTargetVersion` or other cleanup operations to clear them prematurely. Changes include: - Introduced `Pin` and `Unpin` methods in `DeleteBuffer` interface and its implementations (`doubleCacheBuffer`, `listDeleteBuffer`). - Added a `pinnedTimestamps` map to track timestamps protected from cleanup by specific segments. - Modified `LoadSegments` in `shardDelegator` to `Pin` relevant segment delete records before loading and `Unpin` them afterwards. - Added `isPinned` check in `UnRegister` and `TryDiscard` methods of `listDeleteBuffer` to skip cleanup if corresponding timestamps are pinned. - Added comprehensive unit tests for `Pin`, `Unpin`, and `isPinned` functionality, covering basic, multiple pins, concurrent, and edge cases. This ensures the integrity of delete records by preventing their premature removal from the delete buffer during segment loading. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-07-24 01:00:54 +08:00
Zhen Ye	3aacd179f7	fix: balance channel before balance segment when upgrading (#43346 ) issue: #43117, #42966, #43373 - also fix channel balance may not work at 2.6. - fix error lost at delete path - add mvcc into s/q log - change the log level for TestCoordDownSearch Signed-off-by: chyezh <chyezh@outlook.com>	2025-07-17 20:16:52 +08:00
wei liu	039564199c	fix: Prevent duplicate segment results in count queries (#43173 ) issue: #41570 Fix issue where growing and sealed segments could be searched simultaneously, causing inflated count() results. This was caused by logic introduced in PR #42009 that made sealed segments readable before target version advancement. Changes include: - Fix conditional filtering logic in PinReadableSegments to prevent sealed segments from becoming readable prematurely - Use target version filter for full results (ratio=1.0) to ensure sealed segments only become readable after target advancement - Use query view segment list filter for partial results (ratio<1.0) to maintain backward compatibility - Simplify target version setting in AddDistributions to prevent premature segment readability - Add logging for redundant growing segments during sync - Add comprehensive unit tests covering the duplicate segment scenario This fix ensures count() queries return accurate results by preventing the same segment from being counted in both growing and sealed states. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-07-14 11:10:49 +08:00
Zhen Ye	15a6631147	enhance: add quota limit based on sn consuming lag (#43105 ) issue: #42995 - The consuming lag at streaming node will be reported to coordinator. - The consuming lag will trigger the write limit and deny by quota center. - Set the ttProtection by default. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-07-11 14:10:49 +08:00
Chun Han	07745439b5	fix: empty search groupby result causing crash(#43137 ) (#43214 ) related: #43137 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-07-10 12:04:48 +08:00
aoiasd	97b1c3ed96	enhance: add warn log if some segment's bm25 stats lacks (#43111 ) Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-07-09 23:22:47 +08:00
aoiasd	54cc0b60f2	fix: dropped segment in excluded segment use wrong excluded ts (#43115 ) cause some excluded growing data insert again relate: https://github.com/milvus-io/milvus/issues/43114 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-07-08 18:04:46 +08:00
congqixia	7bc7b18ed5	fix: [AddField] Prevent concurrent load during UpdateSchema (#43043 ) Related to #43028 This PR: - Add mutex prevent concurrent load segment & schema change - Add schema verison field in load meta - Update schema in PutOrRef if schema verison is larger --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-02 17:38:44 +08:00
wei liu	c381bf3e41	enhance: add logs for count(*) (#43001 ) Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-07-01 19:36:43 +08:00
wei liu	396120ade5	enhance: Improve delegator serviceable check with coordinator sync state (#42975 ) issue: #42404 Add syncedByCoord field to ensure delegator only becomes serviceable after coordinator sync, preventing unreliable service state when memory is insufficient. Issue: When memory is low, delegator may become serviceable before current target is ready, but segments can be released at any time, making the serviceable state unreliable. Changes include: - Add syncedByCoord field to track coordinator sync status - Update Serviceable() to require both data readiness and coord sync - Set syncedByCoord=true in SyncTargetVersion - Add comprehensive test coverage Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-07-01 10:00:43 +08:00
aoiasd	e2566c0e92	enhance: bm25 stats local cache use local storage path (#42923 ) Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-06-25 13:44:46 +08:00
wei liu	bf5fde1431	fix: Prevent delegator unserviceable due to shard leader change (#42689 ) issue: #42098 #42404 Fix critical issue where concurrent balance segment and balance channel operations cause delegator view inconsistency. When shard leader switches between load and release phases of segment balance, it results in loading segments on old delegator but releasing on new delegator, making the new delegator unserviceable. The root cause is that balance segment modifies delegator views, and if these modifications happen on different delegators due to leader change, it corrupts the delegator state and affects query availability. Changes include: - Add shardLeaderID field to SegmentTask to track delegator for load - Record shard leader ID during segment loading in move operations - Skip release if shard leader changed from the one used for loading - Add comprehensive unit tests for leader change scenarios This ensures balance segment operations are atomic on single delegator, preventing view corruption and maintaining delegator serviceability. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-06-19 12:10:38 +08:00
wei liu	679930bb93	enhance: refine delegator state checking error msg (#42673 ) issue: #42661 Add NotStopped() and IsWorking() methods to shardDelegator for better state management and error handling. Changes include: - Add instance state checking methods with proper error messages - Replace lifetime package calls with delegator instance methods - Add comprehensive unit tests for state transitions and error cases - Improve error reporting with channel name for better debugging Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-06-17 10:40:38 +08:00
aoiasd	13330bd466	fix: add concurrency and close protect for bm25 function (#42597 ) relate: https://github.com/milvus-io/milvus/issues/42576 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-06-10 11:36:34 +08:00
cai.zhang	5566a85bcc	enhance: Add proxy task queue metrics (#42156 ) issue: #42155 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-06-04 11:26:32 +08:00
Zhen Ye	4bad293655	enhance: make upgrading from 2.5.x less down time (#42082 ) issue: #40532 - start timeticksync at rootcoord if the streaming service is not available - stop timeticksync if the streaming service is available - open a read-only wal if some nodes in cluster is not upgrading to 2.6 - allow to open read-write wal after all nodes in cluster is upgrading to 2.6 --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-05-29 23:02:29 +08:00
aoiasd	3a74044149	fix: hybird search sub requset not set analyzer name (#41896 ) relate: https://github.com/milvus-io/milvus/issues/41213 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-05-29 14:56:28 +08:00
aoiasd	2ae4d80120	enhance: support run analyzer by loaded collection field (#42113 ) relate: https://github.com/milvus-io/milvus/issues/42094 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-05-29 10:54:30 +08:00
Xianhui Lin	da30e1e4df	fix: pass the ttl duration in the search request for ttl filter (#42122 ) fix: pass the TTL duration in the search request for TTL filter issue:https://github.com/milvus-io/milvus/issues/41959 Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-05-28 11:08:29 +08:00
wei liu	54619eaa2c	feat: Implement partial result support on node down (#42009 ) issue: https://github.com/milvus-io/milvus/issues/41690 This commit implements partial search result functionality when query nodes go down, improving system availability during node failures. The changes include: - Enhanced load balancing in proxy (lb_policy.go) to handle node failures with retry support - Added partial search result capability in querynode delegator and distribution logic - Implemented tests for various partial result scenarios when nodes go down - Added metrics to track partial search results in querynode_metrics.go - Updated parameter configuration to support partial result required data ratio - Replaced old partial_search_test.go with more comprehensive partial_result_on_node_down_test.go - Updated proto definitions and improved retry logic These changes improve query resilience by returning partial results to users when some query nodes are unavailable, ensuring that queries don't completely fail when a portion of data remains accessible. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-05-28 00:12:28 +08:00
aoiasd	0fafb706ba	enhance: add segment bm25 stats local cache (#41775 ) relate: https://github.com/milvus-io/milvus/issues/41424 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-05-26 18:44:27 +08:00
Zhen Ye	38c804fb01	fix: more stable recovery graceful closing and stable unittest (#42013 ) issue: #41544 Signed-off-by: chyezh <chyezh@outlook.com>	2025-05-23 17:52:26 +08:00
wei liu	78010262f0	enhance: Optimize shard serviceable mechanism (#41937 ) issue: https://github.com/milvus-io/milvus/issues/41690 - Merge leader view and channel management into ChannelDistManager, allowing a channel to have multiple delegators. - Improve shard leader switching to ensure a single replica only has one shard leader per channel. The shard leader handles all resource loading and query requests. - Refine the serviceable mechanism: after QC completes loading, sync the query view to the delegator. The delegator then determines its serviceable status based on the query view. - When a delegator encounters forwarding query or deletion failures, mark the corresponding segment as offline and transition it to an unserviceable state. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-05-22 11:38:24 +08:00
congqixia	186a01eef4	fix: [AddField] Broadcast update schema even there is no segment (#41780 ) Related to #41744 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-05-13 16:02:55 +08:00

1 2 3 4 5 ...

255 Commits