Cherry-pick from master
pr: #44870
Related to #44819
This fix addresses an issue(#44819) where the offset parameter did not
work correctly during searches when multiple results had identical
scores. The problem occurred because results with equal scores were not
consistently ordered, leading to unpredictable pagination behavior.
The solution adds a new sorting step (SortEqualScoresByPks) in the
reduce phase that sorts results with identical scores by their primary
keys in ascending order. This ensures deterministic ordering and enables
proper offset functionality.
Changes:
- Add SortEqualScoresByPks() to sort results with equal scores by PK
- Add SortEqualScoresOneNQ() to handle per-query sorting logic
- Invoke sorting step after FillPrimaryKey() in Reduce() workflow
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #43858
pr: #44834
Fix the issue introduced in PR #43992 where deactivating the balance
checker incorrectly stops stopping balance operations.
Changes:
- Move IsActive() check after stopping balance logic
- Only skip normal balance when checker is inactive
- Allow stopping balance to proceed regardless of checker state
This ensures stopping balance can execute even when the balance checker
is deactivated.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #43828
pr: #43829#43909
Implement robust rewatch mechanism to handle etcd connection failures
and node reconnection scenarios in DataCoord and QueryCoord, along with
heartbeat lag monitoring capabilities.
Changes include:
- Implement rewatchDataNodes/rewatchQueryNodes callbacks for etcd
reconnection scenarios
- Add idempotent rewatchNodes method to handle etcd session recovery
gracefully
- Add QueryCoordLastHeartbeatTimeStamp metric for monitoring node
heartbeat lag
- Clean up heartbeat metrics when nodes go down to prevent metric leaks
---------
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Co-authored-by: Zhen Ye <chyezh@outlook.com>
Cherry-pick from master
pr: #44789
This commit addresses issue #44788 where the
`datacoord_stored_binlog_size` metric could become inaccurate when
multiple concurrent `GetMetrics` calls arrived at DataCoord.
### Problem
The original implementation called `Reset()` followed by `Add()`
operations on Prometheus metrics within the `GetQuotaInfo()` method.
When multiple goroutines invoked this method concurrently, race
conditions occurred:
- Thread 1: Reset() → Add(value1)
- Thread 2: Reset() → Add(value2)
- Result: Metrics could be reset multiple times and values added in an
interleaved fashion, leading to inaccurate and inflated metric values
### Solution
Changed the approach from `Reset() + Add()` to aggregating metric values
in local maps first, then using `Set()` to update metrics atomically:
1. Collect segment size data into local maps:
- `storedBinlogSize`: tracks size per collection per segment state
- `binlogFileSize`: tracks total file count per collection
- `coll2DbName`: maps collection IDs to database names
2. After aggregation is complete, use `Set()` (instead of `Add()`) to
update metrics in a single operation per label combination
This ensures that concurrent `GetMetrics` calls don't interfere with
each other, as each invocation works with its own local state and only
updates the final metric value atomically.
---------
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Cherry-pick from master
pr: #44723
Related to #36672
Add accesslog field displaying value length for search/query request may
help developers debug related issues
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #44699
Fix the issue where L0 segment checking logic incorrectly identifies L0
segments as missing when they exist on multiple delegators during
channel balance process, which blocks sealed segment loading and target
progression.
Changes include:
- Replace GetLatestShardLeaderByFilter with GetByFilter to check all
delegators instead of only the latest leader
- Iterate through all delegator views to identify which ones lack the L0
segment
The original logic only checked the latest shard leader, causing false
positive detection of missing L0 segments when they actually exist on
other delegators in the same channel during balance operations. This led
to continuous generation of duplicate L0 segment load tasks, preventing
normal sealed segment loading flow.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #43427
master pr: #44606
The GISFunction asserts that the segment_offsets cannot be nullptr. When
size is 0, the segment_offsets is nullptr, so the loop is skiped.
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Cherry-pick from master
pr: #44706
Related to #44703
This PR:
- Add `SetActualConsistencyLevel` to `info.AccessInfo` interface and
related util method processing it
- Make `$consistency_level` returning actual value if set
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #43858
pr: #43992
Refactor the balance checker implementation to use priority queues for
managing collection balance operations, improving processing efficiency
and order control.
Changes include:
- Export priority queue interfaces (Item, BaseItem, PriorityQueue)
- Replace collection round-robin with priority-based queue system
- Add BalanceCheckCollectionMaxCount configuration parameter
- Optimize balance task generation with batch processing limits
- Refactor processBalanceQueue method for different strategies
- Enhance test coverage with comprehensive unit tests
The new priority queue system processes collections based on row count
or collection ID order, providing better control over balance operation
priorities and resource utilization.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #44156
pr: #44234
Enhance FlushAll functionality to support targeting specific collections
within databases instead of only database-level flushing.
Changes include:
- Add FlushAllTarget message in data_coord.proto for granular targeting
- Support collection-specific flush operations within databases
- Maintain backward compatibility with deprecated db_name field
This enhancement allows users to flush specific collections without
affecting other collections in the same database, providing more precise
control over data persistence operations.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #43427
pr: #37417
Support R-Tree index for geometry datatype.
---------
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Co-authored-by: ZhuXi <150327960+Yinwei-Yu@users.noreply.github.com>