issue: #42176
pr: #42177
Remove the mutual exclusion constraints between channel and segment
balance tasks to allow them to run concurrently.
Changes include:
- Remove permitBalanceChannel() and permitBalanceSegment() methods from
RoundRobinBalancer
- Update ChannelLevelScoreBalancer, MultiTargetBalancer,
RowCountBasedBalancer, and ScoreBasedBalancer to remove constraint
checks
- Allow segment balance tasks to proceed even when channel balance tasks
are running
- Update test cases to reflect new behavior where balance tasks no
longer block each other
- Improve error handling in task executor by preferring serviceable
shard leaders for segment release operations
- Add fallback logic to find latest shard leader when serviceable leader
is not available
This change improves the efficiency of load balancing by removing
unnecessary coordination overhead between different types of balance
operations.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #41874
pr: #41875
- Optimize balance_checker to support balancing multiple collections
simultaneously
- Add new parameters for segment and channel balancing batch sizes
- Add enableBalanceOnMultipleCollections parameter
- Update tests for balance checker
This change improves resource utilization by allowing the system to
balance multiple collections in a single trigger with configurable batch
sizes.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #41194
pr: #41195
- Refactor hasUnbalancedCollection flag handling to function scope
- Ensure tracking sets clearance when no balance needed
- Add deferred cleanup for both normal/stopping balance paths
- Add unit tests for collection tracking scenarios
The changes ensure tracking sets (normalBalanceCollectionsCurrentRound
and stoppingBalanceCollectionsCurrentRound) are properly cleared when:
- All collections in current round are balanced
- Balance checks return early due to unready targets
- Balance feature flags are disabled
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #37651
pr: #40297
this PR enable to balance the collection with largest row count first,
to avoid temporary migration of small table data to new nodes during
their onboarding, only to be moved out again after the large table
balance, which would cause unnecessary load.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #35528
pr: #36750
this pr includes json path index pr and some related prs:
1. update tantivy version #39253
2. json path index #36750
3. fall back to brute force #40076
4. term filter #40140
5. bug fix#40336
---------
Signed-off-by: sunby <sunbingyi1992@gmail.com>
issue: #40052
pr: #40183
task delta cache rely on the taskID is unique, so it incDeltaCache at
AddTask, and decDeltaCache at RemoveTask, but the taskID allocator is
not atomic, which cause two task with same taskID, in such case, it will
call incDeltaCache twice, but call decDeltaCacheOnce, which cause delta
cache leak.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Cherry-pick from master
pr: #39841
Related to #39840
The target could be updated async in previous code. This PR make remove
collection from target observer block until all tasks related in
dispatchers are removed preventing the metrics being updated after
collection released.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #39680
pr: #39701
if compaction/gc happens, load collection may stuck due to
SegmentNotFound, we should trigger UpdateNextTarget to get a new data
view to execute loading operation.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #39681
pr: #39702
this PR maintain workload effect in action instead of computing workload
effect from target, which may cause leak if target changes.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #36621#39417
pr: #39456
1. Adjust the server-side cache size.
2. Add source information for configurations.
3. Add node ID for compaction and indexing tasks.
4. Resolve localhost access issues to fix health check failures for
etcd.
Signed-off-by: jaime <yun.zhang@zilliz.com>
issue: #38970
pr: #38971
cause the stopping balance channel still use the row_count_based policy,
which may causes channel unbalance in multi-collection case.
This PR impl a score based stopping balance channel policy.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #39001
pr: #39000
Background:
Segment Load Version: Each segment load request assigns a timestamp as
its version. When multiple copies of a segment are loaded on different
QueryNodes, the leader checker uses this version to identify the latest
copy and updates the routing table in the leader view to point to it.
Delegator Router Version: When a delegator builds a route to a QueryNode
that has loaded a segment, it also records the segment's version.
Router Table Update Logic: If the leader checker detects that the
version of a segment in the routing table does not match the version in
the worker, it updates the routing table to point to the QueryNode with
the latest version. Additionally, it updates the segment's load version
in the QueryNode during this process.
Issue:
When a channel is undergoing load balancing, the leader checker may sync
the routing table to a new delegator. This sync operation modifies the
segment's load version, which invalidates the routing in the old
delegator. Subsequently, the leader checker updates the routing table in
the old delegator, breaking the routing in the new delegator. This cycle
continues, causing repeated updates and inconsistencies.
Fix:
This PR introduces two changes to address the issue:
1. Use NodeID to verify whether the delegator's routing table needs an
update, avoiding unnecessary modifications.
2. Ensure compatibility by using the latest segment's load version as
the version recorded in the routing table.
These changes resolve the cyclic updates and prevent the leader checker
from generating excessive duplicate tasks, ensuring routing stability
across delegators during load balancing.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #38715
pr: #38770
- Current milvus use a serialized index size(compressed) for estimate
resource for loading.
- Add a new field MemSize (before compressing) for index to estimate
resource.
---------
Signed-off-by: chyezh <chyezh@outlook.com>