issue: #42098#42404
pr: #42689
Fix critical issue where concurrent balance segment and balance channel
operations cause delegator view inconsistency. When shard leader
switches between load and release phases of segment balance, it results
in loading segments on old delegator but releasing on new delegator,
making the new delegator unserviceable.
The root cause is that balance segment modifies delegator views, and if
these modifications happen on different delegators due to leader change,
it corrupts the delegator state and affects query availability.
Changes include:
- Add shardLeaderID field to SegmentTask to track delegator for load
- Record shard leader ID during segment loading in move operations
- Skip release if shard leader changed from the one used for loading
- Add comprehensive unit tests for leader change scenarios
This ensures balance segment operations are atomic on single delegator,
preventing view corruption and maintaining delegator serviceability.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
1. Optimize the import process: skip subsequent steps and mark the task
as complete if the number of imported rows is 0.
2. Improve import integration tests:
a. Add a test to verify that autoIDs are not duplicated
b. Add a test for the corner case where all data is deleted
c. Shorten test execution time
3. Enhance import logging:
a. Print imported segment information upon completion
b. Include file name in failure logs
issue: https://github.com/milvus-io/milvus/issues/42488,
https://github.com/milvus-io/milvus/issues/42518
pr: https://github.com/milvus-io/milvus/pull/42612
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Assign the import task to the worker with the most available slots, even
if availableSlots < requiredSlots. This ensures tasks won’t be blocked
indefinitely.
issue: https://github.com/milvus-io/milvus/issues/41981
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
This pr fixs double assign in channels, recurr of dropped channel, amend
extra channel meta, and refresh incorrect state with DN by the following
edits:
1. Loose the lock in advanceStandbys to avoid concurrent assignment.
2. Trasfer ToWatch to ToRelease if DN returns ErrChannelRedupulicate.
3. Remove dup channel when recovering
See also: #41876
pr: #41837
---------
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
When autoID is enabled, the preimport task estimates row distribution by
evenly dividing the total row count (numRows) across all vchannels:
`estimatedCount = numRows / vchannelNum`.
However, the actual import task hashes real auto-generated IDs to
determine
the target vchannel. This mismatch can lead to inaccurate row
distribution estimation
in such corner cases:
- Importing 1 row into 2 vchannels:
• Preimport: 1 / 2 = 0 → both v0 and v1 are estimated to have 0 rows
• Import: real autoID (e.g., 457975852966809057) hashes to v1
→ actual result: v0 = 0, v1 = 1
To resolve such corner case, we now allocate at least one segment for
each vchannel
when autoID is enabled, ensuring all vchannels are prepared to receive
data even
if no rows are estimated for them.
issue: https://github.com/milvus-io/milvus/issues/41759
pr: https://github.com/milvus-io/milvus/pull/41771
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
pr: #40394
- Use CounterVec to calculate sum of increase during a time period.
- Use entries number instead of binlog size
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
issue: #35563
1. Use an internal health checker to monitor the cluster's health state,
storing the latest state on the coordinator node. The CheckHealth
request retrieves the cluster's health from this latest state on the
proxy sides, which enhances cluster stability.
2. Each health check will assess all collections and channels, with
detailed failure messages temporarily saved in the latest state.
3. Use CheckHealth request instead of the heavy GetMetrics request on
the querynode and datanode
Signed-off-by: jaime <yun.zhang@zilliz.com>
Related to #35415
In rolling upgrade, legacy proxy may dispatch load request wit empty
load field list. The upgraded querycoord may report error by mistake
that load field list is changed.
This PR:
- Auto field empty load field list with all user field ids
- Refine the error messag when load field list updates
- Refine load job unit test with service cases
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>