Related to #42702
This patch add wait logic for `CheckerController` and nil check for
channel checker in case of panicking during server/testcase stop
procedure
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #42165
Implement dynamic task execution capacity calculation based on QueryNode
CPU core count instead of static configuration for better resource
utilization.
Changes include:
- Add CpuCoreNum() method and WithCpuCoreNum() option to NodeInfo
- Implement GetTaskExecutionCap() for dynamic capacity calculation
- Add QueryNodeTaskParallelismFactor parameter for tuning
- Update proto definition to include cpu_core_num field
- Add unit tests for new functionality
This allows QueryCoord to automatically adjust task parallelism based on
actual hardware resources.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #42098#42404
related to: ##42009 #41937
Implement new method to handle partition removal from next target
without directly modifying current target.
Changes include:
- Add RemovePartitionFromNextTarget method and deprecate RemovePartition
- Update target_observer to use new method for ReleasePartition
operations
- Add unit tests and mock methods for new functionality
This ensures that all changes to next target will propagates to
delegator's query view.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #42492
- consider the old RO query node (not streaming node) when balancing
channel.
- querynode graceful stop can be done if there's only L0 segment exists.
Signed-off-by: chyezh <chyezh@outlook.com>
Remove the 'goccy/go-json' library, which was inadvertently introduced,
and revert to using the standard internal JSON handling.
Changes include:
- Removed dependency on 'github.com/goccy/go-json' in go.mod and go.sum.
- Replaced import of 'goccy/go-json' with 'internal/json' in
'internal/querycoordv2/task/scheduler.go'.
This correction ensures the project continues to use the intended JSON
processing libraries and avoids unnecessary external dependencies.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #42176
Remove the mutual exclusion constraints between channel and segment
balance tasks to allow them to run concurrently.
Changes include:
- Remove permitBalanceChannel() and permitBalanceSegment() methods from
RoundRobinBalancer
- Update ChannelLevelScoreBalancer, MultiTargetBalancer,
RowCountBasedBalancer, and ScoreBasedBalancer to remove constraint
checks
- Allow segment balance tasks to proceed even when channel balance tasks
are running
- Update test cases to reflect new behavior where balance tasks no
longer block each other
This change improves the efficiency of load balancing by removing
unnecessary coordination overhead between different types of balance
operations.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/41690
This commit implements partial search result functionality when query
nodes go down, improving system availability during node failures. The
changes include:
- Enhanced load balancing in proxy (lb_policy.go) to handle node
failures with retry support
- Added partial search result capability in querynode delegator and
distribution logic
- Implemented tests for various partial result scenarios when nodes go
down
- Added metrics to track partial search results in querynode_metrics.go
- Updated parameter configuration to support partial result required
data ratio
- Replaced old partial_search_test.go with more comprehensive
partial_result_on_node_down_test.go
- Updated proto definitions and improved retry logic
These changes improve query resilience by returning partial results to
users when some query nodes are unavailable, ensuring that queries don't
completely fail when a portion of data remains accessible.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/41690
- Merge leader view and channel management into ChannelDistManager,
allowing a channel to have multiple delegators.
- Improve shard leader switching to ensure a single replica only has one
shard leader per channel. The shard leader handles all resource loading
and query requests.
- Refine the serviceable mechanism: after QC completes loading, sync the
query view to the delegator. The delegator then determines its
serviceable status based on the query view.
- When a delegator encounters forwarding query or deletion failures,
mark the corresponding segment as offline and transition it to an
unserviceable state.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #41874
- Optimize balance_checker to support balancing multiple collections
simultaneously
- Add new parameters for segment and channel balancing batch sizes
- Add enableBalanceOnMultipleCollections parameter
- Update tests for balance checker
This change improves resource utilization by allowing the system to
balance multiple collections in a single trigger with configurable batch
sizes.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Merge RootCoord, DataCoord And QueryCoord into MixCoord
Make Session into one
issue : https://github.com/milvus-io/milvus/issues/37764
---------
Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
issue: #41194
- Refactor hasUnbalancedCollection flag handling to function scope
- Ensure tracking sets clearance when no balance needed
- Add deferred cleanup for both normal/stopping balance paths
- Add unit tests for collection tracking scenarios
The changes ensure tracking sets (normalBalanceCollectionsCurrentRound
and stoppingBalanceCollectionsCurrentRound) are properly cleared when:
- All collections in current round are balanced
- Balance checks return early due to unready targets
- Balance feature flags are disabled
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #39937
Before PR #39552, whenever a segment was missing in either the `current
target` or the `next target`, we would trigger `load segment` to recover
the delegator. However, restoring only the missing segments in the `next
target` is sufficient to advance the target and complete the recovery
process.
In PR #39552, we removed the scheduling of L0 segments along with this
unnecessary `load segment` logic. However, this exposed a new issue: if
the `current target` still has missing segments and there is a flaw in
the `checkDelegatorDataReady` logic, it could block the recovery of a
delegator that contains `offline segments`.
Since `offline segments` are cleaned up asynchronously in this scenario,
this PR removes their blocking effect on delegator recovery, ensuring a
smoother failure recovery process.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
after the pr merged, we can support to insert, upsert, build index,
query, search in the added field.
can only do the above operates in added field after add field request
complete, which is a sync operate.
compact will be supported in the next pr.
#39718
---------
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
issue: #37651
this PR enable to balance the collection with largest row count first,
to avoid temporary migration of small table data to new nodes during
their onboarding, only to be moved out again after the large table
balance, which would cause unnecessary load.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #40292
related to #39552
- Fix incorrect delete checkpoint usage in SyncDistribution
- Change checkpoint parameter from action.GetCheckpoint() to
action.GetDeleteCP() in SyncTargetVersion call
- This resolves the issue where delete buffer data was being cleaned
prematurely due to wrong checkpoint reference
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #40052
task delta cache rely on the taskID is unique, so it incDeltaCache at
AddTask, and decDeltaCache at RemoveTask, but the taskID allocator is
not atomic, which cause two task with same taskID, in such case, it will
call incDeltaCache twice, but call decDeltaCacheOnce, which cause delta
cache leak.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #39551
This PR remove querycoord's scheduling of l0 segments:
- only load l0 segment when watch channel
- only release l0 segment when release channel or sync data distribution
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
https://github.com/milvus-io/milvus/issues/35528
This PR adds json index support for json and dynamic fields. Now you can
only do unary query like 'a["b"] > 1' using this index. We will support
more filter type later.
basic usage:
```
collection.create_index("json_field", {"index_type": "INVERTED",
"params": {"json_cast_type": DataType.STRING, "json_path":
'json_field["a"]["b"]'}})
```
There are some limits to use this index:
1. If a record does not have the json path you specify, it will be
ignored and there will not be an error.
2. If a value of the json path fails to be cast to the type you specify,
it will be ignored and there will not be an error.
3. A specific json path can have only one json index.
4. If you try to create more than one json indexes for one json field,
sdk(pymilvus<=2.4.7) may return immediately because of internal
implementation. This will be fixed in a later version.
---------
Signed-off-by: sunby <sunbingyi1992@gmail.com>
Related to #39840
The target could be updated async in previous code. This PR make remove
collection from target observer block until all tasks related in
dispatchers are removed preventing the metrics being updated after
collection released.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #38399
- Make a timetick-commit-based write ahead buffer at write side.
- Add a switchable scanner at read side to transfer the state between
catchup and tailing read
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #39680
if compaction/gc happens, load collection may stuck due to
SegmentNotFound, we should trigger UpdateNextTarget to get a new data
view to execute loading operation.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #39681
this PR maintain workload effect in action instead of computing workload
effect from target, which may cause leak if target changes.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #38399
- Add a pchannel level checkpoint for flush processing
- Refactor the recovery of flushers of wal
- make a shared wal scanner first, then make multi datasyncservice on it
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #36621#39417
1. Adjust the server-side cache size.
2. Add source information for configurations.
3. Add node ID for compaction and indexing tasks.
4. Resolve localhost access issues to fix health check failures for
etcd.
Signed-off-by: jaime <yun.zhang@zilliz.com>
issue: #38399
- Embed the query node into streaming node to make delegator available
at streaming node.
- The embedded query node has a special server label
`QUERYNODE_STREAMING-EMBEDDED`.
- Change the balance strategy to make the channel assigned to streaming
node as much as possible.
Signed-off-by: chyezh <chyezh@outlook.com>
issue: https://github.com/milvus-io/milvus/issues/37630
Reduce the frequency updating metrics to avoid holding the mutex for
long periods.
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
When there are many segment tasks in the querycoord scheduler, the
traversal in `GetSegmentTaskDelta` checks becomes time-consuming. This
PR adds caching for segment deltas.
issue: https://github.com/milvus-io/milvus/issues/37630
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Co-authored-by: Wei Liu <wei.liu@zilliz.com>
Read metadata such as segments, binlogs, and partitions concurrently at
the collection level.
issue: https://github.com/milvus-io/milvus/issues/37630
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>