Cherry pick from pr
#38311
Related to #37630
Add secondary index with vchannel to reduce `GetBy` rlock holding time
when segment number is large.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/38142
current balance channel policy only consider current collection's
distribution, so if all collections has 1 channel, and all channels has
been loaded on same querynode, after querynode num increase, balance
channel won't be triggered.
This PR enable score based balance channel policy, to achieve:
1. distribute all channels evenly across multiple querynodes
2. distribute each collection's channel evenly across multiple
querynodes.
pr: https://github.com/milvus-io/milvus/pull/38143
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Co-authored-by: Wei Liu <wei.liu@zilliz.com>
issue: #33285
pr: #37815
- remove the rpc layer of coordinator when enabling standalone or
mixcoord
- move health check into init
---------
Signed-off-by: chyezh <chyezh@outlook.com>
1. DataNode: Skip generating BF during the insert phase (BF will be
regenerated during the sync phase).
2. QueryNode: Skip generating or maintaining BF for growing segments;
deletion checks will be handled in the segcore.
issue: https://github.com/milvus-io/milvus/issues/37630
pr: https://github.com/milvus-io/milvus/pull/38129
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/37764
- add a local client to call local server directly for
querycoord/rootcoord/datacoord.
- enable local client if milvus is running mixcoord or standalone mode.
Signed-off-by: chyezh <chyezh@outlook.com>
---------
Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: Zhen Ye <chyezh@outlook.com>
1. Introduce a data view mechanism for DataCoord, attempting to update
each collection's data view periodically.
2. QueryCoord maintains a cache of data view versions. Before
batch-fetching recovery info, it retrieves all versions and only fetches
recovery info for collections with updated versions.
3. Return DataCoord's current data view when fetching RecoverInfo.
issue: https://github.com/milvus-io/milvus/issues/37743,
https://github.com/milvus-io/milvus/issues/37630
pr: https://github.com/milvus-io/milvus/pull/37863
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
issue: #33285
pr: #37722
- move most cgo opeartions related to search/query into segcore package
for reusing for streamingnode.
- add go unittest for segcore operations.
Signed-off-by: chyezh <chyezh@outlook.com>
Cherry-pick from master
pr: #37405
Cgo API cost is not observerable since not metrics is related to them.
This PR add metrics for some sync cgo call related to load & write
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #36977
pr: #36968
with node_label_filter on resource group, user can add label on
querynode with env `MILVUS_COMPONENT_LABEL`, then resource group will
prefer to accept node which match it's node_label_filter.
then querynode's can't be group by labels, and put querynodes with same
label to same resource groups.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Cherry-pick from master
pr: #37416
See also #37404#37402
IP address in paramtable need validation and fail fast with reasonable
error message
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #37166
pr: #37433
cause the misuse of timer.Reset, which cause dispatcher failed to send
msg to virtual channel buffer, and dispatcher do splitting again and
again, which hold the dispatcher manager's lock, block watching channel
progress.
This PR fix the misuse of timer.Reset
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Cherry-pick from master
pr: #37468
Previously failed label is used for canceled storage op, which may cause
wrong alarm when user cancel load operation or etc. This PR utilizes
cancel label when such case happens.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Timeout is a bad design for long running tasks, especially using a
static timeout config. We should monitor execution progress and fail the
task if the progress has been stale for a long time.
This pr is a small patch to stop DC from marking compaction tasks
timeout, while still waiting for DN to finish. The design is
self-conflicted. After this pr, mix and L0 compaction are no longer
controlled by DC timeout, but clustering is still under timeout control.
The compaction queue capacity grows larger for priority calc, hence
timeout compactions appears more often, and when timeout, the queuing
tasks will be timeout too, no compaction will success after.
See also: #37108, #37015
pr: #37118
---------
---------
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
Cherry pick from master
pr: #37223
Related to #36102
Previous PR #36107 add grpc inteceptor to observe rpc stats. Using same
strategy, this pr add gin middleware to observer restful v2 rpc stats.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Cherry pick from master
pr: #37081
Related to #35303
This PR add metrics for querynode delegator delete buffer information,
which is related to dml quota logic.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>