Cherry-pick from master
pr: #36107
Related to #36102
This PR use newly added `grpcSizeStatsHandler` to reduce calling
`proto.Size` since the request & response size info is recorded by grpc
framework.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Cherry-pick from master
pr: #36125
Related to #35941
Previous PR: #36034
This patch makes the switch branching logic correct and make the unit
test work for cases which does not select the whole dataset.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #36064
pr: #36065
when delegator has high memory usage, load l0 segment will failed. and
balance segment task will blocked by load segment task, then delegator
cann't free memory by moving out some segment, causes a logic dead lock.
this PR remove the limit for balance, we permit segment and balance
execute in parallel. which won't cause side effect due to:
1. one segment can only has one task in qc's scheduler, and load/release
task will replace balance task if necessary
2. balance speed has been limited, and it won't block load segment task.
3. if collection has load task and balance task at same time, load task
will be scheduled first due to high proirity.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Cherry pick from master
pr: #36034
Related to #35941
Previous PR: #35943
This PR make `Trie` index using `MARISA_LABEL_ORDER`, which make
predictive search iterating in lexicographic order.
When trie index is build in label order, lexicographc could be utilized
accelerating `Range` operations.
However according to the official document, using `MARISA_LABEL_ORDER`
will make "exact match lookup, common prefix search, and predictive
search" slower.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Cherry-pick from master
pr: #35928
Related to #35927
There are serveral issue this PR addresses:
- Use `ResetTraceConfig` method instead init one in update event handler
- Implement dynamic stats.Handler to receive tracing config update event
- Update `enable_trace` flag when `ResetTraceConfig` is invoked
- Change `enable_trace` to `std::atomic<bool>` in case of data race
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #33550
pr: #35919
cause concurrent issue may occur between remove parition in target
manager and sync segment list to delegator. when it happens, some
segment may be released in delegator, and those segment may also be
synced to delegator, which cause delegator become unserviceable due to
lack of necessary segments, then search/query fails.
this PR make sure that all write access to target_manager will be
executed in serial to avoid the concurrent issues.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Cherry-pick from master
pr: #35943
Related to #35941
For marisa trie `predictive_search` default behavior, it value iterated
is not in lexicographic order.
This PR is a brute force fix to make range operator returns correct
values.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #35846
pr: #35850
querycoord will notify proxy to update shard leader cache after
delegator location changes, but during querynode's failure recovery,
some delegator may become unserviceable due to lacking of segments, and
back to serviceable after segment loaded, so we also need to notify
proxy to invalidate shard leader cache when delegator serviceable state
changes.
This PR will maintain querynode's serviceable state during heartbeat,
and notify proxy to invalidate shard leader cache if serviceable state
changes.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Cherry-pick from master
pr: #35905
Related to #35415
This PR make querycoord report error when load request tries to update
load fields list, which is currently not supported.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Cherry-pick from master
pr: #35806
The debug log for "RateLimiter register for rateType" is too frequent
and in e2e cases, the may print 18M times in one run.
This PR make the log be printed only when rate limit is updated.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Cherry-pick from master
pr: #35749
"skip balance" log is too frequent in debug level. This PR changes it
into rated on.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Cherry-pick from master
pr: #35778
Related to #35767
prometheus counter cannot add negative value
when response is not written(say timeout/network broken) panicking may
happen if not check
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>