issue: #45728
When mixcoord is in standby mode and shutdown is triggered, the
ProcessActiveStandBy goroutine may panic if context cancellation occurs.
This happens because the error handling didn't check for
context.Canceled errors before panicking.
Changes:
- Add context cancellation check in mix_coord Register() before panic
- Check s.ctx.Err() == context.Canceled and gracefully exit
- Remove unused ForceActiveStandby() function from session_util
This ensures standby mixcoord can shutdown gracefully without panic when
context is cancelled during the standby process.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Related to #44620
Related to unstable ut "internal/querycoordv2 TestServer/TestNodeUp"
Introduce SessionWatcher interface to fix race condition and goroutine
leak that caused unstable unit test TestServer/TestNodeUp.
Changes:
- Add SessionWatcher interface with EventChannel() and Stop() methods
- Refactor WatchServices() to return SessionWatcher instead of raw
channel
- Fix cleanup order in QueryCoordV2: stop watcher before session
- Update DataCoord, ConnectionManager to use SessionWatcher
- Add MockSessionWatcher for testing
Fixes race condition between session context cancellation and internal
loop exit. Eliminates goroutine leak by providing explicit lifecycle
management.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #44620
Make unittest stable by waiting eventch instead of async read
len(eventCh), which is unstable
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #36977
with node_label_filter on resource group, user can add label on
querynode with env `MILVUS_COMPONENT_LABEL`, then resource group will
prefer to accept node which match it's node_label_filter.
then querynode's can't be group by labels, and put querynodes with same
label to same resource groups.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
See also #31628
The `Revoke` operation shall delete all keys related to the lease
attaching to. This `deleteSession` operation may also remove the session
key in next epoch by mistake and cause chaos session status
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
This PR mainly improve two items:
1. Target observer should refresh loading status during init time. An
uninitialized loading status blocks search/query. Currently, the target
observer refreshes every 10 seconds, i.e. we'd need to wait for 10s for
no reason. That's also the reason why we constantly see false log
"collection unloaded" upon mixcoord restarts.
2. Delete session when service is stopped. So that the new service
doesn't need to wait for the previous session to expire (~10s).
Item 1 is the major improvement of this PR, which should speed up init
time by 10s.
Item 2 is not a big concern in most cases as coordinators usually shut
down after stop(). In those cases, coordinator restart triggers serverID
change which further triggers an existing logic that deletes expired
session. This PR only fixes rare cases where serverID doesn't change.
integration test:
`go test -tags dynamic -v -coverprofile=profile.out -covermode=atomic
tests/integration/coordrecovery/coord_recovery_test.go -timeout=20m`
Performance after the change:
Average init time of coordinators: 10s
Hardware: M2 Pro
Test setup: 1000 collections with 1000 rows (dim=128) per collection.
issue: #29409
Signed-off-by: yiwangdr <yiwangdr@gmail.com>