milvus/internal
wei liu 2232dfc3de
fix: Prevent Close from hanging on etcd reconnection (#45622)
issue: #45623
When etcd reconnects, the DataCoord rewatches DataNodes and calls
ChannelManager.Startup again without closing the previous instance. This
causes multiple contexts and goroutines to accumulate, leading to Close
hanging indefinitely waiting for untracked goroutines.

Root cause:
- Etcd reconnection triggers rewatch flow and calls Startup again
- Startup was not idempotent, allowing repeated calls
- Multiple context cancellations and goroutines accumulated
- Close would wait indefinitely for untracked goroutines

Changes:
- Add started field to ChannelManagerImpl
- Refactor Startup to check and handle restart scenario
- Add state check in Close to prevent hanging

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-11-19 12:49:06 +08:00
..