Cherry-pick from master
pr: #45969
Related to #45960
Follow-up to #45961
After #45961 ensured that handleNodeUp is always called for nodes
discovered during rewatchNodes (including stopping nodes), this change
adds a safeguard in ResourceManager.handleNodeUp to skip adding stopping
nodes to resource groups.
1. **resource_manager.go**: Add check for IsStoppingState() in
handleNodeUp to prevent stopping nodes from being added to incomingNode
set and assigned to resource groups.
2. **server.go**:
- Delete processed nodes from sessionMap to avoid duplicate processing
in the subsequent loop
- Add warning logs for stopping state transitions during rewatch
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Cherry-pick from master
pr: #45147
Related to #44509
Fix a bug where QueryNodeNumEntities metrics were not updated for
collections with zero segments, causing stale metrics when all segments
are flushed or compacted.
The previous implementation used separate loops: one to update size
metrics for all collections, and another to update num entities metrics
only for collections present in the grouped segments map. Collections
with no segments were skipped in the second loop, leaving their
NumEntities metrics stale.
Changes:
- Consolidate size and num entities metric updates into single loop
- Iterate over all collections instead of grouped segments
- Get collection metadata from manager instead of segment instances
- Correctly set NumEntities to 0 for collections with no segments
- Apply the same fix to both growing and sealed segment processing
- Add nil check for collection metadata before processing
This ensures all collection metrics are updated consistently, even when
segment count drops to zero.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Cherry-pick from master
pr: #45961
Related to #45960
When QueryCoord restarts or reconnects to etcd, the rewatchNodes
function previously skipped handleNodeUp for QueryNodes in stopping
state. This caused stopping balance to fail because necessary components
were not initialized:
- Task scheduler executor was not added
- Dist handler was not started
- Node was not registered in resource manager
This fix ensures handleNodeUp is always called for new nodes regardless
of their stopping state, followed by handleNodeStopping if the node is
stopping. This allows the graceful shutdown process to correctly migrate
segments and channels away from stopping nodes.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
1. Array.h: Add output_data(ScalarFieldProto&) overload for both Array
and ArrayView classes
2. Use std::string_view instead of std::string for VARCHAR and GEOMETRY
types to avoid extra string copies
3. Call Reserve(length_) before writing to proto objects to reduce
memory reallocations
a simple test shows those optimizations improve the Array of Varchar
bulk_subscript performance by 20%
issue: https://github.com/milvus-io/milvus/issues/45679
pr: https://github.com/milvus-io/milvus/pull/45743
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
issue: #45623
When etcd reconnects, the DataCoord rewatches DataNodes and calls
ChannelManager.Startup again without closing the previous instance. This
causes multiple contexts and goroutines to accumulate, leading to Close
hanging indefinitely waiting for untracked goroutines.
Root cause:
- Etcd reconnection triggers rewatch flow and calls Startup again
- Startup was not idempotent, allowing repeated calls
- Multiple context cancellations and goroutines accumulated
- Close would wait indefinitely for untracked goroutines
Changes:
- Add started field to ChannelManagerImpl
- Refactor Startup to check and handle restart scenario
- Add state check in Close to prevent hanging
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #45608
pr: #45609
When component.Prepare() fails (e.g., net listener creation error), the
sign channel was never closed, causing runComponent to block
indefinitely at <-sign. This resulted in the entire process hanging
after logging the error message.
Changes:
- Move close(sign) to defer statement in runComponent goroutine
- Ensures sign channel is always closed regardless of success/failure
- Allows proper error propagation through future.Await() mechanism
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
When there're no growing segments in the collection, L0 Compaction will
try to choose all L0 segments that hits all L1/L2 segments.
However, if there's Sealed Segment still under flushing in DataNode at
the same time L0 Compaction selects satisfied L1/L2 segments, L0
Compaction will ignore this Segment because it's not in "FlushState",
which is wrong, causing missing deletes on the Sealed Segment.
This quick solution here is to fail this L0 compaction task once
selected a Sealed segment.
See also: #45339
pr: #45340
pr: #45341
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
Cherry-pick from master
pr: #45037Fixes#45035
This commit addresses a data race issue where refreshCollection was
updating the collection notifier without proper lock protection.
Changes:
- Add UpdateCollection method to CollectionManager with proper locking
- Introduce CollectionOperator pattern for thread-safe collection
updates
- Make setRefreshNotifier private and use it through the operator
pattern
- Update refreshCollection to use the new UpdateCollection method
- Handle collection not found error gracefully in refreshCollection
The CollectionOperator pattern ensures all collection modifications go
through the CollectionManager's lock, preventing concurrent access
issues.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #44909
pr: #44917
When requery optimization is enabled, search results contain IDs but
empty FieldsData. During reduce/rerank operations, if the first shard
has empty FieldsData while others have data, PrepareResultFieldData
initializes an empty array, causing AppendFieldData to panic when
accessing array indices.
Changes:
- Find first non-empty FieldsData as template in 5 functions:
reduceAdvanceGroupBY, reduceSearchResultDataWithGroupBy,
reduceSearchResultDataNoGroupBy, rankSearchResultDataByGroup,
rankSearchResultDataByPk
- Add length check before 4 AppendFieldData calls to prevent panic
- Add unit tests for empty and partial empty FieldsData scenarios
This fix handles both pure requery (all empty) and mixed scenarios
(some empty, some with data) without breaking normal search flow.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #44961
master pr: #44966
This PR fixes 3 geometry related bugs:
1. Implement ToString interface for GisFunctionFilter.
2. Ignore GisFunctionFilter MoveCursor for growing segment.
3. Don't skip null geometry for building R-Tree index, should be record
in null_offsets.
---------
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>