10564 Commits

Author SHA1 Message Date
cai.zhang
9b4b0cb808
enhance: [2.5] Estimate the taskSlot based on whether scalar or vector index (#46260)
issue: #45186
master pr: #45850

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-12-11 15:43:14 +08:00
Buqian Zheng
dcc3975f17
fix: [2.5] move cursor after skip index skipped a chunk (#46078)
issue: https://github.com/milvus-io/milvus/issues/46053
pr: https://github.com/milvus-io/milvus/pull/46054

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-12-05 13:59:11 +08:00
zhagnlu
f1f11b336b
fix:fix undefined bahavior when dump snapshot (#45613)
pr: #45611

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-12-05 10:49:12 +08:00
cai.zhang
6ce2df9944
fix: [2.5]Fix setting default value for geometry by restful (#46058) (#46065)
issue: https://github.com/milvus-io/milvus/issues/46056
master pr: https://github.com/milvus-io/milvus/pull/46058

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-12-04 16:35:11 +08:00
congqixia
e70e70699c
enhance: [2.5] skip adding stopping node to resource group in handleNodeUp (#45969) (#45982)
Cherry-pick from master
pr: #45969
Related to #45960
Follow-up to #45961

After #45961 ensured that handleNodeUp is always called for nodes
discovered during rewatchNodes (including stopping nodes), this change
adds a safeguard in ResourceManager.handleNodeUp to skip adding stopping
nodes to resource groups.

1. **resource_manager.go**: Add check for IsStoppingState() in
handleNodeUp to prevent stopping nodes from being added to incomingNode
set and assigned to resource groups.

2. **server.go**:
- Delete processed nodes from sessionMap to avoid duplicate processing
in the subsequent loop
   - Add warning logs for stopping state transitions during rewatch

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-12-02 10:23:13 +08:00
congqixia
61c80235dd
fix: [2.5] update QueryNode NumEntities metrics when collection has no segments (#45147) (#45981)
Cherry-pick from master
pr: #45147
Related to #44509

Fix a bug where QueryNodeNumEntities metrics were not updated for
collections with zero segments, causing stale metrics when all segments
are flushed or compacted.

The previous implementation used separate loops: one to update size
metrics for all collections, and another to update num entities metrics
only for collections present in the grouped segments map. Collections
with no segments were skipped in the second loop, leaving their
NumEntities metrics stale.

Changes:
- Consolidate size and num entities metric updates into single loop
- Iterate over all collections instead of grouped segments
- Get collection metadata from manager instead of segment instances
- Correctly set NumEntities to 0 for collections with no segments
- Apply the same fix to both growing and sealed segment processing
- Add nil check for collection metadata before processing

This ensures all collection metrics are updated consistently, even when
segment count drops to zero.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-12-02 10:19:10 +08:00
1mmortal
0ece57325e
fix: [2.5] Correcting the incorrect AllSearchCount value in the result of hybrid_search. (#45843)
Correcting the incorrect AllSearchCount value in the result of
hybrid_search.
#45842

Signed-off-by: 1mmortal <lmzzzzz1@163.com>
2025-12-01 15:11:11 +08:00
congqixia
a24a0f11aa
fix: [2.5] always call handleNodeUp in rewatchNodes for proper stopping balance (#45964)
Cherry-pick from master
pr: #45961
Related to #45960

When QueryCoord restarts or reconnects to etcd, the rewatchNodes
function previously skipped handleNodeUp for QueryNodes in stopping
state. This caused stopping balance to fail because necessary components
were not initialized:
- Task scheduler executor was not added
- Dist handler was not started
- Node was not registered in resource manager

This fix ensures handleNodeUp is always called for new nodes regardless
of their stopping state, followed by handleNodeStopping if the node is
stopping. This allows the graceful shutdown process to correctly migrate
segments and channels away from stopping nodes.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-12-01 11:11:10 +08:00
Bingyi Sun
ba6198a3b8
fix: Replace json.doc() calls with json.dom_doc() in JsonContainsExpr (#45785)
issue: https://github.com/milvus-io/milvus/issues/45783
pr: https://github.com/milvus-io/milvus/pull/45573

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-11-25 20:19:07 +08:00
aoiasd
7af4e4076d
enhance: [2.5] optimize bm25 stats load. (#45780)
relate: https://github.com/milvus-io/milvus/issues/41424
pr: https://github.com/milvus-io/milvus/pull/44279
https://github.com/milvus-io/milvus/pull/44628

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-11-25 10:39:08 +08:00
cai.zhang
74a0363df7
fix: [2.5] Remove the incorrect reset task step (#45771)
issue: #45184

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-11-21 19:47:06 +08:00
Buqian Zheng
1fda4bcae4
enhance: [2.5] add ScalarFieldProto& overload to avoid unnecessary copies (#45744)
1. Array.h: Add output_data(ScalarFieldProto&) overload for both Array
and ArrayView classes
2. Use std::string_view instead of std::string for VARCHAR and GEOMETRY
types to avoid extra string copies
3. Call Reserve(length_) before writing to proto objects to reduce
memory reallocations

a simple test shows those optimizations improve the Array of Varchar
bulk_subscript performance by 20%

issue: https://github.com/milvus-io/milvus/issues/45679
pr: https://github.com/milvus-io/milvus/pull/45743

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-11-21 12:39:05 +08:00
wei liu
2232dfc3de
fix: Prevent Close from hanging on etcd reconnection (#45622)
issue: #45623
When etcd reconnects, the DataCoord rewatches DataNodes and calls
ChannelManager.Startup again without closing the previous instance. This
causes multiple contexts and goroutines to accumulate, leading to Close
hanging indefinitely waiting for untracked goroutines.

Root cause:
- Etcd reconnection triggers rewatch flow and calls Startup again
- Startup was not idempotent, allowing repeated calls
- Multiple context cancellations and goroutines accumulated
- Close would wait indefinitely for untracked goroutines

Changes:
- Add started field to ChannelManagerImpl
- Refactor Startup to check and handle restart scenario
- Add state check in Close to prevent hanging

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-11-19 12:49:06 +08:00
Bingyi Sun
f1844c9841
enhance: optimize term expr performance (#45490)
issue: https://github.com/milvus-io/milvus/issues/45641
pr: https://github.com/milvus-io/milvus/pull/45491

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-11-19 11:51:06 +08:00
7y-9
a42e847678
fix: [2.5] Fix infinite loop in ResourceManager recovery process (#45563)
relate: https://github.com/milvus-io/milvus/issues/45557

Signed-off-by: lianyu.sun <lianyu.sun@ly.com>
2025-11-17 15:19:39 +08:00
cai.zhang
6eb77ddc4d
fix: [2.5]Fix target segment marked dropped for save stats result twice (#45480)
issue: #45477 

master pr: #45478

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-11-13 09:41:37 +08:00
cai.zhang
1d6786545b
fix: [2.5] Fix filter geometry for growing with mmap (#45466)
issue: #45450 
master pr: #45464

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-11-11 15:41:40 +08:00
aoiasd
7ad68910d9
enhance: [2.5] skip check source id (#45383)
pr: https://github.com/milvus-io/milvus/pull/45377
relate:https://github.com/milvus-io/milvus/issues/45381

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-11-07 15:21:42 +08:00
XuanYang-cn
2d6c736448
fix: [2.5]Accidentally ignored sealed segments in L0 Compaction #45341 (#45342)
When there're no growing segments in the collection, L0 Compaction will
try to choose all L0 segments that hits all L1/L2 segments.

However, if there's Sealed Segment still under flushing in DataNode at
the same time L0 Compaction selects satisfied L1/L2 segments, L0
Compaction will ignore this Segment because it's not in "FlushState",
which is wrong, causing missing deletes on the Sealed Segment.

This quick solution here is to fail this L0 compaction task once
selected a Sealed segment.

See also: #45339
pr: #45340
pr: #45341

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-11-06 19:21:35 +08:00
sparknack
91645d9242
enhance: [2.5] unify the aligned buffer for both buffered and direct I/O (#45324)
issue: #43040
pr: #45323

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-11-06 10:55:35 +08:00
sparknack
561b167f1e
fix:[2.5] avoid potential race conditions when updating the executor (#45231)
issue: #43030
pr: #45230

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-11-05 10:15:33 +08:00
cai.zhang
2e4502a4fc
fix: [2.5]Skip create tmp dir for growing R-Tree index (#45258)
issue: https://github.com/milvus-io/milvus/issues/45181

master pr: https://github.com/milvus-io/milvus/pull/45256

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-11-04 17:35:34 +08:00
cai.zhang
cc9735ff4f
enhance: [2.5]Make GeometryCache an optional configuration (#45197)
issue: #45187 
master pr: #45192

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-11-03 20:31:34 +08:00
cai.zhang
dfcef7d14d
fix: [2.5]Fix sort stats task failed when segment is compacting (#45185)
issue: #45184

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-11-03 11:25:33 +08:00
cai.zhang
0ca74f234f
fix: [2.5] Fix import null geometry data (#45163)
issue: #44787 

master pr: #45161

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-10-31 12:34:10 +08:00
foxspy
0f0ea4d206
enhance: [2.5] update knowhere version (#45148)
issue: #42937 
/kind branch-feature

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-10-30 10:08:08 +08:00
cai.zhang
e58cd7fcc4
fix: [2.5]Fix bug for importing Geometry data (#45091)
issue: https://github.com/milvus-io/milvus/issues/44787 ,
https://github.com/milvus-io/milvus/issues/45012
master pr: https://github.com/milvus-io/milvus/pull/45089

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-10-29 18:48:13 +08:00
cai.zhang
3ebd1f2f26
fix: [2.5]Fix retrieve geometry null data when enable mmap (#45142)
issue: #44648

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-10-29 16:48:12 +08:00
aoiasd
529a31a1bf
enhance: [2.5]support use nullable field as bm25 function input field (#44586) (#45118)
relate: https://github.com/milvus-io/milvus/pull/44586

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-10-28 19:20:11 +08:00
zhagnlu
78d70db6fd
fix: support skip load json stats when disable jsonstats (#45098)
pr: #45101

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-10-28 10:50:11 +08:00
congqixia
9ed77d4484
fix: [2.5] prevent data race in querycoord collection notifier update (#45037) (#45052)
Cherry-pick from master
pr: #45037
Fixes #45035

This commit addresses a data race issue where refreshCollection was
updating the collection notifier without proper lock protection.

Changes:
- Add UpdateCollection method to CollectionManager with proper locking
- Introduce CollectionOperator pattern for thread-safe collection
updates
- Make setRefreshNotifier private and use it through the operator
pattern
- Update refreshCollection to use the new UpdateCollection method
- Handle collection not found error gracefully in refreshCollection

The CollectionOperator pattern ensures all collection modifications go
through the CollectionManager's lock, preventing concurrent access
issues.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-10-23 19:34:12 +08:00
wei liu
c633556fee
fix: [2.5] Handle empty FieldsData in reduce/rerank for requery scenario (#44919)
issue: #44909
pr: #44917
When requery optimization is enabled, search results contain IDs but
empty FieldsData. During reduce/rerank operations, if the first shard
has empty FieldsData while others have data, PrepareResultFieldData
initializes an empty array, causing AppendFieldData to panic when
accessing array indices.

Changes:
- Find first non-empty FieldsData as template in 5 functions:
  reduceAdvanceGroupBY, reduceSearchResultDataWithGroupBy,
  reduceSearchResultDataNoGroupBy, rankSearchResultDataByGroup,
  rankSearchResultDataByPk
- Add length check before 4 AppendFieldData calls to prevent panic
- Add unit tests for empty and partial empty FieldsData scenarios

This fix handles both pure requery (all empty) and mixed scenarios
(some empty, some with data) without breaking normal search flow.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-10-21 19:48:04 +08:00
cai.zhang
d43b030b4d
fix: [2.5] Fix bug for gis function to filter geometry (#44968)
issue: #44961
master pr: #44966 

This PR fixes 3 geometry related bugs:
1. Implement ToString interface for GisFunctionFilter.
2. Ignore GisFunctionFilter MoveCursor for growing segment.
3. Don't skip null geometry for building R-Tree index, should be record
in null_offsets.

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-10-21 17:00:13 +08:00
Bingyi Sun
a0201ef98d
enhance: optimize the performace of bitmap reverse lookup (#44804) (#44958)
pr: https://github.com/milvus-io/milvus/pull/44804

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-10-21 14:38:04 +08:00
cai.zhang
c6cc3d2c25
fix: [2.5] Fix the geometry return POINT(0 0) when growing mmap is enabled (#44891)
issue: #44802 
master pr: #44889 

After a Geometry object is serialized into WKB, the resulting binary may
contain '\0' bytes.
When growing mmap is enabled, the append data logic uses strcpy, which
stops copying at the first '\0' bytes.
This causes only part of the WKB---typically the portion up to the
geometry type field to be copied, leading to corrupted data.
As a result, during parsing, all POINT geometries are incorrectly
interperted as POINT(0 0).

To fix this issue, memcpy will be used instead of strcpy.

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-10-17 17:16:08 +08:00
cai.zhang
f27dfa4490
enhance: [2.5]Support import geometry data by json/csv (#44828)
issue: #44787 
master pr: #44826 
2.6 pr: #44827

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-10-17 17:14:23 +08:00
cqy123456
e4b72977dd
fix:[2.5]remove the limit of deduplicate case when disable autoindex (#44782)
issue: https://github.com/milvus-io/milvus/issues/44702
related pr: https://github.com/milvus-io/milvus/pull/44825

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2025-10-17 11:40:02 +08:00
congqixia
93411a388c
fix: [2.5] ensure deterministic search result ordering when scores are equal (#44870) (#44885)
Cherry-pick from master
pr: #44870

Related to #44819
This fix addresses an issue(#44819) where the offset parameter did not
work correctly during searches when multiple results had identical
scores. The problem occurred because results with equal scores were not
consistently ordered, leading to unpredictable pagination behavior.

The solution adds a new sorting step (SortEqualScoresByPks) in the
reduce phase that sorts results with identical scores by their primary
keys in ascending order. This ensures deterministic ordering and enables
proper offset functionality.

Changes:
- Add SortEqualScoresByPks() to sort results with equal scores by PK
- Add SortEqualScoresOneNQ() to handle per-query sorting logic
- Invoke sorting step after FillPrimaryKey() in Reduce() workflow

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-10-16 19:34:08 +08:00
wei liu
82081eba1b
fix: [2.5] Fix deactivate balance checker also stops stopping balance (#44835)
issue: #43858
pr: #44834
Fix the issue introduced in PR #43992 where deactivating the balance
checker incorrectly stops stopping balance operations.

Changes:
- Move IsActive() check after stopping balance logic
- Only skip normal balance when checker is inactive
- Allow stopping balance to proceed regardless of checker state

This ensures stopping balance can execute even when the balance checker
is deactivated.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-10-15 15:56:01 +08:00
aoiasd
71fc23dd24
fix: [2.5] dropped segment in excluded segment use wrong excluded ts (#44771)
relate: https://github.com/milvus-io/milvus/issues/43114
pr: https://github.com/milvus-io/milvus/pull/43115
https://github.com/milvus-io/milvus/pull/44769

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-10-15 15:06:01 +08:00
wei liu
47949fd883
enhance: Implement rewatch mechanism for etcd failure scenarios (#43829) (#43920)
issue: #43828
pr: #43829 #43909
Implement robust rewatch mechanism to handle etcd connection failures
and node reconnection scenarios in DataCoord and QueryCoord, along with
heartbeat lag monitoring capabilities.

Changes include:
- Implement rewatchDataNodes/rewatchQueryNodes callbacks for etcd
reconnection scenarios
- Add idempotent rewatchNodes method to handle etcd session recovery
gracefully
- Add QueryCoordLastHeartbeatTimeStamp metric for monitoring node
heartbeat lag
- Clean up heartbeat metrics when nodes go down to prevent metric leaks

---------

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Co-authored-by: Zhen Ye <chyezh@outlook.com>
2025-10-15 14:12:01 +08:00
congqixia
1f94b1f5f6
fix: [2.5] avoid concurrent Reset/Add operations on DataCoord metrics (#44789) (#44817)
Cherry-pick from master
pr: #44789

This commit addresses issue #44788 where the
`datacoord_stored_binlog_size` metric could become inaccurate when
multiple concurrent `GetMetrics` calls arrived at DataCoord.

### Problem

The original implementation called `Reset()` followed by `Add()`
operations on Prometheus metrics within the `GetQuotaInfo()` method.
When multiple goroutines invoked this method concurrently, race
conditions occurred:
- Thread 1: Reset() → Add(value1)
- Thread 2: Reset() → Add(value2)
- Result: Metrics could be reset multiple times and values added in an
interleaved fashion, leading to inaccurate and inflated metric values

### Solution

Changed the approach from `Reset() + Add()` to aggregating metric values
in local maps first, then using `Set()` to update metrics atomically:

1. Collect segment size data into local maps:
   - `storedBinlogSize`: tracks size per collection per segment state
   - `binlogFileSize`: tracks total file count per collection
   - `coll2DbName`: maps collection IDs to database names

2. After aggregation is complete, use `Set()` (instead of `Add()`) to
update metrics in a single operation per label combination

This ensures that concurrent `GetMetrics` calls don't interfere with
each other, as each invocation works with its own local state and only
updates the final metric value atomically.

---------

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-10-14 20:06:00 +08:00
jiaqizho
00ef6032c6
enhance:[2.5] Introduce sparse filter in query (#44347) (#44790)
pr: #44347

Signed-off-by: jiaqizho <jiaqi.zhou@zilliz.com>
2025-10-14 15:02:01 +08:00
congqixia
c30cb6c283
enhance: [2.5] Add accesslog field for template value length info (#44723) (#44791)
Cherry-pick from master
pr: #44723 
Related to #36672

Add accesslog field displaying value length for search/query request may
help developers debug related issues

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-10-13 14:29:58 +08:00
wei liu
cbe2761e99
fix: Fix L0 segment duplicate load task generation during channel balance (#44700)
issue: #44699
Fix the issue where L0 segment checking logic incorrectly identifies L0
segments as missing when they exist on multiple delegators during
channel balance process, which blocks sealed segment loading and target
progression.

Changes include:
- Replace GetLatestShardLeaderByFilter with GetByFilter to check all
delegators instead of only the latest leader
- Iterate through all delegator views to identify which ones lack the L0
segment

The original logic only checked the latest shard leader, causing false
positive detection of missing L0 segments when they actually exist on
other delegators in the same channel during balance operations. This led
to continuous generation of duplicate L0 segment load tasks, preventing
normal sealed segment loading flow.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-10-11 10:04:00 +08:00
cai.zhang
52ab33ba88
fix: [2.5] Skip empty loop for process growing segment (#44608)
issue: #43427 
master pr: #44606 

The GISFunction asserts that the segment_offsets cannot be nullptr. When
size is 0, the segment_offsets is nullptr, so the loop is skiped.

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-10-10 14:55:59 +08:00
congqixia
cb0e88632f
enhance: [2.5] Make accesslog.$consistency_level represent actual value used (#44708)
Cherry-pick from master
pr: #44706 
Related to #44703

This PR:
- Add `SetActualConsistencyLevel` to `info.AccessInfo` interface and
  related util method processing it
- Make `$consistency_level` returning actual value if set

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-10-09 21:55:59 +08:00
Bingyi Sun
9434a3bdaa
fix: Fix bulk import with autoid (#44601)
pr: #44604 
issue: #44424

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-10-09 14:51:58 +08:00
congqixia
c86d68bea5
enhance: [2.5] Bump arrow/go to v17 (#44663)
Related to #40777

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-10-09 11:47:57 +08:00
wei liu
892d63d26e
enhance: [2.5] Refactor balance checker with priority queue (#43992) (#44588)
issue: #43858
pr: #43992
Refactor the balance checker implementation to use priority queues for
managing collection balance operations, improving processing efficiency
and order control.

Changes include:
- Export priority queue interfaces (Item, BaseItem, PriorityQueue)
- Replace collection round-robin with priority-based queue system
- Add BalanceCheckCollectionMaxCount configuration parameter
- Optimize balance task generation with batch processing limits
- Refactor processBalanceQueue method for different strategies
- Enhance test coverage with comprehensive unit tests

The new priority queue system processes collections based on row count
or collection ID order, providing better control over balance operation
priorities and resource utilization.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-09-28 19:23:05 +08:00