1287 Commits

Author SHA1 Message Date
tinswzy
74432db503
fix stuck insert by binding buffer and chan to ensure proper notification (#42505)
#41918 #42482  #42049  #42513 

cherrypick: sn release memory after pop from heap 
wp: Encapsulate buffer and chan into a single item for one-to-one
management and cleanup

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
Co-authored-by: chyezh <chyezh@outlook.com>
2025-06-05 10:02:32 +08:00
yihao.dai
6fda1f69c8
fix: Fix duplicate autoID between import and insert (#42519)
Remove the unlimited logID mechanism and switch to redundantly
allocating a large number of IDs.

issue: https://github.com/milvus-io/milvus/issues/42518

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-06-04 19:58:31 +08:00
cai.zhang
5566a85bcc
enhance: Add proxy task queue metrics (#42156)
issue: #42155

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-06-04 11:26:32 +08:00
Zhen Ye
fc010e44a8
fix: release memory after pop from heap (#42482)
issue: #42481

Signed-off-by: chyezh <chyezh@outlook.com>
2025-06-04 10:00:32 +08:00
tinswzy
f55f900c85
fix insert hang caused by WAL writer writing to a closing logfile (#42078)
related issue #42049 
wp commit
[94de4](94de4cbc60)

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-06-03 09:58:36 +08:00
yihao.dai
297331b2cc
enhance: Add slot and tasks num metrics (#42141)
issue: https://github.com/milvus-io/milvus/issues/41123

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-05-30 21:52:30 +08:00
Chun Han
ed0df38605
enhance: resize high priority wqthreadpool dynamically(#40838) (#41549) (#41929)
related: #40838
pr: https://github.com/milvus-io/milvus/pull/41549

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
2025-05-30 10:18:36 +08:00
Zhen Ye
4bad293655
enhance: make upgrading from 2.5.x less down time (#42082)
issue: #40532

- start timeticksync at rootcoord if the streaming service is not
available
- stop timeticksync if the streaming service is available
- open a read-only wal if some nodes in cluster is not upgrading to 2.6
- allow to open read-write wal after all nodes in cluster is upgrading
to 2.6

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-29 23:02:29 +08:00
Zhen Ye
b94cee2413
fix: growing segment from old arch is not flushed after upgrading (#42164)
issue: #42162

- enhance: add read ahead buffer size issue #42129
- fix: rocksmq consumer's close operation may get stucked
- fix: growing segment from old arch is not flushed after upgrading

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-29 23:00:28 +08:00
aoiasd
3a74044149
fix: hybird search sub requset not set analyzer name (#41896)
relate: https://github.com/milvus-io/milvus/issues/41213

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-05-29 14:56:28 +08:00
aoiasd
2ae4d80120
enhance: support run analyzer by loaded collection field (#42113)
relate: https://github.com/milvus-io/milvus/issues/42094

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-05-29 10:54:30 +08:00
junjiejiangjjj
4202c775ba
feat: Support vllm and tei rerank (#41947)
https://github.com/milvus-io/milvus/issues/35856

Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>
2025-05-28 19:18:28 +08:00
Buqian Zheng
7243c1d0ce
feat: remove async warmup policy (#42123)
issue: https://github.com/milvus-io/milvus/issues/41993

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-05-28 10:30:28 +08:00
cqy123456
5fe7015f63
enhance: InterimIndex support more index type and data type (#41021)
issue: https://github.com/milvus-io/milvus/issues/27678
cherry pick from : https://github.com/milvus-io/milvus/pull/39180,
https://github.com/milvus-io/milvus/pull/40429

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2025-05-28 08:40:28 +08:00
wei liu
54619eaa2c
feat: Implement partial result support on node down (#42009)
issue: https://github.com/milvus-io/milvus/issues/41690
This commit implements partial search result functionality when query
nodes go down, improving system availability during node failures. The
changes include:

- Enhanced load balancing in proxy (lb_policy.go) to handle node
failures with retry support
- Added partial search result capability in querynode delegator and
distribution logic
- Implemented tests for various partial result scenarios when nodes go
down
- Added metrics to track partial search results in querynode_metrics.go
- Updated parameter configuration to support partial result required
data ratio
- Replaced old partial_search_test.go with more comprehensive
partial_result_on_node_down_test.go
- Updated proto definitions and improved retry logic

These changes improve query resilience by returning partial results to
users when some query nodes are unavailable, ensuring that queries don't
completely fail when a portion of data remains accessible.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-05-28 00:12:28 +08:00
congqixia
6d0b15308d
enhance: Take nq into slow query consideration (#42109)
Related to #40756

Large nq will naturally increase query time, which causing lots of slow
log when user NQ numbers are very large.

This PR make slow search counts span per nq (using avg val) to decide
whether one request is slow or not.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-05-27 19:56:28 +08:00
Xianhui Lin
6a0e182e13
enhance: support TTL expiration with queries returning no results (#42086)
support TTL expiration with queries returning no results
issue:https://github.com/milvus-io/milvus/issues/41959

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-05-27 18:28:27 +08:00
groot
c00005bdaa
feat: support to drop properties of field (#41996)
issue: https://github.com/milvus-io/milvus/issues/41990

Signed-off-by: yhmo <yihua.mo@zilliz.com>
2025-05-27 14:32:34 +08:00
Zhen Ye
212e17c4c5
fix: modify param to use less memory when flush and sync (#42102)
issue: #42097

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-27 10:12:27 +08:00
aoiasd
0fafb706ba
enhance: add segment bm25 stats local cache (#41775)
relate: https://github.com/milvus-io/milvus/issues/41424

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-05-26 18:44:27 +08:00
wei liu
f84650ece0
enhance: Reduce session TTL from 30s to 10s for faster failure detection (#42050)
Optimize session management by reducing the TTL (Time To Live) value for
service registration from 30 seconds to 10 seconds. This change improves
the system's ability to detect service failures more quickly and
enhances overall cluster responsiveness.

Changes include:
- Update default session TTL from 30s to 10s in milvus.yaml
- Adjust DefaultSessionTTL constant from 30 to 10 seconds
- Update SessionTTL default value from 60 to 10 seconds
- Maintain consistent TTL values across configuration files

This optimization reduces the time required for the system to detect
when services become unavailable, leading to faster failover and
improved cluster stability during node failures or network issues.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-05-26 12:04:26 +08:00
Chun Han
d1cfa58a0a
feature: support compact expiry data(#41336) (#42056)
related: #41336

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-05-25 16:46:31 +08:00
yihao.dai
83c9527e70
enhance: Use QuerySlot interface for tasks (#41989)
Use `QuerySlot` rpc instead of `QueryTask` for querying slot.

issue: https://github.com/milvus-io/milvus/issues/41123

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-05-23 10:30:28 +08:00
tinswzy
1735f557ca
fix sn oom issue during small file loading in wp (#41946)
#41846  #41894 
Resolve SN OOM issue during small file loading in Woodpecker; 
Correct WP fence/close execution order;

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-05-23 01:30:28 +08:00
presburger
e878fe588e
enhance: Set the default GPU version autoindex to use the CAgra index (#41906)
issue:  #41907

Signed-off-by: yusheng.ma <yusheng.ma@zilliz.com>
2025-05-23 01:20:28 +08:00
yihao.dai
e04e5b41ca
enhance: Add task version monitoring (#42023)
issue: https://github.com/milvus-io/milvus/issues/41123

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-05-22 23:24:28 +08:00
Zhen Ye
c9b0748ff9
enhance: add delete rows into delete msg header and more metric (#41952)
issue: #41544

- add delete rows into delete messsage header
- add more insert/delete metrics
- fix non-broadcast message has broadcast header

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-22 20:28:26 +08:00
Buqian Zheng
2e3539319d
feat: vector field raw data to mmap by default (#41975)
issue: https://github.com/milvus-io/milvus/issues/41435

should address https://github.com/milvus-io/milvus/issues/41774

this PR also: 
* added caching layer memory overhead metric
* re-enable TextMatch.GrowingLoadData test

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-05-22 11:56:25 +08:00
wei liu
78010262f0
enhance: Optimize shard serviceable mechanism (#41937)
issue: https://github.com/milvus-io/milvus/issues/41690
- Merge leader view and channel management into ChannelDistManager,
allowing a channel to have multiple delegators.
- Improve shard leader switching to ensure a single replica only has one
shard leader per channel. The shard leader handles all resource loading
and query requests.
- Refine the serviceable mechanism: after QC completes loading, sync the
query view to the delegator. The delegator then determines its
serviceable status based on the query view.
- When a delegator encounters forwarding query or deletion failures,
mark the corresponding segment as offline and transition it to an
unserviceable state.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-05-22 11:38:24 +08:00
wei liu
4e1208f4f6
enhance: support balancing multiple collections in single trigger (#41875)
issue: #41874
- Optimize balance_checker to support balancing multiple collections
simultaneously
- Add new parameters for segment and channel balancing batch sizes
- Add enableBalanceOnMultipleCollections parameter
- Update tests for balance checker

This change improves resource utilization by allowing the system to
balance multiple collections in a single trigger with configurable batch
sizes.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-05-21 21:38:25 +08:00
SimFG
9f866dd7c3
enhance: add privilege group message types and corresponding tests (#41939)
- issue: #41938

Signed-off-by: SimFG <bang.fu@zilliz.com>
2025-05-21 11:12:28 +08:00
yihao.dai
142bd2fc05
enhance: Pooling for data tasks (#41256)
1. Add global scheduler for datacoord.
2. Define and implement new CreateTask, QueryTask, DropTask interfaces.
3. Refine Import, Compaction, Stats, Index task.

issue: https://github.com/milvus-io/milvus/issues/41123

Co-authored-by: Cai Zhang <cai.zhang@zilliz.com>
2025-05-20 21:06:24 +08:00
yihao.dai
65dd3982d8
fix: Fix ants.Pool goroutine leak (#41892)
1. Release the pool after it is no longer in use.
2. Upgrade ants.Pool to fix the goroutine leak issue (see [PR
#287](https://github.com/panjf2000/ants/pull/287)).

issue: https://github.com/milvus-io/milvus/issues/41838

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-05-19 17:56:22 +08:00
tinswzy
3d8629de3e
fix memory reuse in woodpecker to prevent streamingNode OOM (#41918)
#41846 
Reduce woodpecker memory allocation frequency through recycled memory
pools, allowing GC to keep up with collection.
related [woodpecker issue 24
](https://github.com/zilliztech/woodpecker/issues/24)

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-05-19 15:22:22 +08:00
cai.zhang
38ded7364f
fix: Don't create index for unsorted importing segment when enable stats (#41864)
issue: #41863

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-05-19 10:52:23 +08:00
Zhen Ye
59dff668dc
enhance: schema change without manual flush (#41882)
issue: #39718

- remove the manual flush message from schema change operation
- add flush segment id handle into schema change processes

Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: congqixia <congqi.xia@zilliz.com>
2025-05-19 10:14:22 +08:00
Ted Xu
ae32203d3a
fix: support group by with nullable grouping keys (#41797)
See #36264

In this PR:
- Enhanced error handling in parse of grouping field.
- Fixed null handling in reduce tasks in proxy nodes. 
- Updated tests to reflect changes in error handling and data processing
logic.

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-05-17 20:54:22 +08:00
tinswzy
4edb1bc6f1
fix: resolve wp WALImpls concurrent read/write bug (#41763)
#41563 #41579 #41842 #41846 #41758
Upgraded the wp dependency to incorporate recent fixes addressing
multiple concurrency bugs in WALImpls.

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-05-16 12:02:27 +08:00
Zhen Ye
d3fff1769e
fix: streaming node panic with when binary size is set as zero (#41879)
issue: #41853

- persist the estimated binary size for insert message into wal.
- add metric to record the total growing rows of channel.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-16 11:12:22 +08:00
congqixia
ba8f62a3b2
enhance: Bump x/net fixing CVE-2025-22872 (#41861)
Related to #41291
Related to CVE-2025-22872

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-05-15 19:04:23 +08:00
SimFG
9fa50e0b1a
enhance: implement authorization checks for DescribeCollection and DescribeDatabase tasks (#41798)
- issue: #41694

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
2025-05-15 17:52:23 +08:00
foxspy
1c794be119
enhance: Output index version information in the DescribeIndex interface (#41847)
issue: https://github.com/milvus-io/milvus/issues/41431

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-05-15 14:36:22 +08:00
Zhen Ye
0a465bb5b7
enhance: use recovery+shardmanager, remove segment assignment interceptor (#41824)
issue: #41544

- add lock interceptor into wal.
- use recovery and shardmanager to replace the original implementation
of segment assignment.
- remove redundant implementation and unittest.
- remove redundant proto definition.
- use 2 streamingnode in e2e.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-14 23:00:23 +08:00
wei liu
2d0ae3a709
fix: unexpected password for root user (#41817)
issue: #41816 
pr #37983 introduced an issue, if doesn't specified
`defaultRootPassword` in milvus.yaml, then `"Milvus"` will be used as
default password for root user, instead of `Milvus`.

This PR fix the unexpected password for root, and add comment for case
which use large numeric password requires double quotes.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-05-14 19:42:22 +08:00
yihao.dai
36e9e41627
fix: Fix no candidate segments error for small import (#41771)
When autoID is enabled, the preimport task estimates row distribution by
evenly dividing the total row count (numRows) across all vchannels:
`estimatedCount = numRows / vchannelNum`.
However, the actual import task hashes real auto-generated IDs to
determine
the target vchannel. This mismatch can lead to inaccurate row
distribution estimation
in such corner cases:
- Importing 1 row into 2 vchannels:
				• Preimport: 1 / 2 = 0 → both v0 and v1 are estimated to have 0 rows
				• Import: real autoID (e.g., 457975852966809057) hashes to v1
				  → actual result: v0 = 0, v1 = 1

To resolve such corner case, we now allocate at least one segment for
each vchannel
when autoID is enabled, ensuring all vchannels are prepared to receive
data even
if no rows are estimated for them.

issue: https://github.com/milvus-io/milvus/issues/41759

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-05-14 15:30:21 +08:00
Zhen Ye
21d6d1669e
fix: wal should be reopen if wal append receive the fence error (#41807)
issue: #41544

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-14 01:02:56 +08:00
Zhen Ye
7beafe99a7
enhance: implement wal garbage collector with truncate api (#41770)
issue: #41544

- add a truncator implementation into wal recovery storage.
- add metrics for recovery storage.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-13 22:08:56 +08:00
zhagnlu
f094d026f8
fix: add params to ignore config type exception (#41776)
#41707

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-05-13 13:48:56 +08:00
Zhen Ye
61b6ca5b73
enhance: add in mem shard manager (#41749)
issue: #41544

- Implement in-memory shard manager to maintain the shard state at write
ahead.
- Remove all rpc and meta operation at write ahead, make the segment
assignment logic only use wal and memory.
- Refactor global stats management, add node-level flush policy.
- Fix the recovery storage inconsistency bug when graceful close.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-13 12:04:56 +08:00
shaoyue
5e8966ec32
enhance: update golang-jwt to v4.5.2 to fix cve (#41734)
/cc @congqixia

Signed-off-by: haorenfsa <haorenfsa@gmail.com>
2025-05-13 10:58:56 +08:00