94 Commits

Author SHA1 Message Date
wei liu
39754af727
fix: Fix L0 segment loading delegator selection in QueryCoord (#43795)
issue: #43794
Fix the issue where L0 segments were not correctly selecting appropriate
delegators during loading, which could cause load failures or incorrect
delegator assignments.

Changes include:
- Add special handling for L0 segments in delegator selection logic
- Find delegators that are missing the L0 segment for direct loading
- Fallback to existing serviceable delegator selection when no suitable
delegator is found for L0 segments
- Add comprehensive test coverage for L0 segment loading scenarios
- Test delegator selection when some delegators are missing segments
- Test fallback behavior when all delegators already have the segment
- Test error handling when no delegators are available

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-08-19 16:35:47 +08:00
wei liu
b08d9efe69
fix: Prevent delegator unserviceable due to shard leader change (#42689) (#43309)
issue: #42098 #42404
pr: #42689
Fix critical issue where concurrent balance segment and balance channel
operations cause delegator view inconsistency. When shard leader
switches between load and release phases of segment balance, it results
in loading segments on old delegator but releasing on new delegator,
making the new delegator unserviceable.

The root cause is that balance segment modifies delegator views, and if
these modifications happen on different delegators due to leader change,
it corrupts the delegator state and affects query availability.

Changes include:
- Add shardLeaderID field to SegmentTask to track delegator for load
- Record shard leader ID during segment loading in move operations
- Skip release if shard leader changed from the one used for loading
- Add comprehensive unit tests for leader change scenarios

This ensures balance segment operations are atomic on single delegator,
preventing view corruption and maintaining delegator serviceability.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-07-15 17:46:51 +08:00
congqixia
2531ebda27
fix: [2.5] Check field mmap property before apply collection level one (#43091)
Cherry-pick from master
pr: #43090
Related to #43089

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-03 14:32:45 +08:00
wei liu
f06de7eca6
fix: Fix delegator selection logic in releaseSegment (#42572)
issue: #42568
Fix incorrect delegator selection during segment release process which
introduced by pr #42410

- Add serviceable filter to prioritize available shard leaders
- Fix fallback logic with channel-specific lookup
- Add early return when no leader found
- Add comprehensive unit tests for all scenarios

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-06-06 19:24:33 +08:00
wei liu
b298218a29
enhance: [2.5] Remove balance constraints between channel and segment tasks (#42410)
issue: #42176
pr: #42177

Remove the mutual exclusion constraints between channel and segment
balance tasks to allow them to run concurrently.

Changes include:
- Remove permitBalanceChannel() and permitBalanceSegment() methods from
RoundRobinBalancer
- Update ChannelLevelScoreBalancer, MultiTargetBalancer,
RowCountBasedBalancer, and ScoreBasedBalancer to remove constraint
checks
- Allow segment balance tasks to proceed even when channel balance tasks
are running
- Update test cases to reflect new behavior where balance tasks no
longer block each other
- Improve error handling in task executor by preferring serviceable
shard leaders for segment release operations
- Add fallback logic to find latest shard leader when serviceable leader
is not available

This change improves the efficiency of load balancing by removing
unnecessary coordination overhead between different types of balance
operations.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-06-03 10:16:32 +08:00
liliu-z
cb0f984155
enhance: Revert "separate for index completed (#40873)" (#41152)
This reverts commit 23e579e3240a30397f05f5b308be687f6f16b013. #40873

issue: #39519

Signed-off-by: Li Liu <li.liu@zilliz.com>
2025-04-08 17:36:30 +08:00
Chun Han
23e579e324
separate for index completed (#40873)
related: https://github.com/milvus-io/milvus/issues/40781

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-04-05 10:20:24 +08:00
Xianhui Lin
f5e9dea2aa
fix: [2.5]fix the garbage cleanup logic of jsonkey stats && improve json key stats filer (#40039)
fix: fix the garbage collection cleanup logic of jsonkey stats &&
improve json key stats filer
issue: https://github.com/milvus-io/milvus/issues/36995
https://github.com/milvus-io/milvus/issues/40034
https://github.com/milvus-io/milvus/issues/40041
https://github.com/milvus-io/milvus/issues/40106
https://github.com/milvus-io/milvus/issues/40138
pr: https://github.com/milvus-io/milvus/pull/38039

---------

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-03-13 20:18:10 +08:00
Xianhui Lin
a4eb2ce224
fix: [2.5]Revert qc statschecker for json key stats (#40125)
Revert qc statschecker for json key stats
issue:https://github.com/milvus-io/milvus/issues/36995
pr:https://github.com/milvus-io/milvus/pull/39876

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-02-24 13:31:55 +08:00
congqixia
709594f158
enhance: [2.5] Use v2 package name for pkg module (#40117)
Cherry-pick from master
pr: #39990
Related to #39095

https://go.dev/doc/modules/version-numbers

Update pkg version according to golang dep version convention

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-02-23 00:46:01 +08:00
Xianhui Lin
f0964f769d
enhance: [2.5]Add json key inverted index in stats for optimization (#39876)
Add json key inverted index in stats for optimization
issue: https://github.com/milvus-io/milvus/issues/36995
pr: https://github.com/milvus-io/milvus/pull/38039

---------

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-02-16 20:12:15 +08:00
Zhen Ye
95809ca767
enhance: make new go package to manage proto (#39128)
issue: #39095
pr: #39114

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-10 10:53:01 +08:00
SimFG
2afe2eaf3e
feat: support to replicate collection when the services contains the system tt msg (#37559)
- issue: #37105

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-12-17 09:08:46 +08:00
wei liu
f04986fceb
enhance: Remove constraint on release segment task (#38297)
issue: #38305
after we disable balance segment and balance channel happens at same
time, the constriant which require release segment must happens on
serviceable shard leader is unnessary.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-12-10 11:18:49 +08:00
congqixia
051bc280dd
enhance: Make dynamic load/release partition follow targets (#38059)
Related to #37849

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-12-05 16:24:40 +08:00
tinswzy
e76802f910
enhance: refine querycoord meta/catalog related interfaces to ensure that each method includes a ctx parameter (#37916)
issue: #35917 
This PR refine the querycoord meta related interfaces to ensure that
each method includes a ctx parameter.

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2024-11-25 11:14:34 +08:00
yihao.dai
b6612e02b4
enhance: Reduce GetIndexInfos calls (#37695)
Batch `GetIndexInfos` calls for segments to reduce RPC calls.

issue: https://github.com/milvus-io/milvus/issues/37634

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-11-19 14:24:31 +08:00
wei liu
8714774305
fix: search/query failed due to segment not loaded (#37403)
issue: #36970
cause release segment and balance channel may happen at same time, and
before new delegator become serviceable, if release segment exeuctes on
new delegator, and search/query comes on old delegator, then release
segment and query segment happens in parallel, if release segment
execute first in worker, then search/query will got a SegmentNodeLoaded
error.

This PR add serviceable filter on delegator, then all load/release
segment operation will happens on serviceable delegator.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-06 15:10:25 +08:00
jaime
9d16b972ea
feat: add tasks page into management WebUI (#37002)
issue: #36621

1. Add API to access task runtime metrics, including:
  - build index task
  - compaction task
  - import task
- balance (including load/release of segments/channels and some leader
tasks on querycoord)
  - sync task
2. Add a debug model to the webpage by using debug=true or debug=false
in the URL query parameters to enable or disable debug mode.

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-10-28 10:13:29 +08:00
wei liu
3cd0b26285
enhance: Enable dynamic update loaded collection's replica (#35822)
issue: #35821
After collection loaded, if we need to increase/decrease collection's
replica, we need to release and load it again.

milvus offers 4 solution to update loaded collection's replica, this PR
aims to dynamic change the replica number without release, and after
replica number changed, milvus will execute load replica or release
replica in async, and the replica loaded status can be checked by
getReplicas API.

Notice that if set too much replicas than querynode can afford,the new
replica won't be loaded successfully until enough querynode joins.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-09-25 10:13:18 +08:00
congqixia
2fbc628994
feat: Support field partial load collection (#35416)
Related to #35415

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-08-20 16:49:02 +08:00
wei liu
c0200eec39
enhance: limit getSegmentInfo batch size to avoid excced grpc message limit (#35394)
issue: #35395

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-15 19:17:00 +08:00
jaime
fcec4c21b9
fix: check collection health(queryable) fail for releasing collection (#34947)
issue: #34946

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-08-02 17:20:15 +08:00
wayblink
a1232fafda
feat: Major compaction (#33620)
#30633

Signed-off-by: wayblink <anyang.wang@zilliz.com>
Co-authored-by: MrPresent-Han <chun.han@zilliz.com>
2024-06-10 21:34:08 +08:00
SimFG
ecee7d90d4
enhance: try to speed up the loading of small collections (#33570)
- issue: #33569

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-06-07 08:25:53 +08:00
wei liu
68dec7dcd4
fix: Use correct ts to avoid exclude segment list leak (#31991)
issue: #31990

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-04-12 10:39:19 +08:00
wei liu
c4806b69c4
enhance: Refactor leader view manager interface (#31133)
issue: #31091
This PR add GetByFilter interface in leader view manager, instead of all
kind of get func

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-04-10 15:13:36 +08:00
wei liu
7471a8005f
fix: querycoord panic after node down (#31831)
issue: #30519

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-04-03 10:03:22 +08:00
wei liu
5d752498e7
fix: Skip release duplicate l0 segment (#31540)
issue: #31480 #31481

release duplicate l0 segment task, which execute on old delegator may
cause segment lack, and execute on new delegator may break new
delegator's leader view.

This PR skip release duplicate l0 segment by segment_checker, cause l0
segment will be released with unsub channel

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-27 12:53:10 +08:00
chyezh
9f9ef8ac32
enhance: transfer resource group and dbname to querynode when load (#30936)
issue: #30931

Signed-off-by: chyezh <chyezh@outlook.com>
2024-03-21 11:59:12 +08:00
congqixia
773c64ecbb
fix: Set nodeID when remove distribution (#31259)
See also #30930

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-14 15:09:03 +08:00
congqixia
5b51c20293
fix: Use Remove sync type for distribution removal (#31215)
See also #31214

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-13 06:11:04 +08:00
wei liu
22df5061c1
fix: Leader checker can't update segment's load version (#31040)
issue: #30890

when leader checker find that leader view has an older load version of
segment, it will try to correct leader view. but the sync action doesn't
specify the latest load version. so the update operation will failed.

This PR fix leader checker can't update segment's load version and
keeping generate same task to scheduler.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-08 11:57:01 +08:00
congqixia
c886aa29ff
enhance: Use ListIndexes instead of DescribeIndex for qc broker (#31122)
See also #31103

Since querycoord need index meta information from datacoord only, broker
shall use `ListIndexes` to skip segment index building check logic in
datacoord

This PR is also related to #30538, in which DescribeIndex caused lots of
memory usage and lead to OOM eventually

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-07 21:43:03 +08:00
SimFG
ee8d6f236c
enhance: make the watch dm channel request better compatibility (#30952)
issue: #30938

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-03-01 16:07:37 +08:00
wei liu
befe0e21fd
fix: Set indexInfo when try to set segment to leader view (#30758)
issue: #30150
see also: #30258

cause `SyncDataDistribution` will try to load delta for segment. if miss
indexInfo in request, sync action will failed due to lack of index info.

This PR set indexinfo when try to set segment to leader view

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-02-26 11:02:55 +08:00
congqixia
7b91fa3db8
fix: Make leader checker generate leader task instead of segment task (#30258)
See also #30150

For leader view distribution with offline nodes, a release task can
never be sent to querynode due to targetNode online check logic. Even
the request is dispatched, normal release task does not have "force"
flag when calling `delegator.ReleaseSegment`.

This PR adds a new type of querycoord task: LeaderTask, the
responsibility of which is to rectify leader view distribtion.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-02-21 11:08:51 +08:00
wei liu
f69f65ff68
fix: Leader checker can't remove segment from leader view (#30151)
issue: #30150

This PR fix three problems:
1. leader checker use wrong node id when generate release task, which
cause the release task finished immediately
2. the release request generated by leader_checker doesn't set the
`force` flag, the operation to clean leader view on delegator will fail.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-20 18:58:58 +08:00
yah01
c68c128e47
fix: level 0 segments not loaded (#29908)
the recent changes move the level 0 segments list to a new proto field,
which leads to the QueryCoord can't see the level 0 segments, handle the
new changes
fix #29907

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-16 14:40:53 +08:00
xige-16
9702cef2b5
feat: Support multiple vector search (#29433)
issue #25639 

Signed-off-by: xige-16 <xi.ge@zilliz.com>

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2024-01-08 15:34:48 +08:00
congqixia
a3cb8e2625
fix: Add atomic method to get collection target (#29577)
Related to #29575

Add `getCollectionTarget` method which is atomic when scope is
`CurrentTargetFirst` or `NextTargetFirst`
Also return error when executor finds no channel in target manager

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-12-29 09:04:46 +08:00
yah01
1d6bcd1ded
enhance: speed up loading with many deletions (#29455)
the executor always fetches the latest segment info, so we could consume
from the latest checkpoint, which could save much time while deleted
many entities

Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-12-25 20:58:45 +08:00
yah01
a0e1a1eb31
feat: support enable/disable mmap for index (#29005)
support enable/disable mmap for index, the user could alter the index's
mode by `AlterIndex` method
related: https://github.com/milvus-io/milvus/issues/21866

---------

Signed-off-by: yah01 <yah2er0ne@outlook.com>
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-12-21 18:07:24 +08:00
yah01
0a87724f18
enhance: remove merger for load segments (#29062)
remove merger as now QueryNode could load segments concurrently
fix #29063

Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-12-12 10:48:37 +08:00
yah01
fab52d167b
fix: may miss stream delta while loading (#28871)
we consume the delta data from the lastest channel checkpoint while
loading segment,

this works well without level 0 segments, but now it may lead to miss
some delta data,

so we have to consume from the current target's channel checkpoint

related: #27349

---------

Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-12-05 17:34:45 +08:00
yah01
ece592a42f
Deliver L0 segments delete records (#27722)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-11-07 01:44:18 +08:00
yah01
dc89730a50
Support collection-level mmap control (#26901)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-11-02 23:52:16 +08:00
congqixia
852be152de
Change task sourceID to stringer interface (#27965)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-10-27 01:08:12 +08:00
SimFG
26f06dd732
Format the code (#27275)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-09-21 09:45:27 +08:00
yah01
168e82ee10
Fix panic while handling with the nil status (#27040)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-09-15 10:09:21 +08:00