753 Commits

Author SHA1 Message Date
congqixia
7ea9c983d2
enhance: Add mockery package config for QC&QN (#38340)
Related to #38339

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-12-10 19:18:42 +08:00
wei liu
856e2aad7d
fix: Leader task stuck and retry again and again (#38202)
issue: #38201
leader task require to update delegator's distribution, and only success
after the distribution change has been applyed to delegator. but the
delegator will reject the distribution change if it's version is older
than current version in delegator. which cause the leader task stuck and
retry forever.

this PR remove the leader task finish check.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-12-10 19:16:42 +08:00
wei liu
f04986fceb
enhance: Remove constraint on release segment task (#38297)
issue: #38305
after we disable balance segment and balance channel happens at same
time, the constriant which require release segment must happens on
serviceable shard leader is unnessary.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-12-10 11:18:49 +08:00
jaime
8ed019735c
enhance: add disk stats within system metrics (#38033)
issue: ##36621

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-12-06 16:32:41 +08:00
congqixia
36946cc9ce
enhance: Set loaded collection/partition number to metrics (#38271)
Related to #36456
Previous PR: #38471 #38233

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-12-06 16:18:40 +08:00
congqixia
6ff19481f0
enhance: Resolve compilation error due to PR conflict (#38252)
Related pr: #38233 #38059

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-12-05 19:26:40 +08:00
congqixia
051bc280dd
enhance: Make dynamic load/release partition follow targets (#38059)
Related to #37849

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-12-05 16:24:40 +08:00
congqixia
32645fc28a
enhance: Unify querycoord meta metrics (#38233)
Related to #36456

Unify collection/partition number metrics to collection manager in case
of unwant missing modification

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-12-05 15:48:39 +08:00
tinswzy
7944538ade
enhance: Add ctx param to KV operation interfaces (#38154)
issue: #35917 
Refine KV operation interfaces by adding a ctx param

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2024-12-05 15:16:41 +08:00
tinswzy
e76802f910
enhance: refine querycoord meta/catalog related interfaces to ensure that each method includes a ctx parameter (#37916)
issue: #35917 
This PR refine the querycoord meta related interfaces to ensure that
each method includes a ctx parameter.

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2024-11-25 11:14:34 +08:00
jaime
7bbfe86bcd
enhance: add list index and segment index retrieval API for WebUI (#37861)
issue: #36621

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-11-22 16:58:34 +08:00
congqixia
b34bfb98a0
enhance: Refine Replica manager colle2Replicas secondary index (#37906)
Related to #37630

This PR add a new util coll2Replicas secondary index to reduce map
access & iteration while get replicas by collection

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-22 11:54:32 +08:00
wei liu
965bda6e60
enhance: Add channel name to shard leader log in meta cache (#37856)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-21 19:24:31 +08:00
wei liu
0a440e0d38
fix: Prevent simultaneous balance of segments and channels (#37850)
issue: #33550
balance segment and balance segment execute at same time, which will
cause bounch of corner case.

This PR disable simultaneous balance of segments and channels

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-21 17:56:55 +08:00
wei liu
b983ef9fca
fix: Channel may be released after balance (#37862)
issue: #37830
casue dist handler doesn't set channel's version, so if channel checker
try to dedup channel, it may release the new delegator after balance
finished.

this PR fix the way to set proper version for channel.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-21 10:40:31 +08:00
congqixia
b8d31ebed8
enhance: Remove unnecessary segment clone updating dist (#37797)
Related to #37630

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-20 11:26:31 +08:00
yihao.dai
b6612e02b4
enhance: Reduce GetIndexInfos calls (#37695)
Batch `GetIndexInfos` calls for segments to reduce RPC calls.

issue: https://github.com/milvus-io/milvus/issues/37634

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-11-19 14:24:31 +08:00
congqixia
6d86b9022e
enhance: Provide secondary index critria when filter leaderview (#37777)
Related to #37630

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-19 10:12:30 +08:00
jaime
257ecab84b
enhance: remove collection queryable check from health check (#37712)
Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-11-18 10:50:38 +08:00
congqixia
b0bd290a6e
enhance: Use internal json(sonic) to replace std json lib (#37708)
Related to #35020

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-18 10:46:31 +08:00
wei liu
a1b6be1253
fix: Delegator stuck at unserviceable status (#37694)
issue: #37679

pr #36549 introduce the logic error which update current target when
only parts of channel is ready.

This PR fix the logic error and let dist handler keep pull distribution
on querynode until all delegator becomes serviceable.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-15 10:20:31 +08:00
jaime
1d06d4324b
fix: Int64 overflow in JSON encoding (#37657)
issue: ##36621

- For simple types in a struct, add "string" to the JSON tag for
automatic string conversion during JSON encoding.
- For complex types in a struct, replace "int64" with "string."

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-11-14 22:52:30 +08:00
wei liu
1304b40552
fix: Balance channel may stuck at increasing replica number case (#37641)
issue: #37640
fix the pr #36549
cause balance channel will wait until new delegator becomes serviceable,
but new delegator need to sync target version then becomes serviceable,
and sync target version need to be wait all replica load done. so if
increasing replica number and balance channel happens at same time,
logic dead lock occurs.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-14 10:08:31 +08:00
jaime
1e8ea4a7e7
feat: add segment/channel/task/slow query render (#37561)
issue: #36621

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-11-12 17:44:29 +08:00
wei liu
266f8ef1f5
fix: Search may return less result after qn recover (#36549)
issue: #36293 #36242
after qn recover, delegator may be loaded in new node, after all segment
has been loaded, delegator becomes serviceable. but delegator's target
version hasn't been synced, and if search/query comes, delegator will
use wrong target version to filter out a empty segment list, which
caused empty search result.

This pr will block delegator's serviceable status until target version
is synced

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-12 16:34:28 +08:00
congqixia
f5b06a3c9f
enhance: Invalidate collection cache when release collection (#37577)
Related to #37395

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-12 10:16:29 +08:00
wei liu
61a5b15ada
fix: Lost loading collection's updateTs after qc restart (#37538)
issue: #37537

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-11 14:34:28 +08:00
congqixia
5e90f348fc
enhance: Handle legacy proxy load fields request (#37565)
Related to #35415

In rolling upgrade, legacy proxy may dispatch load request wit empty
load field list. The upgraded querycoord may report error by mistake
that load field list is changed.

This PR:

- Auto field empty load field list with all user field ids
- Refine the error messag when load field list updates
- Refine load job unit test with service cases

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-11 10:14:26 +08:00
sthuang
70605cf5b3
enhance: Support custom privilege group for RBAC (#37087)
issue: #37031

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2024-11-09 08:44:28 +08:00
yihao.dai
ff9bdf7029
fix: Fix load slowly (#37454)
When there're a lot of loaded collections, they would occupy the target
observer scheduler’s pool. This prevents loading collections from
updating the current target in time, slowing down the load process.
This PR adds a separate target dispatcher for loading collections.

issue: https://github.com/milvus-io/milvus/issues/37166

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-11-09 07:48:26 +08:00
congqixia
dcc1b506dc
enhance: Add context trace for querycoord queryable check (#37524)
When check health logic failed to collection not-queryable, the related
reason is hard to find in log.

This PR add context for log with trace id and print unqueryable
collection info log.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-08 14:00:26 +08:00
wei liu
a03157838b
enhance: Enable node assign policy on resource group (#36968)
issue: #36977
with node_label_filter on resource group, user can add label on
querynode with env `MILVUS_COMPONENT_LABEL`, then resource group will
prefer to accept node which match it's node_label_filter.

then querynode's can't be group by labels, and put querynodes with same
label to same resource groups.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-08 11:18:27 +08:00
Xiaofan
e073906a19
enhance: optimize describe collection and index (#37490)
fix #37489
combine multiple describe collection and list index into one call

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
2024-11-08 10:18:34 +08:00
jaime
f348bd9441
feat: add segment,pipeline, replica and resourcegroup api for WebUI (#37344)
issue: #36621

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-11-07 11:52:25 +08:00
wei liu
8714774305
fix: search/query failed due to segment not loaded (#37403)
issue: #36970
cause release segment and balance channel may happen at same time, and
before new delegator become serviceable, if release segment exeuctes on
new delegator, and search/query comes on old delegator, then release
segment and query segment happens in parallel, if release segment
execute first in worker, then search/query will got a SegmentNodeLoaded
error.

This PR add serviceable filter on delegator, then all load/release
segment operation will happens on serviceable delegator.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-06 15:10:25 +08:00
congqixia
9539739781
enhance: Release compacted growing segment if in dropped list (#37245)
See also #37205

Previously releasing growing segments could be triggered by two
conditions:

- Sealed Segment with same id is loaded
- Segment start position is before target checkpoint ts

Which has a worst case that the corresponding sealed segment is
compacted and the checkpoint is pinned by a growing l0 segment.

This PR introduces a new rule that: a growing segment could be released
if the segment id appeared in current target dropped segment id list.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-29 18:04:21 +08:00
jaime
9d16b972ea
feat: add tasks page into management WebUI (#37002)
issue: #36621

1. Add API to access task runtime metrics, including:
  - build index task
  - compaction task
  - import task
- balance (including load/release of segments/channels and some leader
tasks on querycoord)
  - sync task
2. Add a debug model to the webpage by using debug=true or debug=false
in the URL query parameters to enable or disable debug mode.

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-10-28 10:13:29 +08:00
wei liu
39a91eb100
fix: Delegator may becomes unserviceable after querycoord restart (#37055)
issue: #37054
after querycoord restart, segment_checker may release segment by mistake
due to next target isn't ready yet.

This PR requires release segment must happens after next target is
ready.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-10-24 12:21:28 +08:00
congqixia
f43527ef6f
enhance: Batch forward delete when using DirectForward (#37076)
Relatedt #36887

DirectFoward streaming delete will cause memory usage explode if the
segments number was large. This PR add batching delete API and using it
for direct forward implementation.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-24 10:39:28 +08:00
wei liu
f029314e20
fix: Dynamic release parition may fail search/query. (#37049)
issue: #33550
cause wrong impl of UpdateCollectionNextTarget, if ReleaseCollection and
UpdateCollectionNextTarget happens at same time, the the released
partition's segment list may be add to target again, and delegator will
be marked as unserviceable due to lack of segment.

This PR fix the impl of UpdateCollectionNextTarget

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-10-24 01:03:28 +08:00
jaime
4746f47282
feat: management WebUI homepage (#36822)
issue: #36784
1. Implement an embedded web server for WebUI access.  
2. Complete the homepage development.

Home page demo:
<img width="2177" alt="iShot_2024-10-10_17 57 34"
src="https://github.com/user-attachments/assets/38539917-ce09-4e54-a5b5-7f4f7eaac353">

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-10-23 11:29:28 +08:00
Bingyi Sun
6851738fd1
fix: fix make generate-mockery panic with go1.22 (#36830)
https://github.com/milvus-io/milvus/issues/36831
Fix `make generate-mockery` panic.

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-10-17 12:11:31 +08:00
congqixia
3fe0f82923
enhance: Add balance report log for qc balancer (#36747)
Related to #36746

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-11 10:25:24 +08:00
aoiasd
db34572c56
feat: support load and query with bm25 metric (#36071)
relate: https://github.com/milvus-io/milvus/issues/35853

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-10-11 10:23:20 +08:00
wei liu
470bb0cc3f
enhance: Enable balance on querynode with different mem capacity (#36466)
issue: #36464
This PR enable balance on querynode with different mem capacity, for
query node which has more mem capactity will be assigned more records,
and query node with the largest difference between assignedScore and
currentScore will have a higher priority to carry the new segment.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-09-30 16:15:17 +08:00
cai.zhang
ecb2b242e2
enhance: Add sorted for segment info (#36469)
issue: #33744

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-09-30 10:01:16 +08:00
wei liu
55be814a58
enhance: make TransferChannel/TransferSegment idempotent (#36489)
issue: #36488
when call TransferChannel/TransferSegment, querycoord will generate and
submit balance task to scheduler, if segment/channel's task already
exist in scheduler, submit task will failed.

to make TransferChannel/TransferSegment idempotent, we should skip to
submit if task already exist in scheduler.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-09-26 18:11:23 +08:00
wei liu
5dfa1c3397
fix: Segment unbalance after many times load/release (#36537)
issue: #36536
query coord use `segmentTaskDeleta/channelTaskDelta` to measure the
executing workload for querynode in scheduler, and we maintains the
`segmentTaskDeleta/channelTaskDelta` by `scheulder.Add(task)` and
`scheduler.remove(task)`, but `scheduler.remove(task)` has been called
in unexpected way, which cause a wrong
`segmentTaskDeleta/channelTaskDelta` value and affect the segment assign
logic, causes segment unbalance.

This PR moves to compute the `segmentTaskDeleta/channelTaskDelta` when
access, to avoid the wrong value affect.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-09-26 15:13:15 +08:00
sthuang
4493aa2142
fix: querycoord collection num metric (#36471)
related to: #36456

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2024-09-26 14:23:13 +08:00
wei liu
3cd0b26285
enhance: Enable dynamic update loaded collection's replica (#35822)
issue: #35821
After collection loaded, if we need to increase/decrease collection's
replica, we need to release and load it again.

milvus offers 4 solution to update loaded collection's replica, this PR
aims to dynamic change the replica number without release, and after
replica number changed, milvus will execute load replica or release
replica in async, and the replica loaded status can be checked by
getReplicas API.

Notice that if set too much replicas than querynode can afford,the new
replica won't be loaded successfully until enough querynode joins.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-09-25 10:13:18 +08:00