92 Commits

Author SHA1 Message Date
Chun Han
001619aef9
feat: supporing load priority for loading (#42413)
related: #40781

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-06-17 15:22:38 +08:00
wei liu
78010262f0
enhance: Optimize shard serviceable mechanism (#41937)
issue: https://github.com/milvus-io/milvus/issues/41690
- Merge leader view and channel management into ChannelDistManager,
allowing a channel to have multiple delegators.
- Improve shard leader switching to ensure a single replica only has one
shard leader per channel. The shard leader handles all resource loading
and query requests.
- Refine the serviceable mechanism: after QC completes loading, sync the
query view to the delegator. The delegator then determines its
serviceable status based on the query view.
- When a delegator encounters forwarding query or deletion failures,
mark the corresponding segment as offline and transition it to an
unserviceable state.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-05-22 11:38:24 +08:00
wei liu
b0806bb900
fix: task delta cache leak due to duplicate task id (#40183)
issue: #40052

task delta cache rely on the taskID is unique, so it incDeltaCache at
AddTask, and decDeltaCache at RemoveTask, but the taskID allocator is
not atomic, which cause two task with same taskID, in such case, it will
call incDeltaCache twice, but call decDeltaCacheOnce, which cause delta
cache leak.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-02-28 10:22:08 +08:00
wei liu
69b8b89369
enhance: Remove QueryCoord's scheduling of L0 segments (#39552)
issue: #39551
This PR remove querycoord's scheduling of l0 segments:
  - only load l0 segment when watch channel
- only release l0 segment when release channel or sync data distribution

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-02-26 21:38:00 +08:00
congqixia
cb7f2fa6fd
enhance: Use v2 package name for pkg module (#39990)
Related to #39095

https://go.dev/doc/modules/version-numbers

Update pkg version according to golang dep version convention

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-02-22 23:15:58 +08:00
wei liu
7d2c948c69
fix: task delta cache leak on reduce task (#40055)
issue: #40052

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-02-21 16:47:54 +08:00
wei liu
c12c4b4fff
fix: [skip e2e] pr conflict cause ut failed (#39811)
Related to https://github.com/milvus-io/milvus/pull/39701 &
https://github.com/milvus-io/milvus/issues/39681

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-02-12 11:44:51 +08:00
congqixia
7b51e4839f
fix: Resolve conflict on qc task test (#39796)
Related to #39701 & #39681

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-02-11 18:40:45 +08:00
wei liu
ff5c680c99
fix: load collection stucks if compaction/gc happens (#39701)
issue: #39680
if compaction/gc happens, load collection may stuck due to
SegmentNotFound, we should trigger UpdateNextTarget to get a new data
view to execute loading operation.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-02-11 15:48:50 +08:00
wei liu
85c9f92ff4
fix: uneven distribution caused by executing task delta cache leak (#39702)
issue: #39681 

this PR maintain workload effect in action instead of computing workload
effect from target, which may cause leak if target changes.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-02-11 14:30:46 +08:00
yihao.dai
e0b26260f2
enhance: enable task delta cache (#39307)
When there are many segment tasks in the querycoord scheduler, the
traversal in `GetSegmentTaskDelta` checks becomes time-consuming. This
PR adds caching for segment deltas.

issue: https://github.com/milvus-io/milvus/issues/37630

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Co-authored-by: Wei Liu <wei.liu@zilliz.com>
2025-01-23 14:31:16 +08:00
yihao.dai
657550cf06
fix: Fix slow dist handle and slow observe (#38566)
1. Provide partition&channel level indexing in the collection target.
2. Make `SegmentAction` not wait for distribution.
3. Remove scheduler and target manager mutex.
4. Optimize logging to reduce CPU overhead.

issue: https://github.com/milvus-io/milvus/issues/37630

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-01-15 20:17:00 +08:00
Zhen Ye
bb8d1ab3bf
enhance: make new go package to manage proto (#39114)
issue: #39095

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-10 10:49:01 +08:00
wei liu
9c3f59dbbe
fix: Prevent balancer from overloading the same QueryNode (#38719)
issue: #38718
The balancer calculates the workload of executing tasks as an ongoing
score for target nodes. However, a logic issue arises when
GetSegmentTaskDelta or GetChannelTaskDelta is called with
collectionID=-1, which incorrectly returns zero.

Due to the incorrect global score, the executing task's workload is not
properly reflected for each collection. Consequently, each collection
submits its own balance task, leading to the balancer assigning
excessive tasks to the same QueryNode.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-12-25 16:36:49 +08:00
SimFG
2afe2eaf3e
feat: support to replicate collection when the services contains the system tt msg (#37559)
- issue: #37105

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-12-17 09:08:46 +08:00
tinswzy
7944538ade
enhance: Add ctx param to KV operation interfaces (#38154)
issue: #35917 
Refine KV operation interfaces by adding a ctx param

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2024-12-05 15:16:41 +08:00
tinswzy
e76802f910
enhance: refine querycoord meta/catalog related interfaces to ensure that each method includes a ctx parameter (#37916)
issue: #35917 
This PR refine the querycoord meta related interfaces to ensure that
each method includes a ctx parameter.

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2024-11-25 11:14:34 +08:00
yihao.dai
b6612e02b4
enhance: Reduce GetIndexInfos calls (#37695)
Batch `GetIndexInfos` calls for segments to reduce RPC calls.

issue: https://github.com/milvus-io/milvus/issues/37634

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-11-19 14:24:31 +08:00
congqixia
6d86b9022e
enhance: Provide secondary index critria when filter leaderview (#37777)
Related to #37630

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-19 10:12:30 +08:00
congqixia
b0bd290a6e
enhance: Use internal json(sonic) to replace std json lib (#37708)
Related to #35020

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-18 10:46:31 +08:00
jaime
1d06d4324b
fix: Int64 overflow in JSON encoding (#37657)
issue: ##36621

- For simple types in a struct, add "string" to the JSON tag for
automatic string conversion during JSON encoding.
- For complex types in a struct, replace "int64" with "string."

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-11-14 22:52:30 +08:00
jaime
9d16b972ea
feat: add tasks page into management WebUI (#37002)
issue: #36621

1. Add API to access task runtime metrics, including:
  - build index task
  - compaction task
  - import task
- balance (including load/release of segments/channels and some leader
tasks on querycoord)
  - sync task
2. Add a debug model to the webpage by using debug=true or debug=false
in the URL query parameters to enable or disable debug mode.

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-10-28 10:13:29 +08:00
wei liu
c84ea5465c
fix: Fix some replicas don't participate in the query after the failure recovery (#35850)
issue: #35846
querycoord will notify proxy to update shard leader cache after
delegator location changes, but during querynode's failure recovery,
some delegator may become unserviceable due to lacking of segments, and
back to serviceable after segment loaded, so we also need to notify
proxy to invalidate shard leader cache when delegator serviceable state
changes.

This PR will maintain querynode's serviceable state during heartbeat,
and notify proxy to invalidate shard leader cache if serviceable state
changes.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-09-03 15:39:03 +08:00
wei liu
c0200eec39
enhance: limit getSegmentInfo batch size to avoid excced grpc message limit (#35394)
issue: #35395

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-15 19:17:00 +08:00
jaime
9630974fbb
enhance: move rocksmq from internal to pkg module (#33881)
issue: #33956

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-06-25 21:18:15 +08:00
wayblink
a1232fafda
feat: Major compaction (#33620)
#30633

Signed-off-by: wayblink <anyang.wang@zilliz.com>
Co-authored-by: MrPresent-Han <chun.han@zilliz.com>
2024-06-10 21:34:08 +08:00
SimFG
ecee7d90d4
enhance: try to speed up the loading of small collections (#33570)
- issue: #33569

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-06-07 08:25:53 +08:00
congqixia
16d869c57e
enhance: Add EmbedEtcd testutil and remove etcd dep of task pkg (#31802)
See also #20478

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-04-02 09:59:14 +08:00
wei liu
5d752498e7
fix: Skip release duplicate l0 segment (#31540)
issue: #31480 #31481

release duplicate l0 segment task, which execute on old delegator may
cause segment lack, and execute on new delegator may break new
delegator's leader view.

This PR skip release duplicate l0 segment by segment_checker, cause l0
segment will be released with unsub channel

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-27 12:53:10 +08:00
chyezh
9f9ef8ac32
enhance: transfer resource group and dbname to querynode when load (#30936)
issue: #30931

Signed-off-by: chyezh <chyezh@outlook.com>
2024-03-21 11:59:12 +08:00
wei liu
c26c1b33c2
fix: Transfer l0 segment to new delegator after balance (#31319)
issue: #30186

during channel balance, after new delegator loaded, instead of syncing
l0 segment's location to new delegator, we should load l0 segment on new
delegator, and release the old l0 segment, then start to release old
delegator.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-19 09:59:05 +08:00
chyezh
ff4237bb90
enhance: add hostname into node info (#30673)
issue: https://github.com/milvus-io/milvus/issues/30647

- Address may be reused in k8s environment. Using hostname can be
better.

Signed-off-by: chyezh <chyezh@outlook.com>
2024-03-15 10:45:06 +08:00
wei liu
22df5061c1
fix: Leader checker can't update segment's load version (#31040)
issue: #30890

when leader checker find that leader view has an older load version of
segment, it will try to correct leader view. but the sync action doesn't
specify the latest load version. so the update operation will failed.

This PR fix leader checker can't update segment's load version and
keeping generate same task to scheduler.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-08 11:57:01 +08:00
congqixia
c886aa29ff
enhance: Use ListIndexes instead of DescribeIndex for qc broker (#31122)
See also #31103

Since querycoord need index meta information from datacoord only, broker
shall use `ListIndexes` to skip segment index building check logic in
datacoord

This PR is also related to #30538, in which DescribeIndex caused lots of
memory usage and lead to OOM eventually

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-07 21:43:03 +08:00
Bingyi Sun
e3cce11dd9
fix: data race in querynode task test (#31019)
issue: https://github.com/milvus-io/milvus/issues/31022

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-03-05 16:26:59 +08:00
congqixia
7b91fa3db8
fix: Make leader checker generate leader task instead of segment task (#30258)
See also #30150

For leader view distribution with offline nodes, a release task can
never be sent to querynode due to targetNode online check logic. Even
the request is dispatched, normal release task does not have "force"
flag when calling `delegator.ReleaseSegment`.

This PR adds a new type of querycoord task: LeaderTask, the
responsibility of which is to rectify leader view distribtion.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-02-21 11:08:51 +08:00
yah01
0a87724f18
enhance: remove merger for load segments (#29062)
remove merger as now QueryNode could load segments concurrently
fix #29063

Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-12-12 10:48:37 +08:00
yah01
fab52d167b
fix: may miss stream delta while loading (#28871)
we consume the delta data from the lastest channel checkpoint while
loading segment,

this works well without level 0 segments, but now it may lead to miss
some delta data,

so we have to consume from the current target's channel checkpoint

related: #27349

---------

Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-12-05 17:34:45 +08:00
wei liu
043ac87be0
fix: Balance channel may cause channel not availble error (#28829)
issue: #28831
release old delegator before new delegator update it's distribution may
cause `channel not availble` error

This PR will block release old delgator before new delegator finish
`syncDistribution`

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-12-01 10:08:34 +08:00
yah01
dc89730a50
Support collection-level mmap control (#26901)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-11-02 23:52:16 +08:00
congqixia
852be152de
Change task sourceID to stringer interface (#27965)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-10-27 01:08:12 +08:00
yah01
be980fbc38
Refine state check (#27541)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-10-11 21:01:35 +08:00
yah01
a8ce1b6686
Refine QueryCoord stopping (#27371)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-09-27 16:27:27 +08:00
yah01
6539a5ae2c
Refine DataCoord status (#27262)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-09-26 17:15:27 +08:00
SimFG
26f06dd732
Format the code (#27275)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-09-21 09:45:27 +08:00
Enwei Jiao
fb0705df1b
Decouple basetable and componentparam (#26725)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-09-05 10:31:48 +08:00
yah01
f9c060e0d2
Treat balance task with released source segment as stale (#26453)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-08-22 17:12:22 +08:00
wei liu
6f89620a43
remove pull target rpc from lock (#26054)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-08-04 10:31:06 +08:00
yah01
2180ef180c
Record only failed task error (#26033)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-08-01 10:11:05 +08:00
Bingyi Sun
a3e22786ed
Move meta store to kv catalog (#25915)
Signed-off-by: sunby <sunbingyi1992@gmail.com>
2023-07-31 13:57:04 +08:00