80 Commits

Author SHA1 Message Date
congqixia
d7ff1bbe5c
enhance: Make querycoordv2 collection observer task driven (#32441)
See also #32440

- Add loadTask in collection observer
- For load collection/partitions, load task shall timeout as a whole
- Change related constructor to load jobs

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-04-22 10:39:22 +08:00
chyezh
a8c8a6bb0f
fix: parameter check of TransferReplica and TransferNode (#32297)
issue: #30647 

- Same dst and src resource group should not be allowed in
`TransferReplica` and `TransferNode`.

-  Remove redundant parameter check.

Signed-off-by: chyezh <chyezh@outlook.com>
2024-04-17 15:27:19 +08:00
chyezh
48fe977a9d
enhance: declarative resource group api (#31930)
issue: #30647

- Add declarative resource group api

- Add config for resource group management

- Resource group recovery enhancement

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-04-15 08:13:19 +08:00
wei liu
c4806b69c4
enhance: Refactor leader view manager interface (#31133)
issue: #31091
This PR add GetByFilter interface in leader view manager, instead of all
kind of get func

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-04-10 15:13:36 +08:00
chyezh
a2502bde75
enhance: replica manager enhancement (#31496)
issue: #30647 

- ReplicaManager manage read only node now, and always do persistent of
node distribution of replica.

- All segment/channel checker using ReplicaManager to get read-only node
or read-write node, but not ResourceManager.

- ReplicaManager promise that only apply unique querynode to one replica
in same collection now (replicas in same collection never hold same
querynode at same time).

- ReplicaManager promise that fairly node count assignment policy if
multi replicas of collection is assigned to one resource group.

- Move some parameters check into ReplicaManager to avoid data race.

- Allow transfer replica to resource group that already load replica of
same collection

- Allow transfer node between resource groups that load replica of same
collection

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-04-05 04:57:16 +08:00
congqixia
c2aad513c0
fix: Check collection nil before check load status (#31850)
See also #31849

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-04-03 10:07:13 +08:00
wei liu
7471a8005f
fix: querycoord panic after node down (#31831)
issue: #30519

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-04-03 10:03:22 +08:00
wei liu
92971707de
enhance: Add restful api for devops to execute rolling upgrade (#29998)
issue: #29261
This PR Add restful api for devops to execute rolling upgrade, including
suspend/resume balance and manual transfer segments/channels.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-27 16:15:19 +08:00
wei liu
ddd918ba04
enhance: change frequency log to rated level (#31084)
This PR change frequency log of check shard leader to rated level

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-08 16:39:02 +08:00
wei liu
efe8cecc88
enhance: refactor segment dist manager interface (#31073)
issue: #31091
This PR add `GetByFilter` interface in segment dist manager, instead of
all kind of get func

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-08 16:29:01 +08:00
zhenshan.cao
bb93b22c84
fix: should return collectionName in response of ListAliases (#30532)
issue : https://github.com/milvus-io/milvus/issues/30369

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-02-12 08:30:55 +08:00
aoiasd
f84d9a589a
fix: channel checker reduce balancing channels. (#30087)
Ignore leader unavailable when channel checker judge repeat channel to
avoid channel checker remove channels balancing.
relate: https://github.com/milvus-io/milvus/issues/29841
https://github.com/milvus-io/milvus/issues/29838

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-01-26 10:59:00 +08:00
wei liu
5474bce9d2
fix: Choose wrong shard leader during balance channel (#29529)
issue: #29523

readable shard leader should still be the old one during channel
balance, if the new shard leader is not ready.
This PR fixed that query coord choose wrong shard leader during balance
channel

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-12-28 15:22:51 +08:00
wei liu
d081fd5481
enhance: Change some frequency log to rated level (#28897)
This pr change some frequency log's level to rated.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-12-04 10:38:35 +08:00
wei liu
e0222b2ce3
refine target manager code style (#27883)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-10-25 00:44:12 +08:00
wayblink
e3f2122618
Expose metrics of stanby coordinators (#27698)
Signed-off-by: wayblink <anyang.wang@zilliz.com>
2023-10-16 15:04:09 +08:00
yah01
be980fbc38
Refine state check (#27541)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-10-11 21:01:35 +08:00
yah01
63ac43a3b8
Refine errors for import (#27379)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-09-30 10:31:28 +08:00
yah01
6539a5ae2c
Refine DataCoord status (#27262)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-09-26 17:15:27 +08:00
SimFG
0901b76732
Avoid the panic when the status of rpc response is nil (#26910)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-09-07 19:23:15 +08:00
yah01
3349db4aa7
Refine errors to remove changes breaking design (#26521)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-09-04 09:57:09 +08:00
Enwei Jiao
7d61355ab0
Refactor log for Query (#26310)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-08-14 18:57:32 +08:00
wei liu
b47a72bfcf
fix set dirty segment distribution to leader view (#26180)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-08-11 11:21:32 +08:00
yah01
2180ef180c
Record only failed task error (#26033)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-08-01 10:11:05 +08:00
yah01
dc37b4587e
Fix panic if channel not watched while getting shard leaders (#25820)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-07-24 14:13:02 +08:00
yah01
948d1f1f4a
Handle errors by merr for QueryCoord (#24926)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-07-17 14:59:34 +08:00
yihao.dai
3be502c155
Fix not fully loaded error after restart (#25243)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-07-03 16:58:28 +08:00
congqixia
41af0a98fa
Use go-api/v2 for milvus-proto (#24770)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-06-09 01:28:37 +08:00
congqixia
39d31f8bbf
Trigger checker while waiting collection/partition released (#24523)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-05-30 17:41:28 +08:00
wei liu
8965ea2a08
refine err msg about no available node in replica (#24256)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-05-22 11:59:26 +08:00
yihao.dai
1a3dca9b5e
Fix dynamic partitions loading (#24112)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-05-18 09:17:23 +08:00
Bingyi Sun
a53beba14f
Move release collection metrics to job (#24079)
Signed-off-by: sunby <bingyi.sun@zilliz.com>
Co-authored-by: sunby <bingyi.sun@zilliz.com>
2023-05-17 11:17:22 +08:00
yihao.dai
3827ac30bc
Remove load cache (#23287)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-05-09 10:36:41 +08:00
wei liu
4336ed8609
fix node up (#23415)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-04-20 09:52:31 +08:00
yah01
296380d6e6
Support async refresh (#23107)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-04-12 15:06:28 +08:00
jaime
c9d0c157ec
Move some modules from internal to public package (#22572)
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-04-06 19:14:32 +08:00
yah01
75737c65ac
Refine error handle of QueryCoord (#23068)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-03-31 10:54:29 +08:00
wei liu
74da53c027
fix update load percentage (#23054)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-03-30 10:48:23 +08:00
yah01
68b9cabb87
Fix GetShardLeader returns old leader (#22887)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-03-21 16:57:57 +08:00
yihao.dai
1f718118e9
Dynamic load/release partitions (#22655)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-03-20 14:55:57 +08:00
SimFG
b57e476089
Fix the nil point about the session (#22748)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-03-14 20:07:54 +08:00
yah01
1a4732bb19
Use new errors to handle load failures cache (#22672)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-03-10 17:15:54 +08:00
yah01
90a5aa6265
Refine errors, re-define error codes (#22501)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-03-09 15:47:52 +08:00
wei liu
11f1f4226a
support replica observer assign node (#22604)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-03-08 18:57:51 +08:00
wei liu
c162c6ecc0
fix assign node err (#22479)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-03-01 11:11:47 +08:00
Enwei Jiao
697dedac7e
Use cockroachdb/errors to replace other error pkg (#22390)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-02-26 11:31:49 +08:00
wei liu
a9a263d5a8
fix assign node to replica in nodeUp (#22323)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-02-23 14:15:45 +08:00
wei liu
c3e8ad3629
fix balance generate reduce task (#22236)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-02-21 19:06:27 +08:00
wei liu
87a4ddc7e2
fix rg e2e (#22187)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-02-16 10:48:34 +08:00
wei liu
7b4511b8f4
fix transfer node (#22120)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-02-14 16:16:34 +08:00