wei liu
2013d97243
enhance: Enable to dynamic update balancer policy in querycoord ( #33037 )
...
issue: #33036
This PR enable to dynamic update balancer policy without restart
querycoord.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-05-21 14:29:39 +08:00
wei liu
a7f6193bfc
fix: query node may stuck at stopping progress ( #33104 )
...
issue: #33103
when try to do stopping balance for stopping query node, balancer will
try to get node list from replica.GetNodes, then check whether node is
stopping, if so, stopping balance will be triggered for this replica.
after the replica refactor, replica.GetNodes only return rwNodes, and
the stopping node maintains in roNodes, so balancer couldn't find
replica which contains stopping node, and stopping balance for replica
won't be triggered, then query node will stuck forever due to
segment/channel doesn't move out.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-05-20 10:21:38 +08:00
wei liu
ba02d54a30
enhance: update shard leader cache when leader location changed ( #32470 )
...
issue: #32466
this PR enhance that when shard location changed, update proxy's shard
leader cache. in case of query node failover case, proxy can find
replica recover
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-05-08 10:05:29 +08:00
chyezh
f06509bf97
fix: get replica should not report error when no querynode serve ( #32536 )
...
issue: #30647
- Remove error report if there's no query node serve. It's hard for
programer to use it to do resource management.
- Change resource group `transferNode` logic to keep compatible with old
version sdk.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
2024-04-25 19:25:24 +08:00
chyezh
a2502bde75
enhance: replica manager enhancement ( #31496 )
...
issue: #30647
- ReplicaManager manage read only node now, and always do persistent of
node distribution of replica.
- All segment/channel checker using ReplicaManager to get read-only node
or read-write node, but not ResourceManager.
- ReplicaManager promise that only apply unique querynode to one replica
in same collection now (replicas in same collection never hold same
querynode at same time).
- ReplicaManager promise that fairly node count assignment policy if
multi replicas of collection is assigned to one resource group.
- Move some parameters check into ReplicaManager to avoid data race.
- Allow transfer replica to resource group that already load replica of
same collection
- Allow transfer node between resource groups that load replica of same
collection
---------
Signed-off-by: chyezh <chyezh@outlook.com>
2024-04-05 04:57:16 +08:00
wei liu
92971707de
enhance: Add restful api for devops to execute rolling upgrade ( #29998 )
...
issue: #29261
This PR Add restful api for devops to execute rolling upgrade, including
suspend/resume balance and manual transfer segments/channels.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-27 16:15:19 +08:00
chyezh
9f9ef8ac32
enhance: transfer resource group and dbname to querynode when load ( #30936 )
...
issue: #30931
Signed-off-by: chyezh <chyezh@outlook.com>
2024-03-21 11:59:12 +08:00
wei liu
efe8cecc88
enhance: refactor segment dist manager interface ( #31073 )
...
issue: #31091
This PR add `GetByFilter` interface in segment dist manager, instead of
all kind of get func
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-08 16:29:01 +08:00
aoiasd
f84d9a589a
fix: channel checker reduce balancing channels. ( #30087 )
...
Ignore leader unavailable when channel checker judge repeat channel to
avoid channel checker remove channels balancing.
relate: https://github.com/milvus-io/milvus/issues/29841
https://github.com/milvus-io/milvus/issues/29838
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-01-26 10:59:00 +08:00
wei liu
9abc868d15
fix: Remove heartbeat lag logic during get shard leaders ( #29999 )
...
issue: #29677 #29838
during get shard leaders, if qeurynode doesn't ack the heartbeat than
10s, querycoord will treat it as unavailable, and won't return shard
leader on it. but when querynode has a full cpu usage, it's easily to
stuck for more than 10s without ack the heartbeat, which cause no shard
leader to search/query.
This PR remove heartbeat lag logic during get shard leaders
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-17 11:22:52 +08:00
wei liu
5b45a138b1
disable auto balance when old node exists ( #28191 )
...
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-11-07 14:02:20 +08:00
congqixia
852be152de
Change task sourceID to stringer interface ( #27965 )
...
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-10-27 01:08:12 +08:00
wei liu
e0222b2ce3
refine target manager code style ( #27883 )
...
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-10-25 00:44:12 +08:00
zhenshan.cao
020ad9a6bc
Rectify wrong exception messages associated with Array datatype ( #27769 )
...
Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2023-10-19 17:24:07 +08:00
SimFG
26f06dd732
Format the code ( #27275 )
...
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-09-21 09:45:27 +08:00
xige-16
d1d0169fa3
Delete useless config ( #26173 )
...
Signed-off-by: xige-16 <xi.ge@zilliz.com>
2023-08-08 10:23:08 +08:00
yah01
948d1f1f4a
Handle errors by merr for QueryCoord ( #24926 )
...
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-07-17 14:59:34 +08:00
congqixia
7d00020c9e
Reduce DataScope to historical for segment release task ( #25489 )
...
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-07-12 09:12:28 +08:00
congqixia
41af0a98fa
Use go-api/v2 for milvus-proto ( #24770 )
...
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-06-09 01:28:37 +08:00
wei liu
b6ae70db43
fix get replica return wrong node list ( #23792 )
...
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-04-28 19:48:36 +08:00
XuanYang-cn
d56771b7b7
Fix return too many nodeIDs ( #23397 )
...
See also: #23396
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2023-04-20 13:50:31 +08:00
jaime
c9d0c157ec
Move some modules from internal to public package ( #22572 )
...
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-04-06 19:14:32 +08:00
MrPresent-Han
afd874b736
enhance segment balance by considering global rowCount(##22914) ( #23056 )
...
Signed-off-by: MrPresent-Han <jamesharden11122@gmail.com>
Co-authored-by: xiaofan-luan <xiaofan.luan@zilliz.com>
2023-04-03 14:16:25 +08:00
yah01
68b9cabb87
Fix GetShardLeader returns old leader ( #22887 )
...
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-03-21 16:57:57 +08:00
yah01
3d8f0156c7
Refine scheduler & executor of QueryCoord ( #22761 )
...
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-03-16 17:43:55 +08:00
SimFG
b57e476089
Fix the nil point about the session ( #22748 )
...
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-03-14 20:07:54 +08:00
yah01
21ba8182ee
Refine task create errors ( #22745 )
...
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-03-14 18:51:53 +08:00
yah01
1a4732bb19
Use new errors to handle load failures cache ( #22672 )
...
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-03-10 17:15:54 +08:00
yah01
90a5aa6265
Refine errors, re-define error codes ( #22501 )
...
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-03-09 15:47:52 +08:00
Enwei Jiao
697dedac7e
Use cockroachdb/errors to replace other error pkg ( #22390 )
...
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-02-26 11:31:49 +08:00
wei liu
73c44d4b29
resource group impl ( #21609 )
...
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-01-30 10:19:48 +08:00
yah01
9ebaa10dec
Add more logs for GetShardLeaders ( #21046 )
...
Also increase the heartbeatAvailableInterval from 2.5s to 10s
Signed-off-by: yah01 <yang.cen@zilliz.com>
Signed-off-by: yah01 <yang.cen@zilliz.com>
2022-12-08 19:09:18 +08:00
Enwei Jiao
89b810a4db
Refactor all params into ParamItem ( #20987 )
...
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2022-12-07 18:01:19 +08:00
wei liu
67403fcb3b
fix mannual balance with empty segment list ( #20738 )
...
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2022-11-21 19:29:12 +08:00
MrPresent-Han
d44d50e735
fix getting query segment info error during the period of loading and unloading segments ( #20549 )
...
issue: #20281
Signed-off-by: MrPresent-Han <jamesharden11122@gmail.com>
Signed-off-by: MrPresent-Han <jamesharden11122@gmail.com>
2022-11-21 10:41:10 +08:00
yah01
31872f436c
Only balance segments in targets ( #20635 )
...
Signed-off-by: yah01 <yang.cen@zilliz.com>
Signed-off-by: yah01 <yang.cen@zilliz.com>
2022-11-16 14:33:11 +08:00
wei liu
c5cd92d36e
update target ( #19296 )
...
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2022-11-07 19:37:04 +08:00
Enwei Jiao
956c5e1b9d
Make Params singleton ( #20088 )
...
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2022-11-04 14:25:38 +08:00
wei liu
4412cfcaaf
reduce querycoord unnecessary panic ( #19925 )
...
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2022-10-28 17:15:32 +08:00
yah01
8bfa55e560
Fix memory & goroutine leak ( #20152 )
...
Signed-off-by: yah01 <yang.cen@zilliz.com>
Signed-off-by: yah01 <yang.cen@zilliz.com>
2022-10-28 14:55:32 +08:00
yah01
5429a973b4
Fix forget to fill the channel name ( #20070 )
...
Signed-off-by: yah01 <yang.cen@zilliz.com>
Signed-off-by: yah01 <yang.cen@zilliz.com>
2022-10-25 21:49:32 +08:00
SimFG
a55f739608
Separate public proto files ( #19782 )
...
Signed-off-by: SimFG <bang.fu@zilliz.com>
Signed-off-by: SimFG <bang.fu@zilliz.com>
2022-10-16 20:49:27 +08:00
xige-16
a1db9038fb
Move disk index params to config file ( #19714 )
...
Signed-off-by: xige-16 <xi.ge@zilliz.com>
Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-10-14 17:51:24 +08:00
yah01
1c71844b8d
Add license header ( #19678 )
...
Signed-off-by: yah01 <yang.cen@zilliz.com>
Signed-off-by: yah01 <yang.cen@zilliz.com>
2022-10-11 11:39:22 +08:00
yah01
377f856833
Fix balance may confuse leader observer ( #19435 )
...
Signed-off-by: yah01 <yang.cen@zilliz.com>
Signed-off-by: yah01 <yang.cen@zilliz.com>
2022-09-28 12:10:54 +08:00
yah01
3c5ce74843
Fix LoadBalance not check segment exists ( #19448 )
...
Signed-off-by: yah01 <yang.cen@zilliz.com>
Signed-off-by: yah01 <yang.cen@zilliz.com>
2022-09-27 16:00:54 +08:00
yah01
6ba52366c5
Fix load segments can't be retried ( #19414 )
...
Signed-off-by: yah01 <yang.cen@zilliz.com>
Signed-off-by: yah01 <yang.cen@zilliz.com>
2022-09-26 10:54:52 +08:00
yah01
6d6e14e67d
Fix manual balance failed with TaskStale ( #19400 )
...
Signed-off-by: yah01 <yang.cen@zilliz.com>
Signed-off-by: yah01 <yang.cen@zilliz.com>
2022-09-23 16:18:51 +08:00
Bingyi Sun
5117017355
Remove manual balance timeout ( #19358 )
...
Signed-off-by: sunby <bingyi.sun@zilliz.com>
Signed-off-by: sunby <bingyi.sun@zilliz.com>
Co-authored-by: sunby <bingyi.sun@zilliz.com>
2022-09-22 16:52:51 +08:00
xige-16
1cd6e80c8a
Increase timeout of activate task ( #19330 )
...
Signed-off-by: xige-16 <xi.ge@zilliz.com>
Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-09-22 14:14:52 +08:00