548 Commits

Author SHA1 Message Date
SimFG
95e47bfcf8
fix: force to set the metric type in the search request (#36279)
- issue: #35960
- pr: #35962

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-09-18 19:21:11 +08:00
wei liu
efed3d3ed0
fix: [skip e2e] Fix unstable ut TestCollectionObserver (#36231) (#36260)
issue: #36237
pr: #36231

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-09-14 15:43:08 +08:00
wei liu
38b307e230
fix: Clean dirty segment/channel on querynode (#36202) (#36259)
issue: #36201
pr: #36202
after querynode has been remove from replica, all dirty segment/channel
on it should be released.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-09-14 14:41:09 +08:00
wei liu
cc414d53b7
fix: Fix logic dead lock when delegator has high memory usage (#36066)
issue: #36064
pr: #36065
when delegator has high memory usage, load l0 segment will failed. and
balance segment task will blocked by load segment task, then delegator
cann't free memory by moving out some segment, causes a logic dead lock.

this PR remove the limit for balance, we permit segment and balance
execute in parallel. which won't cause side effect due to:
1. one segment can only has one task in qc's scheduler, and load/release
task will replace balance task if necessary
2. balance speed has been limited, and it won't block load segment task.

3. if collection has load task and balance task at same time, load task
will be scheduled first due to high proirity.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-09-06 22:01:07 +08:00
congqixia
b34b035edc
fix: [2.4] Use SliceSetEqual to compare load field list (#36062)
Cherry-pick from master
pr: #36051
Related to #36037

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-09-06 19:17:05 +08:00
congqixia
e21b09cc90
fix: [2.4] Fill load field list from old version load info (#35993) (#36018)
Cherry-pick from master
pr: #35993
See also #35959

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-09-06 17:27:06 +08:00
wei liu
10211ea056
fix: Fix dynamic release partition may fail search/query request (#35919) (#36019)
issue: #33550
pr: #35919
cause concurrent issue may occur between remove parition in target
manager and sync segment list to delegator. when it happens, some
segment may be released in delegator, and those segment may also be
synced to delegator, which cause delegator become unserviceable due to
lack of necessary segments, then search/query fails.

this PR make sure that all write access to target_manager will be
executed in serial to avoid the concurrent issues.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-09-06 10:49:05 +08:00
wei liu
c87711d903
fix: Fix some replicas don't participate in the query after the failure recovery (#35850) (#35925)
issue: #35846
pr: #35850
querycoord will notify proxy to update shard leader cache after
delegator location changes, but during querynode's failure recovery,
some delegator may become unserviceable due to lacking of segments, and
back to serviceable after segment loaded, so we also need to notify
proxy to invalidate shard leader cache when delegator serviceable state
changes.

This PR will maintain querynode's serviceable state during heartbeat,
and notify proxy to invalidate shard leader cache if serviceable state
changes.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-09-05 10:09:04 +08:00
SimFG
084b3efaa1
fix: [2.4] fill the metric type field in the LoadMetaInfo object (#35963)
- issue: #35960
- pr: #35962

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-09-04 16:21:05 +08:00
congqixia
df8d1c7ca3
enhance: [2.4] Check load fields for previous loaded collection (#35905) (#35910)
Cherry-pick from master
pr: #35905
Related to #35415

This PR make querycoord report error when load request tries to update
load fields list, which is currently not supported.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-09-03 11:25:03 +08:00
congqixia
cfc99e63b1
fix: [2.4] Make sure querycoord observers started once (#35811) (#35817)
Cherry-pick from master
pr: #35811
Related to #35809

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-08-29 19:15:01 +08:00
congqixia
8928c9d570
enhance: [2.4] Change frequent balancer debug log to rated one (#35749) (#35796)
Cherry-pick from master
pr: #35749
"skip balance" log is too frequent in debug level. This PR changes it
into rated on.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-08-29 12:31:00 +08:00
SimFG
fc324b4254
feat: [2.4] add the rbac msg and send them to the replicate channel (#35562)
- issue: #35391
- pr: #35392

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-08-27 14:45:00 +08:00
congqixia
ab261d0f8b
feat: [2.4] Support field partial load collection (#35416) (#35696)
Cherry-pick from master
pr: #35416
Related to #35415

---------

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-08-27 14:07:00 +08:00
Xiaofan
7269d5eda2
enhance: [2.4] reduce the log level of frequent log (#35653)
pr: #35651

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
2024-08-25 17:48:57 +08:00
SimFG
5b5119a51f
feat: [2.4] provide more general configuration to control mmap behavior (#35609)
- issue: #35273
- pr: #35359

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-08-23 12:35:02 +08:00
wei liu
e2542a1bf5
enhance: Update protobuf-go to protobuf-go v2 (#34394) (#35555)
issue: #34252
pr: #34394 #35072 #35084

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Co-authored-by: Congqi Xia <congqi.xia@zilliz.com>
2024-08-21 18:50:58 +08:00
wei liu
4bf4cbad85
enhance: Mark query node as read only after suspend (#35492) (#35586)
issue: #34985 #35493
pr: #35492
after querynode has been suspended, it's not allow to load
segment/channel on it, which means the node is read only. to be
compatible with resource group design, after query node has been
suspend, we remove it from it's original resource group, make it a read
only query node in replica. then two things will happens:
1. it's original resource group will be lacking of query nodes, query
coord will assign new node to it.
2. querycoord will try to move out all segments/channels after querynode
has been suspended

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-20 19:00:56 +08:00
wei liu
4610dafb2e
enhance: make configure load param feature be compatible with old sdk(#35520) (#35573)
issue: #31570 #35521
pr: #35520 #35546

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-20 18:20:57 +08:00
wei liu
8cd6718672
enhance: limit getSegmentInfo batch size to avoid excced grpc message limit (#35432)
issue: #35395 
pr: #35394

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-13 11:42:19 +08:00
wei liu
b316040634
fix: force update next target if target can't be loaded (#35366)
issue: #35361
pr: #35365

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-13 10:24:20 +08:00
wei liu
0201e00a2f
enhance: enable to set load config in cluster level (#35293)
issue: #35170
pr: #35169
This PR enable to set load configs in cluster level, such as replicas
and resource groups. then when load collections will use the load
config.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-07 12:38:21 +08:00
wei liu
2ac1bf7532
enhance: Enable setting the replica number and resource group during collection creation (#34403) (#34561)
issue: #30040
pr: #34403

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-06 15:06:17 +08:00
wei liu
d48c690cb3
enhance: Avoid unnecesary syncTargetVersion func call after querycoord recover (#34954) (#35234)
pr: #34954
before querycoord stop gracefully, we will save the current target to
meta store and recover it after querycoord start up, to speed the
querycoord's recovery time. but the target version hasn't been recovered
as expected, and it use latest timestamp as current target's version,
which has no effect to querycoord but an unnecessary syncTargetVersion
func call.

This PR recover the correct target version to avoid unnecessary
syncTargetVersion func call

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-05 10:18:16 +08:00
Chun Han
58f7c35b75
enhance: add log for partition stats(#30376) (#35220)
related: #30376
pr: https://github.com/milvus-io/milvus/pull/35219

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2024-08-02 19:34:21 +08:00
wei liu
11578772ef
fix: Set legacy level to l0 segment after qc restart (#35197) (#35211)
issue: #35087
pr: #35197
after qc restarts, and target is not ready yet, if dist_handler try to
update segment dist, it will set legacy level to l0 segment, which may
cause l0 segment be moved to other node, cause search/query failed.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-02 18:22:15 +08:00
cai.zhang
756922ebec
fix: [cherry-pick] Maintain load idempotency even when building new indexes (#35179)
issue: #34404 

master pr: #35178

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-08-02 17:28:15 +08:00
wei liu
5f601fcc50
enhance: Reduce delegator memory overloaded factor to 0.1 (#35092) (#35164)
pr: #35092

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-01 14:20:13 +08:00
congqixia
8991dc211e
enhance: [2.4] Fix go&cpp lint issues (#35107)
See also #34483

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-30 20:25:55 +08:00
Jiquan Long
86edca8c1b
fix: support auto index for array (#35095)
/kind branch-feature
pr: #34450

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
Co-authored-by: Zhagnlu <lu.zhang@zilliz.com>
2024-07-30 17:57:50 +08:00
congqixia
d16320705e
enhance: [2.4] Add Segment Level in milvus segment info APIs (#34763) (#35023)
Cherry-pick from master
pr: #34763
See also #34746

This PR add segment level field in response of
`GetPersistentSegmentInfo` and `GetQuerySegmentInfo`

---------

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-29 10:11:52 +08:00
wei liu
b3bc7f3985
enhance: Limit collection's normal balance speed (#34810) (#34987)
issue: #34798
pr: #34810

after we remove the task priority on query coord, to avoid load/release
segment blocked by too much balance task, we limit the balance task size
in each round. at same time, we reduce the balance interval to trigger
balance more frequently.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-07-26 10:13:46 +08:00
jaime
77ae127a62
fix: check collection health(queryable) fail for releasing collection (#34948)
issue: #34946
pr: #34947

---------

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-07-25 10:25:57 +08:00
wei liu
8c96026722
fix: Segment may bounce between delegator and worker (#34904)
issue: #34595
pr: #34830

pr#34596 to we add an overloaded factor to segment in delegator, which
cause same segment got different score in delegator and worker. which
may cause segment bounce between delegator and worker.

This PR use average score to compute the delegator overloaded factor, to
avoid segment bounce between delegator and worker.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-07-23 15:57:49 +08:00
wei liu
ebbccb870c
fix: Avoid segment lack caused by deduplicate segment task (#34782) (#34903)
issue: #34781
pr: #34782

when balance segment hasn't finished yet, query coord may found 2 loaded
copy of segment, then it will generate task to deduplicate, which may
cancel the balance task. then the old copy has been released, and the
new copy hasn't be ready yet but canceled, then search failed by segment
lack.

this PR set deduplicate segment task's proirity to low, to avoid balance
segment task canceled by deduplicate task.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-07-23 11:06:15 +08:00
wei liu
cf701a9bf0
enhance: Preserve fixed-size memory in delegator node for growing segment (#34600)
issue: #34595
pr: #34596
When consuming insert data on the delegator node, QueryCoord will move
out some sealed segments to manage its memory usage. After the growing
segment gets flushed, some sealed segments from other workers will be
moved back to the delegator node. To avoid the frequent movement of
segments, we estimate the maximum growing row count and preserve a
fixed-size memory in the delegator node.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-07-15 20:51:46 +08:00
wayblink
c62bf8a0b0
fix: [Cherry-pick]Pick major compaction fixs and optimizations (#34360)
This PR cherry-picks the following commits:

- fix: sync partitiion stats blocking balance task #33742
- fix: Fix meta prefix overlap bug #33830
- fix: Small fixs of major compaction #33929 
- fix: Fix memory buffer error & some renaming #33850
- fix: sync part stats task cannot be finished #34027 
- Add an option to enable/disable vector field clustering key #34097
- fix: fix error ignore in compactor #34169
- fix:load major compaction partial result #34052
- Use new stream segment reader in clustering compaction #34232

issue: #30633
pr: #33742 #33830 #33929 #33850 #34027 #34097 #34169 #34052 #34232

---------

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
Signed-off-by: wayblink <anyang.wang@zilliz.com>
Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: Chun Han <116052805+MrPresent-Han@users.noreply.github.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2024-07-03 09:53:37 +08:00
wayblink
99586066f5
feat: [cherry-pick] Major compaction (#34326)
This PR cherry-picks the following commits:
fix: speed up segment lookup via channel name in datacoord (#33530)
needed by the next commit
  feat: Major compaction (#33620)

issue: #30633
pr: #33620

---------

Signed-off-by: yiwangdr <yiwangdr@gmail.com>
Signed-off-by: wayblink <anyang.wang@zilliz.com>
Co-authored-by: yiwangdr <80064917+yiwangdr@users.noreply.github.com>
Co-authored-by: MrPresent-Han <chun.han@zilliz.com>
2024-07-02 18:29:01 +08:00
congqixia
4aa8a12ce8
fix: [2.4] Check partition in current target when observing partition load status (#34282) (#34305)
Cherry-pick from master
pr: #34282
See also #34234

`LoadPartitions` does not guarantee the current target has loading
partitions if there are some partitions already loaded before.

This PR check current target contains the partition to load when
advancing loading percentage to 100.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-02 15:48:10 +08:00
wei liu
92b7eebb53
enhance: Skip update index for L0 segment (#34099) (#34280)
pr: #34280
try to update index for l0 segment, will failed by `index not found`

This PR skip update index for l0 segment

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-07-01 16:32:07 +08:00
wei liu
b18de95817
enhance: Avoid assign too much segment/channels to new querynode (#34096) (#34245)
issue: #34095
pr: #34096

When a new query node comes online, the segment_checker,
channel_checker, and balance_checker simultaneously attempt to allocate
segments to it. If this occurs during the execution of a load task and
the distribution of the new query node hasn't been updated, the query
coordinator may mistakenly view the new query node as empty. As a
result, it assigns segments or channels to it, potentially overloading
the new query node with more segments or channels than expected.

This PR measures the workload of the executing tasks on the target query
node to prevent assigning an excessive number of segments to it.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-07-01 10:32:06 +08:00
jaime
0992f10694
enhance: improve check health (#34265)
issue: https://github.com/milvus-io/milvus/issues/34264
pr: #33800

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-07-01 10:18:07 +08:00
jaime
6423b6c718
enhance: move rocksmq from internal to pkg (#34165)
pr:  https://github.com/milvus-io/milvus/pull/33881
issue:  https://github.com/milvus-io/milvus/issues/33956

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-06-26 13:36:05 +08:00
congqixia
26b2e1d43c
fix: [2.4] Make querycoord panick when rg metastore sync fail (#34106) (#34127)
Cherry-pick from master
pr: #34106
See also #34047

When `unassignNode` sync resource group with node removed failed

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-06-26 10:04:03 +08:00
wei liu
061a00c58f
enhance: Enable database level replica num and resource groups for loading collection (#33052) (#33981)
pr: #33052

issue: #30040

This PR introduce two database level props:
1. database.replica.number
2. database.resource_groups

User can set those two database props by AlterDatabase API, then can
load collection without specified replica_num and resource groups. then
it will use database level load param when try to load collections.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-06-21 16:56:02 +08:00
wei liu
7d1d5a838a
fix: Fix GetReplicas API return nil status (#33715) (#34019)
issue: #33702
pr: #33715

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-06-21 10:26:02 +08:00
wei liu
fbc8fb3cb2
enhance: Skip return data distribution if no change happen (#32814) (#33985)
issue: #32813
pr: #32814

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-06-21 10:24:12 +08:00
wei liu
87508c3390
enhance: Avoid to iterate whole segment list for each task's process(#33943) (#33976)
pr: #33943

when querycoord process segment task, it will try to iterate whole
segment list to checke whether segment is loaded, which cost too much
cpu if there has thousands of segments.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-06-20 10:00:05 +08:00
SimFG
f664b51ebe
enhance: [2.4] try to speed up the loading of small collections (#33746)
- issue: #33569
- pr: #33570

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-06-11 15:07:55 +08:00
yihao.dai
ed1dee9e38
enhance: Support L0 import (#33514) (#33712)
issue: https://github.com/milvus-io/milvus/issues/33157

pr: https://github.com/milvus-io/milvus/pull/33514

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-06-08 11:17:52 +08:00