milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-02-02 01:06:41 +08:00

Author	SHA1	Message	Date
wei liu	93063ce1f9	fix: Prevent simultaneous balance of segments and channels (#37850 ) (#37939 ) issue: #33550 pr: #37850 balance segment and balance segment execute at same time, which will cause bounch of corner case. This PR disable simultaneous balance of segments and channels Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-11-26 10:26:40 +08:00
congqixia	8601f3ed66	enhance: [2.4] Refine Replica manager colle2Replicas secondary index (#37906 ) (#37970 ) Cherry-pick from master pr: #37906 Related to #37630 This PR add a new util coll2Replicas secondary index to reduce map access & iteration while get replicas by collection --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-11-26 10:20:35 +08:00
wei liu	bb66636448	fix: Channel may be released after balance (#37862 ) (#37940 ) issue: #37830 pr: #37862 casue dist handler doesn't set channel's version, so if channel checker try to dedup channel, it may release the new delegator after balance finished. this PR fix the way to set proper version for channel. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-11-25 11:26:44 +08:00
congqixia	0bd26171d5	enhance: [2.4] Provide secondary index criteria when filter leaderview (#37777 ) (#37802 ) Cherry-pick from master pr: #37777 Related to #37630 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-11-21 10:48:33 +08:00
congqixia	28adfe4629	enhance: [2.4] Remove unnecessary segment clone updating dist (#37797 ) (#37833 ) Cherry-pick from master pr: #37797 Related to #37630 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-11-20 19:48:33 +08:00
jaime	3ce27ca689	enhance: remove collection queryable check from health check (#37731 ) pr: #37712 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-11-18 10:50:38 +08:00
wei liu	1bd502b585	fix: Delegator stuck at unserviceable status (#37694 ) (#37702 ) issue: #37679 pr: #37694 pr #36549 introduce the logic error which update current target when only parts of channel is ready. This PR fix the logic error and let dist handler keep pull distribution on querynode until all delegator becomes serviceable. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-11-15 14:52:30 +08:00
wei liu	28bcd85bd0	fix: Balance channel may stuck at increasing replica number case (#37642 ) issue: #37640 pr: #37641 fix the pr #36549 cause balance channel will wait until new delegator becomes serviceable, but new delegator need to sync target version then becomes serviceable, and sync target version need to be wait all replica load done. so if increasing replica number and balance channel happens at same time, logic dead lock occurs. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-11-13 14:26:30 +08:00
congqixia	8801322371	enhance: [2.4] Invalidate collection cache when release collection (#37577 ) (#37628 ) Cherry-pick from master pr: #37577 Related to #37395 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-11-13 14:00:31 +08:00
wei liu	6dc879b1e2	enhance: Enable node assign policy on resource group (#36968 ) (#37588 ) issue: #36977 pr: #36968 with node_label_filter on resource group, user can add label on querynode with env `MILVUS_COMPONENT_LABEL`, then resource group will prefer to accept node which match it's node_label_filter. then querynode's can't be group by labels, and put querynodes with same label to same resource groups. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-11-13 11:10:29 +08:00
wei liu	7d1c899155	fix: Search may return less result after qn recover (#36549 ) (#37610 ) issue: #36293 #36242 pr: #36549 after qn recover, delegator may be loaded in new node, after all segment has been loaded, delegator becomes serviceable. but delegator's target version hasn't been synced, and if search/query comes, delegator will use wrong target version to filter out a empty segment list, which caused empty search result. This pr will block delegator's serviceable status until target version is synced --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-11-12 19:16:30 +08:00
wei liu	074f8ee696	enhance: optimize describe collection and index (#37490 ) (#37605 ) fix #37489 pr: #34790 combine multiple describe collection and list index into one call Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com> Signed-off-by: Wei Liu <wei.liu@zilliz.com> Co-authored-by: Xiaofan <83447078+xiaofan-luan@users.noreply.github.com>	2024-11-12 16:54:29 +08:00
wei liu	25c96991f6	fix: Lost loading collection's updateTs after qc restart (#37538 ) (#37580 ) issue: #37537 pr: #37538 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-11-11 17:50:30 +08:00
congqixia	2fbb157dc8	enhance: [2.4] Handle legacy proxy load fields request (#37565 ) (#37569 ) Cherry-pick from master pr: #37565 Related to #35415 In rolling upgrade, legacy proxy may dispatch load request wit empty load field list. The upgraded querycoord may report error by mistake that load field list is changed. This PR: - Auto field empty load field list with all user field ids - Refine the error messag when load field list updates - Refine load job unit test with service cases Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-11-11 14:06:29 +08:00
congqixia	cedc34053c	enhance: [2.4] Add context trace for querycoord queryable check (#37524 ) (#37534 ) Cherry-pick from master pr: #37524 When check health logic failed to collection not-queryable, the related reason is hard to find in log. This PR add context for log with trace id and print unqueryable collection info log. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-11-08 18:58:27 +08:00
wei liu	7b71411b60	fix: search/query failed due to segment not loaded (#37403 ) (#37544 ) issue: #36970 pr: #37403 cause release segment and balance channel may happen at same time, and before new delegator become serviceable, if release segment exeuctes on new delegator, and search/query comes on old delegator, then release segment and query segment happens in parallel, if release segment execute first in worker, then search/query will got a SegmentNodeLoaded error. This PR add serviceable filter on delegator, then all load/release segment operation will happens on serviceable delegator. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-11-08 18:56:26 +08:00
congqixia	1a09d6385e	enhance: [2.4] Release compacted growing segment if in dropped list (#37245 ) (#37266 ) Cherry-pick from master pr: #37245 See also #37205 Previously releasing growing segments could be triggered by two conditions: - Sealed Segment with same id is loaded - Segment start position is before target checkpoint ts Which has a worst case that the corresponding sealed segment is compacted and the checkpoint is pinned by a growing l0 segment. This PR introduces a new rule that: a growing segment could be released if the segment id appeared in current target dropped segment id list. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-10-31 10:14:22 +08:00
wei liu	057bfbe678	fix: Delegator may becomes unserviceable after querycoord restart (#37055 ) (#37100 ) issue: #37054 pr: #37055 after querycoord restart, segment_checker may release segment by mistake due to next target isn't ready yet. This PR requires release segment must happens after next target is ready. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-10-25 14:55:31 +08:00
congqixia	6bc8aba17f	enhance: [2.4] Batch forward delete when using DirectForward (#37076 ) (#37107 ) Cherry pick from master pr: #37076 Related #36887 DirectFoward streaming delete will cause memory usage explode if the segments number was large. This PR add batching delete API and using it for direct forward implementation. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-10-25 11:53:29 +08:00
wei liu	59b2563029	fix: Dynamic release parition may fail search/query. (#37049 ) (#37099 ) issue: #33550 pr: #37049 cause wrong impl of UpdateCollectionNextTarget, if ReleaseCollection and UpdateCollectionNextTarget happens at same time, the the released partition's segment list may be add to target again, and delegator will be marked as unserviceable due to lack of segment. This PR fix the impl of UpdateCollectionNextTarget Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-10-24 18:01:30 +08:00
congqixia	b24788b2c7	enhance: [2.4] Add balance report log for qc balancer (#36749 ) Cherry pick from master pr: #36747 Related to #36746 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-10-11 10:25:24 +08:00
wei liu	2428adea3b	enhance: Enable balance on querynode with different mem capacity (#36466 ) (#36625 ) issue: #36464 pr: #36466 This PR enable balance on querynode with different mem capacity, for query node which has more mem capactity will be assigned more records, and query node with the largest difference between assignedScore and currentScore will have a higher priority to carry the new segment. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-09-30 18:11:18 +08:00
wei liu	4120320074	enhance: make TransferChannel/TransferSegment idempotent (#36489 ) (#36552 ) issue: #36488 pr: #36489 when call TransferChannel/TransferSegment, querycoord will generate and submit balance task to scheduler, if segment/channel's task already exist in scheduler, submit task will failed. to make TransferChannel/TransferSegment idempotent, we should skip to submit if task already exist in scheduler. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-09-30 14:25:26 +08:00
wei liu	74af00ba8c	fix: Segment unbalance after many times load/release (#36537 ) (#36543 ) issue: #36536 pr: #36537 query coord use `segmentTaskDeleta/channelTaskDelta` to measure the executing workload for querynode in scheduler, and we maintains the `segmentTaskDeleta/channelTaskDelta` by `scheulder.Add(task)` and `scheduler.remove(task)`, but `scheduler.remove(task)` has been called in unexpected way, which cause a wrong `segmentTaskDeleta/channelTaskDelta` value and affect the segment assign logic, causes segment unbalance. This PR moves to compute the `segmentTaskDeleta/channelTaskDelta` when access, to avoid the wrong value affect. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-09-26 20:57:14 +08:00
wei liu	975a9797a2	enhance: Enable dynamic update loaded collection's replica (#36417 ) issue: #35821 pr: #35822 After collection loaded, if we need to increase/decrease collection's replica, we need to release and load it again. milvus offers 4 solution to update loaded collection's replica, this PR aims to dynamic change the replica number without release, and after replica number changed, milvus will execute load replica or release replica in async, and the replica loaded status can be checked by getReplicas API. Notice that if set too much replicas than querynode can afford，the new replica won't be loaded successfully until enough querynode joins. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-09-26 10:43:15 +08:00
wei liu	bdc59f3b15	fix: Fix cornor case that segment can't be move out from stopping node (#36431 ) (#36475 ) issue: #36426 pr: #36431 the old constriant requires only segment on current target can be balanced, which is wrong, and caused that segment can't be move out from stopping node, if it's only exist in next target. by design, stopping balance need to move out all segment on it by balance task, thus the unfair old constriant should be removed. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-09-25 10:13:18 +08:00
SimFG	95e47bfcf8	fix: force to set the metric type in the search request (#36279 ) - issue: #35960 - pr: #35962 Signed-off-by: SimFG <bang.fu@zilliz.com>	2024-09-18 19:21:11 +08:00
wei liu	efed3d3ed0	fix: [skip e2e] Fix unstable ut TestCollectionObserver (#36231 ) (#36260 ) issue: #36237 pr: #36231 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-09-14 15:43:08 +08:00
wei liu	38b307e230	fix: Clean dirty segment/channel on querynode (#36202 ) (#36259 ) issue: #36201 pr: #36202 after querynode has been remove from replica, all dirty segment/channel on it should be released. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-09-14 14:41:09 +08:00
wei liu	cc414d53b7	fix: Fix logic dead lock when delegator has high memory usage (#36066 ) issue: #36064 pr: #36065 when delegator has high memory usage, load l0 segment will failed. and balance segment task will blocked by load segment task, then delegator cann't free memory by moving out some segment, causes a logic dead lock. this PR remove the limit for balance, we permit segment and balance execute in parallel. which won't cause side effect due to: 1. one segment can only has one task in qc's scheduler, and load/release task will replace balance task if necessary 2. balance speed has been limited, and it won't block load segment task. 3. if collection has load task and balance task at same time, load task will be scheduled first due to high proirity. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-09-06 22:01:07 +08:00
congqixia	b34b035edc	fix: [2.4] Use SliceSetEqual to compare load field list (#36062 ) Cherry-pick from master pr: #36051 Related to #36037 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-09-06 19:17:05 +08:00
congqixia	e21b09cc90	fix: [2.4] Fill load field list from old version load info (#35993 ) (#36018 ) Cherry-pick from master pr: #35993 See also #35959 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-09-06 17:27:06 +08:00
wei liu	10211ea056	fix: Fix dynamic release partition may fail search/query request (#35919 ) (#36019 ) issue: #33550 pr: #35919 cause concurrent issue may occur between remove parition in target manager and sync segment list to delegator. when it happens, some segment may be released in delegator, and those segment may also be synced to delegator, which cause delegator become unserviceable due to lack of necessary segments, then search/query fails. this PR make sure that all write access to target_manager will be executed in serial to avoid the concurrent issues. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-09-06 10:49:05 +08:00
wei liu	c87711d903	fix: Fix some replicas don't participate in the query after the failure recovery (#35850 ) (#35925 ) issue: #35846 pr: #35850 querycoord will notify proxy to update shard leader cache after delegator location changes, but during querynode's failure recovery, some delegator may become unserviceable due to lacking of segments, and back to serviceable after segment loaded, so we also need to notify proxy to invalidate shard leader cache when delegator serviceable state changes. This PR will maintain querynode's serviceable state during heartbeat, and notify proxy to invalidate shard leader cache if serviceable state changes. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-09-05 10:09:04 +08:00
SimFG	084b3efaa1	fix: [2.4] fill the metric type field in the LoadMetaInfo object (#35963 ) - issue: #35960 - pr: #35962 Signed-off-by: SimFG <bang.fu@zilliz.com>	2024-09-04 16:21:05 +08:00
congqixia	df8d1c7ca3	enhance: [2.4] Check load fields for previous loaded collection (#35905 ) (#35910 ) Cherry-pick from master pr: #35905 Related to #35415 This PR make querycoord report error when load request tries to update load fields list, which is currently not supported. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-09-03 11:25:03 +08:00
congqixia	cfc99e63b1	fix: [2.4] Make sure querycoord observers started once (#35811 ) (#35817 ) Cherry-pick from master pr: #35811 Related to #35809 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-08-29 19:15:01 +08:00
congqixia	8928c9d570	enhance: [2.4] Change frequent balancer debug log to rated one (#35749 ) (#35796 ) Cherry-pick from master pr: #35749 "skip balance" log is too frequent in debug level. This PR changes it into rated on. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-08-29 12:31:00 +08:00
SimFG	fc324b4254	feat: [2.4] add the rbac msg and send them to the replicate channel (#35562 ) - issue: #35391 - pr: #35392 Signed-off-by: SimFG <bang.fu@zilliz.com>	2024-08-27 14:45:00 +08:00
congqixia	ab261d0f8b	feat: [2.4] Support field partial load collection (#35416 ) (#35696 ) Cherry-pick from master pr: #35416 Related to #35415 --------- --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-08-27 14:07:00 +08:00
Xiaofan	7269d5eda2	enhance: [2.4] reduce the log level of frequent log (#35653 ) pr: #35651 Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>	2024-08-25 17:48:57 +08:00
SimFG	5b5119a51f	feat: [2.4] provide more general configuration to control mmap behavior (#35609 ) - issue: #35273 - pr: #35359 Signed-off-by: SimFG <bang.fu@zilliz.com>	2024-08-23 12:35:02 +08:00
wei liu	e2542a1bf5	enhance: Update protobuf-go to protobuf-go v2 (#34394 ) (#35555 ) issue: #34252 pr: #34394 #35072 #35084 Signed-off-by: Wei Liu <wei.liu@zilliz.com> Co-authored-by: Congqi Xia <congqi.xia@zilliz.com>	2024-08-21 18:50:58 +08:00
wei liu	4bf4cbad85	enhance: Mark query node as read only after suspend (#35492 ) (#35586 ) issue: #34985 #35493 pr: #35492 after querynode has been suspended, it's not allow to load segment/channel on it, which means the node is read only. to be compatible with resource group design, after query node has been suspend, we remove it from it's original resource group, make it a read only query node in replica. then two things will happens: 1. it's original resource group will be lacking of query nodes, query coord will assign new node to it. 2. querycoord will try to move out all segments/channels after querynode has been suspended Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-08-20 19:00:56 +08:00
wei liu	4610dafb2e	enhance: make configure load param feature be compatible with old sdk(#35520 ) (#35573 ) issue: #31570 #35521 pr: #35520 #35546 --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-08-20 18:20:57 +08:00
wei liu	8cd6718672	enhance: limit getSegmentInfo batch size to avoid excced grpc message limit (#35432 ) issue: #35395 pr: #35394 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-08-13 11:42:19 +08:00
wei liu	b316040634	fix: force update next target if target can't be loaded (#35366 ) issue: #35361 pr: #35365 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-08-13 10:24:20 +08:00
wei liu	0201e00a2f	enhance: enable to set load config in cluster level (#35293 ) issue: #35170 pr: #35169 This PR enable to set load configs in cluster level, such as replicas and resource groups. then when load collections will use the load config. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-08-07 12:38:21 +08:00
wei liu	2ac1bf7532	enhance: Enable setting the replica number and resource group during collection creation (#34403 ) (#34561 ) issue: #30040 pr: #34403 --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-08-06 15:06:17 +08:00
wei liu	d48c690cb3	enhance: Avoid unnecesary syncTargetVersion func call after querycoord recover (#34954 ) (#35234 ) pr: #34954 before querycoord stop gracefully, we will save the current target to meta store and recover it after querycoord start up, to speed the querycoord's recovery time. but the target version hasn't been recovered as expected, and it use latest timestamp as current target's version, which has no effect to querycoord but an unnecessary syncTargetVersion func call. This PR recover the correct target version to avoid unnecessary syncTargetVersion func call Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-08-05 10:18:16 +08:00

1 2 3 4 5 ...

574 Commits