milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2025-12-07 17:48:29 +08:00

Author	SHA1	Message	Date
wei liu	38833b0e1d	fix: Fix deactivate balance checker also stops stopping balance (#44834 ) issue: #43858 Fix the issue introduced in PR #43992 where deactivating the balance checker incorrectly stops stopping balance operations. Changes: - Move IsActive() check after stopping balance logic - Only skip normal balance when checker is inactive - Allow stopping balance to proceed regardless of checker state This ensures stopping balance can execute even when the balance checker is deactivated. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-10-15 15:50:04 +08:00
wei liu	6d4961b978	enhance: Refactor balance checker with priority queue (#43992 ) issue: #43858 Refactor the balance checker implementation to use priority queues for managing collection balance operations, improving processing efficiency and order control. Changes include: - Export priority queue interfaces (Item, BaseItem, PriorityQueue) - Replace collection round-robin with priority-based queue system - Add BalanceCheckCollectionMaxCount configuration parameter - Optimize balance task generation with batch processing limits - Refactor processBalanceQueue method for different strategies - Enhance test coverage with comprehensive unit tests The new priority queue system processes collections based on row count or collection ID order, providing better control over balance operation priorities and resource utilization. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-09-19 17:46:01 +08:00
Zhen Ye	df7e507c49	fix: balance may not trigger at balance checker when upgrading (#43462 ) issue: #43416 Signed-off-by: chyezh <chyezh@outlook.com>	2025-07-22 16:02:53 +08:00
Zhen Ye	ecb24e7232	enhance: use multi-process framework in integration test (#42976 ) issue: #41609 - add env `MILVUS_NODE_ID_FOR_TESTING` to set up a node id for milvus process. - add env `MILVUS_CONFIG_REFRESH_INTERVAL` to set up the refresh interval of paramtable. - Init paramtable when calling `paramtable.Get()`. - add new multi process framework for integration test. - change all integration test into multi process. - merge some test case into one suite to speed up it. - modify some test, which need to wait for issue #42966, #42685. - remove the waittssync for delete collection to fix issue: #42989 --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-06-30 14:22:43 +08:00
wei liu	78010262f0	enhance: Optimize shard serviceable mechanism (#41937 ) issue: https://github.com/milvus-io/milvus/issues/41690 - Merge leader view and channel management into ChannelDistManager, allowing a channel to have multiple delegators. - Improve shard leader switching to ensure a single replica only has one shard leader per channel. The shard leader handles all resource loading and query requests. - Refine the serviceable mechanism: after QC completes loading, sync the query view to the delegator. The delegator then determines its serviceable status based on the query view. - When a delegator encounters forwarding query or deletion failures, mark the corresponding segment as offline and transition it to an unserviceable state. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-05-22 11:38:24 +08:00
wei liu	4e1208f4f6	enhance: support balancing multiple collections in single trigger (#41875 ) issue: #41874 - Optimize balance_checker to support balancing multiple collections simultaneously - Add new parameters for segment and channel balancing batch sizes - Add enableBalanceOnMultipleCollections parameter - Update tests for balance checker This change improves resource utilization by allowing the system to balance multiple collections in a single trigger with configurable batch sizes. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-05-21 21:38:25 +08:00
wei liu	a839d94c9e	fix: balance checker may enter infinite normal balance loop after balance suspension (#41195 ) issue: #41194 - Refactor hasUnbalancedCollection flag handling to function scope - Ensure tracking sets clearance when no balance needed - Add deferred cleanup for both normal/stopping balance paths - Add unit tests for collection tracking scenarios The changes ensure tracking sets (normalBalanceCollectionsCurrentRound and stoppingBalanceCollectionsCurrentRound) are properly cleared when: - All collections in current round are balanced - Balance checks return early due to unready targets - Balance feature flags are disabled Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-04-10 15:22:29 +08:00
wei liu	bf8547578f	fix: Address manual balance and balance check issues (#41037 ) issue: #37651 - Fix context propagation for manual balance segment task creation from PR #38080. - Optimize stopping balance by preventing redundant checks per round, addressing performance regression from PR #40297. - Decrease default `checkBalanceInterval` from 3000ms to 300ms. - Correct minor log messages in `BalanceChecker`. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-04-03 15:48:27 +08:00
wei liu	c02892e9fb	enhance: Balance the collection with the largest row count first (#40297 ) issue: #37651 this PR enable to balance the collection with largest row count first, to avoid temporary migration of small table data to new nodes during their onboarding, only to be moved out again after the large table balance, which would cause unnecessary load. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-03-31 16:00:19 +08:00
congqixia	cb7f2fa6fd	enhance: Use v2 package name for pkg module (#39990 ) Related to #39095 https://go.dev/doc/modules/version-numbers Update pkg version according to golang dep version convention --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-02-22 23:15:58 +08:00
wei liu	b9e3ec7175	enhance: Add trigger interval config for auto balance (#39154 ) issue: #39156 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-02-14 16:12:15 +08:00
Zhen Ye	bb8d1ab3bf	enhance: make new go package to manage proto (#39114 ) issue: #39095 --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-01-10 10:49:01 +08:00
wei liu	f49d618382	fix: Querycoord will trigger unexpected balance task after restart (#38630 ) issue: #38606 --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-12-25 19:30:48 +08:00
tinswzy	e76802f910	enhance: refine querycoord meta/catalog related interfaces to ensure that each method includes a ctx parameter (#37916 ) issue: #35917 This PR refine the querycoord meta related interfaces to ensure that each method includes a ctx parameter. Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>	2024-11-25 11:14:34 +08:00
wei liu	30a99b66c1	fix: Fix logic dead lock when delegator has high memory usage (#36065 ) issue: #36064 when delegator has high memory usage, load l0 segment will failed. and balance segment task will blocked by load segment task, then delegator cann't free memory by moving out some segment, causes a logic dead lock. this PR remove the limit for balance, we permit segment and balance execute in parallel. which won't cause side effect due to: 1. one segment can only has one task in qc's scheduler, and load/release task will replace balance task if necessary 2. balance speed has been limited, and it won't block load segment task. 3. if collection has load task and balance task at same time, load task will be scheduled first due to high proirity. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-09-09 10:21:06 +08:00
congqixia	86691656f3	enhance: Change frequent balancer debug log to rated one (#35749 ) "skip balance" log is too frequent in debug level. This PR changes it into rated on. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-08-29 10:07:00 +08:00
wei liu	166fc902b0	enhance: Limit collection's normal balance speed (#34810 ) issue: #34798 after we remove the task priority on query coord, to avoid load/release segment blocked by too much balance task, we limit the balance task size in each round. at same time, we reduce the balance interval to trigger balance more frequently. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-07-24 19:11:44 +08:00
congqixia	b284b81a47	fix: Check partition in current target when observing partition load status (#34282 ) See also #34234 `LoadPartitions` does not guarantee the current target has loading partitions if there are some partitions already loaded before. This PR check current target contains the partition to load when advancing loading percentage to 100. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-07-01 17:40:07 +08:00
wei liu	f7ecafe77d	enhance: Skip update index for L0 segment (#34099 ) try to update index for l0 segment, will failed by `index not found` This PR skip update index for l0 segment Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-07-01 10:26:06 +08:00
Chun Han	f7af323d1e	fix: sync partitiion stats blocking balance task(#33741 ) (#33742 ) related: #33741 Signed-off-by: MrPresent-Han <chun.han@zilliz.com>	2024-06-11 14:21:56 +08:00
wei liu	2013d97243	enhance: Enable to dynamic update balancer policy in querycoord (#33037 ) issue: #33036 This PR enable to dynamic update balancer policy without restart querycoord. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-21 14:29:39 +08:00
wei liu	a7f6193bfc	fix: query node may stuck at stopping progress (#33104 ) issue: #33103 when try to do stopping balance for stopping query node, balancer will try to get node list from replica.GetNodes, then check whether node is stopping, if so, stopping balance will be triggered for this replica. after the replica refactor, replica.GetNodes only return rwNodes, and the stopping node maintains in roNodes, so balancer couldn't find replica which contains stopping node, and stopping balance for replica won't be triggered, then query node will stuck forever due to segment/channel doesn't move out. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-20 10:21:38 +08:00
wei liu	fad8f0afa5	enhance: enable stopping balance after balance has been suspended (#32812 ) issue: #32811 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-08 10:15:29 +08:00
wei liu	06b191b164	fix: Balance channel stuck forever due to logic dead lock (#31202 ) issue: #30816 cause balance channel will stuck until leader view catch up the current target, then start to unsub the old delegator. which make sure that the new delegator can provide search before release old delegator. but another logic in segment_checker skip loading segment during balance channel. so during balance channel, if query node crash, new delegator can't catch up target forever, then stuck forever. This PR remove the rule that skip loading segment during balance channel to avoid the logic dead lock here. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-13 15:05:04 +08:00
wei liu	6dd7297178	fix: Skip generate balance task when target not ready (#30724 ) issue: #30723 This PR skip generate balance task when collection's target isn't ready. also refine the check stale logic in query coord's scheduler, if channel exist in current or next target, task won't be canceled. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-02-23 10:32:53 +08:00
wei liu	e98c62abbb	enhance: refactor leader_observer to leader_checker (#29454 ) issue: #29453 sync distribution by rpc will also call loadSegment/releaseSegment, which may cause all kinds of concurrent case on same segment, such as concurrent load and release on one segment. This PR add leader_checker which generate load/release task to correct the leader view, instead of calling sync distribution by rpc --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-01-05 15:54:55 +08:00
Bingyi Sun	45e6801ce4	feat: Add checker activation service interfaces (#28850 ) issue: #28610 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2023-12-04 17:38:37 +08:00
Bingyi Sun	8514a39d1a	feat: Add checker activation (#28611 ) issue: https://github.com/milvus-io/milvus/issues/28610 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2023-11-24 18:08:24 +08:00
congqixia	852be152de	Change task sourceID to stringer interface (#27965 ) Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-10-27 01:08:12 +08:00
SimFG	26f06dd732	Format the code (#27275 ) Signed-off-by: SimFG <bang.fu@zilliz.com>	2023-09-21 09:45:27 +08:00
congqixia	76e03fe6d3	Set reason for balance, index checker generated tasks (#25865 ) Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-07-24 17:07:00 +08:00
Bingyi Sun	5afea0e5bf	Fix querycoord crash (#25638 ) Signed-off-by: sunby <bingyi.sun@zilliz.com> Co-authored-by: sunby <bingyi.sun@zilliz.com>	2023-07-17 16:56:35 +08:00
yah01	e962a8ba31	Limit the frequency of debug logs (#25009 ) Signed-off-by: yah01 <yang.cen@zilliz.com>	2023-06-20 14:14:41 +08:00
yah01	67cf23d050	Fix panic while balancing releasing collection (#24003 ) Signed-off-by: yah01 <yang.cen@zilliz.com>	2023-05-11 12:47:20 +08:00
Enwei Jiao	240c5625cd	Fix nil pointer access (#23919 ) Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>	2023-05-08 10:08:39 +08:00
MrPresent-Han	b517bc9e6a	refine balance mechanism including:(#23454 ) (#23763 ) (#23791 ) 1. balance granuity to replica to avoid influence unrelated replicas 2. avoid balance back and forth Signed-off-by: MrPresent-Han <jamesharden11122@gmail.com>	2023-05-04 12:22:40 +08:00
wei liu	dbbd703667	fix balance generate unexpected task (#23299 ) Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2023-04-11 14:38:30 +08:00
MrPresent-Han	6fb3542f2a	enable auto balance paramter(#21504 ) (#21507 ) Signed-off-by: MrPresent-Han <jamesharden11122@gmail.com>	2023-01-06 14:45:35 +08:00
Enwei Jiao	89b810a4db	Refactor all params into ParamItem (#20987 ) Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com> Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>	2022-12-07 18:01:19 +08:00
SimFG	f8cff79804	Support the graceful stop for the query node (#20851 ) Signed-off-by: SimFG <bang.fu@zilliz.com> Signed-off-by: SimFG <bang.fu@zilliz.com>	2022-12-06 22:59:19 +08:00
yah01	4112c667d0	Refine task's priority (#20364 ) Signed-off-by: yah01 <yang.cen@zilliz.com> Signed-off-by: yah01 <yang.cen@zilliz.com>	2022-11-07 14:53:02 +08:00
yah01	1c71844b8d	Add license header (#19678 ) Signed-off-by: yah01 <yang.cen@zilliz.com> Signed-off-by: yah01 <yang.cen@zilliz.com>	2022-10-11 11:39:22 +08:00
Bingyi Sun	626854cf0c	Refactor QueryCoord (#18836 ) Signed-off-by: sunby <bingyi.sun@zilliz.com> Co-authored-by: yah01 <yang.cen@zilliz.com> Co-authored-by: Wei Liu <wei.liu@zilliz.com> Co-authored-by: Congqi Xia <congqi.xia@zilliz.com> Signed-off-by: sunby <bingyi.sun@zilliz.com> Co-authored-by: sunby <bingyi.sun@zilliz.com> Co-authored-by: yah01 <yang.cen@zilliz.com> Co-authored-by: Wei Liu <wei.liu@zilliz.com> Co-authored-by: Congqi Xia <congqi.xia@zilliz.com>	2022-09-15 18:48:32 +08:00

43 Commits