milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2025-12-06 17:18:35 +08:00

Author	SHA1	Message	Date
wei liu	bf5fde1431	fix: Prevent delegator unserviceable due to shard leader change (#42689 ) issue: #42098 #42404 Fix critical issue where concurrent balance segment and balance channel operations cause delegator view inconsistency. When shard leader switches between load and release phases of segment balance, it results in loading segments on old delegator but releasing on new delegator, making the new delegator unserviceable. The root cause is that balance segment modifies delegator views, and if these modifications happen on different delegators due to leader change, it corrupts the delegator state and affects query availability. Changes include: - Add shardLeaderID field to SegmentTask to track delegator for load - Record shard leader ID during segment loading in move operations - Skip release if shard leader changed from the one used for loading - Add comprehensive unit tests for leader change scenarios This ensures balance segment operations are atomic on single delegator, preventing view corruption and maintaining delegator serviceability. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-06-19 12:10:38 +08:00
wei liu	0b4a17c22b	fix: Fix exclude nodes clearing logic position in load balancer retry (#42577 ) issue: #42561 Move the exclude nodes clearing logic from ExecuteWithRetry to selectNode after shard leader cache refresh to ensure proper retry behavior: - Remove premature exclude clearing in ExecuteWithRetry that happened before shard leader cache update - Add exclude clearing logic in selectNode after refreshing shard leader cache when all replicas are excluded - Ensure multiple retries can properly update shard leader cache and clear exclude list when needed - Add comprehensive tests for edge cases including empty shard leaders and mixed serviceable node scenarios --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-06-17 08:15:24 +08:00
aoiasd	2ae4d80120	enhance: support run analyzer by loaded collection field (#42113 ) relate: https://github.com/milvus-io/milvus/issues/42094 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-05-29 10:54:30 +08:00
wei liu	54619eaa2c	feat: Implement partial result support on node down (#42009 ) issue: https://github.com/milvus-io/milvus/issues/41690 This commit implements partial search result functionality when query nodes go down, improving system availability during node failures. The changes include: - Enhanced load balancing in proxy (lb_policy.go) to handle node failures with retry support - Added partial search result capability in querynode delegator and distribution logic - Implemented tests for various partial result scenarios when nodes go down - Added metrics to track partial search results in querynode_metrics.go - Updated parameter configuration to support partial result required data ratio - Replaced old partial_search_test.go with more comprehensive partial_result_on_node_down_test.go - Updated proto definitions and improved retry logic These changes improve query resilience by returning partial results to users when some query nodes are unavailable, ensuring that queries don't completely fail when a portion of data remains accessible. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-05-28 00:12:28 +08:00
congqixia	cb7f2fa6fd	enhance: Use v2 package name for pkg module (#39990 ) Related to #39095 https://go.dev/doc/modules/version-numbers Update pkg version according to golang dep version convention --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-02-22 23:15:58 +08:00
Zhen Ye	bb8d1ab3bf	enhance: make new go package to manage proto (#39114 ) issue: #39095 --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-01-10 10:49:01 +08:00
wei liu	fceff6ed7f	enhance: Optimize param cost in search (#37738 ) Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-11-19 10:14:40 +08:00
wei liu	261212ee4a	fix: Get shard client failed by client is closed (#37729 ) issue: #37718 This PR refine the shard client ref counter, dec ref counter won't release client anymore, and only permit shard client manager to remove client. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-11-18 18:22:30 +08:00
wei liu	c5485bb1b1	fix: Make GetShardLeaders skip retry on not loaded collection (#37684 ) issue: #37532 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-11-14 21:14:40 +08:00
congqixia	e7fb87d491	enhance: Fix lb policy retry log after targetNode type changed (#37646 ) Related to previous pr: #37371 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-11-13 19:52:29 +08:00
wei liu	2a4c00de9d	enhance: Decouple shard client manager from shard cache (#37371 ) issue: #37115 the old implementation update shard cache and shard client manager at same time, which causes lots of conor case due to concurrent issue without lock. This PR decouple shard client manager from shard cache, so only shard cache will be updated if delegator changes. and make sure shard client manager will always return the right client, and create a new client if not exist. in case of client leak, shard client manager will purge client in async for every 10 minutes. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-11-12 10:30:28 +08:00
wei liu	b83b376cfc	fix: Search/Query may failed during updating delegator cache. (#37116 ) issue: #37115 casue init query node client is too heavy, so we remove updateShardClient from leader mutex, which cause much more concurrent cornor cases. This PR delay query node client's init operation until `getClient` is called, then use leader mutex to protect updating shard client progress to avoid concurrent issues. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-11-05 10:52:23 +08:00
wei liu	bd658a6510	enhance: Enable dynamic update replica selection policy (#35860 ) issue: #35859 --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-09-13 17:05:15 +08:00
wei liu	d900e68440	fix: fix GetShardLeaders return empty node list (#32685 ) issue: #32449 to avoid GetShardLeaders return empty node list, this PR add node list check in both client side and server side. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-29 14:19:26 +08:00
zhenshan.cao	60e88fb833	fix: Restore the MVCC functionality. (#29749 ) When the TimeTravel functionality was previously removed, it inadvertently affected the MVCC functionality within the system. This PR aims to reintroduce the internal MVCC functionality as follows: 1. Add MvccTimestamp to the requests of Search/Query and the results of Search internally. 2. When the delegator receives a Query/Search request and there is no MVCC timestamp set in the request, set the delegator's current tsafe as the MVCC timestamp of the request. If the request already has an MVCC timestamp, do not modify it. 3. When the Proxy handles Search and triggers the second phase ReQuery, divide the ReQuery into different shards and pass the MVCC timestamp to the corresponding Query requests. issue: #29656 Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2024-01-09 11:38:48 +08:00
yah01	bf633bb5d7	enhance: refine the retry error (#28573 ) return the last error but not combining all errors, to improve readability and erorr handling resolve: #28572 --------- Signed-off-by: yah01 <yah2er0ne@outlook.com>	2023-11-30 18:34:32 +08:00
yah01	bfccfcd0ca	enhance: refine error messages (#28424 ) - Split the simple reason and full detail - Refine existing error messages related: #28422 --------- Signed-off-by: yah01 <yah2er0ne@outlook.com>	2023-11-21 17:02:24 +08:00
congqixia	49516d44b4	Add ctx parameter and log tracer for watch and selectNodes (#27809 ) Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-10-20 04:22:11 +08:00
jaime	7f7c71ea7d	Decoupling client and server API in types interface (#27186 ) Co-authored-by:: aoiasd <zhicheng.yue@zilliz.com> Signed-off-by: jaime <yun.zhang@zilliz.com>	2023-09-26 09:57:25 +08:00
yah01	3349db4aa7	Refine errors to remove changes breaking design (#26521 ) Signed-off-by: yah01 <yah2er0ne@outlook.com>	2023-09-04 09:57:09 +08:00
wei liu	4aac7a6642	refine error reason of query/search failed (#26342 ) Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2023-08-16 19:38:18 +08:00
wei liu	518b6310a2	refine retry times on replica (#26043 ) Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2023-08-03 15:55:09 +08:00
smellthemoon	9614e61f14	Fix collection and channel not match (#25859 ) Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2023-08-01 17:33:06 +08:00
aoiasd	77a9553c3f	Add some log when node being reachable or unreachable (#25572 ) Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2023-07-18 10:35:20 +08:00
yah01	948d1f1f4a	Handle errors by merr for QueryCoord (#24926 ) Signed-off-by: yah01 <yang.cen@zilliz.com>	2023-07-17 14:59:34 +08:00
SimFG	69d274d233	Improve the operation log (#25589 ) Signed-off-by: SimFG <bang.fu@zilliz.com>	2023-07-14 16:08:31 +08:00
wei liu	17796743dd	add metrics for replica selection (#25037 ) Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2023-06-27 09:52:44 +08:00
wei liu	7b999b42bd	enable config policy on replica selection (#25067 ) Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2023-06-25 19:46:44 +08:00
jaime	18df2ba6fd	[Cherry-Pick] Support Database (#24769 ) Support Database(#23742) Fix db nonexists error for FlushAll (#24222) Fix check collection limits fails (#24235) backward compatibility with empty DB name (#24317) Fix GetFlushAllState with DB (#24347) Remove db from global meta cache after drop database (#24474) Fix db name is empty for describe collection response (#24603) Add RBAC for Database API (#24653) Fix miss load the same name collection during recover stage (#24941) RBAC supports Database validation (#23609) Fix to list grant with db return empty (#23922) Optimize PrivilegeAll permission check (#23972) Add the default db value for the rbac request (#24307) Signed-off-by: jaime <yun.zhang@zilliz.com> Co-authored-by: SimFG <bang.fu@zilliz.com> Co-authored-by: longjiquan <jiquan.long@zilliz.com>	2023-06-25 17:20:43 +08:00
wei liu	46f7d903a3	enable look aside balancer on replica selection (#24791 ) Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2023-06-16 18:38:39 +08:00
wei liu	a3437e0ab5	refactor replica selection and retry policy on channel (#24367 ) Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2023-06-13 10:20:37 +08:00

31 Commits