milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-07 19:31:51 +08:00

Author	SHA1	Message	Date
wei liu	293838bb67	enhance: add delegator catching up streaming data state tracking (#46551 ) issue: #46550 - Add CatchUpStreamingDataTsLag parameter to control tolerable lag threshold for delegator to be considered caught up - Add catchingUpStreamingData field in delegator to track whether delegator has caught up with streaming data - Add catching_up_streaming_data field in LeaderViewStatus proto - Check catching up status in CheckDelegatorDataReady, return not ready when delegator is still catching up streaming data - Add unit tests for the new functionality When tsafe lag exceeds the threshold, the distribution will not be considered serviceable, preventing queries from timing out in waitTSafe. This is useful when streaming message queue consumption is slow. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - Core invariant: a delegator must not be considered serviceable while its tsafe lags behind the latest committed timestamp beyond a configurable tolerance; a delegator is "caught-up" only when (latestTsafe - delegator.GetTSafe()) < CatchUpStreamingDataTsLag (configured by queryNode.delegator.catchUpStreamingDataTsLag, default 1s). - New capability and where it takes effect: adds streaming-catchup tracking to QueryNode/QueryCoord — an atomic catchingUpStreamingData flag on shardDelegator (internal/querynodev2/delegator/delegator.go), a new param CatchUpStreamingDataTsLag (pkg/util/paramtable/component_param.go), and a LeaderViewStatus.catching_up_streaming_data field in the proto (pkg/proto/query_coord.proto). The flag is exposed in GetDataDistribution (internal/querynodev2/services.go) and used by QueryCoord readiness checks (internal/querycoordv2/utils/util.go::CheckDelegatorDataReady) to reject leaders that are still catching up. - What logic is simplified/added (not removed): instead of relying solely on segment distribution/worker heartbeats, the PR adds an explicit readiness gate that returns "not available" when the delegator reports catching-up-streaming-data. This is strictly additive — no existing checks are removed; the new precondition runs before segment availability validation to prevent premature routing to slow-consuming delegators. - Why this does NOT cause data loss or regress behavior: the change only controls serviceability visibility and routing — it never drops or mutates data. Concretely: shardDelegator starts with catchingUpStreamingData=true and flips to false in UpdateTSafe once the sampled lag falls below the configured threshold (internal/querynodev2/delegator/delegator.go::UpdateTSafe). QueryCoord will short-circuit in CheckDelegatorDataReady when leader.Status.GetCatchingUpStreamingData() is true (internal/querycoordv2/utils/util.go), returning a channel-not-available error before any segment checks; when the flag clears, existing segment-distribution checks (same code paths) resume. Tests added cover both catching-up and caught-up paths (internal/querynodev2/delegator/delegator_test.go, internal/querycoordv2/utils/util_test.go, internal/querynodev2/services_test.go), demonstrating convergence without changed data flows or deletion of data. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-12-29 17:15:21 +08:00
aoiasd	0c54875832	enhance: ValidateAnalyzer return ValidateAnalyzerResponse instead common.Status (#46292 ) Prepare for return more info when validate analyzer. relate: https://github.com/milvus-io/milvus/issues/43687 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-12-12 10:35:14 +08:00
aoiasd	947c8855f3	feat: support search bm25 with highlight (#44923 ) relate: https://github.com/milvus-io/milvus/issues/42589 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-11-18 16:09:39 +08:00
cai.zhang	216c576da2	fix: Retain collection early to prevent it from being released before query completion (#45413 ) issue: #45314 This PR only ensures that no panic occurs. However, we still need to provide protection for the delegator handling ongoing query tasks. Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-11-11 20:29:37 +08:00
aoiasd	cfeb095ad7	enhance: forbid build analyzer at proxy (#44067 ) relate: https://github.com/milvus-io/milvus/issues/43687 We used to run the temporary analyzer and validate analyzer on the proxy, but the proxy should not be a computation-heavy node. This PR move all analyzer calculations to the streaming node. --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-10-23 10:58:12 +08:00
yihao.dai	51f69f32d0	feat: Add CDC support (#44124 ) This PR implements a new CDC service for Milvus 2.6, providing log-based cross-cluster replication. issue: https://github.com/milvus-io/milvus/issues/44123 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com> Signed-off-by: chyezh <chyezh@outlook.com> Co-authored-by: chyezh <chyezh@outlook.com>	2025-09-16 16:32:01 +08:00
Zhen Ye	5551d99425	enhance: remove old arch non-streaming arch code (#43651 ) issue: #41609 - remove all dml dead code at proxy - remove dead code at l0_write_buffer - remove msgstream dependency at proxy - remove timetick reporter from proxy - remove replicate stream implementation --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-08-06 14:41:40 +08:00
Zhen Ye	15a6631147	enhance: add quota limit based on sn consuming lag (#43105 ) issue: #42995 - The consuming lag at streaming node will be reported to coordinator. - The consuming lag will trigger the write limit and deny by quota center. - Set the ttProtection by default. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-07-11 14:10:49 +08:00
aoiasd	2ae4d80120	enhance: support run analyzer by loaded collection field (#42113 ) relate: https://github.com/milvus-io/milvus/issues/42094 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-05-29 10:54:30 +08:00
wei liu	54619eaa2c	feat: Implement partial result support on node down (#42009 ) issue: https://github.com/milvus-io/milvus/issues/41690 This commit implements partial search result functionality when query nodes go down, improving system availability during node failures. The changes include: - Enhanced load balancing in proxy (lb_policy.go) to handle node failures with retry support - Added partial search result capability in querynode delegator and distribution logic - Implemented tests for various partial result scenarios when nodes go down - Added metrics to track partial search results in querynode_metrics.go - Updated parameter configuration to support partial result required data ratio - Replaced old partial_search_test.go with more comprehensive partial_result_on_node_down_test.go - Updated proto definitions and improved retry logic These changes improve query resilience by returning partial results to users when some query nodes are unavailable, ensuring that queries don't completely fail when a portion of data remains accessible. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-05-28 00:12:28 +08:00
wei liu	78010262f0	enhance: Optimize shard serviceable mechanism (#41937 ) issue: https://github.com/milvus-io/milvus/issues/41690 - Merge leader view and channel management into ChannelDistManager, allowing a channel to have multiple delegators. - Improve shard leader switching to ensure a single replica only has one shard leader per channel. The shard leader handles all resource loading and query requests. - Refine the serviceable mechanism: after QC completes loading, sync the query view to the delegator. The delegator then determines its serviceable status based on the query view. - When a delegator encounters forwarding query or deletion failures, mark the corresponding segment as offline and transition it to an unserviceable state. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-05-22 11:38:24 +08:00
congqixia	b5443ddbd0	enhance: [AddField] Reopen loaded segments after AddField (#41529 ) Related to #39718 This PR: - Add reopen logic for growing & sealed segments - Lazy reopen when schema version increases - Add FinishLoad api for loading progress --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-04-26 08:48:39 +08:00
congqixia	b36c88f3c8	enhance: [AddField] Broadcast schema change via WAL (#41373 ) Related to #39718 Add Broadcast logic for collection schema change and notifies: - Streamnode - Delegator - Streamnode - Flush component - QueryNodes via grpc --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-04-22 16:28:37 +08:00
congqixia	94a859c028	enhance: Add buffer forwarder for stream delta loading (#40559 ) See also #40558 Related to #35303 & #38066 as well This PR: - Add `BufferedForward` to limit memory usage forwarding stream delete - Add `UseLoad` flag to determine `Delete` shall use `segment.Delete` or `segment.LoadDelta` - Fix delegator accidentally use always true candidate while load streaming delta --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-03-17 15:24:10 +08:00
yihao.dai	004a1875dc	enhance: Introduce batch subscription in msgdispatcher (#39863 ) Introduce a batch subscription mechanism in msgdispatcher: the msgdispatcher now includes a vchannel watch task queue, where all vchannels in the queue will subscribe to the MQ only once and pull messages from the oldest vchannel checkpoint to the latest. issue: https://github.com/milvus-io/milvus/issues/39862 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-03-05 14:38:02 +08:00
wei liu	69b8b89369	enhance: Remove QueryCoord's scheduling of L0 segments (#39552 ) issue: #39551 This PR remove querycoord's scheduling of l0 segments: - only load l0 segment when watch channel - only release l0 segment when release channel or sync data distribution --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-02-26 21:38:00 +08:00
congqixia	cb7f2fa6fd	enhance: Use v2 package name for pkg module (#39990 ) Related to #39095 https://go.dev/doc/modules/version-numbers Update pkg version according to golang dep version convention --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-02-22 23:15:58 +08:00
aoiasd	24d2bbc441	enhance: unmashall ts msg in dispatcher instead in msgstream (#38656 ) relate: https://github.com/milvus-io/milvus/issues/38655 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-02-14 12:04:13 +08:00
yihao.dai	f0b7446e6b	enhance: Remove unnecessary collection and partition label from the metrics (#39536 ) /kind improvement --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-02-05 11:01:10 +08:00
congqixia	bca2a62b78	enhance: Handle PutOrRef collection schema failure error (#39310 ) Related to previous pr #39279 When NewCollection returns nil, the error shall be returned and handled by caller Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-01-16 10:13:06 +08:00
Zhen Ye	bb8d1ab3bf	enhance: make new go package to manage proto (#39114 ) issue: #39095 --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-01-10 10:49:01 +08:00
cai.zhang	bb5f38e574	enhance: Return collection not loaded rather than not found on querynode (#38593 ) issue: #38586 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-12-23 14:36:48 +08:00
jaime	78438ef41e	fix: revert optimize CPU usage for CheckHealth requests (#35589 ) (#38555 ) issue: #35563 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-12-19 00:38:45 +08:00
jaime	28fdbc4e30	enhance: optimize CPU usage for CheckHealth requests (#35589 ) issue: #35563 1. Use an internal health checker to monitor the cluster's health state, storing the latest state on the coordinator node. The CheckHealth request retrieves the cluster's health from this latest state on the proxy sides, which enhances cluster stability. 2. Each health check will assess all collections and channels, with detailed failure messages temporarily saved in the latest state. 3. Use CheckHealth request instead of the heavy GetMetrics request on the querynode and datanode Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-12-17 11:02:45 +08:00
tinswzy	27229f7907	enhance: refine exists log print with ctx (#38080 ) issue: #35917 Refines exists log print with ctx Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>	2024-12-14 22:36:44 +08:00
congqixia	10460ed3f8	enhance: Simplify querynode tsafe & reduce goroutine number (#38416 ) Related to #37630 TSafe manager is too complex for current implementation and each delegator need one goroutine waiting for tsafe update event. Tsafe updating could be executed in pipeline. This PR remove tsafe manager and simplify the entire logic of tsafe updating. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-12-13 10:56:43 +08:00
Gao	8977454311	enhance: support recall estimation (#38017 ) issue: #37899 Only `search` api will be supported --------- Signed-off-by: chasingegg <chao.gao@zilliz.com>	2024-12-11 20:40:48 +08:00
congqixia	051bc280dd	enhance: Make dynamic load/release partition follow targets (#38059 ) Related to #37849 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-12-05 16:24:40 +08:00
Zhen Ye	c6dcef7b84	enhance: move segcore codes of segment into one package (#37722 ) issue: #33285 - move most cgo opeartions related to search/query into segcore package for reusing for streamingnode. - add go unittest for segcore operations. Signed-off-by: chyezh <chyezh@outlook.com>	2024-11-29 10:22:36 +08:00
congqixia	b0bd290a6e	enhance: Use internal json(sonic) to replace std json lib (#37708 ) Related to #35020 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-11-18 10:46:31 +08:00
cai.zhang	14e007d6fb	fix: Fix the bug that retrieved from wrong field for L0 segments (#37598 ) issue: #37574 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-11-12 20:58:29 +08:00
jaime	9d16b972ea	feat: add tasks page into management WebUI (#37002 ) issue: #36621 1. Add API to access task runtime metrics, including: - build index task - compaction task - import task - balance (including load/release of segments/channels and some leader tasks on querycoord) - sync task 2. Add a debug model to the webpage by using debug=true or debug=false in the URL query parameters to enable or disable debug mode. Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-10-28 10:13:29 +08:00
Gao	1d61b604e1	enhance: support retry search when topk is reduced and result not enough (#35645 ) issue: #35576 This pr is to cover those cases when queryHook optimize search params and make the result size insufficient, add retry search mechanism and add related metrics for alarming. --------- Signed-off-by: chasingegg <chao.gao@zilliz.com>	2024-10-23 19:19:30 +08:00
yihao.dai	3685edb264	enhance: Use common gc config (#36668 ) Use the GC config from `common` and remove the GC config from `queryNode`. issue: https://github.com/milvus-io/milvus/issues/36667 related pr: https://github.com/milvus-io/milvus/pull/34949 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-10-09 19:47:19 +08:00
congqixia	8593c4580a	enhance: Add delete buffer related quota logic (#35918 ) See also #35303 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-09-05 11:39:03 +08:00
SimFG	731d45abbe	enhance: provide more general configuration to control mmap behavior (#35359 ) - issue: #35273 Signed-off-by: SimFG <bang.fu@zilliz.com>	2024-08-21 00:22:54 +08:00
wei liu	c45f38aa61	enhance: Update protobuf-go to protobuf-go v2 (#34394 ) issue: #34252 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-07-29 11:31:51 +08:00
congqixia	531092c031	enhance: Add lint rule to forbid gogo protobuf (#34594 ) github.com/gogo/protobuf is deprecated and could be error prune after upgrade protobuf message to v2. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-07-12 10:19:35 +08:00
wayblink	a1232fafda	feat: Major compaction (#33620 ) #30633 Signed-off-by: wayblink <anyang.wang@zilliz.com> Co-authored-by: MrPresent-Han <chun.han@zilliz.com>	2024-06-10 21:34:08 +08:00
SimFG	cb99e3db34	enhance: add the includeCurrentMsg param for the Seek method (#33326 ) /kind improvement - issue: #33325 Signed-off-by: SimFG <bang.fu@zilliz.com>	2024-05-27 10:31:41 +08:00
congqixia	40728ce83d	enhance: Add `metautil.Channel` to convert string compare to int (#32749 ) See also #32748 This PR: - Add `metautil.Channel` utiltiy which convert virtual name to physical channel name, collectionID and shard idx - Add channel mapper interface & implementation to convert limited physical channel name into int index - Apply `metautil.Channel` filter in querynode segment manager logic --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-05-07 19:13:35 +08:00
Bingyi Sun	fecd9c21ba	feat: LRU cache implementation (#32567 ) issue: https://github.com/milvus-io/milvus/issues/32783 This pr is the implementation of lru cache on branch lru-dev. Signed-off-by: sunby <sunbingyi1992@gmail.com> Co-authored-by: chyezh <chyezh@outlook.com> Co-authored-by: MrPresent-Han <chun.han@zilliz.com> Co-authored-by: Ted Xu <ted.xu@zilliz.com> Co-authored-by: jaime <yun.zhang@zilliz.com> Co-authored-by: wayblink <anyang.wang@zilliz.com>	2024-05-06 20:29:30 +08:00
wei liu	1a98ce39f5	enhance: Remove useless logic about FromShardLeader (#32029 ) issue: #32047 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-10 20:11:19 +08:00
wei liu	df208d538c	fix: Check exclude segment before add new growing segment (#31803 ) issue: #31479 #31797 milvus will add released segment to excluded info, and filter out it's stream data in filter_node. but for data buffered in insert_node's channel, if it belongs to growing segment which already be released, then it will all the growing segment back again. This PR maintain `excluded segments` in delegator, and check excluded segment before new growing segment. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-10 15:29:17 +08:00
zhenshan.cao	089c805e0a	enhance:Refactor hybrid search (#32020 ) issue: https://github.com/milvus-io/milvus/issues/25639 https://github.com/milvus-io/milvus/issues/31368 Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2024-04-09 14:21:18 +08:00
wei liu	bb500d66c7	fix: Remove segment from leader view can't be executed (#31663 ) issue: #31664 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-01 10:39:12 +08:00
Chun Han	c3264ca3e3	feat: support segment pruner (#31003 ) related: #30376	2024-03-22 13:57:06 +08:00
congqixia	d90e01532f	enhance: Release level zero segments when channel unsub (#31486 ) Related to #27349 See also #30816 Level zero is not allowed to balance among delegators, they shall always serve current delegator. This PR releases all level zero segments after channel is unsubscribed and preventing level zero segment blocking graceful stop. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-22 10:27:17 +08:00
Buqian Zheng	3c80083f51	feat: [Sparse Float Vector] add sparse vector support to milvus components (#30630 ) add sparse float vector support to different milvus components, including proxy, data node to receive and write sparse float vectors to binlog, query node to handle search requests, index node to build index for sparse float column, etc. https://github.com/milvus-io/milvus/issues/29419 --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2024-03-13 14:32:54 -07:00
wei liu	9cfe183253	enhance: remove duplicated target node id check (#31087 ) issue: #31109 This PR remove duplicate target node id check, due to server id has already been checked in rpc's interceptor --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-11 15:31:02 +08:00

1 2 3

110 Commits