milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2025-12-07 01:28:27 +08:00

Author	SHA1	Message	Date
congqixia	7f10c98321	fix: [2.6] update QueryNode NumEntities metrics when collection has no segments (#45147 ) (#45160 ) Cherry-pick from master pr: #45147 Related to #44509 Fix a bug where QueryNodeNumEntities metrics were not updated for collections with zero segments, causing stale metrics when all segments are flushed or compacted. The previous implementation used separate loops: one to update size metrics for all collections, and another to update num entities metrics only for collections present in the grouped segments map. Collections with no segments were skipped in the second loop, leaving their NumEntities metrics stale. Changes: - Consolidate size and num entities metric updates into single loop - Iterate over all collections instead of grouped segments - Get collection metadata from manager instead of segment instances - Correctly set NumEntities to 0 for collections with no segments - Apply the same fix to both growing and sealed segment processing - Add nil check for collection metadata before processing This ensures all collection metrics are updated consistently, even when segment count drops to zero. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-10-30 14:08:09 +08:00
sparknack	64b76b723f	enhance: [2.6] add a disk quota for the loaded binlog size to prevent load failures of querynode (#44932 ) issue: #41435 pr: #44893 --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-10-19 19:46:07 +08:00
Zhen Ye	a110d8cc49	fix: don't use logical resource for metrics of quota center on streaming node (#44613 ) issue: #44599 Signed-off-by: chyezh <chyezh@outlook.com>	2025-09-29 21:34:13 +08:00
sparknack	70c8114e85	enhance: cachinglayer: resource management for segment loading (#43846 ) issue: #41435 --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-08-29 11:37:50 +08:00
Zhen Ye	15a6631147	enhance: add quota limit based on sn consuming lag (#43105 ) issue: #42995 - The consuming lag at streaming node will be reported to coordinator. - The consuming lag will trigger the write limit and deny by quota center. - Set the ttProtection by default. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-07-11 14:10:49 +08:00
cai.zhang	9eebb9b464	fix: Collect entites num group by collection instead of partition (#41788 ) issue: #41787 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-05-15 12:04:22 +08:00
wei liu	f79391dea9	fix: remove metrics reset calls to ensure accurate reporting (#41049 ) issue: #41048 Fixes issue introduced in PR #33522 where metric resets caused incomplete data collection by monitoring systems. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-04-08 11:38:36 +08:00
yihao.dai	27c7cbbc72	fix: Fix QueryNodeNumEntities metric (#40602 ) fix QueryNodeNumEntities metric introduced by pr https://github.com/milvus-io/milvus/pull/39536 issue: https://github.com/milvus-io/milvus/issues/38162 Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-03-12 19:08:05 +08:00
congqixia	cb7f2fa6fd	enhance: Use v2 package name for pkg module (#39990 ) Related to #39095 https://go.dev/doc/modules/version-numbers Update pkg version according to golang dep version convention --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-02-22 23:15:58 +08:00
yihao.dai	f0b7446e6b	enhance: Remove unnecessary collection and partition label from the metrics (#39536 ) /kind improvement --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-02-05 11:01:10 +08:00
jaime	f03a85725a	enhance: add db name in replica (#38672 ) issue: #36621 Signed-off-by: jaime <yun.zhang@zilliz.com>	2025-01-09 19:40:59 +08:00
jaime	78438ef41e	fix: revert optimize CPU usage for CheckHealth requests (#35589 ) (#38555 ) issue: #35563 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-12-19 00:38:45 +08:00
jaime	28fdbc4e30	enhance: optimize CPU usage for CheckHealth requests (#35589 ) issue: #35563 1. Use an internal health checker to monitor the cluster's health state, storing the latest state on the coordinator node. The CheckHealth request retrieves the cluster's health from this latest state on the proxy sides, which enhances cluster stability. 2. Each health check will assess all collections and channels, with detailed failure messages temporarily saved in the latest state. 3. Use CheckHealth request instead of the heavy GetMetrics request on the querynode and datanode Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-12-17 11:02:45 +08:00
congqixia	10460ed3f8	enhance: Simplify querynode tsafe & reduce goroutine number (#38416 ) Related to #37630 TSafe manager is too complex for current implementation and each delegator need one goroutine waiting for tsafe update event. Tsafe updating could be executed in pipeline. This PR remove tsafe manager and simplify the entire logic of tsafe updating. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-12-13 10:56:43 +08:00
jaime	8ed019735c	enhance: add disk stats within system metrics (#38033 ) issue: ##36621 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-12-06 16:32:41 +08:00
jaime	7bbfe86bcd	enhance: add list index and segment index retrieval API for WebUI (#37861 ) issue: #36621 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-11-22 16:58:34 +08:00
congqixia	b0bd290a6e	enhance: Use internal json(sonic) to replace std json lib (#37708 ) Related to #35020 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-11-18 10:46:31 +08:00
jaime	f348bd9441	feat: add segment,pipeline, replica and resourcegroup api for WebUI (#37344 ) issue: #36621 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-11-07 11:52:25 +08:00
jaime	9d16b972ea	feat: add tasks page into management WebUI (#37002 ) issue: #36621 1. Add API to access task runtime metrics, including: - build index task - compaction task - import task - balance (including load/release of segments/channels and some leader tasks on querycoord) - sync task 2. Add a debug model to the webpage by using debug=true or debug=false in the URL query parameters to enable or disable debug mode. Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-10-28 10:13:29 +08:00
aoiasd	22b917a1e6	enhance: Add collection name label for some metric (#36951 ) Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2024-10-25 14:29:47 +08:00
congqixia	8593c4580a	enhance: Add delete buffer related quota logic (#35918 ) See also #35303 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-09-05 11:39:03 +08:00
jaime	24fb10114b	enhance: remove cooling off in rate limiter for read requests (#35935 ) issue: #35934 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-09-04 14:39:10 +08:00
jaime	8858fcb40a	fix: fix loaded entity num is inaccurate (#33521 ) issue: #33520 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-06-04 20:09:54 +08:00
yihao.dai	281a583eda	fix: Correct the negative queryable num entities metric (#32361 ) issue: https://github.com/milvus-io/milvus/issues/32281 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-04-24 15:55:24 +08:00
Jiquan Long	dc2cdbe387	enhance: add more metrics (#31271 ) /kind improvement fix: #31272 This pr add more metrics, which are: - Slow query count, which the duration considered as slow can be configurable; - Number of deleted entities; - Number of entities imported; - Number of entities per collection; - Number of loaded entities per collection; - Number of indexed entities; - Number of indexed entities, per collection, per index and whether it's a vetor index; - Quota states (LongTimeTickDelay, MemoryExhuasted, DiskQuotaExhuasted) per database; --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2024-03-19 15:23:06 +08:00
congqixia	08aba2e05f	fix: Remove `QueryNodeEntitiesSize` after segment/collection released (#31290 ) See also #31289 This PR: - Set collection level `QueryNodeEntitiesSize` to zero if all segment released - Delete `QueryNodeEntitiesSize` metrics value after collection ref is zero Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-15 15:43:04 +08:00
yiwangdr	c6665c2a4c	test: support multiple data/querynodes in integration test (#30618 ) issue: https://github.com/milvus-io/milvus/issues/29507 Signed-off-by: yiwangdr <yiwangdr@gmail.com>	2024-02-21 11:54:53 +08:00
yah01	be980fbc38	Refine state check (#27541 ) Signed-off-by: yah01 <yah2er0ne@outlook.com>	2023-10-11 21:01:35 +08:00
yah01	00c65fa0d7	Refine QueryNode errors (#27013 ) Signed-off-by: yah01 <yah2er0ne@outlook.com>	2023-09-12 16:07:18 +08:00
yah01	3349db4aa7	Refine errors to remove changes breaking design (#26521 ) Signed-off-by: yah01 <yah2er0ne@outlook.com>	2023-09-04 09:57:09 +08:00
yihao.dai	4c93495587	Add segment size metric in querynode (#25406 ) Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2023-07-12 14:26:28 +08:00
congqixia	41af0a98fa	Use go-api/v2 for milvus-proto (#24770 ) Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-06-09 01:28:37 +08:00
aoiasd	44e5daae3a	Fix querynode read quota (#24412 ) Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2023-06-08 17:24:36 +08:00
yihao.dai	7384d83d2c	Support rate limit based on growing segments size (#24121 ) Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2023-05-17 09:57:22 +08:00
jaime	c9d0c157ec	Move some modules from internal to public package (#22572 ) Signed-off-by: jaime <yun.zhang@zilliz.com>	2023-04-06 19:14:32 +08:00
yah01	081572d31c	Refactor QueryNode (#21625 ) Signed-off-by: yah01 <yang.cen@zilliz.com> Co-authored-by: Congqi Xia <congqi.xia@zilliz.com> Co-authored-by: aoiasd <zhicheng.yue@zilliz.com>	2023-03-27 00:42:00 +08:00

36 Commits