878 Commits

Author SHA1 Message Date
yihao.dai
9b2b2a2689
enhance: [10kcp] Remove scheduler and target manager mutex (#38968)
supplement to PR https://github.com/milvus-io/milvus/pull/38566

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-01-03 11:18:52 +08:00
yihao.dai
ecd55596cf
enhance: [10kcp] Optimize GetLocalDiskSize and segment loader mutex (#38600)
1. Make the segment loader lock protect only the resource.
2. Optimize GetDiskUsage to avoid excessive overhead.

issue: https://github.com/milvus-io/milvus/issues/37630

pr: https://github.com/milvus-io/milvus/pull/38599

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-12-19 21:14:26 +08:00
yihao.dai
de78de7689
fix: [10kcp] Fix consume blocked due to too many consumers (#38456)
This PR limits the maximum number of consumers per pchannel to 10 for
each QueryNode and DataNode.

issue: https://github.com/milvus-io/milvus/issues/37630

pr: https://github.com/milvus-io/milvus/pull/38455

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Co-authored-by: SimFG <bang.fu@zilliz.com>
2024-12-13 21:20:47 +08:00
yihao.dai
df4d5e1096
enhance: [10kcp] Read metadata concurrently to accelerate recovery (#38404)
Read metadata such as segments, binlogs, and partitions concurrently at
the collection level.

issue: https://github.com/milvus-io/milvus/issues/37630

pr: https://github.com/milvus-io/milvus/pull/38403

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-12-12 16:39:06 +08:00
congqixia
24a055996b
enhance: [10kcp] Add secondary index for querynode segment manager (#38312)
Cherry pick from pr
#38311
Related to #37630

Add secondary index with vchannel to reduce `GetBy` rlock holding time
when segment number is large.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-12-09 19:56:16 +08:00
yihao.dai
3e65cc5850
enhance: [10kcp] Enable score based balance channel policy (#38301)
issue: https://github.com/milvus-io/milvus/issues/38142
current balance channel policy only consider current collection's
distribution, so if all collections has 1 channel, and all channels has
been loaded on same querynode, after querynode num increase, balance
channel won't be triggered.

This PR enable score based balance channel policy, to achieve:

1. distribute all channels evenly across multiple querynodes
2. distribute each collection's channel evenly across multiple
querynodes.

pr: https://github.com/milvus-io/milvus/pull/38143

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Co-authored-by: Wei Liu <wei.liu@zilliz.com>
2024-12-09 19:50:05 +08:00
yihao.dai
2fe6423552
enhance: [10kcp] Speed up meta recovery (#38298)
Increase the batchSize in WalkWithPrefix operations to 10000.

issue: https://github.com/milvus-io/milvus/issues/37630

pr: https://github.com/milvus-io/milvus/pull/38285

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-12-09 19:49:35 +08:00
Zhen Ye
99279e0bef
enhance: remove the rpc layer of coordinator when enabling standalone or mixcoord (#38246)
issue: #33285
pr: #37815

- remove the rpc layer of coordinator when enabling standalone or
mixcoord
- move health check into init

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-12-05 17:27:53 +08:00
yihao.dai
338ccc9ff9
enhance: [10kcp] Reduce memory usage of BF in DataNode and QueryNode (#38133)
1. DataNode: Skip generating BF during the insert phase (BF will be
regenerated during the sync phase).
2. QueryNode: Skip generating or maintaining BF for growing segments;
deletion checks will be handled in the segcore.

issue: https://github.com/milvus-io/milvus/issues/37630

pr: https://github.com/milvus-io/milvus/pull/38129

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-12-02 14:41:19 +08:00
yihao.dai
0930430a68
enhance: [10kcp] Skip creating partition rate limiters when not enable (#38062)
issue: https://github.com/milvus-io/milvus/issues/37630

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-11-28 10:45:46 +08:00
yihao.dai
312475d1f1
enhance: [10kcp] remove the rpc level of coordinator (#37984)
issue: https://github.com/milvus-io/milvus/issues/37764

- add a local client to call local server directly for
querycoord/rootcoord/datacoord.
- enable local client if milvus is running mixcoord or standalone mode.

Signed-off-by: chyezh <chyezh@outlook.com>

---------

Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: Zhen Ye <chyezh@outlook.com>
2024-11-25 14:50:42 +08:00
yihao.dai
4845e4d679
enhance: [10kcp] Revert "enhance: remove the rpc level of coordinator (#37914)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-11-21 21:35:29 +08:00
yihao.dai
bf90e55319
enhance: [10kcp] Reduce GetRecoveryInfo calls (#37891)
1. Introduce a data view mechanism for DataCoord, attempting to update
each collection's data view periodically.
2. QueryCoord maintains a cache of data view versions. Before
batch-fetching recovery info, it retrieves all versions and only fetches
recovery info for collections with updated versions.
3. Return DataCoord's current data view when fetching RecoverInfo.

issue: https://github.com/milvus-io/milvus/issues/37743,
https://github.com/milvus-io/milvus/issues/37630

pr: https://github.com/milvus-io/milvus/pull/37863

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-11-21 15:43:13 +08:00
Zhen Ye
1a6b98be77
enhance: remove the rpc level of coordinator (#37876)
issue: #33285
pr: #37722

- move most cgo opeartions related to search/query into segcore package
for reusing for streamingnode.
- add go unittest for segcore operations.

Signed-off-by: chyezh <chyezh@outlook.com>
2024-11-21 15:21:11 +08:00
yihao.dai
92ab65ada0
enhance:[10kcp] Reduce GetIndexInfos calls (#37877)
Batch GetIndexInfos calls for segments to reduce RPC calls.

issue: https://github.com/milvus-io/milvus/issues/37634

pr: https://github.com/milvus-io/milvus/pull/37695

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-11-21 15:09:39 +08:00
congqixia
a10f95d71c
enhance: Bump milvus & proto version to v2.4.16 (#37762)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-18 20:36:31 +08:00
yihao.dai
13f83df019
enhance: [2.4] Remove segment-level tag from monitoring metrics (#37737)
When there are a large number of segments, the metrics consume a lot of
memory. This PR Remove segment-level tag from monitoring metrics.

issue: https://github.com/milvus-io/milvus/issues/37636

pr: https://github.com/milvus-io/milvus/pull/37696

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-11-16 23:04:33 +08:00
congqixia
cdf703aabc
enhance: [2.4] Enable RemoteLoad l0 forward policy by default (#37678) (#37713)
Cherry-pick from master
pr: #37678
Related to #35303

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-15 18:28:31 +08:00
smellthemoon
b3e6482367
enhance: add search params in search request in restful(#36304) (#37673)
pr: #36304 
pr: #36714 
pr: #36448

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: zhuwenxing <wenxing.zhu@zilliz.com>
2024-11-15 17:54:30 +08:00
congqixia
e222289038
fix: [2.4] Store default value if ErrKeyNotFound is returned (#37691) (#37705)
Cherry-pick from master
pr: #37691
Related to #37690

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-15 14:50:32 +08:00
congqixia
d073f322a4
enhance: [2.4] Add cgo call metrics for load/write API (#37405) (#37627)
Cherry-pick from master
pr: #37405

Cgo API cost is not observerable since not metrics is related to them.
This PR add metrics for some sync cgo call related to load & write

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-13 13:58:30 +08:00
wei liu
6dc879b1e2
enhance: Enable node assign policy on resource group (#36968) (#37588)
issue: #36977
pr: #36968
with node_label_filter on resource group, user can add label on
querynode with env `MILVUS_COMPONENT_LABEL`, then resource group will
prefer to accept node which match it's node_label_filter.

then querynode's can't be group by labels, and put querynodes with same
label to same resource groups.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-13 11:10:29 +08:00
sthuang
9e8b6ace6d
enhance: [2.4] RBAC custom privilege group (#37560)
Cherry-pick from master
pr: https://github.com/milvus-io/milvus/pull/37087,
https://github.com/milvus-io/milvus/pull/37558
issue: #37031

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2024-11-11 14:20:29 +08:00
congqixia
4f4261157d
fix: [2.4] Add IP address validation from paramtable (#37416) (#37500)
Cherry-pick from master
pr: #37416
See also #37404 #37402

IP address in paramtable need validation and fail fast with reasonable
error message

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-11 10:12:28 +08:00
wei liu
a9beca44ef
fix: watch channel stuck due to misuse of timer.Reset (#37433) (#37542)
issue: #37166
pr: #37433
cause the misuse of timer.Reset, which cause dispatcher failed to send
msg to virtual channel buffer, and dispatcher do splitting again and
again, which hold the dispatcher manager's lock, block watching channel
progress.

This PR fix the misuse of timer.Reset

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-08 18:46:27 +08:00
SimFG
5c166a25b9
enhance: [2.4] improve rootcoord task scheduling policy (#37523)
- issue: #30301
- pr: #37352

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-11-08 14:56:27 +08:00
congqixia
c8ba682aaf
enhance: [2.4] Use cancel label for ctx canceled storage op (#37468) (#37491)
Cherry-pick from master
pr: #37468

Previously failed label is used for canceled storage op, which may cause
wrong alarm when user cancel load operation or etc. This PR utilizes
cancel label when such case happens.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-07 12:38:26 +08:00
XuanYang-cn
20534a3f7b
fix: [cp24]Saperate L0 and Mix trigger interval (#37319)
See also: #37108
pr: #37190

- Add MixCompactionTriggerInterval, default 60s
- Add L0CompactionTriggerInterval, default 10s
- Export Single related compaction configs
- Raise SingleCompactionDeltaLogMaxSize from 2MB to 16MB

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-11-06 11:10:26 +08:00
congqixia
b7c80f9b83
enhance: Bump milvus & proto version to v2.4.15 (#37435)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-05 14:46:24 +08:00
SimFG
d0e78cef06
enhance: [2.4] update the expr version to fix the method call error (#37260)
/kind improvement
- pr: #37259

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-10-31 15:00:23 +08:00
XuanYang-cn
6109e9d69e
fix: Skip mark compaction timeout for mix and l0 compaction (#37118) (#37194)
Timeout is a bad design for long running tasks, especially using a
static timeout config. We should monitor execution progress and fail the
task if the progress has been stale for a long time.

This pr is a small patch to stop DC from marking compaction tasks
timeout, while still waiting for DN to finish. The design is
self-conflicted. After this pr, mix and L0 compaction are no longer
controlled by DC timeout, but clustering is still under timeout control.

The compaction queue capacity grows larger for priority calc, hence
timeout compactions appears more often, and when timeout, the queuing
tasks will be timeout too, no compaction will success after.

See also: #37108, #37015
pr: #37118

---------

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-10-31 10:36:21 +08:00
aoiasd
8370caa4a6
enhance: [Cherry-pick]Add collection name label for some metric (#36951) (#37159)
pr: https://github.com/milvus-io/milvus/pull/36951

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-10-29 17:38:22 +08:00
congqixia
0b284ccc23
enhance: Bump milvus & proto version to v2.4.14 (#37198)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-29 10:44:25 +08:00
congqixia
49147524be
enhance: [2.4] Use middleware to observe restful v2 in/out rpc stats (#37224)
Cherry pick from master
pr: #37223
Related to #36102

Previous PR #36107 add grpc inteceptor to observe rpc stats. Using same
strategy, this pr add gin middleware to observer restful v2 rpc stats.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-29 10:26:24 +08:00
SimFG
ae4ce9bbba
enhance: [2.4] allow to delete data when disk quota exhausted (#37139)
- issue: #37133
- pr: #37134

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-10-25 16:07:32 +08:00
congqixia
3db137f4ad
enhance: [2.4] Add metrics for querynode delete buffer info (#37081) (#37097)
Cherry pick from master
pr: #37081
Related to #35303

This PR add metrics for querynode delegator delete buffer information,
which is related to dml quota logic.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-24 16:21:37 +08:00
presburger
27a4fe002a
enhance:change gpu default mem pool size (#36969)
Signed-off-by: yusheng.ma <yusheng.ma@zilliz.com>
2024-10-23 17:17:28 +08:00
yihao.dai
539f56220f
enhance: Remove bf from datanode (#36367) (#37027)
Remove bf from datanode:
1. When watching vchannels, skip loading **flushed** segments's bf. For
generating merged bf, we need to keep loading **growing** segments's bf.
2. Bypass bloom filter checks for delete messages, directly writing to
L0 segments.
3. In version 2.4, when dropping a partition, marking segments as
dropped depends on having the full segment list in the DataNode. So, we
need to keep syncing the segments every 10 minutes.

issue: https://github.com/milvus-io/milvus/issues/34585

pr: https://github.com/milvus-io/milvus/pull/35902,
https://github.com/milvus-io/milvus/pull/36367,
https://github.com/milvus-io/milvus/pull/36592

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-22 11:15:28 +08:00
yihao.dai
4e0f5845a1
enhance: Limit import job number (#36891) (#36892)
issue: https://github.com/milvus-io/milvus/issues/36890

pr: https://github.com/milvus-io/milvus/pull/36891

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-18 18:13:25 +08:00
yihao.dai
8923936c9a
enhance: Support memory mode chunk cache (#35347) (#35836)
Chunk cache supports loading raw vectors into memory.

issue: https://github.com/milvus-io/milvus/issues/35273

pr: https://github.com/milvus-io/milvus/pull/35347

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-18 17:03:25 +08:00
Ted Xu
22838a8413
enhance: Datacoord to support prioritization of compaction tasks (#36979)
See #36550

pr: #36547 
pr: #36956

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-10-18 14:15:25 +08:00
cqy123456
6934e8da3a
enhance: [2.4]use growingMmapEnabled to control the behavior of interim index, not vectorField (#36391)
issue: https://github.com/milvus-io/milvus/issues/36392
related pr: https://github.com/milvus-io/milvus/pull/36500

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-10-17 20:23:25 +08:00
congqixia
3252d7a64c
fix: [2.4] Load original key if ts is MaxTimestamp (#36934) (#36950)
Cherry-pick from master
pr: #36934 

Related to #36933

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-17 16:05:29 +08:00
SimFG
8743752ac3
enhance: [2.4] force to stop buffer message when receiving the drop collection message (#36917)
/kind improvement
pr: #36916

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-10-17 12:53:29 +08:00
SimFG
6b9e28bc8f
enhance: [2.4] update the expr version to support automatic conversion of variable types (#36847)
/kind improvement
- pr: #36832

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-10-15 10:55:23 +08:00
SimFG
1d9c7462ba
enhance: [2.4] support to execute the method which contains the ctx param (#36798)
/kind improvement
- pr: #36797

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-10-11 23:17:21 +08:00
XuanYang-cn
e976b41f97
fix: Remove enableLevelZeroSegment config (#36507)
See also: #36504
pr: #36535

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-10-11 16:41:21 +08:00
congqixia
bacbfae542
enhance: Bump milvus & proto version to v2.4.13 (#36758)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-11 16:01:22 +08:00
yihao.dai
a4ef93457d
enhance: Optimize import scheduling and add time cost metric (#36601) (#36684)
1. Optimize import scheduling strategic:
a. Revise slot weights, calculating them based on the number of files
and segments for both import and pre-import tasks.
b. Ensure that the DN executes tasks in ascending order of task ID.
2. Add time cost metric and log.

issue: https://github.com/milvus-io/milvus/issues/36600,
https://github.com/milvus-io/milvus/issues/36518

pr: https://github.com/milvus-io/milvus/pull/36601

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-11 10:27:22 +08:00
SimFG
548f8e80c3
enhance: [2.4] the estimate method when loading the collection (#36728)
- pr: #36307
- issue: #36530

Signed-off-by: SimFG <bang.fu@zilliz.com>
Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2024-10-11 10:20:45 +08:00