364 Commits

Author SHA1 Message Date
Zhen Ye
5d2f454ce4
enhance: add multiply factor when loading index (#38721)
issue: #38715
pr: #38716

Signed-off-by: chyezh <chyezh@outlook.com>
2024-12-25 10:50:58 +08:00
jaime
ee7dffc758
fix: sync task still running after DataNode has stopped (#38441)
issue: #38319
pr: #38377

---------

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-12-18 12:38:47 +08:00
wei liu
83e162f5f1
enhance: Enable score based balance channel policy (#38143) (#38378)
issue: #38142
pr: #38143
current balance channel policy only consider current collection's
distribution, so if all collections has 1 channel, and all channels has
been loaded on same querynode, after querynode num increase, balance
channel won't be triggered.

This PR enable score based balance channel policy, to achieve:
1. distribute all channels evenly across multiple querynodes
2. distribute each collection's channel evenly across multiple
querynodes.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-12-13 10:28:44 +08:00
SimFG
df73f93126
enhance: [2.4] pick some master improvements to 2.4 branch (#38128)
- issue: #38127

master pr list:
- #37759
- #37835
- #37845
- #37874
- #37894
- #37969
- #37983
- #38005
- #38035

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-12-13 10:24:45 +08:00
Zhen Ye
6b310e16dc
enhance: remove the rpc layer of coordinator when enabling standalone or mixcoord (#38207)
issue: #37764
pr: #37815 
also see: #38259

- add a local client to call local server directly for
querycoord/rootcoord/datacoord.
- enable local client if milvus is running mixcoord or standalone mode.
- after removing rpc layer from mixcoord, the querycoord at standby mode
will be blocked forever of deployment rolling

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-12-10 20:38:44 +08:00
cai.zhang
ddc40a7266
enhance: [2.4]Determine the number of buffers based on the resource limits of the DataNode (#38210)
issue: #28410 

master pr: #38209

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-12-08 17:54:41 +08:00
jaime
319f5494cd
enhance: optimize CPU usage for CheckHealth requests (#35595)
issue: #35563
pr: #35589

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-12-04 14:26:41 +08:00
XuanYang-cn
c32ad6573c
enhance: [24]Increase task capacity and clean illegal task (#37896) (#38095)
1. taskQueueCapacity 256 is too small for production when we want to
re-write the entire collection

2. tasks should be cleaned when unable to recover, or the meta will
remain in etcd forever later.

pr: #37896

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-12-02 11:58:38 +08:00
Gao
165afbba91
enhance: support retry search when topk is reduced and result not enough (#37093)
issue: #35576 
pr: #35645

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2024-11-28 10:12:37 +08:00
jaime
09a7b55c87
enhance: set the maximum database configuration to be refreshable (#37932)
pr: #37931

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-11-27 11:26:36 +08:00
wei liu
93063ce1f9
fix: Prevent simultaneous balance of segments and channels (#37850) (#37939)
issue: #33550
pr: #37850
balance segment and balance segment execute at same time, which will
cause bounch of corner case.

This PR disable simultaneous balance of segments and channels

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-26 10:26:40 +08:00
sthuang
d8f1af68e9
enhance: [2.4] RBAC built in privilege groups and grant v2 (#37787)
cherry-pick from master: https://github.com/milvus-io/milvus/pull/37720,
https://github.com/milvus-io/milvus/pull/37785
issue: https://github.com/milvus-io/milvus/issues/37031

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2024-11-25 11:24:54 +08:00
zhenshan.cao
9b3de3ac3e
fix: Revert "enhance: [2.4] Enable RemoteLoad l0 forward policy" (#37875)
issue https://github.com/milvus-io/milvus/issues/35303
pr: https://github.com/milvus-io/milvus/pull/37867
This reverts commit cdf703aabc2ec7e4addded68e808ba6add3ab2cb.

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-11-22 12:40:33 +08:00
congqixia
cdf703aabc
enhance: [2.4] Enable RemoteLoad l0 forward policy by default (#37678) (#37713)
Cherry-pick from master
pr: #37678
Related to #35303

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-15 18:28:31 +08:00
SimFG
5c166a25b9
enhance: [2.4] improve rootcoord task scheduling policy (#37523)
- issue: #30301
- pr: #37352

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-11-08 14:56:27 +08:00
XuanYang-cn
20534a3f7b
fix: [cp24]Saperate L0 and Mix trigger interval (#37319)
See also: #37108
pr: #37190

- Add MixCompactionTriggerInterval, default 60s
- Add L0CompactionTriggerInterval, default 10s
- Export Single related compaction configs
- Raise SingleCompactionDeltaLogMaxSize from 2MB to 16MB

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-11-06 11:10:26 +08:00
XuanYang-cn
6109e9d69e
fix: Skip mark compaction timeout for mix and l0 compaction (#37118) (#37194)
Timeout is a bad design for long running tasks, especially using a
static timeout config. We should monitor execution progress and fail the
task if the progress has been stale for a long time.

This pr is a small patch to stop DC from marking compaction tasks
timeout, while still waiting for DN to finish. The design is
self-conflicted. After this pr, mix and L0 compaction are no longer
controlled by DC timeout, but clustering is still under timeout control.

The compaction queue capacity grows larger for priority calc, hence
timeout compactions appears more often, and when timeout, the queuing
tasks will be timeout too, no compaction will success after.

See also: #37108, #37015
pr: #37118

---------

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-10-31 10:36:21 +08:00
presburger
27a4fe002a
enhance:change gpu default mem pool size (#36969)
Signed-off-by: yusheng.ma <yusheng.ma@zilliz.com>
2024-10-23 17:17:28 +08:00
yihao.dai
539f56220f
enhance: Remove bf from datanode (#36367) (#37027)
Remove bf from datanode:
1. When watching vchannels, skip loading **flushed** segments's bf. For
generating merged bf, we need to keep loading **growing** segments's bf.
2. Bypass bloom filter checks for delete messages, directly writing to
L0 segments.
3. In version 2.4, when dropping a partition, marking segments as
dropped depends on having the full segment list in the DataNode. So, we
need to keep syncing the segments every 10 minutes.

issue: https://github.com/milvus-io/milvus/issues/34585

pr: https://github.com/milvus-io/milvus/pull/35902,
https://github.com/milvus-io/milvus/pull/36367,
https://github.com/milvus-io/milvus/pull/36592

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-22 11:15:28 +08:00
yihao.dai
4e0f5845a1
enhance: Limit import job number (#36891) (#36892)
issue: https://github.com/milvus-io/milvus/issues/36890

pr: https://github.com/milvus-io/milvus/pull/36891

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-18 18:13:25 +08:00
yihao.dai
8923936c9a
enhance: Support memory mode chunk cache (#35347) (#35836)
Chunk cache supports loading raw vectors into memory.

issue: https://github.com/milvus-io/milvus/issues/35273

pr: https://github.com/milvus-io/milvus/pull/35347

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-18 17:03:25 +08:00
Ted Xu
22838a8413
enhance: Datacoord to support prioritization of compaction tasks (#36979)
See #36550

pr: #36547 
pr: #36956

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-10-18 14:15:25 +08:00
cqy123456
6934e8da3a
enhance: [2.4]use growingMmapEnabled to control the behavior of interim index, not vectorField (#36391)
issue: https://github.com/milvus-io/milvus/issues/36392
related pr: https://github.com/milvus-io/milvus/pull/36500

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-10-17 20:23:25 +08:00
congqixia
3252d7a64c
fix: [2.4] Load original key if ts is MaxTimestamp (#36934) (#36950)
Cherry-pick from master
pr: #36934 

Related to #36933

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-17 16:05:29 +08:00
XuanYang-cn
e976b41f97
fix: Remove enableLevelZeroSegment config (#36507)
See also: #36504
pr: #36535

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-10-11 16:41:21 +08:00
yihao.dai
a4ef93457d
enhance: Optimize import scheduling and add time cost metric (#36601) (#36684)
1. Optimize import scheduling strategic:
a. Revise slot weights, calculating them based on the number of files
and segments for both import and pre-import tasks.
b. Ensure that the DN executes tasks in ascending order of task ID.
2. Add time cost metric and log.

issue: https://github.com/milvus-io/milvus/issues/36600,
https://github.com/milvus-io/milvus/issues/36518

pr: https://github.com/milvus-io/milvus/pull/36601

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-11 10:27:22 +08:00
yihao.dai
9cb5396cf6
enhance: Use common gc config (#36668) (#36670)
Use the GC config from `common` and remove the GC config from
`queryNode`.

issue: https://github.com/milvus-io/milvus/issues/36667

pr: https://github.com/milvus-io/milvus/pull/36668

related pr: https://github.com/milvus-io/milvus/pull/34949

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-09 19:49:20 +08:00
congqixia
3a80d1f602
enhance: [2.4] Add streaming forward policy switch for delegator (#36330) (#36712)
Cherry pick from master
pr: #36330
Related to #35303

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-09 17:41:20 +08:00
XuanYang-cn
05f96f5298
fix: [24]raise l0 compaction memory ratio to 0.5 (#36691)
5 percent of free memory is too less for l0 compaction. This pr will
raise it to 50 percent.

See also: #36614
pr: #36690

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-10-09 17:19:24 +08:00
SimFG
58a763c529
enhance: [2.4] avoid to create many timer object in the target (#36573)
/kind improvement
- pr: #36570

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-09-29 19:27:16 +08:00
wei liu
ad5d24be65
enhance: Optimize workload based replica selection policy (#36181) (#36384)
issue: #35859
pr: #36181

This PR introduce two new param: toleranceFactor and checkRequestNum,
after every checkRequestNum request has been assigned, try to compute
querynode's workload score.

if the diff is less than the toleranceFactor, replica selection policy
will fallback to round_robin, which reduce the average cost to about
500ns.

if the diff is larger than the toleranceFactor, replica selection policy
will compute querynode's score to select the target node with smallest
score in every assigment.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-09-26 11:19:14 +08:00
jaime
b92daa1532
fix: iaccurate size estimation for encoded array data (#36379)
issue: #36029
pr: #36373

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-09-23 21:17:13 +08:00
SimFG
a35d99eabf
fix: [2.4] long buffering causes mq to be unable to receive messages. (#36425)
- issue: #36397
- pr: #36420

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-09-23 16:33:17 +08:00
congqixia
13d443eb2e
enhance: [2.4] Add L0 forward policy to support remote load (#36189) (#36208)
Cherry-pick from master
pr: #36189
Related to #35303

This PR add a param item to support change l0 forward behavior from bf
filtering and forward to remote load.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-09-12 19:09:08 +08:00
XuanYang-cn
835c9d5c65
fix: Change l0SegmentsRowCount limits to a reasonable value (#36015)
pr: #36014
See also: #36028

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-09-08 16:55:05 +08:00
Ted Xu
45b2049d5d
fix: fallback params may be overridden (#35972) (#36006)
See #35756

---------

pr: #35972

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-09-05 19:05:05 +08:00
congqixia
da0bc22a5f
enhance: [2.4] Add delete buffer related quota logic (#35918) (#35997)
Cherry pick from master
pr: #35128 #35918
See also #35303

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Co-authored-by: aoiasd <45024769+aoiasd@users.noreply.github.com>
2024-09-05 16:43:06 +08:00
jaime
2c1fa50412
enhance: remove cooling off in rate limiter for read requests (#35936)
issue: #35934
pr: #35935

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-09-04 14:39:10 +08:00
SimFG
5b5119a51f
feat: [2.4] provide more general configuration to control mmap behavior (#35609)
- issue: #35273
- pr: #35359

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-08-23 12:35:02 +08:00
wei liu
e014ad9280
fix: fix dynamic update config doesn't works for some param (#35572) (#35637)
issue: #35570
pr: #35572
milvus support config cache to spped up config access, but only evict
param's cache when param has been updated. but milvus's param may rely
on other param's value, let's say ParamsA relys on paramsB, when paramsB
updated, it will evict paramB's cache, but the paramA's cache still keep
the old value.

This PR evict all config cache to solve the above issue, cause dynamic
update config won't be much frequetly.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-22 16:00:58 +08:00
wei liu
14ec3dc357
enhance: Enable ReadOnly/ReadWrite/Admin Privilege Group to simplify RBAC grant progress (#35472) (#35543)
issue: #35471
pr: #35472 #35515

---------

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-19 16:24:54 +08:00
Ted Xu
57d4bcbf15
enhance: adding the msgchannel section in generated yaml (#35466)
See #32168

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-08-14 19:03:11 +08:00
Ted Xu
ce53e79f12
fix: enable milvus.yaml check (#34567) (#35446)
See #32168

pr: #34567 #35152

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-08-13 19:00:23 +08:00
aoiasd
a20cb727eb
enhance:[Cherry-pick] Check by proxy rate limiter when delete get data by query. (#30891) (#35262)
relate: https://github.com/milvus-io/milvus/issues/30927
pr: https://github.com/milvus-io/milvus/pull/30891

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-08-13 14:32:21 +08:00
wei liu
0201e00a2f
enhance: enable to set load config in cluster level (#35293)
issue: #35170
pr: #35169
This PR enable to set load configs in cluster level, such as replicas
and resource groups. then when load collections will use the load
config.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-07 12:38:21 +08:00
cai.zhang
2534b30e39
enhance: [cherry-pick] Add monitoring metrics for task execution time in datacoord (#35141)
issue: #35138 

master pr: #35139

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-08-02 19:46:16 +08:00
wei liu
d767f8977a
enhance: Refine param init for MmapDirPath (#35181) (#35214)
pr: #35181

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-02 16:30:15 +08:00
congqixia
f8444b900f
enhance: [2.4] Support proxy/delegator qn client pooling (#35195)
Cherry pick from master
pr: #35194
See also #35196
Add param item for proxy/delegator query node client pooling and
implement pooling logic

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-08-02 11:24:19 +08:00
wei liu
5f601fcc50
enhance: Reduce delegator memory overloaded factor to 0.1 (#35092) (#35164)
pr: #35092

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-01 14:20:13 +08:00
cai.zhang
c340f387cf
enhance: [cherry-pick] Change the fixed value to a ratio for clustering segment size (#35075)
issue: #34495 

master pr: #35076

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-07-31 10:32:00 +08:00