120 Commits

Author SHA1 Message Date
yihao.dai
16d1947d5c
enhance: [2.5] Adjust import task concurrency based on CPU count (#43817)
pr: https://github.com/milvus-io/milvus/pull/43132

issue: https://github.com/milvus-io/milvus/issues/43131

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-08-18 01:11:45 +08:00
wei liu
73210303a9
fix: Fix exclude nodes clearing logic position in load balancer retry (#43002)
issue: #42994
pr: #42577 #40438

cp partial logic from pr #42577 and #40438

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-06-30 16:10:44 +08:00
cai.zhang
78b66a29b6
fix: [2.5] Reduce task slot for standalone to 1/4 of normal datanode (#42809)
issue: #42129 

master pr: #42808

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-06-20 16:38:46 +08:00
yihao.dai
f978641d6a
enhance: [2.5] Enhance import integration tests and logs (#42696)
1. Optimize the import process: skip subsequent steps and mark the task
as complete if the number of imported rows is 0.
2. Improve import integration tests:
a. Add a test to verify that autoIDs are not duplicated
b. Add a test for the corner case where all data is deleted
c. Shorten test execution time
3. Enhance import logging:
a. Print imported segment information upon completion
b. Include file name in failure logs

issue: https://github.com/milvus-io/milvus/issues/42488,
https://github.com/milvus-io/milvus/issues/42518

pr: https://github.com/milvus-io/milvus/pull/42612

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-06-16 20:06:38 +08:00
yihao.dai
28aa364bf7
enhance: [2.5] Adjust default import buffer size (#42542)
Increase insert buffer size from 16MB to 64MB, while keeping delete
buffer size at 16MB.

issue: https://github.com/milvus-io/milvus/issues/42518

pr: https://github.com/milvus-io/milvus/pull/42541

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-06-05 18:46:33 +08:00
wei liu
4a05180f88
enhance: [2.5] support balancing multiple collections in single trigger (#41875) (#42134)
issue: #41874
pr: #41875
- Optimize balance_checker to support balancing multiple collections
simultaneously
- Add new parameters for segment and channel balancing batch sizes
- Add enableBalanceOnMultipleCollections parameter
- Update tests for balance checker

This change improves resource utilization by allowing the system to
balance multiple collections in a single trigger with configurable batch
sizes.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-05-28 23:18:30 +08:00
yihao.dai
83ca664150
fix: [2.5] Fix import slot assignment (#41982)
Assign the import task to the worker with the most available slots, even
if availableSlots < requiredSlots. This ensures tasks won’t be blocked
indefinitely.

issue: https://github.com/milvus-io/milvus/issues/41981

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-05-23 01:36:30 +08:00
cai.zhang
0db5e0c4f6
enhance: [2.5]Deprecate disk params about indexing (#41078)
issue: #40863 

master pr: #41045

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-04-07 11:36:34 +08:00
wei liu
37a533fe6d
fix: [2.5] Address manual balance and balance check issues (#41038)
issue: #37651
pr: #41037
- Fix context propagation for manual balance segment task creation from
PR #38080.
- Optimize stopping balance by preventing redundant checks per round,
addressing performance regression from PR #40297.
- Decrease default `checkBalanceInterval` from 3000ms to 300ms.
- Correct minor log messages in `BalanceChecker`.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-04-03 01:26:23 +08:00
yihao.dai
27ea5d14dc
fix: [2.5] Fix delete data loss due to duplicate binlogID (#40976)
With concurrenct L0 compaction
(https://github.com/milvus-io/milvus/pull/36816), delta logs might be
written to the same L1 segment, causing logID duplication when using the
incremental beginLogID. This PR removes the beginLogID mechanism and
instead passes a log ID range, where the number of IDs in the range
equals the number of compaction segment binlogs multiplied by an
expansion factor.

issue: https://github.com/milvus-io/milvus/issues/40207

pr: https://github.com/milvus-io/milvus/pull/40960

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-03-28 14:34:21 +08:00
wei liu
b64bb63e77
enhance: [2.5] Add trigger interval config for auto balance (#39154) (#39918)
issue: #39156
pr: #39154

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-03-27 16:40:23 +08:00
congqixia
709594f158
enhance: [2.5] Use v2 package name for pkg module (#40117)
Cherry-pick from master
pr: #39990
Related to #39095

https://go.dev/doc/modules/version-numbers

Update pkg version according to golang dep version convention

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-02-23 00:46:01 +08:00
cai.zhang
52434ccc78
enhance: [2.5] Limit the speed of the generating stats task (#39645)
master pr: #39644

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-02-17 16:06:17 +08:00
cqy123456
d121ac3a7a
enhance: [2.5]intermin index support different index type and more data type(fp16/bf16) (#39180)
issue: https://github.com/milvus-io/milvus/issues/27678
related: https://github.com/milvus-io/milvus/pull/39753
some raw data status will change:
Intermin index has raw data: 
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File

href="file:////Users/cqy/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip.htm">
<link rel=File-List

href="file:////Users/cqy/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_filelist.xml">
<!--table
	{mso-displayed-decimal-separator:"\.";
	mso-displayed-thousand-separator:"\,";}
@page
	{margin:.75in .7in .75in .7in;
	mso-header-margin:.3in;
	mso-footer-margin:.3in;}
.font5
	{color:windowtext;
	font-size:9.0pt;
	font-weight:400;
	font-style:normal;
	text-decoration:none;
	font-family:等线;
	mso-generic-font-family:auto;
	mso-font-charset:134;}
tr
	{mso-height-source:auto;
	mso-ruby-visibility:none;}
col
	{mso-width-source:auto;
	mso-ruby-visibility:none;}
br
	{mso-data-placement:same-cell;}
td
	{padding-top:1px;
	padding-right:1px;
	padding-left:1px;
	mso-ignore:padding;
	color:black;
	font-size:12.0pt;
	font-weight:400;
	font-style:normal;
	text-decoration:none;
	font-family:等线;
	mso-generic-font-family:auto;
	mso-font-charset:134;
	mso-number-format:General;
	text-align:general;
	vertical-align:middle;
	border:none;
	mso-background-source:auto;
	mso-pattern:auto;
	mso-protection:locked visible;
	white-space:nowrap;
	mso-rotate:0;}
ruby
	{ruby-align:left;}
rt
	{color:windowtext;
	font-size:9.0pt;
	font-weight:400;
	font-style:normal;
	text-decoration:none;
	font-family:等线;
	mso-generic-font-family:auto;
	mso-font-charset:134;
	mso-char-type:none;
	display:none;}
-->
</head>

<body link="#0563C1" vlink="#954F72">


sparse vector | growing segment | sealed segment
-- | -- | --
BM25 | no | no
IP | yes | no
  |   |  
dense vector | growing segment | sealed segment
ivf flat cc | yes | yes
scann_dvr | no | no



</body>

</html>

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2025-02-13 09:56:47 +08:00
yihao.dai
29dad64341
fix: [2.5] Fix consume blocked due to too many consumers (#38915)
This PR limits the maximum number of consumers per pchannel to 10 for
each QueryNode and DataNode.

issue: https://github.com/milvus-io/milvus/issues/37630

pr: https://github.com/milvus-io/milvus/pull/38455

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-01-16 15:19:03 +08:00
yihao.dai
5b0bb4c04e
enhance: [2.5] Reduce memory usage of BF in DataNode and QueryNode (#38913)
1. DataNode: Skip generating BF during the insert phase (BF will be
regenerated during the sync phase).
2. QueryNode: Skip generating or maintaining BF for growing segments;
deletion checks will be handled in the segcore.

issue: https://github.com/milvus-io/milvus/issues/37630

pr: https://github.com/milvus-io/milvus/pull/38129

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-01-15 13:59:00 +08:00
yihao.dai
b91c0a8079
enhance: [2.5] Optimize GetLocalDiskSize and segment loader mutex (#38907)
1. Make the segment loader lock protect only the resource.
2. Optimize GetDiskUsage to avoid excessive overhead.

issue: https://github.com/milvus-io/milvus/issues/37630

pr: https://github.com/milvus-io/milvus/pull/38599

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-01-14 20:43:06 +08:00
Zhen Ye
54036bcafd
enhance: add broadcast operation for msgstream (#39119)
issue: #38399
pr: #39040

- make broadcast service available for msgstream by reusing the
architecture streaming service

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-14 15:11:00 +08:00
jaime
78438ef41e
fix: revert optimize CPU usage for CheckHealth requests (#35589) (#38555)
issue: #35563

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-12-19 00:38:45 +08:00
jaime
29e620fa6d
fix: sync task still running after DataNode has stopped (#38377)
issue: #38319

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-12-17 18:06:44 +08:00
jaime
28fdbc4e30
enhance: optimize CPU usage for CheckHealth requests (#35589)
issue: #35563
1. Use an internal health checker to monitor the cluster's health state,
storing the latest state on the coordinator node. The CheckHealth
request retrieves the cluster's health from this latest state on the
proxy sides, which enhances cluster stability.
2. Each health check will assess all collections and channels, with
detailed failure messages temporarily saved in the latest state.
3. Use CheckHealth request instead of the heavy GetMetrics request on
the querynode and datanode

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-12-17 11:02:45 +08:00
wei liu
e279ccf109
enhance: Enable score based balance channel policy (#38143)
issue: #38142
current balance channel policy only consider current collection's
distribution, so if all collections has 1 channel, and all channels has
been loaded on same querynode, after querynode num increase, balance
channel won't be triggered.

This PR enable score based balance channel policy, to achieve:
1. distribute all channels evenly across multiple querynodes
2. distribute each collection's channel evenly across multiple
querynodes.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-12-11 17:20:43 +08:00
SimFG
49ee46ec1d
enhance: support to config the default db properties (#38035)
- issue: #38034

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-11-27 10:04:34 +08:00
SimFG
2208b7c2ef
fix: the too long default root password does not take effect (#37983)
- issue: #36987

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-11-26 17:24:35 +08:00
Zhen Ye
2b4f211d84
enhance: add switch for local rpc enabled (#37985)
issue: #33285

- Add switch for local rpc

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-11-26 17:00:54 +08:00
wei liu
0a440e0d38
fix: Prevent simultaneous balance of segments and channels (#37850)
issue: #33550
balance segment and balance segment execute at same time, which will
cause bounch of corner case.

This PR disable simultaneous balance of segments and channels

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-21 17:56:55 +08:00
yihao.dai
0fc0d1a888
fix: Limit the concurrency of channel tasks (#37740)
Limit the maximum concurrency of channel tasks for each DataNode to
prevent excessive subscriptions from causing DataNode OOM.

issue: https://github.com/milvus-io/milvus/issues/37665

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-11-18 16:26:30 +08:00
Zhen Ye
81fa7dd52c
fix: add ddl and dcl concurrency to avoid competition (#37672)
issue: #37166

Signed-off-by: chyezh <chyezh@outlook.com>
2024-11-15 15:04:31 +08:00
yihao.dai
f0b3942a08
enhance: Limit import job number (#36891)
issue: https://github.com/milvus-io/milvus/issues/36890

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-23 16:01:28 +08:00
yihao.dai
0fc2a4aa53
enhance: Optimize import scheduling and add time cost metric (#36601)
1. Optimize import scheduling strategic:
a. Revise slot weights, calculating them based on the number of files
and segments for both import and pre-import tasks.
b. Ensure that the DN executes tasks in ascending order of task ID.
2. Add time cost metric and log.

issue: https://github.com/milvus-io/milvus/issues/36600,
https://github.com/milvus-io/milvus/issues/36518

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-09 14:41:20 +08:00
Zhen Ye
a6545b2e29
fix: refactor milvus config and change default txn timeout (#36522)
issue: #36498

Signed-off-by: chyezh <chyezh@outlook.com>
2024-09-29 11:01:15 +08:00
SimFG
c50fe71163
fix: long buffering causes mq to be unable to receive messages. (#36420)
- issue: #36397

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-09-23 16:33:18 +08:00
wei liu
3b10085f61
enhance: Optimize workload based replica selection policy (#36181)
issue: #35859

This PR introduce two new param: toleranceFactor and checkRequestNum,
after every checkRequestNum request has been assigned, try to compute
querynode's workload score.

if the diff is less than the toleranceFactor, replica selection policy
will fallback to round_robin, which reduce the average cost to about
500ns.

if the diff is larger than the toleranceFactor, replica selection policy
will compute querynode's score to select the target node with smallest
score in every assigment.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-09-20 12:33:11 +08:00
yihao.dai
763fd0dfc5
enhance: Use a separate mmap config for chunk cache (#36276)
issue: https://github.com/milvus-io/milvus/issues/35273

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-09-15 16:23:09 +08:00
Ted Xu
d9a40784a2
fix: fallback params may be overridden (#35972)
See #35756

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-09-05 16:19:04 +08:00
wei liu
cf242f9e09
fix: fix dynamic update config doesn't works for some param (#35572)
issue: #35570
milvus support config cache to spped up config access, but only evict
param's cache when param has been updated. but milvus's param may rely
on other param's value, let's say ParamsA relys on paramsB, when paramsB
updated, it will evict paramB's cache, but the paramA's cache still keep
the old value.

This PR evict all config cache to solve the above issue, cause dynamic
update config won't be much frequetly.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-21 11:02:56 +08:00
wei liu
a570567644
enhance: Enable ReadOnly/ReadWrite/Admin Privilege Group to simplify RBAC grant progress (#35472)
issue: #35471

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-16 14:18:54 +08:00
wei liu
344dc6a9f8
enhance: enable to set load config in cluster level (#35169)
issue: #35170
This PR enable to set load configs in cluster level, such as replicas
and resource groups. then when load collections will use the load
config.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-07 12:38:21 +08:00
cai.zhang
6542c1ab0e
enhance: Add monitoring metrics for task execution time in datacoord (#35139)
issue: #35138

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-08-05 16:26:17 +08:00
jaime
fcec4c21b9
fix: check collection health(queryable) fail for releasing collection (#34947)
issue: #34946

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-08-02 17:20:15 +08:00
wei liu
3b735b4b02
enhance: Refine param init for MmapDirPath (#35181)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-02 11:12:14 +08:00
cai.zhang
196a7986b3
enhance: Change the fixed value to a ratio for clustering segment size (#35076)
issue: #34495

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-08-01 22:04:14 +08:00
wei liu
e9d61daa3f
enhance: Reduce delegator memory overloaded factor to 0.1 (#35092)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-01 10:21:50 +08:00
wayblink
ce3f836876
fix: compaction task not be cleaned correctly (#34765)
1.fix compaction task not be cleaned correctly
2.add a new parameter to control compaction gc loop interval
3.remove some useless configs of clustering compaction

bug: #34764

Signed-off-by: wayblink <anyang.wang@zilliz.com>
2024-07-30 20:21:56 +08:00
cai.zhang
2372452fac
enhance: Optimized the GC logic to ensure that memory is released in time (#34949)
issue: #34703

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-07-28 23:53:47 +08:00
wei liu
166fc902b0
enhance: Limit collection's normal balance speed (#34810)
issue: #34798

after we remove the task priority on query coord, to avoid load/release
segment blocked by too much balance task, we limit the balance task size
in each round. at same time, we reduce the balance interval to trigger
balance more frequently.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-07-24 19:11:44 +08:00
yihao.dai
b22e549844
enhance: Rename config of sealing by growing segmetns size (#34787)
/kind improvement

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-07-19 20:27:41 +08:00
wayblink
c79d1af390
enhance: Add compaction task slot usage logic (#34581)
#34544

Signed-off-by: wayblink <anyang.wang@zilliz.com>
2024-07-18 10:27:41 +08:00
yihao.dai
4939f82d4f
enhance: Seal by total growing segments size (#34692)
Seals the largest growing segment if the total size of growing segments
of each shard exceeds the size threshold(default 4GB). Introducing this
policy can help keep the size of growing segments within a suitable
level, alleviating the pressure on the delegator.

issue: https://github.com/milvus-io/milvus/issues/34554

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-07-17 21:45:41 +08:00
SimFG
203fb554a4
enhance: support to config root user's password (#34752)
- issue: #33058

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-07-17 20:19:42 +08:00