853 Commits

Author SHA1 Message Date
SimFG
5c166a25b9
enhance: [2.4] improve rootcoord task scheduling policy (#37523)
- issue: #30301
- pr: #37352

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-11-08 14:56:27 +08:00
congqixia
c8ba682aaf
enhance: [2.4] Use cancel label for ctx canceled storage op (#37468) (#37491)
Cherry-pick from master
pr: #37468

Previously failed label is used for canceled storage op, which may cause
wrong alarm when user cancel load operation or etc. This PR utilizes
cancel label when such case happens.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-07 12:38:26 +08:00
XuanYang-cn
20534a3f7b
fix: [cp24]Saperate L0 and Mix trigger interval (#37319)
See also: #37108
pr: #37190

- Add MixCompactionTriggerInterval, default 60s
- Add L0CompactionTriggerInterval, default 10s
- Export Single related compaction configs
- Raise SingleCompactionDeltaLogMaxSize from 2MB to 16MB

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-11-06 11:10:26 +08:00
congqixia
b7c80f9b83
enhance: Bump milvus & proto version to v2.4.15 (#37435)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-05 14:46:24 +08:00
SimFG
d0e78cef06
enhance: [2.4] update the expr version to fix the method call error (#37260)
/kind improvement
- pr: #37259

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-10-31 15:00:23 +08:00
XuanYang-cn
6109e9d69e
fix: Skip mark compaction timeout for mix and l0 compaction (#37118) (#37194)
Timeout is a bad design for long running tasks, especially using a
static timeout config. We should monitor execution progress and fail the
task if the progress has been stale for a long time.

This pr is a small patch to stop DC from marking compaction tasks
timeout, while still waiting for DN to finish. The design is
self-conflicted. After this pr, mix and L0 compaction are no longer
controlled by DC timeout, but clustering is still under timeout control.

The compaction queue capacity grows larger for priority calc, hence
timeout compactions appears more often, and when timeout, the queuing
tasks will be timeout too, no compaction will success after.

See also: #37108, #37015
pr: #37118

---------

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-10-31 10:36:21 +08:00
aoiasd
8370caa4a6
enhance: [Cherry-pick]Add collection name label for some metric (#36951) (#37159)
pr: https://github.com/milvus-io/milvus/pull/36951

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-10-29 17:38:22 +08:00
congqixia
0b284ccc23
enhance: Bump milvus & proto version to v2.4.14 (#37198)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-29 10:44:25 +08:00
congqixia
49147524be
enhance: [2.4] Use middleware to observe restful v2 in/out rpc stats (#37224)
Cherry pick from master
pr: #37223
Related to #36102

Previous PR #36107 add grpc inteceptor to observe rpc stats. Using same
strategy, this pr add gin middleware to observer restful v2 rpc stats.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-29 10:26:24 +08:00
SimFG
ae4ce9bbba
enhance: [2.4] allow to delete data when disk quota exhausted (#37139)
- issue: #37133
- pr: #37134

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-10-25 16:07:32 +08:00
congqixia
3db137f4ad
enhance: [2.4] Add metrics for querynode delete buffer info (#37081) (#37097)
Cherry pick from master
pr: #37081
Related to #35303

This PR add metrics for querynode delegator delete buffer information,
which is related to dml quota logic.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-24 16:21:37 +08:00
presburger
27a4fe002a
enhance:change gpu default mem pool size (#36969)
Signed-off-by: yusheng.ma <yusheng.ma@zilliz.com>
2024-10-23 17:17:28 +08:00
yihao.dai
539f56220f
enhance: Remove bf from datanode (#36367) (#37027)
Remove bf from datanode:
1. When watching vchannels, skip loading **flushed** segments's bf. For
generating merged bf, we need to keep loading **growing** segments's bf.
2. Bypass bloom filter checks for delete messages, directly writing to
L0 segments.
3. In version 2.4, when dropping a partition, marking segments as
dropped depends on having the full segment list in the DataNode. So, we
need to keep syncing the segments every 10 minutes.

issue: https://github.com/milvus-io/milvus/issues/34585

pr: https://github.com/milvus-io/milvus/pull/35902,
https://github.com/milvus-io/milvus/pull/36367,
https://github.com/milvus-io/milvus/pull/36592

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-22 11:15:28 +08:00
yihao.dai
4e0f5845a1
enhance: Limit import job number (#36891) (#36892)
issue: https://github.com/milvus-io/milvus/issues/36890

pr: https://github.com/milvus-io/milvus/pull/36891

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-18 18:13:25 +08:00
yihao.dai
8923936c9a
enhance: Support memory mode chunk cache (#35347) (#35836)
Chunk cache supports loading raw vectors into memory.

issue: https://github.com/milvus-io/milvus/issues/35273

pr: https://github.com/milvus-io/milvus/pull/35347

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-18 17:03:25 +08:00
Ted Xu
22838a8413
enhance: Datacoord to support prioritization of compaction tasks (#36979)
See #36550

pr: #36547 
pr: #36956

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-10-18 14:15:25 +08:00
cqy123456
6934e8da3a
enhance: [2.4]use growingMmapEnabled to control the behavior of interim index, not vectorField (#36391)
issue: https://github.com/milvus-io/milvus/issues/36392
related pr: https://github.com/milvus-io/milvus/pull/36500

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-10-17 20:23:25 +08:00
congqixia
3252d7a64c
fix: [2.4] Load original key if ts is MaxTimestamp (#36934) (#36950)
Cherry-pick from master
pr: #36934 

Related to #36933

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-17 16:05:29 +08:00
SimFG
8743752ac3
enhance: [2.4] force to stop buffer message when receiving the drop collection message (#36917)
/kind improvement
pr: #36916

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-10-17 12:53:29 +08:00
SimFG
6b9e28bc8f
enhance: [2.4] update the expr version to support automatic conversion of variable types (#36847)
/kind improvement
- pr: #36832

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-10-15 10:55:23 +08:00
SimFG
1d9c7462ba
enhance: [2.4] support to execute the method which contains the ctx param (#36798)
/kind improvement
- pr: #36797

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-10-11 23:17:21 +08:00
XuanYang-cn
e976b41f97
fix: Remove enableLevelZeroSegment config (#36507)
See also: #36504
pr: #36535

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-10-11 16:41:21 +08:00
congqixia
bacbfae542
enhance: Bump milvus & proto version to v2.4.13 (#36758)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-11 16:01:22 +08:00
yihao.dai
a4ef93457d
enhance: Optimize import scheduling and add time cost metric (#36601) (#36684)
1. Optimize import scheduling strategic:
a. Revise slot weights, calculating them based on the number of files
and segments for both import and pre-import tasks.
b. Ensure that the DN executes tasks in ascending order of task ID.
2. Add time cost metric and log.

issue: https://github.com/milvus-io/milvus/issues/36600,
https://github.com/milvus-io/milvus/issues/36518

pr: https://github.com/milvus-io/milvus/pull/36601

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-11 10:27:22 +08:00
SimFG
548f8e80c3
enhance: [2.4] the estimate method when loading the collection (#36728)
- pr: #36307
- issue: #36530

Signed-off-by: SimFG <bang.fu@zilliz.com>
Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2024-10-11 10:20:45 +08:00
yihao.dai
9cb5396cf6
enhance: Use common gc config (#36668) (#36670)
Use the GC config from `common` and remove the GC config from
`queryNode`.

issue: https://github.com/milvus-io/milvus/issues/36667

pr: https://github.com/milvus-io/milvus/pull/36668

related pr: https://github.com/milvus-io/milvus/pull/34949

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-09 19:49:20 +08:00
congqixia
3a80d1f602
enhance: [2.4] Add streaming forward policy switch for delegator (#36330) (#36712)
Cherry pick from master
pr: #36330
Related to #35303

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-09 17:41:20 +08:00
XuanYang-cn
05f96f5298
fix: [24]raise l0 compaction memory ratio to 0.5 (#36691)
5 percent of free memory is too less for l0 compaction. This pr will
raise it to 50 percent.

See also: #36614
pr: #36690

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-10-09 17:19:24 +08:00
Zhen Ye
bcc661dbd0
fix: rocksmq consume too slow if the channel is full (#36618)
issue: #36569
pr: #36617

Signed-off-by: chyezh <chyezh@outlook.com>
2024-10-09 11:59:31 +08:00
congqixia
1955738ab8
enhance: [2.4] Produce messages of multiple topics in parallel (#36344) (#36462)
Cherry-pick from master
pr: #36344 
Related to #36343

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-08 18:31:20 +08:00
yihao.dai
c5088b8527
enhance: Add metrics to monitor import throughput and imported rows (#36519) (#36588)
issue: https://github.com/milvus-io/milvus/issues/36518

pr: https://github.com/milvus-io/milvus/pull/36519

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-09-30 10:03:18 +08:00
SimFG
58a763c529
enhance: [2.4] avoid to create many timer object in the target (#36573)
/kind improvement
- pr: #36570

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-09-29 19:27:16 +08:00
aoiasd
2231aeab4d
fix:[Cherry-Pick] Split delete task msg to MaxMessageSize (#36574)
relate: https://github.com/milvus-io/milvus/issues/36089
pr: https://github.com/milvus-io/milvus/pull/36197
split delete task msg to MaxMessageSize to avoid mq message too large
error

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-09-27 18:15:19 +08:00
Zhen Ye
e34fa0461b
fix: port listen racing in mix or standalone mode (#36459)
issue: #36441
pr: #36442

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-09-26 21:01:15 +08:00
SimFG
07e1bc8c08
enhance: [2.4] get msg type from the msg header to reduce the Unmarshal usage (#36454)
/kind improvement
- pr: #36409

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-09-26 16:41:15 +08:00
SimFG
6604bbda8f
enhance: [2.4] update the expr version and format the expr http response (#36467)
/kind improvement
- pr: #36406

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-09-26 14:27:20 +08:00
wei liu
ad5d24be65
enhance: Optimize workload based replica selection policy (#36181) (#36384)
issue: #35859
pr: #36181

This PR introduce two new param: toleranceFactor and checkRequestNum,
after every checkRequestNum request has been assigned, try to compute
querynode's workload score.

if the diff is less than the toleranceFactor, replica selection policy
will fallback to round_robin, which reduce the average cost to about
500ns.

if the diff is larger than the toleranceFactor, replica selection policy
will compute querynode's score to select the target node with smallest
score in every assigment.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-09-26 11:19:14 +08:00
wei liu
975a9797a2
enhance: Enable dynamic update loaded collection's replica (#36417)
issue: #35821
pr: #35822
After collection loaded, if we need to increase/decrease collection's
replica, we need to release and load it again.

milvus offers 4 solution to update loaded collection's replica, this PR
aims to dynamic change the replica number without release, and after
replica number changed, milvus will execute load replica or release
replica in async, and the replica loaded status can be checked by
getReplicas API.

Notice that if set too much replicas than querynode can afford,the new
replica won't be loaded successfully until enough querynode joins.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-09-26 10:43:15 +08:00
jaime
b92daa1532
fix: iaccurate size estimation for encoded array data (#36379)
issue: #36029
pr: #36373

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-09-23 21:17:13 +08:00
SimFG
a35d99eabf
fix: [2.4] long buffering causes mq to be unable to receive messages. (#36425)
- issue: #36397
- pr: #36420

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-09-23 16:33:17 +08:00
congqixia
2b796b180b
enhance: Bump milvus & proto version to v2.4.12 (#36376)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-09-23 14:51:13 +08:00
cai.zhang
eb47150f66
enhance: [cherry-pick]Disallow the keywords as a field name or dynamic field name (#36108)
issue: #35873

master pr: #36101

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-09-15 15:19:14 +08:00
XuanYang-cn
c1dab50fed
enhance: [cp]Add metrics for Delete entries num of L0seg (#36227)
- Add metrics *DataCoordL0DeleteEntriesNum*
- Remove metrics *DataCoordRateStoredL0Segment*

See also: #36147
pr: #36175

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-09-14 10:37:08 +08:00
congqixia
13d443eb2e
enhance: [2.4] Add L0 forward policy to support remote load (#36189) (#36208)
Cherry-pick from master
pr: #36189
Related to #35303

This PR add a param item to support change l0 forward behavior from bf
filtering and forward to remote load.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-09-12 19:09:08 +08:00
Buqian Zheng
089790a459
enhance: [2.4]Allow empty sparse row (#36061)
pr: #34700

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-09-12 10:13:09 +08:00
congqixia
1cd8d1bd80
enhance: [2.4] Use stats Handler to record request/response size metrics (#36107) (#36118)
Cherry-pick from master
pr: #36107 
Related to #36102

This PR use newly added `grpcSizeStatsHandler` to reduce calling
`proto.Size` since the request & response size info is recorded by grpc
framework.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-09-10 17:13:08 +08:00
XuanYang-cn
835c9d5c65
fix: Change l0SegmentsRowCount limits to a reasonable value (#36015)
pr: #36014
See also: #36028

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-09-08 16:55:05 +08:00
congqixia
9d0378ae84
enhance: Bump milvus & proto version to v2.4.11 (#36069)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-09-06 23:39:06 +08:00
congqixia
55b33cd3cf
fix: [2.4] Fix tracing config update logic (#35928) (#35998)
Cherry-pick from master
pr: #35928 
Related to #35927

There are serveral issue this PR addresses:
- Use `ResetTraceConfig` method instead init one in update event handler
- Implement dynamic stats.Handler to receive tracing config update event
- Update `enable_trace` flag when `ResetTraceConfig` is invoked
- Change `enable_trace` to `std::atomic<bool>` in case of data race

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-09-06 11:19:05 +08:00
Ted Xu
45b2049d5d
fix: fallback params may be overridden (#35972) (#36006)
See #35756

---------

pr: #35972

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-09-05 19:05:05 +08:00