20567 Commits

Author SHA1 Message Date
yihao.dai
bff0113cbb
fix: Fix subscription leak (#37382) (#37541)
Close (unsubscribe) the msg stream after completing the PreCreatedTopic
check to prevent backlog issue.

issue: https://github.com/milvus-io/milvus/issues/36021

pr: https://github.com/milvus-io/milvus/pull/37382

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-11-08 17:38:28 +08:00
yihao.dai
fd1ca73b61
fix: Fix large growing segment (#37388) (#37540)
Consider the `sealProportion` factor during segment allocation.

issue: https://github.com/milvus-io/milvus/issues/37387

pr: https://github.com/milvus-io/milvus/pull/37388

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-11-08 17:34:27 +08:00
SimFG
5c166a25b9
enhance: [2.4] improve rootcoord task scheduling policy (#37523)
- issue: #30301
- pr: #37352

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-11-08 14:56:27 +08:00
nico
a3c1fc1848
test: update test cases (#37476)
pr: #36841

Signed-off-by: nico <cheng.yuan@zilliz.com>
2024-11-07 16:56:25 +08:00
wei liu
349924615b
fix: [skip e2e]unstable integration test TestNodeDownOnSingleReplica(#37480) (#37499)
issue: #37289
pr: #37480

cause pr #37116 introduce retry on get shard leader, which make search
won't fail during query node down.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-07 16:48:26 +08:00
XuanYang-cn
dd0cf20ee0
fix: [cp24]Correct dropped segment num metrics (#37471)
See also: #31891
pr: #37410

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-11-07 16:46:33 +08:00
smellthemoon
60f963102e
enhance: refactor createIndex in RESTful API(#37235) (#37237)
pr: #37235 
2.5: #37236

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-11-07 14:18:31 +08:00
congqixia
c8ba682aaf
enhance: [2.4] Use cancel label for ctx canceled storage op (#37468) (#37491)
Cherry-pick from master
pr: #37468

Previously failed label is used for canceled storage op, which may cause
wrong alarm when user cancel load operation or etc. This PR utilizes
cancel label when such case happens.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-07 12:38:26 +08:00
cai.zhang
651a56e3dd
enhance: [2.4]Update the template expression proto to improve transmission efficiency (#37485)
issue: #36672 

master pr: #37484

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-11-07 12:12:25 +08:00
Zhen Ye
cea8c756d4
fix: repeated error code in milvus and segcore (#37449)
issue: #37357
pr: #37359

Signed-off-by: chyezh <chyezh@outlook.com>
2024-11-07 10:46:25 +08:00
cai.zhang
4ae5337343
enhance: [2.4] Refine error message for contains array (#37443)
issue: #36221 

master pr: #37383

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-11-07 10:40:25 +08:00
XuanYang-cn
20534a3f7b
fix: [cp24]Saperate L0 and Mix trigger interval (#37319)
See also: #37108
pr: #37190

- Add MixCompactionTriggerInterval, default 60s
- Add L0CompactionTriggerInterval, default 10s
- Export Single related compaction configs
- Raise SingleCompactionDeltaLogMaxSize from 2MB to 16MB

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-11-06 11:10:26 +08:00
yellow-shine
af5e32d00b
enhance: refine the pipeline (#37456)
https://github.com/milvus-io/milvus/pull/37412

---------

Signed-off-by: Yellow Shine <sammy.huang@zilliz.com>
2024-11-06 10:24:30 +08:00
sre-ci-robot
28cb357de3
[automated] Bump milvus version to v2.4.15 (#37457)
Bump milvus version to v2.4.15
Signed-off-by: sre-ci-robot sre-ci-robot@users.noreply.github.com

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-11-05 21:18:32 +08:00
congqixia
b7c80f9b83
enhance: Bump milvus & proto version to v2.4.15 (#37435)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
v2.4.15
2024-11-05 14:46:24 +08:00
congqixia
c195f9f76a
enhance: [2.4] Pass rpc stats via gin.Context (#37440)
Cherry pick from master
pr: #37439
Related #37223

RPC stats worked in middleware but faild to get method & collection info

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-05 14:24:24 +08:00
wei liu
6b69170a64
fix: proxy retry to get shard leader on unloaded collection (#37326)
issue: #37115

pr#37116 let proxy retry to get shard leader if error happens, which
cause if search/query on a unloaded collection, which will keep retrying
until ctx done.

This PR add error type check to skip retry on ErrCollectionLoaded.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-05 11:02:25 +08:00
yihao.dai
380662153f
fix: [2.4] Revert "enhance: Support db for bulkinsert (#37012) (#37017)" (#37421)
This reverts commit d6adc62765665d1555039c4d256a75d1144d49d0.

issue: https://github.com/milvus-io/milvus/issues/31273

pr: https://github.com/milvus-io/milvus/pull/37420

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-11-05 10:48:24 +08:00
wei liu
eb712f0db9
fix: dead lock if query node crash during shard client init (#37354)
issue: #37115

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-05 10:46:32 +08:00
XuanYang-cn
28fd217e27
fix: [cp24]l0RowCount metrics value always empty (#37307)
See also: #36953
pr: #37306

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-11-04 15:34:24 +08:00
cai.zhang
4fb86eb17d
fix: [2.4] Fix the bug where some expressions do not correctly parse the value (#37342)
issue: #37274

master pr: #37341

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-11-03 18:42:23 +08:00
congqixia
ce7d4090f1
enhance: [2.4] Move forward l0 logic out of delta lock (#37340)
Cherry pick from master
pr: #37337
Related to #35303

`deleteMut` shall be protecting streaming delete buffer, forward l0
could be move out of the rlock section to reduce tsafe impact from
loading segments.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-01 14:42:22 +08:00
wei liu
3c09d42bfc
fix: [skip e2e] TestNodeDownOnSingleReplica has unstable result (#37288) (#37350)
issue: #37289
pr: #37288
those test case use search to verify replica's status, but if the search
gap is 1s, the node down's effect may be fixed up by balance.

This PR remove the 1 second gap between search operation.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-01 13:48:22 +08:00
SimFG
d0e78cef06
enhance: [2.4] update the expr version to fix the method call error (#37260)
/kind improvement
- pr: #37259

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-10-31 15:00:23 +08:00
XuanYang-cn
6109e9d69e
fix: Skip mark compaction timeout for mix and l0 compaction (#37118) (#37194)
Timeout is a bad design for long running tasks, especially using a
static timeout config. We should monitor execution progress and fail the
task if the progress has been stale for a long time.

This pr is a small patch to stop DC from marking compaction tasks
timeout, while still waiting for DN to finish. The design is
self-conflicted. After this pr, mix and L0 compaction are no longer
controlled by DC timeout, but clustering is still under timeout control.

The compaction queue capacity grows larger for priority calc, hence
timeout compactions appears more often, and when timeout, the queuing
tasks will be timeout too, no compaction will success after.

See also: #37108, #37015
pr: #37118

---------

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-10-31 10:36:21 +08:00
congqixia
1a09d6385e
enhance: [2.4] Release compacted growing segment if in dropped list (#37245) (#37266)
Cherry-pick from master
pr: #37245
See also #37205

Previously releasing growing segments could be triggered by two
conditions:

- Sealed Segment with same id is loaded
- Segment start position is before target checkpoint ts

Which has a worst case that the corresponding sealed segment is
compacted and the checkpoint is pinned by a growing l0 segment.

This PR introduces a new rule that: a growing segment could be released
if the segment id appeared in current target dropped segment id list.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-31 10:14:22 +08:00
nico
771fad51b3
test: update pymilvus version and test cases (#37301)
Signed-off-by: nico <cheng.yuan@zilliz.com>
2024-10-31 09:40:22 +08:00
congqixia
37d691f458
fix: [2.4] Rectify OffsetOrderedArray contain logic (#37309)
Cherry pick from master
pr: #37305 
Related to #36887

Remove non-hit pk delete record logic does not work since
`insert_record_.contain` does not work due to logic problem.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-30 21:16:22 +08:00
congqixia
a2a51c489e
fix: [2.4] Check resource when loading deltalogs (#37195) (#37263)
Cherry pick from master
pr: #37195
Related to #36887

`LoadDeltaLogs` API did not check memory usage. When system is under
high delete load pressure, this could result into OOM quit.

This PR add resource check for `LoadDeltaLogs` actions and separate
internal deltalog loading function with public one.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-30 11:54:41 +08:00
yellow-shine
ce7fbb9439 Bump milvus version to v2.4.14 (#37252)
Signed-off-by: Yellow Shine <sammy.huang@zilliz.com>
2024-10-29 21:34:29 +08:00
aoiasd
8370caa4a6
enhance: [Cherry-pick]Add collection name label for some metric (#36951) (#37159)
pr: https://github.com/milvus-io/milvus/pull/36951

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
v2.4.14
2024-10-29 17:38:22 +08:00
cai.zhang
05c40522ce
enhance: [cherry-pick ]Enhance the expression template to support AND and OR operations (#37217)
issue: #36672

master pr: #37033

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-10-29 15:38:40 +08:00
congqixia
3d1e81fb31
fix: [2.4] Use singleton delete pool and avoid goroutine leakage (#37225)
Cherry-pick from master
pr: #37220
Related to #36887

Previously using newly create pool per request shall cause goroutine
leakage. This PR change this behavior by using singleton delete pool.
This change could also provide better concurrency control over delete
memory usage.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-29 14:44:23 +08:00
congqixia
0b284ccc23
enhance: Bump milvus & proto version to v2.4.14 (#37198)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-29 10:44:25 +08:00
congqixia
49147524be
enhance: [2.4] Use middleware to observe restful v2 in/out rpc stats (#37224)
Cherry pick from master
pr: #37223
Related to #36102

Previous PR #36107 add grpc inteceptor to observe rpc stats. Using same
strategy, this pr add gin middleware to observer restful v2 rpc stats.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-29 10:26:24 +08:00
congqixia
b44ef8207e
fix: [2.4] Check whether new collection name is alias (#36981) (#37208)
Cherry pick from master
pr: #36981

Related to #36963

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-28 22:46:24 +08:00
wei liu
79e6ef2617
fix: Search/Query may failed during updating delegator cache (#37174)
issue: #37115
pr: #37116
casue init query node client is too heavy, so we remove
updateShardClient from leader mutex, which cause much more concurrent
cornor cases.

This PR delay query node client's init operation until `getClient` is
called, then use leader mutex to protect updating shard client progress
to avoid concurrent issues.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-10-28 20:08:25 +08:00
cai.zhang
9c0f59488a
feat: [cherry-pick]The expression supports filling elements through templates (#37058)
issue: #36672 

master pr: #37033 

milvus-proto pr: https://github.com/milvus-io/milvus-proto/pull/332

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-10-28 15:17:30 +08:00
XuanYang-cn
4cb5b2c3b5
fix: [cp24]Exlude L0 compaction when clustering is executing (#37142)
Also remove conflit check when executing L0. The exclusive is already
guarenteed in scheduler

See also: #37140
pr: #37141

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-10-28 15:01:30 +08:00
congqixia
223badc482
fix: [2.4] Ref collection meta when load l0 segment meta only (#37179)
Cherry pick from master
pr: #37178
Related to #37177

Previous PR #37160

Collection meta is not ref-ed when loading l0 segment in `RemoteLoad`
policy, which cause collection meta release when lots of l0 segment
released.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-28 14:07:30 +08:00
congqixia
9d37ade24f
enhance: [2.4] Make skip load work for all branches (#37161)
Cherry-pick from master
pr: #37160
Related to #37112

Skip load logic used to work only when there is multiple segment load
info entires in load request. In continous delete case, delegator still
loads l0 segment, which occupies lot of memory.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-25 22:11:30 +08:00
yihao.dai
d30e27e6f9
enhance: Make dataNode.import.maxConcurrentTaskNum dynamic (#37102) (#37103)
Resize import execution pool when config
`dataNode.import.maxConcurrentTaskNum` update.

issue: https://github.com/milvus-io/milvus/issues/37095

pr: https://github.com/milvus-io/milvus/pull/37102

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-25 18:21:29 +08:00
yihao.dai
da897e41f4
fix: Fix collection leak in querynode (#37061) (#37079)
Unref the removed L0 segment count.

issue: https://github.com/milvus-io/milvus/issues/36918

pr: https://github.com/milvus-io/milvus/pull/37061

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-25 18:19:39 +08:00
SimFG
ae4ce9bbba
enhance: [2.4] allow to delete data when disk quota exhausted (#37139)
- issue: #37133
- pr: #37134

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-10-25 16:07:32 +08:00
Xiaofan
2dc89b1cad
enhance: upgrade minio dependency (#37089)
fix #34910
upgrade minio dependency

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
2024-10-25 15:05:30 +08:00
wei liu
057bfbe678
fix: Delegator may becomes unserviceable after querycoord restart (#37055) (#37100)
issue: #37054
pr: #37055
after querycoord restart, segment_checker may release segment by mistake
due to next target isn't ready yet.

This PR requires release segment must happens after next target is
ready.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-10-25 14:55:31 +08:00
foxspy
ba8328727f
enhance: Update Knowhere version (#37132)
/kind branch-feature

release note:
https://github.com/zilliztech/knowhere/releases/tag/v2.3.12

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2024-10-25 14:47:30 +08:00
yihao.dai
ca2057c57d
enhance: Tidy import options (#37077) (#37078)
1. Tidy import options.
2. Tidy common import util functions.

issue: https://github.com/milvus-io/milvus/issues/34150

pr: https://github.com/milvus-io/milvus/pull/37077

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-25 14:35:45 +08:00
congqixia
6bc8aba17f
enhance: [2.4] Batch forward delete when using DirectForward (#37076) (#37107)
Cherry pick from master
pr: #37076
Related #36887

DirectFoward streaming delete will cause memory usage explode if the
segments number was large. This PR add batching delete API and using it
for direct forward implementation.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-25 11:53:29 +08:00
congqixia
79891f047d
enhance: [2.4]Skip load delta data in delegater when using RemoteLoad (#37082) (#37112)
Cherry-pick from master
pr: #37082
Related to #35303

Delta data is not needed when using `RemoteLoad` l0 forward policy. By
skipping load delta data, memory pressure could be eased if l0 segment
size/number is large.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-25 11:41:30 +08:00