20582 Commits

Author SHA1 Message Date
XuanYang-cn
d23da2db4f
fix: [cp24]Correct varchar primarykey size calculation (#37619)
See also: #37582
pr: #37617

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-11-14 14:16:38 +08:00
wei liu
28bcd85bd0
fix: Balance channel may stuck at increasing replica number case (#37642)
issue: #37640
pr: #37641
fix the pr #36549
cause balance channel will wait until new delegator becomes serviceable,
but new delegator need to sync target version then becomes serviceable,
and sync target version need to be wait all replica load done. so if
increasing replica number and balance channel happens at same time,
logic dead lock occurs.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-13 14:26:30 +08:00
congqixia
8801322371
enhance: [2.4] Invalidate collection cache when release collection (#37577) (#37628)
Cherry-pick from master
pr: #37577
Related to #37395

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-13 14:00:31 +08:00
congqixia
d073f322a4
enhance: [2.4] Add cgo call metrics for load/write API (#37405) (#37627)
Cherry-pick from master
pr: #37405

Cgo API cost is not observerable since not metrics is related to them.
This PR add metrics for some sync cgo call related to load & write

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-13 13:58:30 +08:00
wei liu
6dc879b1e2
enhance: Enable node assign policy on resource group (#36968) (#37588)
issue: #36977
pr: #36968
with node_label_filter on resource group, user can add label on
querynode with env `MILVUS_COMPONENT_LABEL`, then resource group will
prefer to accept node which match it's node_label_filter.

then querynode's can't be group by labels, and put querynodes with same
label to same resource groups.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-13 11:10:29 +08:00
wei liu
7d1c899155
fix: Search may return less result after qn recover (#36549) (#37610)
issue: #36293 #36242
pr: #36549
after qn recover, delegator may be loaded in new node, after all segment
has been loaded, delegator becomes serviceable. but delegator's target
version hasn't been synced, and if search/query comes, delegator will
use wrong target version to filter out a empty segment list, which
caused empty search result.

This pr will block delegator's serviceable status until target version
is synced

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-12 19:16:30 +08:00
cai.zhang
3456e241ac
fix: [2.4]Fix the bug that retrieved from wrong field for L0 segments (#37599)
issue: #37574 

master pr: #37598

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-11-12 19:02:31 +08:00
wei liu
074f8ee696
enhance: optimize describe collection and index (#37490) (#37605)
fix #37489
pr: #34790
combine multiple describe collection and list index into one call

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Co-authored-by: Xiaofan <83447078+xiaofan-luan@users.noreply.github.com>
2024-11-12 16:54:29 +08:00
wei liu
25c96991f6
fix: Lost loading collection's updateTs after qc restart (#37538) (#37580)
issue: #37537
pr: #37538

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-11 17:50:30 +08:00
sthuang
9e8b6ace6d
enhance: [2.4] RBAC custom privilege group (#37560)
Cherry-pick from master
pr: https://github.com/milvus-io/milvus/pull/37087,
https://github.com/milvus-io/milvus/pull/37558
issue: #37031

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2024-11-11 14:20:29 +08:00
congqixia
2fbb157dc8
enhance: [2.4] Handle legacy proxy load fields request (#37565) (#37569)
Cherry-pick from master
pr: #37565
Related to #35415

In rolling upgrade, legacy proxy may dispatch load request wit empty
load field list. The upgraded querycoord may report error by mistake
that load field list is changed.

This PR:

- Auto field empty load field list with all user field ids
- Refine the error messag when load field list updates
- Refine load job unit test with service cases

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-11 14:06:29 +08:00
congqixia
4f4261157d
fix: [2.4] Add IP address validation from paramtable (#37416) (#37500)
Cherry-pick from master
pr: #37416
See also #37404 #37402

IP address in paramtable need validation and fail fast with reasonable
error message

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-11 10:12:28 +08:00
congqixia
cedc34053c
enhance: [2.4] Add context trace for querycoord queryable check (#37524) (#37534)
Cherry-pick from master
pr: #37524

When check health logic failed to collection not-queryable, the related
reason is hard to find in log.

This PR add context for log with trace id and print unqueryable
collection info log.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-08 18:58:27 +08:00
wei liu
7b71411b60
fix: search/query failed due to segment not loaded (#37403) (#37544)
issue: #36970
pr: #37403
cause release segment and balance channel may happen at same time, and
before new delegator become serviceable, if release segment exeuctes on
new delegator, and search/query comes on old delegator, then release
segment and query segment happens in parallel, if release segment
execute first in worker, then search/query will got a SegmentNodeLoaded
error.

This PR add serviceable filter on delegator, then all load/release
segment operation will happens on serviceable delegator.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-08 18:56:26 +08:00
wei liu
a9beca44ef
fix: watch channel stuck due to misuse of timer.Reset (#37433) (#37542)
issue: #37166
pr: #37433
cause the misuse of timer.Reset, which cause dispatcher failed to send
msg to virtual channel buffer, and dispatcher do splitting again and
again, which hold the dispatcher manager's lock, block watching channel
progress.

This PR fix the misuse of timer.Reset

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-08 18:46:27 +08:00
yihao.dai
bff0113cbb
fix: Fix subscription leak (#37382) (#37541)
Close (unsubscribe) the msg stream after completing the PreCreatedTopic
check to prevent backlog issue.

issue: https://github.com/milvus-io/milvus/issues/36021

pr: https://github.com/milvus-io/milvus/pull/37382

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-11-08 17:38:28 +08:00
yihao.dai
fd1ca73b61
fix: Fix large growing segment (#37388) (#37540)
Consider the `sealProportion` factor during segment allocation.

issue: https://github.com/milvus-io/milvus/issues/37387

pr: https://github.com/milvus-io/milvus/pull/37388

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-11-08 17:34:27 +08:00
SimFG
5c166a25b9
enhance: [2.4] improve rootcoord task scheduling policy (#37523)
- issue: #30301
- pr: #37352

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-11-08 14:56:27 +08:00
nico
a3c1fc1848
test: update test cases (#37476)
pr: #36841

Signed-off-by: nico <cheng.yuan@zilliz.com>
2024-11-07 16:56:25 +08:00
wei liu
349924615b
fix: [skip e2e]unstable integration test TestNodeDownOnSingleReplica(#37480) (#37499)
issue: #37289
pr: #37480

cause pr #37116 introduce retry on get shard leader, which make search
won't fail during query node down.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-07 16:48:26 +08:00
XuanYang-cn
dd0cf20ee0
fix: [cp24]Correct dropped segment num metrics (#37471)
See also: #31891
pr: #37410

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-11-07 16:46:33 +08:00
smellthemoon
60f963102e
enhance: refactor createIndex in RESTful API(#37235) (#37237)
pr: #37235 
2.5: #37236

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-11-07 14:18:31 +08:00
congqixia
c8ba682aaf
enhance: [2.4] Use cancel label for ctx canceled storage op (#37468) (#37491)
Cherry-pick from master
pr: #37468

Previously failed label is used for canceled storage op, which may cause
wrong alarm when user cancel load operation or etc. This PR utilizes
cancel label when such case happens.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-07 12:38:26 +08:00
cai.zhang
651a56e3dd
enhance: [2.4]Update the template expression proto to improve transmission efficiency (#37485)
issue: #36672 

master pr: #37484

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-11-07 12:12:25 +08:00
Zhen Ye
cea8c756d4
fix: repeated error code in milvus and segcore (#37449)
issue: #37357
pr: #37359

Signed-off-by: chyezh <chyezh@outlook.com>
2024-11-07 10:46:25 +08:00
cai.zhang
4ae5337343
enhance: [2.4] Refine error message for contains array (#37443)
issue: #36221 

master pr: #37383

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-11-07 10:40:25 +08:00
XuanYang-cn
20534a3f7b
fix: [cp24]Saperate L0 and Mix trigger interval (#37319)
See also: #37108
pr: #37190

- Add MixCompactionTriggerInterval, default 60s
- Add L0CompactionTriggerInterval, default 10s
- Export Single related compaction configs
- Raise SingleCompactionDeltaLogMaxSize from 2MB to 16MB

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-11-06 11:10:26 +08:00
yellow-shine
af5e32d00b
enhance: refine the pipeline (#37456)
https://github.com/milvus-io/milvus/pull/37412

---------

Signed-off-by: Yellow Shine <sammy.huang@zilliz.com>
2024-11-06 10:24:30 +08:00
sre-ci-robot
28cb357de3
[automated] Bump milvus version to v2.4.15 (#37457)
Bump milvus version to v2.4.15
Signed-off-by: sre-ci-robot sre-ci-robot@users.noreply.github.com

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-11-05 21:18:32 +08:00
congqixia
b7c80f9b83
enhance: Bump milvus & proto version to v2.4.15 (#37435)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
v2.4.15
2024-11-05 14:46:24 +08:00
congqixia
c195f9f76a
enhance: [2.4] Pass rpc stats via gin.Context (#37440)
Cherry pick from master
pr: #37439
Related #37223

RPC stats worked in middleware but faild to get method & collection info

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-05 14:24:24 +08:00
wei liu
6b69170a64
fix: proxy retry to get shard leader on unloaded collection (#37326)
issue: #37115

pr#37116 let proxy retry to get shard leader if error happens, which
cause if search/query on a unloaded collection, which will keep retrying
until ctx done.

This PR add error type check to skip retry on ErrCollectionLoaded.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-05 11:02:25 +08:00
yihao.dai
380662153f
fix: [2.4] Revert "enhance: Support db for bulkinsert (#37012) (#37017)" (#37421)
This reverts commit d6adc62765665d1555039c4d256a75d1144d49d0.

issue: https://github.com/milvus-io/milvus/issues/31273

pr: https://github.com/milvus-io/milvus/pull/37420

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-11-05 10:48:24 +08:00
wei liu
eb712f0db9
fix: dead lock if query node crash during shard client init (#37354)
issue: #37115

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-05 10:46:32 +08:00
XuanYang-cn
28fd217e27
fix: [cp24]l0RowCount metrics value always empty (#37307)
See also: #36953
pr: #37306

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-11-04 15:34:24 +08:00
cai.zhang
4fb86eb17d
fix: [2.4] Fix the bug where some expressions do not correctly parse the value (#37342)
issue: #37274

master pr: #37341

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-11-03 18:42:23 +08:00
congqixia
ce7d4090f1
enhance: [2.4] Move forward l0 logic out of delta lock (#37340)
Cherry pick from master
pr: #37337
Related to #35303

`deleteMut` shall be protecting streaming delete buffer, forward l0
could be move out of the rlock section to reduce tsafe impact from
loading segments.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-01 14:42:22 +08:00
wei liu
3c09d42bfc
fix: [skip e2e] TestNodeDownOnSingleReplica has unstable result (#37288) (#37350)
issue: #37289
pr: #37288
those test case use search to verify replica's status, but if the search
gap is 1s, the node down's effect may be fixed up by balance.

This PR remove the 1 second gap between search operation.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-01 13:48:22 +08:00
SimFG
d0e78cef06
enhance: [2.4] update the expr version to fix the method call error (#37260)
/kind improvement
- pr: #37259

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-10-31 15:00:23 +08:00
XuanYang-cn
6109e9d69e
fix: Skip mark compaction timeout for mix and l0 compaction (#37118) (#37194)
Timeout is a bad design for long running tasks, especially using a
static timeout config. We should monitor execution progress and fail the
task if the progress has been stale for a long time.

This pr is a small patch to stop DC from marking compaction tasks
timeout, while still waiting for DN to finish. The design is
self-conflicted. After this pr, mix and L0 compaction are no longer
controlled by DC timeout, but clustering is still under timeout control.

The compaction queue capacity grows larger for priority calc, hence
timeout compactions appears more often, and when timeout, the queuing
tasks will be timeout too, no compaction will success after.

See also: #37108, #37015
pr: #37118

---------

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-10-31 10:36:21 +08:00
congqixia
1a09d6385e
enhance: [2.4] Release compacted growing segment if in dropped list (#37245) (#37266)
Cherry-pick from master
pr: #37245
See also #37205

Previously releasing growing segments could be triggered by two
conditions:

- Sealed Segment with same id is loaded
- Segment start position is before target checkpoint ts

Which has a worst case that the corresponding sealed segment is
compacted and the checkpoint is pinned by a growing l0 segment.

This PR introduces a new rule that: a growing segment could be released
if the segment id appeared in current target dropped segment id list.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-31 10:14:22 +08:00
nico
771fad51b3
test: update pymilvus version and test cases (#37301)
Signed-off-by: nico <cheng.yuan@zilliz.com>
2024-10-31 09:40:22 +08:00
congqixia
37d691f458
fix: [2.4] Rectify OffsetOrderedArray contain logic (#37309)
Cherry pick from master
pr: #37305 
Related to #36887

Remove non-hit pk delete record logic does not work since
`insert_record_.contain` does not work due to logic problem.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-30 21:16:22 +08:00
congqixia
a2a51c489e
fix: [2.4] Check resource when loading deltalogs (#37195) (#37263)
Cherry pick from master
pr: #37195
Related to #36887

`LoadDeltaLogs` API did not check memory usage. When system is under
high delete load pressure, this could result into OOM quit.

This PR add resource check for `LoadDeltaLogs` actions and separate
internal deltalog loading function with public one.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-30 11:54:41 +08:00
yellow-shine
ce7fbb9439 Bump milvus version to v2.4.14 (#37252)
Signed-off-by: Yellow Shine <sammy.huang@zilliz.com>
2024-10-29 21:34:29 +08:00
aoiasd
8370caa4a6
enhance: [Cherry-pick]Add collection name label for some metric (#36951) (#37159)
pr: https://github.com/milvus-io/milvus/pull/36951

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
v2.4.14
2024-10-29 17:38:22 +08:00
cai.zhang
05c40522ce
enhance: [cherry-pick ]Enhance the expression template to support AND and OR operations (#37217)
issue: #36672

master pr: #37033

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-10-29 15:38:40 +08:00
congqixia
3d1e81fb31
fix: [2.4] Use singleton delete pool and avoid goroutine leakage (#37225)
Cherry-pick from master
pr: #37220
Related to #36887

Previously using newly create pool per request shall cause goroutine
leakage. This PR change this behavior by using singleton delete pool.
This change could also provide better concurrency control over delete
memory usage.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-29 14:44:23 +08:00
congqixia
0b284ccc23
enhance: Bump milvus & proto version to v2.4.14 (#37198)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-29 10:44:25 +08:00
congqixia
49147524be
enhance: [2.4] Use middleware to observe restful v2 in/out rpc stats (#37224)
Cherry pick from master
pr: #37223
Related to #36102

Previous PR #36107 add grpc inteceptor to observe rpc stats. Using same
strategy, this pr add gin middleware to observer restful v2 rpc stats.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-29 10:26:24 +08:00