9325 Commits

Author SHA1 Message Date
XuanYang-cn
d23da2db4f
fix: [cp24]Correct varchar primarykey size calculation (#37619)
See also: #37582
pr: #37617

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-11-14 14:16:38 +08:00
wei liu
28bcd85bd0
fix: Balance channel may stuck at increasing replica number case (#37642)
issue: #37640
pr: #37641
fix the pr #36549
cause balance channel will wait until new delegator becomes serviceable,
but new delegator need to sync target version then becomes serviceable,
and sync target version need to be wait all replica load done. so if
increasing replica number and balance channel happens at same time,
logic dead lock occurs.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-13 14:26:30 +08:00
congqixia
8801322371
enhance: [2.4] Invalidate collection cache when release collection (#37577) (#37628)
Cherry-pick from master
pr: #37577
Related to #37395

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-13 14:00:31 +08:00
congqixia
d073f322a4
enhance: [2.4] Add cgo call metrics for load/write API (#37405) (#37627)
Cherry-pick from master
pr: #37405

Cgo API cost is not observerable since not metrics is related to them.
This PR add metrics for some sync cgo call related to load & write

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-13 13:58:30 +08:00
wei liu
6dc879b1e2
enhance: Enable node assign policy on resource group (#36968) (#37588)
issue: #36977
pr: #36968
with node_label_filter on resource group, user can add label on
querynode with env `MILVUS_COMPONENT_LABEL`, then resource group will
prefer to accept node which match it's node_label_filter.

then querynode's can't be group by labels, and put querynodes with same
label to same resource groups.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-13 11:10:29 +08:00
wei liu
7d1c899155
fix: Search may return less result after qn recover (#36549) (#37610)
issue: #36293 #36242
pr: #36549
after qn recover, delegator may be loaded in new node, after all segment
has been loaded, delegator becomes serviceable. but delegator's target
version hasn't been synced, and if search/query comes, delegator will
use wrong target version to filter out a empty segment list, which
caused empty search result.

This pr will block delegator's serviceable status until target version
is synced

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-12 19:16:30 +08:00
cai.zhang
3456e241ac
fix: [2.4]Fix the bug that retrieved from wrong field for L0 segments (#37599)
issue: #37574 

master pr: #37598

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-11-12 19:02:31 +08:00
wei liu
074f8ee696
enhance: optimize describe collection and index (#37490) (#37605)
fix #37489
pr: #34790
combine multiple describe collection and list index into one call

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Co-authored-by: Xiaofan <83447078+xiaofan-luan@users.noreply.github.com>
2024-11-12 16:54:29 +08:00
wei liu
25c96991f6
fix: Lost loading collection's updateTs after qc restart (#37538) (#37580)
issue: #37537
pr: #37538

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-11 17:50:30 +08:00
sthuang
9e8b6ace6d
enhance: [2.4] RBAC custom privilege group (#37560)
Cherry-pick from master
pr: https://github.com/milvus-io/milvus/pull/37087,
https://github.com/milvus-io/milvus/pull/37558
issue: #37031

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2024-11-11 14:20:29 +08:00
congqixia
2fbb157dc8
enhance: [2.4] Handle legacy proxy load fields request (#37565) (#37569)
Cherry-pick from master
pr: #37565
Related to #35415

In rolling upgrade, legacy proxy may dispatch load request wit empty
load field list. The upgraded querycoord may report error by mistake
that load field list is changed.

This PR:

- Auto field empty load field list with all user field ids
- Refine the error messag when load field list updates
- Refine load job unit test with service cases

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-11 14:06:29 +08:00
congqixia
cedc34053c
enhance: [2.4] Add context trace for querycoord queryable check (#37524) (#37534)
Cherry-pick from master
pr: #37524

When check health logic failed to collection not-queryable, the related
reason is hard to find in log.

This PR add context for log with trace id and print unqueryable
collection info log.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-08 18:58:27 +08:00
wei liu
7b71411b60
fix: search/query failed due to segment not loaded (#37403) (#37544)
issue: #36970
pr: #37403
cause release segment and balance channel may happen at same time, and
before new delegator become serviceable, if release segment exeuctes on
new delegator, and search/query comes on old delegator, then release
segment and query segment happens in parallel, if release segment
execute first in worker, then search/query will got a SegmentNodeLoaded
error.

This PR add serviceable filter on delegator, then all load/release
segment operation will happens on serviceable delegator.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-08 18:56:26 +08:00
wei liu
a9beca44ef
fix: watch channel stuck due to misuse of timer.Reset (#37433) (#37542)
issue: #37166
pr: #37433
cause the misuse of timer.Reset, which cause dispatcher failed to send
msg to virtual channel buffer, and dispatcher do splitting again and
again, which hold the dispatcher manager's lock, block watching channel
progress.

This PR fix the misuse of timer.Reset

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-08 18:46:27 +08:00
yihao.dai
bff0113cbb
fix: Fix subscription leak (#37382) (#37541)
Close (unsubscribe) the msg stream after completing the PreCreatedTopic
check to prevent backlog issue.

issue: https://github.com/milvus-io/milvus/issues/36021

pr: https://github.com/milvus-io/milvus/pull/37382

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-11-08 17:38:28 +08:00
yihao.dai
fd1ca73b61
fix: Fix large growing segment (#37388) (#37540)
Consider the `sealProportion` factor during segment allocation.

issue: https://github.com/milvus-io/milvus/issues/37387

pr: https://github.com/milvus-io/milvus/pull/37388

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-11-08 17:34:27 +08:00
SimFG
5c166a25b9
enhance: [2.4] improve rootcoord task scheduling policy (#37523)
- issue: #30301
- pr: #37352

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-11-08 14:56:27 +08:00
XuanYang-cn
dd0cf20ee0
fix: [cp24]Correct dropped segment num metrics (#37471)
See also: #31891
pr: #37410

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-11-07 16:46:33 +08:00
smellthemoon
60f963102e
enhance: refactor createIndex in RESTful API(#37235) (#37237)
pr: #37235 
2.5: #37236

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-11-07 14:18:31 +08:00
congqixia
c8ba682aaf
enhance: [2.4] Use cancel label for ctx canceled storage op (#37468) (#37491)
Cherry-pick from master
pr: #37468

Previously failed label is used for canceled storage op, which may cause
wrong alarm when user cancel load operation or etc. This PR utilizes
cancel label when such case happens.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-07 12:38:26 +08:00
cai.zhang
651a56e3dd
enhance: [2.4]Update the template expression proto to improve transmission efficiency (#37485)
issue: #36672 

master pr: #37484

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-11-07 12:12:25 +08:00
Zhen Ye
cea8c756d4
fix: repeated error code in milvus and segcore (#37449)
issue: #37357
pr: #37359

Signed-off-by: chyezh <chyezh@outlook.com>
2024-11-07 10:46:25 +08:00
cai.zhang
4ae5337343
enhance: [2.4] Refine error message for contains array (#37443)
issue: #36221 

master pr: #37383

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-11-07 10:40:25 +08:00
XuanYang-cn
20534a3f7b
fix: [cp24]Saperate L0 and Mix trigger interval (#37319)
See also: #37108
pr: #37190

- Add MixCompactionTriggerInterval, default 60s
- Add L0CompactionTriggerInterval, default 10s
- Export Single related compaction configs
- Raise SingleCompactionDeltaLogMaxSize from 2MB to 16MB

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-11-06 11:10:26 +08:00
congqixia
c195f9f76a
enhance: [2.4] Pass rpc stats via gin.Context (#37440)
Cherry pick from master
pr: #37439
Related #37223

RPC stats worked in middleware but faild to get method & collection info

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-05 14:24:24 +08:00
wei liu
6b69170a64
fix: proxy retry to get shard leader on unloaded collection (#37326)
issue: #37115

pr#37116 let proxy retry to get shard leader if error happens, which
cause if search/query on a unloaded collection, which will keep retrying
until ctx done.

This PR add error type check to skip retry on ErrCollectionLoaded.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-05 11:02:25 +08:00
yihao.dai
380662153f
fix: [2.4] Revert "enhance: Support db for bulkinsert (#37012) (#37017)" (#37421)
This reverts commit d6adc62765665d1555039c4d256a75d1144d49d0.

issue: https://github.com/milvus-io/milvus/issues/31273

pr: https://github.com/milvus-io/milvus/pull/37420

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-11-05 10:48:24 +08:00
wei liu
eb712f0db9
fix: dead lock if query node crash during shard client init (#37354)
issue: #37115

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-05 10:46:32 +08:00
XuanYang-cn
28fd217e27
fix: [cp24]l0RowCount metrics value always empty (#37307)
See also: #36953
pr: #37306

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-11-04 15:34:24 +08:00
cai.zhang
4fb86eb17d
fix: [2.4] Fix the bug where some expressions do not correctly parse the value (#37342)
issue: #37274

master pr: #37341

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-11-03 18:42:23 +08:00
congqixia
ce7d4090f1
enhance: [2.4] Move forward l0 logic out of delta lock (#37340)
Cherry pick from master
pr: #37337
Related to #35303

`deleteMut` shall be protecting streaming delete buffer, forward l0
could be move out of the rlock section to reduce tsafe impact from
loading segments.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-01 14:42:22 +08:00
SimFG
d0e78cef06
enhance: [2.4] update the expr version to fix the method call error (#37260)
/kind improvement
- pr: #37259

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-10-31 15:00:23 +08:00
XuanYang-cn
6109e9d69e
fix: Skip mark compaction timeout for mix and l0 compaction (#37118) (#37194)
Timeout is a bad design for long running tasks, especially using a
static timeout config. We should monitor execution progress and fail the
task if the progress has been stale for a long time.

This pr is a small patch to stop DC from marking compaction tasks
timeout, while still waiting for DN to finish. The design is
self-conflicted. After this pr, mix and L0 compaction are no longer
controlled by DC timeout, but clustering is still under timeout control.

The compaction queue capacity grows larger for priority calc, hence
timeout compactions appears more often, and when timeout, the queuing
tasks will be timeout too, no compaction will success after.

See also: #37108, #37015
pr: #37118

---------

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-10-31 10:36:21 +08:00
congqixia
1a09d6385e
enhance: [2.4] Release compacted growing segment if in dropped list (#37245) (#37266)
Cherry-pick from master
pr: #37245
See also #37205

Previously releasing growing segments could be triggered by two
conditions:

- Sealed Segment with same id is loaded
- Segment start position is before target checkpoint ts

Which has a worst case that the corresponding sealed segment is
compacted and the checkpoint is pinned by a growing l0 segment.

This PR introduces a new rule that: a growing segment could be released
if the segment id appeared in current target dropped segment id list.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-31 10:14:22 +08:00
congqixia
37d691f458
fix: [2.4] Rectify OffsetOrderedArray contain logic (#37309)
Cherry pick from master
pr: #37305 
Related to #36887

Remove non-hit pk delete record logic does not work since
`insert_record_.contain` does not work due to logic problem.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-30 21:16:22 +08:00
congqixia
a2a51c489e
fix: [2.4] Check resource when loading deltalogs (#37195) (#37263)
Cherry pick from master
pr: #37195
Related to #36887

`LoadDeltaLogs` API did not check memory usage. When system is under
high delete load pressure, this could result into OOM quit.

This PR add resource check for `LoadDeltaLogs` actions and separate
internal deltalog loading function with public one.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-30 11:54:41 +08:00
aoiasd
8370caa4a6
enhance: [Cherry-pick]Add collection name label for some metric (#36951) (#37159)
pr: https://github.com/milvus-io/milvus/pull/36951

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-10-29 17:38:22 +08:00
cai.zhang
05c40522ce
enhance: [cherry-pick ]Enhance the expression template to support AND and OR operations (#37217)
issue: #36672

master pr: #37033

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-10-29 15:38:40 +08:00
congqixia
3d1e81fb31
fix: [2.4] Use singleton delete pool and avoid goroutine leakage (#37225)
Cherry-pick from master
pr: #37220
Related to #36887

Previously using newly create pool per request shall cause goroutine
leakage. This PR change this behavior by using singleton delete pool.
This change could also provide better concurrency control over delete
memory usage.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-29 14:44:23 +08:00
congqixia
49147524be
enhance: [2.4] Use middleware to observe restful v2 in/out rpc stats (#37224)
Cherry pick from master
pr: #37223
Related to #36102

Previous PR #36107 add grpc inteceptor to observe rpc stats. Using same
strategy, this pr add gin middleware to observer restful v2 rpc stats.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-29 10:26:24 +08:00
congqixia
b44ef8207e
fix: [2.4] Check whether new collection name is alias (#36981) (#37208)
Cherry pick from master
pr: #36981

Related to #36963

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-28 22:46:24 +08:00
wei liu
79e6ef2617
fix: Search/Query may failed during updating delegator cache (#37174)
issue: #37115
pr: #37116
casue init query node client is too heavy, so we remove
updateShardClient from leader mutex, which cause much more concurrent
cornor cases.

This PR delay query node client's init operation until `getClient` is
called, then use leader mutex to protect updating shard client progress
to avoid concurrent issues.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-10-28 20:08:25 +08:00
cai.zhang
9c0f59488a
feat: [cherry-pick]The expression supports filling elements through templates (#37058)
issue: #36672 

master pr: #37033 

milvus-proto pr: https://github.com/milvus-io/milvus-proto/pull/332

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-10-28 15:17:30 +08:00
XuanYang-cn
4cb5b2c3b5
fix: [cp24]Exlude L0 compaction when clustering is executing (#37142)
Also remove conflit check when executing L0. The exclusive is already
guarenteed in scheduler

See also: #37140
pr: #37141

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-10-28 15:01:30 +08:00
congqixia
223badc482
fix: [2.4] Ref collection meta when load l0 segment meta only (#37179)
Cherry pick from master
pr: #37178
Related to #37177

Previous PR #37160

Collection meta is not ref-ed when loading l0 segment in `RemoteLoad`
policy, which cause collection meta release when lots of l0 segment
released.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-28 14:07:30 +08:00
congqixia
9d37ade24f
enhance: [2.4] Make skip load work for all branches (#37161)
Cherry-pick from master
pr: #37160
Related to #37112

Skip load logic used to work only when there is multiple segment load
info entires in load request. In continous delete case, delegator still
loads l0 segment, which occupies lot of memory.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-25 22:11:30 +08:00
yihao.dai
d30e27e6f9
enhance: Make dataNode.import.maxConcurrentTaskNum dynamic (#37102) (#37103)
Resize import execution pool when config
`dataNode.import.maxConcurrentTaskNum` update.

issue: https://github.com/milvus-io/milvus/issues/37095

pr: https://github.com/milvus-io/milvus/pull/37102

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-25 18:21:29 +08:00
yihao.dai
da897e41f4
fix: Fix collection leak in querynode (#37061) (#37079)
Unref the removed L0 segment count.

issue: https://github.com/milvus-io/milvus/issues/36918

pr: https://github.com/milvus-io/milvus/pull/37061

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-25 18:19:39 +08:00
SimFG
ae4ce9bbba
enhance: [2.4] allow to delete data when disk quota exhausted (#37139)
- issue: #37133
- pr: #37134

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-10-25 16:07:32 +08:00
wei liu
057bfbe678
fix: Delegator may becomes unserviceable after querycoord restart (#37055) (#37100)
issue: #37054
pr: #37055
after querycoord restart, segment_checker may release segment by mistake
due to next target isn't ready yet.

This PR requires release segment must happens after next target is
ready.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-10-25 14:55:31 +08:00