564 Commits

Author SHA1 Message Date
yihao.dai
ecd55596cf
enhance: [10kcp] Optimize GetLocalDiskSize and segment loader mutex (#38600)
1. Make the segment loader lock protect only the resource.
2. Optimize GetDiskUsage to avoid excessive overhead.

issue: https://github.com/milvus-io/milvus/issues/37630

pr: https://github.com/milvus-io/milvus/pull/38599

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-12-19 21:14:26 +08:00
yihao.dai
ca234e7847
fix: [10kcp] Fix slow dist handle and slow observe (#38567)
1. Provide partition-level indexing in the collection target.
2. Make SegmentAction not wait for distribution.
3. Optimize logging to reduce CPU overhead.

issue: https://github.com/milvus-io/milvus/issues/37630

pr: https://github.com/milvus-io/milvus/pull/38566

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-12-18 21:00:39 +08:00
congqixia
999437e76e
enhance: [10kcp] Trim data distribiton resp index info (#38521)
Related to #37630

Data distribution became too large when segment number was huge. This PR
trims the index info struct and return needed info only.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-12-17 15:20:26 +08:00
congqixia
28841ebdf9
enhance: [10kcp] Simplify querynode tsafe & reduce goroutine number (#38416) (#38433)
Related to #37630

TSafe manager is too complex for current implementation and each
delegator need one goroutine waiting for tsafe update event.

Tsafe updating could be executed in pipeline. This PR remove tsafe
manager and simplify the entire logic of tsafe updating.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-12-13 21:20:57 +08:00
yihao.dai
de78de7689
fix: [10kcp] Fix consume blocked due to too many consumers (#38456)
This PR limits the maximum number of consumers per pchannel to 10 for
each QueryNode and DataNode.

issue: https://github.com/milvus-io/milvus/issues/37630

pr: https://github.com/milvus-io/milvus/pull/38455

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Co-authored-by: SimFG <bang.fu@zilliz.com>
2024-12-13 21:20:47 +08:00
congqixia
5521091dcd
enhance: [10kcp] Refine querynode collection number metrics (#38352)
Related to #37630

Previously the loaded collection metrics was calculated via scanning all
loaded segment in segment manager, which is slow and buggy
implementation.

This PR:

- Move collection num metrics to collection manager
- Remove deprecated loaded partition metrics update logic

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-12-10 21:06:42 +08:00
congqixia
24a055996b
enhance: [10kcp] Add secondary index for querynode segment manager (#38312)
Cherry pick from pr
#38311
Related to #37630

Add secondary index with vchannel to reduce `GetBy` rlock holding time
when segment number is large.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-12-09 19:56:16 +08:00
yihao.dai
338ccc9ff9
enhance: [10kcp] Reduce memory usage of BF in DataNode and QueryNode (#38133)
1. DataNode: Skip generating BF during the insert phase (BF will be
regenerated during the sync phase).
2. QueryNode: Skip generating or maintaining BF for growing segments;
deletion checks will be handled in the segcore.

issue: https://github.com/milvus-io/milvus/issues/37630

pr: https://github.com/milvus-io/milvus/pull/38129

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-12-02 14:41:19 +08:00
congqixia
876e06b862
fix: [2.4] Load l0 delta for growings when using RemoteLoad (#37772)
Cherry-pick from master
pr: #37771
Related to #37574

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-18 20:26:31 +08:00
wei liu
2a4f54cd4f
fix: L0 segment has been loaded to worker during channel balance (#37758)
issue: https://github.com/milvus-io/milvus/issues/37703
pr: https://github.com/milvus-io/milvus/pull/37748

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-18 17:00:32 +08:00
XuanYang-cn
d23da2db4f
fix: [cp24]Correct varchar primarykey size calculation (#37619)
See also: #37582
pr: #37617

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-11-14 14:16:38 +08:00
congqixia
d073f322a4
enhance: [2.4] Add cgo call metrics for load/write API (#37405) (#37627)
Cherry-pick from master
pr: #37405

Cgo API cost is not observerable since not metrics is related to them.
This PR add metrics for some sync cgo call related to load & write

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-13 13:58:30 +08:00
cai.zhang
3456e241ac
fix: [2.4]Fix the bug that retrieved from wrong field for L0 segments (#37599)
issue: #37574 

master pr: #37598

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-11-12 19:02:31 +08:00
congqixia
ce7d4090f1
enhance: [2.4] Move forward l0 logic out of delta lock (#37340)
Cherry pick from master
pr: #37337
Related to #35303

`deleteMut` shall be protecting streaming delete buffer, forward l0
could be move out of the rlock section to reduce tsafe impact from
loading segments.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-01 14:42:22 +08:00
congqixia
a2a51c489e
fix: [2.4] Check resource when loading deltalogs (#37195) (#37263)
Cherry pick from master
pr: #37195
Related to #36887

`LoadDeltaLogs` API did not check memory usage. When system is under
high delete load pressure, this could result into OOM quit.

This PR add resource check for `LoadDeltaLogs` actions and separate
internal deltalog loading function with public one.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-30 11:54:41 +08:00
aoiasd
8370caa4a6
enhance: [Cherry-pick]Add collection name label for some metric (#36951) (#37159)
pr: https://github.com/milvus-io/milvus/pull/36951

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-10-29 17:38:22 +08:00
congqixia
3d1e81fb31
fix: [2.4] Use singleton delete pool and avoid goroutine leakage (#37225)
Cherry-pick from master
pr: #37220
Related to #36887

Previously using newly create pool per request shall cause goroutine
leakage. This PR change this behavior by using singleton delete pool.
This change could also provide better concurrency control over delete
memory usage.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-29 14:44:23 +08:00
cai.zhang
9c0f59488a
feat: [cherry-pick]The expression supports filling elements through templates (#37058)
issue: #36672 

master pr: #37033 

milvus-proto pr: https://github.com/milvus-io/milvus-proto/pull/332

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-10-28 15:17:30 +08:00
congqixia
223badc482
fix: [2.4] Ref collection meta when load l0 segment meta only (#37179)
Cherry pick from master
pr: #37178
Related to #37177

Previous PR #37160

Collection meta is not ref-ed when loading l0 segment in `RemoteLoad`
policy, which cause collection meta release when lots of l0 segment
released.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-28 14:07:30 +08:00
congqixia
9d37ade24f
enhance: [2.4] Make skip load work for all branches (#37161)
Cherry-pick from master
pr: #37160
Related to #37112

Skip load logic used to work only when there is multiple segment load
info entires in load request. In continous delete case, delegator still
loads l0 segment, which occupies lot of memory.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-25 22:11:30 +08:00
yihao.dai
da897e41f4
fix: Fix collection leak in querynode (#37061) (#37079)
Unref the removed L0 segment count.

issue: https://github.com/milvus-io/milvus/issues/36918

pr: https://github.com/milvus-io/milvus/pull/37061

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-25 18:19:39 +08:00
congqixia
6bc8aba17f
enhance: [2.4] Batch forward delete when using DirectForward (#37076) (#37107)
Cherry pick from master
pr: #37076
Related #36887

DirectFoward streaming delete will cause memory usage explode if the
segments number was large. This PR add batching delete API and using it
for direct forward implementation.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-25 11:53:29 +08:00
congqixia
79891f047d
enhance: [2.4]Skip load delta data in delegater when using RemoteLoad (#37082) (#37112)
Cherry-pick from master
pr: #37082
Related to #35303

Delta data is not needed when using `RemoteLoad` l0 forward policy. By
skipping load delta data, memory pressure could be eased if l0 segment
size/number is large.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-25 11:41:30 +08:00
congqixia
3db137f4ad
enhance: [2.4] Add metrics for querynode delete buffer info (#37081) (#37097)
Cherry pick from master
pr: #37081
Related to #35303

This PR add metrics for querynode delegator delete buffer information,
which is related to dml quota logic.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-24 16:21:37 +08:00
congqixia
7eba3aa67e
fix: [2.4] Pass full field list when partial load enabled (#37053) (#37063)
Cherry-pick from master
pr: #37053

Related to #37038

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-23 11:03:28 +08:00
congqixia
7acf1d53c1
enhance: [2.4] Preallocate delete data slice to avoid growslice (#37044)
Rewritten based on master pr
pr: #37043

Related to #36887

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-22 14:15:28 +08:00
wei liu
1dcc393e54
fix: Query node panic during sending rpc to worker (#36975) (#36988)
issue: #36976
pr: #36975

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-10-21 10:15:26 +08:00
yihao.dai
8923936c9a
enhance: Support memory mode chunk cache (#35347) (#35836)
Chunk cache supports loading raw vectors into memory.

issue: https://github.com/milvus-io/milvus/issues/35273

pr: https://github.com/milvus-io/milvus/pull/35347

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-18 17:03:25 +08:00
cqy123456
6934e8da3a
enhance: [2.4]use growingMmapEnabled to control the behavior of interim index, not vectorField (#36391)
issue: https://github.com/milvus-io/milvus/issues/36392
related pr: https://github.com/milvus-io/milvus/pull/36500

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-10-17 20:23:25 +08:00
congqixia
dfe27ebf35
fix: [2.4] Direct forward delta exclude l0 segments (#36899) (#36914)
Cherry-pick from master
pr: #36899

Related to #36887

Forward delete to L0 segment will return error and mark l0 segment
offline causing delegator unserviceable

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-17 14:11:29 +08:00
congqixia
877e9ad450
enhance: [2.4] Fill start pos & level for growing segment (#36888) (#36911)
Cherry-pick from master
pr: #36888

Start position & level info is missing for growing segment loaded in
watch dml channel operation.

Level is important for metrics and start position is crucial for growing
exclude logic.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-17 14:09:32 +08:00
yihao.dai
604e346585
enhance: Enhance segment log (#36848) (#36849)
/kind improvement

pr: https://github.com/milvus-io/milvus/pull/36848

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-15 20:43:30 +08:00
SimFG
548f8e80c3
enhance: [2.4] the estimate method when loading the collection (#36728)
- pr: #36307
- issue: #36530

Signed-off-by: SimFG <bang.fu@zilliz.com>
Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2024-10-11 10:20:45 +08:00
yihao.dai
9cb5396cf6
enhance: Use common gc config (#36668) (#36670)
Use the GC config from `common` and remove the GC config from
`queryNode`.

issue: https://github.com/milvus-io/milvus/issues/36667

pr: https://github.com/milvus-io/milvus/pull/36668

related pr: https://github.com/milvus-io/milvus/pull/34949

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-09 19:49:20 +08:00
congqixia
3a80d1f602
enhance: [2.4] Add streaming forward policy switch for delegator (#36330) (#36712)
Cherry pick from master
pr: #36330
Related to #35303

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-09 17:41:20 +08:00
congqixia
9073f6281e
fix: [2.4] Add defer Unpin when error happens (#36620) (#36665)
Cherry-pick from master
pr: #36620
Resolves: #36619

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-08 18:29:20 +08:00
wei liu
2428adea3b
enhance: Enable balance on querynode with different mem capacity (#36466) (#36625)
issue: #36464
pr: #36466
This PR enable balance on querynode with different mem capacity, for
query node which has more mem capactity will be assigned more records,
and query node with the largest difference between assignedScore and
currentScore will have a higher priority to carry the new segment.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-09-30 18:11:18 +08:00
SimFG
a00523f0fd
fix: metric type error when the collection has two vec field (#36473)
- issue: #36395

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-09-24 20:03:14 +08:00
congqixia
fa6354f6df
enhance: [skip e2e][2.4] Add unittest for reducing duplicated pk from multi segments (#36433) (#36460)
Cherry-pick from master
pr: #36433
Related to #35505 #36362

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-09-24 18:09:14 +08:00
wei liu
d421effb03
fix: fix search/query/count may access same growing and sealed segment (#36258) (#36288)
issue: #36257
pr: #36258
during syncTargetVersion, sealed segment should be excluded, to avoid
it's growing segment be conusmed from stream again.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-09-18 22:21:12 +08:00
SimFG
95e47bfcf8
fix: force to set the metric type in the search request (#36279)
- issue: #35960
- pr: #35962

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-09-18 19:21:11 +08:00
congqixia
f7e4db943c
fix: [2.4] overwrite correct selection when pk duplicated (#35826) (#36274)
Cherry-pick from master
pr: #35826
Related to #35505

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-09-14 14:27:07 +08:00
congqixia
13d443eb2e
enhance: [2.4] Add L0 forward policy to support remote load (#36189) (#36208)
Cherry-pick from master
pr: #36189
Related to #35303

This PR add a param item to support change l0 forward behavior from bf
filtering and forward to remote load.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-09-12 19:09:08 +08:00
XuanYang-cn
64e109d155
fix: [cp]Change deltalog memory estimation factor to one (#36035)
See also: #36031
pr: #36033

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-09-06 18:09:05 +08:00
XuanYang-cn
54ec290109
enhance: [cp]Remove too frequent logs in Delete (#35981)
pr: #35980

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-09-06 10:47:13 +08:00
congqixia
da0bc22a5f
enhance: [2.4] Add delete buffer related quota logic (#35918) (#35997)
Cherry pick from master
pr: #35128 #35918
See also #35303

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Co-authored-by: aoiasd <45024769+aoiasd@users.noreply.github.com>
2024-09-05 16:43:06 +08:00
jaime
2c1fa50412
enhance: remove cooling off in rate limiter for read requests (#35936)
issue: #35934
pr: #35935

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-09-04 14:39:10 +08:00
Zhen Ye
a4533f1b8a
enhance: optimize milvus core building (#35660)
issue: #35549,#35611,#35633
pr: #35610

- remove milvus_segcore milvus_indexbuilder..., add libmilvus_core
- core building only link once
- move opendal compilation into cmake
- fix odr

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-08-27 18:55:00 +08:00
congqixia
ab261d0f8b
feat: [2.4] Support field partial load collection (#35416) (#35696)
Cherry-pick from master
pr: #35416
Related to #35415

---------

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-08-27 14:07:00 +08:00
wei liu
35d2f9b210
fix: Fix index memory estimation (#35225) (#35670)
issue: https://github.com/milvus-io/milvus/issues/35229
pr: #32525

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Co-authored-by: Bingyi Sun <sunbingyi1992@gmail.com>
2024-08-24 10:28:57 +08:00