316 Commits

Author SHA1 Message Date
congqixia
3d8b6a4d2e
fix: [Cherry-pick] Release loaded growing if WatchDmlChannel fail (#30735) (#30745)
Cherry pick from master
pr: #30735
See also #30734

---------

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-02-22 16:44:55 +08:00
congqixia
31f33f67e0
fix: [cherry-pick] Update disk usage metrics after segment released (#30702) (#30707)
Cherry-pick from master
pr: #30702
See also #30701

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-02-21 10:54:53 +08:00
yah01
52ac0718f0
enhance: limit the max pool size to 16 (#30371) (#30415)
according to our benchmark, concurrency level 16 is enough to fully
utilize the object storage network bandwidth
pr: #30371

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-02-20 15:58:52 +08:00
zhagnlu
a209d05537
fix: erase pk empty check when pk index replace raw data (#30432) (#30578)
pr: #30432

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-02-12 08:24:53 +08:00
chyezh
be1bd9615a
enhance: add configurable memory index load predict memory usage factor (#30563)
pr: #30561

related pr: #30475

Signed-off-by: chyezh <chyezh@outlook.com>
2024-02-06 22:00:49 +08:00
congqixia
f2310ab4ce
enhance: [Cherry-pick] Use dynamic pool for NewLoadIndexInfo (#30489) (#30497)
Cherry-pick from master
pr: #30489 
See also #30445

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-02-04 16:39:06 +08:00
yah01
655e235230
enhance: calculate the accuracy memory usage while loading segment (#30473) (#30475)
the old version Knowhere would copy the index data while loading, we
need to consider this to avoid OOM.

Knowhere provides a util function to indicate whether it will load the
index with disk, if not, we need to double the memory usage prediction
for index data

pr: #30473

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-02-03 13:01:12 +08:00
congqixia
69a82acc46
enhance: [Cherry-pick] Set delete scope for LoadSegment streaming data (#30245) (#30367)
Cherry pick from master
pr: #30245
See also #29474

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-02-02 16:05:06 +08:00
chyezh
77e123762f
enhance: add graceful stop timeout to avoid node stop hang under extreme cases (#30320)
1. add coordinator and proxy graceful stop timeout to 5s.
3. add other work node graceful stop timeout to 900s, and we should
potentially change this to 600s when graceful stop is smooth
4. change the order of datacoord component while stop.
5. `LivenessCheck` do not perform graceful shutdown now. 

issue: https://github.com/milvus-io/milvus/issues/30310
pr: #30317
also see: https://github.com/milvus-io/milvus/pull/30306

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-01-27 08:45:02 +08:00
yihao.dai
e0f987ee9b
enhance: Allows proactive warming up of chunk cache (#30182) (#30289)
Allows proactive warming up of chunk cache. Original vector data will be
asynchronously loaded into the chunk cache during the load process. It
has the potential to significantly reduce query/search latency for a
certain duration after the load, albeit with a concurrent increase in
disk usage.

issue: https://github.com/milvus-io/milvus/issues/30181

pr: https://github.com/milvus-io/milvus/pull/30182

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-26 09:57:01 +08:00
congqixia
d182a51653
fix: [Cherry-pick] Use correct pools for all CGO methods in segments pkg (#30275)
Cherry-pick from master
pr: #30274
See also #30273

This PR:
- Rename confusing `LoadIndexInfo` to `UpdateIndexInfo` for LocalSegment
- Use `DynamicPool` instead of `LoadPool` for `UpdateSealedSegmentIndex`
- Fix cgo call missing pool control

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-25 19:49:01 +08:00
congqixia
1a54571c10
enhance: [Cherry-pick] Add trace span for scheduling read tasks in QueryNode (#30266)
Cherry-pick from master
pr: #30265 

This PR adds a trace span for search/query task scheduling duration

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-25 15:39:01 +08:00
congqixia
35e4165722
enhance: [2.3] make Load process traceable in querynode & segcore (#30187)
Cherry-pick from master, modified some files since branching
pr: #29858
See also #29803

This PR:
- Add trace span for LoadIndex & LoadFieldData in segment loader
- Add TraceCtx parameter for Index.Load in segcore
- Add span for ReadFiles & Engine Load for Memory/Disk Vector index

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-23 15:58:57 +08:00
yah01
9bd94c4fab
fix: the system rejects all queries and never recovers if enabled read rate limit (#30061) (#30196)
fix #30060
pr: #30061

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-23 10:37:00 +08:00
yah01
0e71923408
enhance: enable converting segcore error to merr (#29914) (#30178)
this converts the segcore error to merr if possible
pr: #29914

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-22 16:56:55 +08:00
SimFG
be1470a654
enhance: [2.3] Add load/release partitions to replicate msg stream (#30001)
/kind improvement
pr: #28399

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-01-18 22:50:55 +08:00
chyezh
c8e3a48214
fix: querynode num entity metric is broken by illegal label (#29949)
issue: #29766
also see pr: #29825
pr: #29948

Signed-off-by: chyezh <ye.zhen@zilliz.com>
2024-01-14 10:22:59 +08:00
congqixia
227071a754
enhance: [cherry-pick] reduce delete detail log to delete range (#29916) (#29930)
Cherry-pick from master
pr: #29916
Delete detail log will be large and hard to read when log level is
debug. This PR change the log to stringer and print only pk range,
number.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-12 20:18:51 +08:00
congqixia
c21229b7bb
enhance: [cherry-pick] add trace span for wait tsafe (#29911) (#29929)
Cherry-pick from master
pr: #29911 
Add tracing span for search/query operation waiting tsafe duration

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-12 20:17:01 +08:00
wei liu
86cddd24b5
enhance: Add ctx for load index logs (#29686) (#29905)
pr: #29686
This PR add ctx for load index logs

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-12 18:56:58 +08:00
yah01
4edcd4d22b
fix: the insert count is zero after set the pointer to nil (#29870) (#29881)
this leads to the EntitiesNum metric would be never reduced

fix: #29766
pr: #29870

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-12 10:20:51 +08:00
chyezh
f0db26107c
fix: panic caused by type assert LocalSegment on Segment (#29018) (#29900)
- Make implementation of LocalWorker and RemoteWorker same.

issue: #29017, #29899
pr: #29018

Signed-off-by: yah01 <yah2er0ne@outlook.com>
Co-authored-by: yah01 <yah2er0ne@outlook.com>
2024-01-12 10:08:50 +08:00
jaime
c0b711e9fb
enhance: Support read hardware metrics for cgroupv2 (#29847)
issue: #29846
pr: #29850

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-01-11 19:20:57 +08:00
zhenshan.cao
7cf2be09b5
fix: Restore the MVCC functionality. (#29749) (#29802)
When the TimeTravel functionality was previously removed, it
inadvertently affected the MVCC functionality within the system. This PR
aims to reintroduce the internal MVCC functionality as follows:

1. Add MvccTimestamp to the requests of Search/Query and the results of
Search internally.
2. When the delegator receives a Query/Search request and there is no
MVCC timestamp set in the request, set the delegator's current tsafe as
the MVCC timestamp of the request. If the request already has an MVCC
timestamp, do not modify it.
3. When the Proxy handles Search and triggers the second phase ReQuery,
divide the ReQuery into different shards and pass the MVCC timestamp to
the corresponding Query requests.

issue: #29656
pr: #29749

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-01-11 14:42:49 +08:00
yah01
e7e4561da8
fix: the entities num metric may be contributed more than once (#29767) (#29825)
the growing segments contribute to this metric while inserting and
putting into the manager, but the current impl inserts data before
putting the segments into manager, which leads to double contributions

fix: #29766
pr: #29767

Signed-off-by: yah01 <yah2er0ne@outlook.com>
2024-01-11 10:24:51 +08:00
yah01
38c61594c0
enhance: use GPU pool for gpu tasks (#29678) (#29706)
- this much improve the performance for GPU index
- this also reduce 1x copy while parsing index meta
pr: #29678

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-09 14:04:48 +08:00
yah01
58410d8b62
enhance: skip loading duplicated index (#29715) (#29716)
this protect the loading index from failure, and speed up the loading
progress
pr: #29715

Signed-off-by: yah01 <yang.cen@zilliz.com>
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2024-01-07 17:00:48 +08:00
congqixia
fc65f01ddd
enhance: [Cherry-pick] Cache segment row num, size, and insert count to reduce CGO calls (#28007) (#29679)
Cherry pick from master
pr: #28007
See also #29650

Signed-off-by: yah01 <yah2er0ne@outlook.com>
Co-authored-by: yah01 <yah2er0ne@outlook.com>
2024-01-04 23:04:47 +08:00
yah01
3c3fc160e9
fix: make the entity num metric accurate (#29643) (#29644)
fix https://github.com/milvus-io/milvus/issues/29642
pr: #29643

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-04 19:26:47 +08:00
MrPresent-Han
757834602a
enhance: add param for bloomfilter(#29388) (#29614)
related: https://github.com/milvus-io/milvus/issues/29388
pr: https://github.com/milvus-io/milvus/pull/29490

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2024-01-02 18:58:47 +08:00
congqixia
67313ccc86
fix: [cherry-pick] exclude insertData before growing checkpoint (#29559)
Cherry-pick from master
pr: #29558
See also: #29556
Refine exclude segment function signature
Add exclude growing before checkpoint logic

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-12-28 18:18:54 +08:00
congqixia
687eb3955e
enhance: [Cherry-pick] Refine C.NewSegment response and handle exception (#28952) (#29550)
Cherry-pick from master
pr: #28952
See also #28795

Orignal `C.NewSegment` may panic if some condition is not met, this pr
changes response struct to `CNewSegmentResult`, which contains
`C.CStatus` and may return catched exception

---------

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-12-28 14:48:47 +08:00
congqixia
852547b1c5
fix: [cherry-pick] compose exclude info from flushed segment id (#29549)
Cherry-pick from master
pr: #29548
See also #29526

Previous PR removed flushed segment info from request, which causes
pipeline failing to exclude flushed segment info

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-12-28 14:02:54 +08:00
congqixia
dd52a674aa
enhance: [cherry-pick] add ctx for HandleCStatus and callers (#29517) (#29546)
Cherry-pick from master
pr: #29517 
See also #29516

Make `HandleCStatus` print trace id for better logging

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-12-28 10:20:47 +08:00
MrPresent-Han
151a5c3ca8
fix: iterator lose data for duplicted result(#29406) (#29446)
related: #29406
pr: #29451

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2023-12-27 23:22:46 +08:00
congqixia
fc5dd524c5
enhance: [Cherry-pick] add log when release segment created for load failure (#29464) (#29500)
Cherry-pick from master
pr: #29464 
Add log for releasing segment created during load process when load
error happens

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-12-27 20:00:48 +08:00
yah01
e422a62a80
enhance: improve the handling for segcore error (#29471) (#29521)
- fix lost exception details in segcore
- improve the logs of handling errors from segcore

pr: #29471

Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-12-27 19:06:46 +08:00
aoiasd
6eeb4b7f9a
enhance: [Cherry-Pick] Refine delete by expression for forbid proxy dml task scheduler hang (#29359)
relate: https://github.com/milvus-io/milvus/issues/29146
pr: https://github.com/milvus-io/milvus/pull/29340

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2023-12-26 17:50:48 +08:00
congqixia
14d8b1fe85
fix: [Cherry-pick] Add scope limit for querynode DeleteRequest (#29476)
Cherry-pick from master
pr: #29474 
See also #27515

When Delegator processes delete data, it forwards delete data with only
segment id specified. When two segments has same segment id but one is
growing and the other is sealed, the delete will be applied to both
segments which causes delete data out of order when concurrent load
segment occurs.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-12-26 16:40:50 +08:00
wei liu
514da535e5
enhance: add metrics for stopping querynode balance progress (#29201) (#29390)
pr: #29201
This PR add three metrics to track the stopping balance progress.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-12-26 10:02:46 +08:00
congqixia
f25d1f9b2c
enhance: [cherry-pick] change protection to RLock for loadStreamDelete (#29452)
Cherry-pick from master
pr: #29450 
See also #29332

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-12-25 23:27:01 +08:00
SimFG
74e72ce27e
enhance: [2.3] Support to get the param value in the runtime (#29298)
pr: #29297
/kind improvement

Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-12-21 20:36:43 +08:00
congqixia
9050b236e4
fix: [cherry-pick] delegator may mark segment offline by mistake (#29344)
cherry-pick from master
pr: #29343
See also #29332

The segment may be released before or during the request when delegator
tries to forward delete request to yet. Currently, these two situation
returns different error code.

In this particular case, ErrSegmentNotLoaded and ErrSegmentNotFound
shall both be ignored preventing return search service unavailable by
mistake.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-12-20 21:16:41 +08:00
yah01
cd9e3c4837
fix: creating growing segments may introduce many threads (#29314)
many growing segments may be created in a short time and there is no
restriction to the process, the CGO call will leave many threads

related: https://github.com/milvus-io/milvus/issues/29282
pr: #29306

Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-12-19 18:02:40 +08:00
wei liu
9092b1ae8a
feat: enable balance based on growing segment row count (#28623) (#29184)
issue: #28622 
pr: #28623
query node with delegator will has more rows than other query node due
to delgator loads all growing rows.
This PR enable the balance segment which based on the num of growing
rows in leader view.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-12-14 15:26:37 +08:00
MrPresent-Han
5f4ac437b2
enhance: [Cherry-pick] Moving etcd client into session (#27069) (#28996)
relate: #26694
pr: https://github.com/milvus-io/milvus/pull/27069

Signed-off-by: Filip Haltmayer <filip.haltmayer@zilliz.com>
Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
Co-authored-by: Filip Haltmayer <81822489+filip-halt@users.noreply.github.com>
2023-12-07 16:22:34 +08:00
aoiasd
8502037cff
fix: [Cherry-pick] sync action load segment with lack collection index info list (#28956)
relate: https://github.com/milvus-io/milvus/issues/28779
https://github.com/milvus-io/milvus/issues/28637
pr: https://github.com/milvus-io/milvus/pull/28788

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2023-12-07 14:14:42 +08:00
cqy123456
8fd38c8eea
enhance:[cherry-pick] Use binlog index for better search performance (#29012)
this pr is cherry-pick from master:
pr: https://github.com/milvus-io/milvus/pull/28528
pr: https://github.com/milvus-io/milvus/pull/27673
related issue:
issue: https://github.com/milvus-io/milvus/issues/27678

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2023-12-07 09:52:34 +08:00
congqixia
63e1ac0846
fix: [cherry-pick] schema->size() check logic with system field (#28802) (#28841)
Cherry pick from master
pr: #28802

Now segcore load system field info as well, the growing segment
assertion shall not pass with "+ 2" value
This will cause all growing segments load failure
Fix #28801
Related to #28478
See also #28524

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-12-01 13:28:30 +08:00
Gao
ccca932cc6
fix: [2.3] correct autoindex segment num (#28429)
issue: #28386 
pr: #28387

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2023-11-28 19:24:26 +08:00