304 Commits

Author SHA1 Message Date
congqixia
35e4165722
enhance: [2.3] make Load process traceable in querynode & segcore (#30187)
Cherry-pick from master, modified some files since branching
pr: #29858
See also #29803

This PR:
- Add trace span for LoadIndex & LoadFieldData in segment loader
- Add TraceCtx parameter for Index.Load in segcore
- Add span for ReadFiles & Engine Load for Memory/Disk Vector index

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-23 15:58:57 +08:00
yah01
9bd94c4fab
fix: the system rejects all queries and never recovers if enabled read rate limit (#30061) (#30196)
fix #30060
pr: #30061

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-23 10:37:00 +08:00
yah01
0e71923408
enhance: enable converting segcore error to merr (#29914) (#30178)
this converts the segcore error to merr if possible
pr: #29914

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-22 16:56:55 +08:00
SimFG
be1470a654
enhance: [2.3] Add load/release partitions to replicate msg stream (#30001)
/kind improvement
pr: #28399

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-01-18 22:50:55 +08:00
chyezh
c8e3a48214
fix: querynode num entity metric is broken by illegal label (#29949)
issue: #29766
also see pr: #29825
pr: #29948

Signed-off-by: chyezh <ye.zhen@zilliz.com>
2024-01-14 10:22:59 +08:00
congqixia
227071a754
enhance: [cherry-pick] reduce delete detail log to delete range (#29916) (#29930)
Cherry-pick from master
pr: #29916
Delete detail log will be large and hard to read when log level is
debug. This PR change the log to stringer and print only pk range,
number.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-12 20:18:51 +08:00
congqixia
c21229b7bb
enhance: [cherry-pick] add trace span for wait tsafe (#29911) (#29929)
Cherry-pick from master
pr: #29911 
Add tracing span for search/query operation waiting tsafe duration

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-12 20:17:01 +08:00
wei liu
86cddd24b5
enhance: Add ctx for load index logs (#29686) (#29905)
pr: #29686
This PR add ctx for load index logs

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-12 18:56:58 +08:00
yah01
4edcd4d22b
fix: the insert count is zero after set the pointer to nil (#29870) (#29881)
this leads to the EntitiesNum metric would be never reduced

fix: #29766
pr: #29870

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-12 10:20:51 +08:00
chyezh
f0db26107c
fix: panic caused by type assert LocalSegment on Segment (#29018) (#29900)
- Make implementation of LocalWorker and RemoteWorker same.

issue: #29017, #29899
pr: #29018

Signed-off-by: yah01 <yah2er0ne@outlook.com>
Co-authored-by: yah01 <yah2er0ne@outlook.com>
2024-01-12 10:08:50 +08:00
jaime
c0b711e9fb
enhance: Support read hardware metrics for cgroupv2 (#29847)
issue: #29846
pr: #29850

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-01-11 19:20:57 +08:00
zhenshan.cao
7cf2be09b5
fix: Restore the MVCC functionality. (#29749) (#29802)
When the TimeTravel functionality was previously removed, it
inadvertently affected the MVCC functionality within the system. This PR
aims to reintroduce the internal MVCC functionality as follows:

1. Add MvccTimestamp to the requests of Search/Query and the results of
Search internally.
2. When the delegator receives a Query/Search request and there is no
MVCC timestamp set in the request, set the delegator's current tsafe as
the MVCC timestamp of the request. If the request already has an MVCC
timestamp, do not modify it.
3. When the Proxy handles Search and triggers the second phase ReQuery,
divide the ReQuery into different shards and pass the MVCC timestamp to
the corresponding Query requests.

issue: #29656
pr: #29749

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-01-11 14:42:49 +08:00
yah01
e7e4561da8
fix: the entities num metric may be contributed more than once (#29767) (#29825)
the growing segments contribute to this metric while inserting and
putting into the manager, but the current impl inserts data before
putting the segments into manager, which leads to double contributions

fix: #29766
pr: #29767

Signed-off-by: yah01 <yah2er0ne@outlook.com>
2024-01-11 10:24:51 +08:00
yah01
38c61594c0
enhance: use GPU pool for gpu tasks (#29678) (#29706)
- this much improve the performance for GPU index
- this also reduce 1x copy while parsing index meta
pr: #29678

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-09 14:04:48 +08:00
yah01
58410d8b62
enhance: skip loading duplicated index (#29715) (#29716)
this protect the loading index from failure, and speed up the loading
progress
pr: #29715

Signed-off-by: yah01 <yang.cen@zilliz.com>
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2024-01-07 17:00:48 +08:00
congqixia
fc65f01ddd
enhance: [Cherry-pick] Cache segment row num, size, and insert count to reduce CGO calls (#28007) (#29679)
Cherry pick from master
pr: #28007
See also #29650

Signed-off-by: yah01 <yah2er0ne@outlook.com>
Co-authored-by: yah01 <yah2er0ne@outlook.com>
2024-01-04 23:04:47 +08:00
yah01
3c3fc160e9
fix: make the entity num metric accurate (#29643) (#29644)
fix https://github.com/milvus-io/milvus/issues/29642
pr: #29643

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-04 19:26:47 +08:00
MrPresent-Han
757834602a
enhance: add param for bloomfilter(#29388) (#29614)
related: https://github.com/milvus-io/milvus/issues/29388
pr: https://github.com/milvus-io/milvus/pull/29490

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2024-01-02 18:58:47 +08:00
congqixia
67313ccc86
fix: [cherry-pick] exclude insertData before growing checkpoint (#29559)
Cherry-pick from master
pr: #29558
See also: #29556
Refine exclude segment function signature
Add exclude growing before checkpoint logic

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-12-28 18:18:54 +08:00
congqixia
687eb3955e
enhance: [Cherry-pick] Refine C.NewSegment response and handle exception (#28952) (#29550)
Cherry-pick from master
pr: #28952
See also #28795

Orignal `C.NewSegment` may panic if some condition is not met, this pr
changes response struct to `CNewSegmentResult`, which contains
`C.CStatus` and may return catched exception

---------

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-12-28 14:48:47 +08:00
congqixia
852547b1c5
fix: [cherry-pick] compose exclude info from flushed segment id (#29549)
Cherry-pick from master
pr: #29548
See also #29526

Previous PR removed flushed segment info from request, which causes
pipeline failing to exclude flushed segment info

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-12-28 14:02:54 +08:00
congqixia
dd52a674aa
enhance: [cherry-pick] add ctx for HandleCStatus and callers (#29517) (#29546)
Cherry-pick from master
pr: #29517 
See also #29516

Make `HandleCStatus` print trace id for better logging

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-12-28 10:20:47 +08:00
MrPresent-Han
151a5c3ca8
fix: iterator lose data for duplicted result(#29406) (#29446)
related: #29406
pr: #29451

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2023-12-27 23:22:46 +08:00
congqixia
fc5dd524c5
enhance: [Cherry-pick] add log when release segment created for load failure (#29464) (#29500)
Cherry-pick from master
pr: #29464 
Add log for releasing segment created during load process when load
error happens

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-12-27 20:00:48 +08:00
yah01
e422a62a80
enhance: improve the handling for segcore error (#29471) (#29521)
- fix lost exception details in segcore
- improve the logs of handling errors from segcore

pr: #29471

Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-12-27 19:06:46 +08:00
aoiasd
6eeb4b7f9a
enhance: [Cherry-Pick] Refine delete by expression for forbid proxy dml task scheduler hang (#29359)
relate: https://github.com/milvus-io/milvus/issues/29146
pr: https://github.com/milvus-io/milvus/pull/29340

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2023-12-26 17:50:48 +08:00
congqixia
14d8b1fe85
fix: [Cherry-pick] Add scope limit for querynode DeleteRequest (#29476)
Cherry-pick from master
pr: #29474 
See also #27515

When Delegator processes delete data, it forwards delete data with only
segment id specified. When two segments has same segment id but one is
growing and the other is sealed, the delete will be applied to both
segments which causes delete data out of order when concurrent load
segment occurs.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-12-26 16:40:50 +08:00
wei liu
514da535e5
enhance: add metrics for stopping querynode balance progress (#29201) (#29390)
pr: #29201
This PR add three metrics to track the stopping balance progress.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-12-26 10:02:46 +08:00
congqixia
f25d1f9b2c
enhance: [cherry-pick] change protection to RLock for loadStreamDelete (#29452)
Cherry-pick from master
pr: #29450 
See also #29332

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-12-25 23:27:01 +08:00
SimFG
74e72ce27e
enhance: [2.3] Support to get the param value in the runtime (#29298)
pr: #29297
/kind improvement

Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-12-21 20:36:43 +08:00
congqixia
9050b236e4
fix: [cherry-pick] delegator may mark segment offline by mistake (#29344)
cherry-pick from master
pr: #29343
See also #29332

The segment may be released before or during the request when delegator
tries to forward delete request to yet. Currently, these two situation
returns different error code.

In this particular case, ErrSegmentNotLoaded and ErrSegmentNotFound
shall both be ignored preventing return search service unavailable by
mistake.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-12-20 21:16:41 +08:00
yah01
cd9e3c4837
fix: creating growing segments may introduce many threads (#29314)
many growing segments may be created in a short time and there is no
restriction to the process, the CGO call will leave many threads

related: https://github.com/milvus-io/milvus/issues/29282
pr: #29306

Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-12-19 18:02:40 +08:00
wei liu
9092b1ae8a
feat: enable balance based on growing segment row count (#28623) (#29184)
issue: #28622 
pr: #28623
query node with delegator will has more rows than other query node due
to delgator loads all growing rows.
This PR enable the balance segment which based on the num of growing
rows in leader view.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-12-14 15:26:37 +08:00
MrPresent-Han
5f4ac437b2
enhance: [Cherry-pick] Moving etcd client into session (#27069) (#28996)
relate: #26694
pr: https://github.com/milvus-io/milvus/pull/27069

Signed-off-by: Filip Haltmayer <filip.haltmayer@zilliz.com>
Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
Co-authored-by: Filip Haltmayer <81822489+filip-halt@users.noreply.github.com>
2023-12-07 16:22:34 +08:00
aoiasd
8502037cff
fix: [Cherry-pick] sync action load segment with lack collection index info list (#28956)
relate: https://github.com/milvus-io/milvus/issues/28779
https://github.com/milvus-io/milvus/issues/28637
pr: https://github.com/milvus-io/milvus/pull/28788

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2023-12-07 14:14:42 +08:00
cqy123456
8fd38c8eea
enhance:[cherry-pick] Use binlog index for better search performance (#29012)
this pr is cherry-pick from master:
pr: https://github.com/milvus-io/milvus/pull/28528
pr: https://github.com/milvus-io/milvus/pull/27673
related issue:
issue: https://github.com/milvus-io/milvus/issues/27678

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2023-12-07 09:52:34 +08:00
congqixia
63e1ac0846
fix: [cherry-pick] schema->size() check logic with system field (#28802) (#28841)
Cherry pick from master
pr: #28802

Now segcore load system field info as well, the growing segment
assertion shall not pass with "+ 2" value
This will cause all growing segments load failure
Fix #28801
Related to #28478
See also #28524

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-12-01 13:28:30 +08:00
Gao
ccca932cc6
fix: [2.3] correct autoindex segment num (#28429)
issue: #28386 
pr: #28387

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2023-11-28 19:24:26 +08:00
congqixia
5a962a631a
fix: [cherry-pick] Change schema to atomic.Pointer to avoid data race (#28739) (#28759)
Cherry-pick from master
pr: #28739
See also #28738

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-11-27 19:50:27 +08:00
yihao.dai
8520ee7552
enhance: Print nq (#28507) (#28639)
Log nq in search path.

pr: https://github.com/milvus-io/milvus/pull/28507

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-11-27 10:18:26 +08:00
yah01
a1b861ed7a
enhance: improve load speed (#28518) (#28719)
This check rejects load request if running out the pool workers, but
small segment would be loaded soon, another segments would been loading
again after a check interval, which leads to slow loading for collection

Block the request by go pool

pr: #28518

Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-11-26 22:10:26 +08:00
congqixia
e4ea148c0f
fix: [cherry-pick] Add IndexList check for load segment request (#28601) (#28700)
Cherry-pick from master
pr: #28601
See also #28022 #28034
The load segment may reaches before watch dml channel, so the index meta
may be empty as well

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-11-24 15:02:39 +08:00
congqixia
0fbd610e89
fix: [cherry-pick] querynodev2 local worker failed to maintain collection ref (#28631)
Cherry-pick from master
pr: #28590 #28598
See also #28589 #28596 
Increase ref for collection during load and unref after load completed.
Use the same logic protection from services.go `LoadSegments`
Perform `Unref` after release sealed segments

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-11-22 10:04:23 +08:00
SimFG
598788e6b8
Delay the cancellation of ctx when stopping the node (#28249)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-11-08 01:46:20 +08:00
yah01
d10a82dba4
Fix getting incorrect CPU num (#28178)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-11-07 11:52:22 +08:00
wei liu
87e8d04ed7
fix sync distribution with wrong version (#28130) (#28170)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-11-06 11:38:18 +08:00
yah01
5c444218a2
Limit max thread num for pool (#28018) (#28115)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-11-06 10:50:17 +08:00
yah01
0ab13c935a
Fix QueryNode panic while upgrading (#28034) (#28114)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-11-03 17:24:18 +08:00
wei liu
4558af94d5
fix retry on offline node (#28079) (#28139)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-11-03 16:00:17 +08:00
congqixia
994bb6991b
Refine offline segments logic in shard delegator (#28073) (#28084)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-11-01 23:18:17 +08:00