347 Commits

Author SHA1 Message Date
wei liu
e5681e5b9c
enhance: make delegator delete buffer holding all delete from cp (#29626) (#35074)
See also #29625
pr: #29626 

This PR:
- Add a new implemention of `DeleteBuffer`: listDeleteBuffer
  - holds cacheBlock slice
  - `Put` method append new delete data into last block
  - when a block is full, append a new block into the list
- Add `TryDiscard` method for `DeleteBuffer` interface
  - For doubleCacheBuffer, do nothing
- For listDeleteBuffer, try to evict "old" blocks, which are blocks
before the first block whose start ts is behind provided ts
- Add checkpoint field for `UpdateVersion` sync action, which shall be
used to discard old cache delete block

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Co-authored-by: congqixia <congqi.xia@zilliz.com>
2024-08-09 18:48:18 +08:00
congqixia
3c44248105
fix: [2.3] support set up knowhere-build-pool-size on querynode (#34647)
Cherry-pick from master
pr: #30922
Related: #29650

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Co-authored-by: MrPresent-Han <chun.han@zilliz.com>
2024-07-12 19:27:36 +08:00
wei liu
7a441c39cd
enhance: Optimize grow slice cost during query (#34256)
issue: #32252
pr: #34253

This PR try to pre-allocate FieldData for Reduce operations in the Query
chain using typeutil.PrepareResultFieldData to avoid the overhead of
dynamically growing the slice during appendFieldData process.

Additionally, Upgrade google.golang.org/protobuf from version 1.31 to
1.33 to address the growing slice overhead during the proto unmarshal
repeated field process, as referenced in
[#protobuffer/protobuf-go/](86bdc4705a).

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-07-01 16:46:08 +08:00
congqixia
ce7bceece9
fix: [2.3] Check nodeID wildcard when removing pkOracle (#33895) (#34022)
Cherry-pick from master
pr: #33895
See also #33894

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-06-21 17:08:02 +08:00
congqixia
9157980232
fix: [2.3] Return record with largest timestamp for entires with same PK(#33936) (#34026)
Cherry-pick from master
pr: #33936
See also #33883

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-06-20 19:56:00 +08:00
congqixia
aea3cfefce
fix: [2.3] Prevent use captured iteration variable partitionID (#33912)
Cherry-pick from master
pr: #33906
See also #33902

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-06-17 23:14:00 +08:00
wei liu
284e79cf3a
enhance: Execute bloom filter apply in parallel to speed up process delete (#33870)
issue: #33610
pr: #33611 #33793

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-06-17 12:06:04 +08:00
congqixia
c62e092463
enhance: [2.3] Make applyDelete work in paralell in segment level (#32291) (#33841)
Cherry-pick from master
pr: #32291
`applyDelete` used to be serial for delete entries on each segments.
This PR make it work in parallel with errgroup to improve performance

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-06-13 23:09:56 +08:00
Chun Han
0d4ee287e1
fix: query iterator lack results(#33137) (#33468)
related: #33137
pr: https://github.com/milvus-io/milvus/pull/33422

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2024-05-31 13:54:07 +08:00
Jiquan Long
76b7c23a66
fix: try best to get enough query results (#33177)
issue: https://github.com/milvus-io/milvus/issues/33137
pr: https://github.com/milvus-io/milvus/pull/32567
Co-authored-by: sunby <bingyi.sun@zilliz.com>
Co-authored-by: MrPresent-Han <chun.han@zilliz.com>

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-05-21 14:07:45 +08:00
foxspy
560e167214
fix: add score compute consistency config for knowhere (#32584)
issue: #32583 
/kind branch-feature

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2024-04-25 14:07:25 +08:00
wei liu
3352805afb
fix: should update leader view when segment version not match (#32517)
issue: #31468
pr: #31643

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-04-25 10:45:33 +08:00
wei liu
261bb8fbdb
fix: Update segment's version in syncDistribution (#32320)
issue: #31468
pr: #31643

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-04-17 11:05:22 +08:00
Gao
89d95901c5
enhance: support disable search optimization (#32143)
master pr: https://github.com/milvus-io/milvus/pull/32141
2.4 pr: https://github.com/milvus-io/milvus/pull/32142

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2024-04-17 10:45:20 +08:00
Xiaofan
37e5728229
fix: reduce didn't handle offset without limit and reduceStopForBest … (#32087)
fix #32059
pr: #32089

this pr fix two issues:
1. offset is not handled correctly without specify a limit
2. reduceStopForBest doesn't guarantee to return limit result even if
there are more result when there is small segment

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
2024-04-10 21:20:37 -07:00
congqixia
d18a88a94f
fix: [2.3] Validate PlaceholderGroups before combine them (#32016) (#32045)
Cherry-pick from master
pr: #32016
See also #32015

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-04-10 16:57:26 +08:00
cqy123456
47f767cf32
enhance: remove float16 in 2.3 branch (#31720)
issue: https://github.com/milvus-io/milvus/issues/31696

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-03-30 10:49:13 +08:00
aoiasd
50315282bb
fix: [Cherry-Pick] delegator filter out all partition's delete msg when loading segment (#31587)
May cause deleted data queryable a period of time.
issue : #31484 
pr: https://github.com/milvus-io/milvus/pull/31585

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-03-25 20:33:09 +08:00
aoiasd
7c234f23c3
fix: double buffer was invalid when put entry which size larger than max size (#31549)
relate: https://github.com/milvus-io/milvus/issues/31548

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-03-23 21:09:07 +08:00
Jiquan Long
ab059bb064
enhance: add more metrics (#31271) (#31511)
/kind improvement
pr: #31271 
fix: https://github.com/milvus-io/milvus/issues/31272

This pr add more metrics, which are:

Slow query count, which the duration considered as slow can be
configurable;
Number of deleted entities;
Number of entities per collection;
Number of loaded entities per collection;
Number of indexed entities;
Number of indexed entities, per collection, per index and whether it's a
vetor index;
Quota states (LongTimeTickDelay, MemoryExhuasted, DiskQuotaExhuasted)
per database;

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-03-22 16:11:07 +08:00
groot
1ca7cba222
enhance: Support MinIO TLS connection (#31292)
issue: https://github.com/milvus-io/milvus/issues/30709
master pr: #31311

Signed-off-by: yhmo <yihua.mo@zilliz.com>
Co-authored-by: Chen Rao <chenrao317328@163.com>
2024-03-21 11:15:20 +08:00
jaime
5ddb0b435f
fix: revoke session may be ignored due to server context cancellation in advance (#31213)
issue: #31219
pr: #31220

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-03-14 19:05:04 +08:00
congqixia
53f5a67112
enhance: [Cherry-pick] Fix misleading log content & possible nil panic (#31021) (#31054)
Cherry pick from master
pr: #31021 

- Change load field log from "dy pool" to "load pool"
- Also defer delete when there is no error

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-06 16:09:01 +08:00
congqixia
81b197267a
enhance: [Cherry-Pick] Add back load memory factor when esitmating memory resource (#30999)
Cherry-pick from master
pr: #30994
Segment load memory usage is underestimated due to removing the load
memroy factor. This PR adds it back to protect querynode OOM during some
extreme memory cases.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-05 09:15:00 +08:00
SimFG
ef84d40e54
enhance: [2.3] make the watch dm channel request better compatibility (#30954)
pr: #30952
issue: https://github.com/milvus-io/milvus/issues/30938

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-03-01 16:09:01 +08:00
wei liu
b0c7f8653f
fix: Segment version doesn't update as expected (#30953)
issue: #30950 
pr: #30951

due to segment version doesn't update as expected.
This PR will update segment version until segment become loaded

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-01 14:21:10 +08:00
congqixia
c3f831fce4
fix: [Cherry-pick] Disk resource is not requested for index loaded with disk (#30757) (#30948)
Cherry pick from master
pr: #30757
See also #30756

This PR:
- Request disk resource when index type, version loaded with disk
- Add attribute cache for index utility
- Add `typeutil.Pair`

---------

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-01 13:07:00 +08:00
wei liu
ee705b7ce8
enhance: Correct misleading nodeID in GetComponentStates's log (#30732)
pr: #30731
This PR corrects the misleading nodeId in GetComponentStates's log

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-02-28 13:50:59 +08:00
chyezh
1c8d9fa686
fix: wrong context passing into NewClient, error handling lost in session_util (#30818)
issue: #30799
pr: #30817

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-02-28 10:41:00 +08:00
zhenshan.cao
2f4a13a7ae
enhance: Revert (#30197 #30690 #30415) (#30795)
Revert "enhance: reduce many I/O operations while loading disk index
(#30189) (#30690)" This reverts commit
d4c4bf946b15bc537acd170dfd1d938bea237c7a.

Revert "enhance: limit the max pool size to 16 (#30371) (#30415)" This
reverts commit 52ac0718f059d4aa45c5908ec8507e6045b24e1f.

Revert "enhance: convert the `GetObject` util to async (#30166)
(#30197)" This reverts commit 4b7c5baab773366aa8084762e7321130c4f894b7.

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-02-24 09:07:46 +08:00
Xiaofan
2896f5eb69
enhance: [2.3] change frequent log to debug (#30781)
pr: #30782 
change the "pipeline fetch insert msg" log to debug

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
2024-02-23 14:10:40 +08:00
congqixia
3d8b6a4d2e
fix: [Cherry-pick] Release loaded growing if WatchDmlChannel fail (#30735) (#30745)
Cherry pick from master
pr: #30735
See also #30734

---------

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-02-22 16:44:55 +08:00
congqixia
31f33f67e0
fix: [cherry-pick] Update disk usage metrics after segment released (#30702) (#30707)
Cherry-pick from master
pr: #30702
See also #30701

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-02-21 10:54:53 +08:00
yah01
52ac0718f0
enhance: limit the max pool size to 16 (#30371) (#30415)
according to our benchmark, concurrency level 16 is enough to fully
utilize the object storage network bandwidth
pr: #30371

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-02-20 15:58:52 +08:00
zhagnlu
a209d05537
fix: erase pk empty check when pk index replace raw data (#30432) (#30578)
pr: #30432

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-02-12 08:24:53 +08:00
chyezh
be1bd9615a
enhance: add configurable memory index load predict memory usage factor (#30563)
pr: #30561

related pr: #30475

Signed-off-by: chyezh <chyezh@outlook.com>
2024-02-06 22:00:49 +08:00
congqixia
f2310ab4ce
enhance: [Cherry-pick] Use dynamic pool for NewLoadIndexInfo (#30489) (#30497)
Cherry-pick from master
pr: #30489 
See also #30445

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-02-04 16:39:06 +08:00
yah01
655e235230
enhance: calculate the accuracy memory usage while loading segment (#30473) (#30475)
the old version Knowhere would copy the index data while loading, we
need to consider this to avoid OOM.

Knowhere provides a util function to indicate whether it will load the
index with disk, if not, we need to double the memory usage prediction
for index data

pr: #30473

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-02-03 13:01:12 +08:00
congqixia
69a82acc46
enhance: [Cherry-pick] Set delete scope for LoadSegment streaming data (#30245) (#30367)
Cherry pick from master
pr: #30245
See also #29474

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-02-02 16:05:06 +08:00
chyezh
77e123762f
enhance: add graceful stop timeout to avoid node stop hang under extreme cases (#30320)
1. add coordinator and proxy graceful stop timeout to 5s.
3. add other work node graceful stop timeout to 900s, and we should
potentially change this to 600s when graceful stop is smooth
4. change the order of datacoord component while stop.
5. `LivenessCheck` do not perform graceful shutdown now. 

issue: https://github.com/milvus-io/milvus/issues/30310
pr: #30317
also see: https://github.com/milvus-io/milvus/pull/30306

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-01-27 08:45:02 +08:00
yihao.dai
e0f987ee9b
enhance: Allows proactive warming up of chunk cache (#30182) (#30289)
Allows proactive warming up of chunk cache. Original vector data will be
asynchronously loaded into the chunk cache during the load process. It
has the potential to significantly reduce query/search latency for a
certain duration after the load, albeit with a concurrent increase in
disk usage.

issue: https://github.com/milvus-io/milvus/issues/30181

pr: https://github.com/milvus-io/milvus/pull/30182

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-26 09:57:01 +08:00
congqixia
d182a51653
fix: [Cherry-pick] Use correct pools for all CGO methods in segments pkg (#30275)
Cherry-pick from master
pr: #30274
See also #30273

This PR:
- Rename confusing `LoadIndexInfo` to `UpdateIndexInfo` for LocalSegment
- Use `DynamicPool` instead of `LoadPool` for `UpdateSealedSegmentIndex`
- Fix cgo call missing pool control

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-25 19:49:01 +08:00
congqixia
1a54571c10
enhance: [Cherry-pick] Add trace span for scheduling read tasks in QueryNode (#30266)
Cherry-pick from master
pr: #30265 

This PR adds a trace span for search/query task scheduling duration

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-25 15:39:01 +08:00
congqixia
35e4165722
enhance: [2.3] make Load process traceable in querynode & segcore (#30187)
Cherry-pick from master, modified some files since branching
pr: #29858
See also #29803

This PR:
- Add trace span for LoadIndex & LoadFieldData in segment loader
- Add TraceCtx parameter for Index.Load in segcore
- Add span for ReadFiles & Engine Load for Memory/Disk Vector index

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-23 15:58:57 +08:00
yah01
9bd94c4fab
fix: the system rejects all queries and never recovers if enabled read rate limit (#30061) (#30196)
fix #30060
pr: #30061

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-23 10:37:00 +08:00
yah01
0e71923408
enhance: enable converting segcore error to merr (#29914) (#30178)
this converts the segcore error to merr if possible
pr: #29914

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-22 16:56:55 +08:00
SimFG
be1470a654
enhance: [2.3] Add load/release partitions to replicate msg stream (#30001)
/kind improvement
pr: #28399

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-01-18 22:50:55 +08:00
chyezh
c8e3a48214
fix: querynode num entity metric is broken by illegal label (#29949)
issue: #29766
also see pr: #29825
pr: #29948

Signed-off-by: chyezh <ye.zhen@zilliz.com>
2024-01-14 10:22:59 +08:00
congqixia
227071a754
enhance: [cherry-pick] reduce delete detail log to delete range (#29916) (#29930)
Cherry-pick from master
pr: #29916
Delete detail log will be large and hard to read when log level is
debug. This PR change the log to stringer and print only pk range,
number.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-12 20:18:51 +08:00
congqixia
c21229b7bb
enhance: [cherry-pick] add trace span for wait tsafe (#29911) (#29929)
Cherry-pick from master
pr: #29911 
Add tracing span for search/query operation waiting tsafe duration

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-12 20:17:01 +08:00