7959 Commits

Author SHA1 Message Date
yah01
d4c4bf946b
enhance: reduce many I/O operations while loading disk index (#30189) (#30690)
before this, every time writting the index chunk data into the disk,
there are 4 I/O operations:
- open the file
- seek to the offset
- write the data
- close the file

this optimized this to open only once and continiously write all data.

This also makes it concurrent to load the files from object storage

pr: #30189

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-02-20 17:40:52 +08:00
congqixia
8734bcc645
fix: [Cherry-pick] Prevent ChunkCache use absolute path in All-in-one mode (#30666) (#30679)
Cherry pick from master
pr: #30666
See also #30651

Append operator of `std::filesystem::path` will replace whole path when
the param of "/" operation is an absolute path.

In "All-in-one" mode, this shall cause ChunkCache removing the original
vector data file when building chunk cache during/after load procedure.

This PR changes the ChunkCache path generation logic to a separate
function in which will check whether the file path is absolute or not.
If the file path is absolute, it removes the root path prefix and return
concatenated file path.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-02-20 16:48:50 +08:00
yah01
52ac0718f0
enhance: limit the max pool size to 16 (#30371) (#30415)
according to our benchmark, concurrency level 16 is enough to fully
utilize the object storage network bandwidth
pr: #30371

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-02-20 15:58:52 +08:00
yah01
4b7c5baab7
enhance: convert the GetObject util to async (#30166) (#30197)
This makes it much easier to use
pr: #30166

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-02-20 11:30:52 +08:00
foxspy
35330ff8ea
enhance: Update Knowhere version (#30640)
/kind branch-feature

Signed-off-by: xianliang <xianliang.li@zilliz.com>
2024-02-18 20:28:52 +08:00
Jiquan Long
26f012c564
fix: Add retry on unimplemented error for datacoord (#30554) (#30639)
issue: #30553
pr: #30554 

when datacoord with version 2.2 and querycoord with version 2.3 coexist
during rolling upgrade, `DescribeIndex/GetIndexInfo` will return
`unimplemented` error
This PR add retry on `DescribeIndex/GetIndexInfo`, to prevent load
collection failed during rolling upgrade from milvus 2.2 to 2.3.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Co-authored-by: wei liu <wei.liu@zilliz.com>
2024-02-18 20:26:59 +08:00
zhenshan.cao
48707f3aac
fix: should return collectionName in response of ListAliases (#30533)
issue : https://github.com/milvus-io/milvus/issues/30369
pr: https://github.com/milvus-io/milvus/pull/30532

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-02-12 08:30:55 +08:00
zhagnlu
a209d05537
fix: erase pk empty check when pk index replace raw data (#30432) (#30578)
pr: #30432

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-02-12 08:24:53 +08:00
chyezh
be1bd9615a
enhance: add configurable memory index load predict memory usage factor (#30563)
pr: #30561

related pr: #30475

Signed-off-by: chyezh <chyezh@outlook.com>
2024-02-06 22:00:49 +08:00
congqixia
8fec7de472
fix: [Cherry-pick] Proxy restful api doesn't register (#30072) (#30559)
Cherry-pick from master
pr: #30072
issue: #30074
This PR fix that management restful api in proxy doesn't register to
http service

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Co-authored-by: wei liu <wei.liu@zilliz.com>
2024-02-06 16:58:33 +08:00
wayblink
b2d3278c56
enhance: Add log when garbage collection resumed (#30536)
/kind enhancement
pr: #30535

Signed-off-by: wayblink <anyang.wang@zilliz.com>
2024-02-05 17:09:53 +08:00
foxspy
88d57f1db9
enhance: Update Knowhere version (#30513)
/kind improvement

Signed-off-by: xianliang <xianliang.li@zilliz.com>
2024-02-04 22:13:07 +08:00
aoiasd
cc2bc3f8f2
enhance: [Cherry-Pick] access log should get get client info by get method (#30503)
https://github.com/milvus-io/milvus/pull/30502

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-02-04 18:57:07 +08:00
congqixia
f2310ab4ce
enhance: [Cherry-pick] Use dynamic pool for NewLoadIndexInfo (#30489) (#30497)
Cherry-pick from master
pr: #30489 
See also #30445

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-02-04 16:39:06 +08:00
aoiasd
ad4a53d225
enhance: [Cherry-Pick] Fix some access log bugs (#30496)
pr: https://github.com/milvus-io/milvus/pull/30409
https://github.com/milvus-io/milvus/pull/29680

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-02-04 16:37:07 +08:00
cai.zhang
3c5ff624f8
fix: [pick]Only use bound indexnodes in bound mode (#30462)
master pr: #30461 
issue: #30463

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-02-03 21:59:05 +08:00
yah01
655e235230
enhance: calculate the accuracy memory usage while loading segment (#30473) (#30475)
the old version Knowhere would copy the index data while loading, we
need to consider this to avoid OOM.

Knowhere provides a util function to indicate whether it will load the
index with disk, if not, we need to double the memory usage prediction
for index data

pr: #30473

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-02-03 13:01:12 +08:00
yihao.dai
20608287b9
fix: Decoupling importing segment from flush process (#30402) (#30439)
This pr decoups importing segment from flush process by:
1. Exclude the importing segment from the flush policy, this approch
avoids notifying the datanode to flush the importing segment, which may
not exist.
2. When RootCoord call Flush, DataCoord directly set the importing
segment state to `Flushed`.

issue: https://github.com/milvus-io/milvus/issues/30359

pr: https://github.com/milvus-io/milvus/pull/30402

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-02-03 12:59:14 +08:00
yah01
f50799b7fd
fix: proxy may never setup if the port binded (#30035) (#30416)
the proxy miss-returned nil while failed to listen the port, then the
server continues to run but we can't connect to service resolve #30034
pr: #30035

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-02-02 16:21:06 +08:00
smellthemoon
692dcebac6
enhance: support varchar autoid when bulkinsert(#30377) (#30448)
related pr: #30377

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-02-02 16:11:08 +08:00
congqixia
69a82acc46
enhance: [Cherry-pick] Set delete scope for LoadSegment streaming data (#30245) (#30367)
Cherry pick from master
pr: #30245
See also #29474

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-02-02 16:05:06 +08:00
SimFG
73df0b872e
fix: [2.3] add more requests to the database interceptor (#30453)
issue: https://github.com/milvus-io/milvus/issues/30368
pr: #30452

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-02-02 16:03:06 +08:00
cqy123456
3036c19867
fix: can't not get search_cache_budget_gb in create index (#30353)
issue:https://github.com/milvus-io/milvus/issues/30375
pr: https://github.com/milvus-io/milvus/pull/30119

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-01-31 15:49:03 +08:00
yah01
028721db25
enhance: optimize the loading strategy (#29910) (#30348)
as we have the pool size limit so we don't need to limit the concurrency
manually
pr: #29910

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-31 15:25:04 +08:00
chyezh
3e994242d6
fix: panic with datanode negetive wait group counter (#30136)
issue: #29170
pr: #30135

Signed-off-by: chyezh <chyezh@outlook.com>
2024-01-30 18:07:03 +08:00
chyezh
21c944beaa
enhance: add basic information of milvus into metrics (#29666)
add basic build information and runtime component dependency into
metrics.

issue: #29664
pr: #29665

Signed-off-by: chyezh <ye.zhen@zilliz.com>
2024-01-29 15:49:04 +08:00
xige-16
9ab2ce0767
enhance: [Cherry-pick] Opt vector dimension mismatch error message (#30316)
Cherry-pick from master
pr: https://github.com/milvus-io/milvus/pull/29928

Signed-off-by: xige-16 <xi.ge@zilliz.com>

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2024-01-29 14:47:03 +08:00
chyezh
77e123762f
enhance: add graceful stop timeout to avoid node stop hang under extreme cases (#30320)
1. add coordinator and proxy graceful stop timeout to 5s.
3. add other work node graceful stop timeout to 900s, and we should
potentially change this to 600s when graceful stop is smooth
4. change the order of datacoord component while stop.
5. `LivenessCheck` do not perform graceful shutdown now. 

issue: https://github.com/milvus-io/milvus/issues/30310
pr: #30317
also see: https://github.com/milvus-io/milvus/pull/30306

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-01-27 08:45:02 +08:00
yihao.dai
e0f987ee9b
enhance: Allows proactive warming up of chunk cache (#30182) (#30289)
Allows proactive warming up of chunk cache. Original vector data will be
asynchronously loaded into the chunk cache during the load process. It
has the potential to significantly reduce query/search latency for a
certain duration after the load, albeit with a concurrent increase in
disk usage.

issue: https://github.com/milvus-io/milvus/issues/30181

pr: https://github.com/milvus-io/milvus/pull/30182

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-26 09:57:01 +08:00
Bingyi Sun
2c4d0605ef
enhance: add a weight for growing row count when balancing segments (#30293)
Cherry-pick from master
pr: #30271

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-01-26 09:17:03 +08:00
congqixia
d182a51653
fix: [Cherry-pick] Use correct pools for all CGO methods in segments pkg (#30275)
Cherry-pick from master
pr: #30274
See also #30273

This PR:
- Rename confusing `LoadIndexInfo` to `UpdateIndexInfo` for LocalSegment
- Use `DynamicPool` instead of `LoadPool` for `UpdateSealedSegmentIndex`
- Fix cgo call missing pool control

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-25 19:49:01 +08:00
congqixia
1a54571c10
enhance: [Cherry-pick] Add trace span for scheduling read tasks in QueryNode (#30266)
Cherry-pick from master
pr: #30265 

This PR adds a trace span for search/query task scheduling duration

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-25 15:39:01 +08:00
congqixia
9e8eb2aa51
fix: Revert leader checker related check (#30262)
See also #30150
PR reverted: #29984 #30152

Currently this scenario could not be covered by ut/it/e2e test cases
Revert it for now

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-25 12:39:02 +08:00
congqixia
e3114b6a4d
enhance: [2.3] Utilize partition key optimization in reQuery (#30255)
Partial cherry-pick from master due to code branching
pr: #30253 
See also #30250

This PR add requery flag in query task. When reQuery flag is true, query
task shall skip partition name conversion and use pre-calculated
partitionIDs passed from search task.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-25 11:05:07 +08:00
SimFG
95cd6f20d0
fix: [2.3] wrong format expr for the delete rest api (#30218)
/kind improvement
issue: https://github.com/milvus-io/milvus/issues/30092
pr: #30217

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-01-24 11:27:05 +08:00
cai.zhang
efea282111
feat: [Pick] Support tencent cloud object storage for milvus (#30210)
issue: #30162 
master pr: #30163

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-01-23 16:07:01 +08:00
congqixia
35e4165722
enhance: [2.3] make Load process traceable in querynode & segcore (#30187)
Cherry-pick from master, modified some files since branching
pr: #29858
See also #29803

This PR:
- Add trace span for LoadIndex & LoadFieldData in segment loader
- Add TraceCtx parameter for Index.Load in segcore
- Add span for ReadFiles & Engine Load for Memory/Disk Vector index

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-23 15:58:57 +08:00
yah01
4d0a6dbc25
fix: written file size is over the int32 range and raises error (#30057) (#30207)
we sum the total data size in int32, which could lead to an overflow
error
related #30056

pr: #30057

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-23 13:50:56 +08:00
yah01
9bd94c4fab
fix: the system rejects all queries and never recovers if enabled read rate limit (#30061) (#30196)
fix #30060
pr: #30061

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-23 10:37:00 +08:00
yah01
0e71923408
enhance: enable converting segcore error to merr (#29914) (#30178)
this converts the segcore error to merr if possible
pr: #29914

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-22 16:56:55 +08:00
yah01
c8a129756f
enhance: filter out the not needed collections while listing (#29690) (#30180)
this improves performance while many collections exist resolve #29631
pr: #29690

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-22 16:52:55 +08:00
MrPresent-Han
6aaccdd5f4
feat: support general capacity restrict for cloud-side resoure contro… (#30017)
related: #29844
pr: #https://github.com/milvus-io/milvus/pull/29845

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2024-01-22 16:18:56 +08:00
SimFG
2465d86138
enhance: [2.3] support related privilege for grant api (#30154)
/kind improvement
pr: #30153

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-01-22 14:42:55 +08:00
yah01
ce318f3286
enhance: make the error of parsing expression to ParameterInvalid (#29681) (#29795)
before this, the error is unexpected error
pr: #29681

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-22 13:36:55 +08:00
yihao.dai
917a4d74f3
fix: Use channel cp as the dml&start position for import segments (#30107) (#30133)
This PR discontinuing the subscription to the mq and, instead, employing
the channel checkpoint as the DML and starting position for the import
segments.

issue: https://github.com/milvus-io/milvus/issues/30106

pr: https://github.com/milvus-io/milvus/pull/30107

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-22 13:32:55 +08:00
yah01
a8d9b0ccba
enhance: optimize the loading index performance (#29894) (#30018)
this utilizes concurrent loading
pr: #29894

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-22 13:12:56 +08:00
congqixia
bac1a1355b
fix: [Cherry-pick] collection properties not saved for alter collection (#30145) (#30156)
Cherry-pick from master
pr: #30145
Resolves: #30144

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-22 10:08:55 +08:00
yihao.dai
b95f0cc0a1
enhance: Add a counter monitoring for the rate-limit requests (#30109) (#30132)
Add a counter monitoring metric for the ratelimited rpc requests with
labels: proxy nodeID, rpc request type, and state.

issue: https://github.com/milvus-io/milvus/issues/30052

pr: https://github.com/milvus-io/milvus/pull/30109

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-21 14:44:59 +08:00
PowderLi
3dc2585d9b
enhance: support dataType: array & json (#30077)
issue: #30075 
master pr: #30076

deal with the array<?> field data correctly

Signed-off-by: PowderLi <min.li@zilliz.com>
2024-01-21 14:00:56 +08:00
wei liu
b2997eb881
fix: Leader checker can't remove segment from leader view (#30152)
issue: #30150
pr: #30151

This PR fix three problems:

1. the load request generated by leader checker doesn't set load scope
2. leader checker use wrong node id when generate release task, which
cause the release task finished immediately
3. the release request generated by leader_checker doesn't set the force
flag, the operation to clean leader view on delegator will fail.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-20 18:58:58 +08:00