7991 Commits

Author SHA1 Message Date
zhagnlu
095c94305c
fix: add GetSegments optimization to avoid meta mutex competition (#31026)
pr: #31025

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-03-05 14:49:01 +08:00
yihao.dai
91d17870d6
enhance: Prevent the backlog of channelCP update tasks, perform batch updates of channelCPs (#30941) (#31024)
This PR includes the following adjustments:

1. To prevent channelCP update task backlog, only one task with the same
vchannel is retained in the updater. Additionally, the lastUpdateTime is
refreshed after the flowgraph submits the update task, rather than in
the callBack function.
2. Batch updates of multiple vchannel checkpoints are performed in the
UpdateChannelCheckpoint RPC (default batch size is 128). Additionally,
the lock for channelCPs in DataCoord meta has been switched from key
lock to global lock.
3. The concurrency of UpdateChannelCheckpoint RPCs in the datanode has
been reduced from 1000 to 10.

issue: https://github.com/milvus-io/milvus/issues/30004

pr: https://github.com/milvus-io/milvus/pull/30941

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-03-05 14:27:01 +08:00
congqixia
b7635ed989
enhance: [Cherry-pick] Change proxy connection manager to concurrent safe (#31009)
Cherry-pick from master
pr: #31008 
See also #31007

This PR:
- Add param item for connection manager behavior: TTL & check interval
- Change clientInfo map to concurrent map

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-05 14:13:00 +08:00
yihao.dai
a5350f64a5
enhance: Reduce the memory usage of the timeTickSender (#30968) (#30991)
In the cache of the timeTickSender, retain only the latest stats instead
of storing stats for every time tick.

issue: https://github.com/milvus-io/milvus/issues/30967

pr: https://github.com/milvus-io/milvus/pull/30968

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-03-05 10:59:01 +08:00
congqixia
81b197267a
enhance: [Cherry-Pick] Add back load memory factor when esitmating memory resource (#30999)
Cherry-pick from master
pr: #30994
Segment load memory usage is underestimated due to removing the load
memroy factor. This PR adds it back to protect querynode OOM during some
extreme memory cases.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-05 09:15:00 +08:00
jaime
336e0ae45e
enhance: index meta use independent rather than global meta lock (#30986)
issue: https://github.com/milvus-io/milvus/issues/30837
pr: #30869

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-03-05 08:48:59 +08:00
chyezh
df09222029
fix: starve lock caused by slow GetCompactionTo method when too much segments (#30965)
issue: #30823
pr: #30963

Signed-off-by: chyezh <chyezh@outlook.com>
2024-03-04 20:51:00 +08:00
XuanYang-cn
bb2de0d964
fix: [cherry-pick] Clear DN unknown compaction tasks (#30972)
If DC restarted,  those unkonwn compaction tasks
will never get call back in DN, so that the segments in the compaction
task will be locked, unable to sync and compaction again, blocking cp
advance and compaction executing.

See also: #30137
pr: #30850

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-03-04 16:52:59 +08:00
wei liu
db49b8524d
fix: Skip generate balance task when target not ready (#30725)
issue: #30723
pr: #30724

This PR skip generate balance task when collection's target isn't ready.
also refine the check stale logic in query coord's scheduler, if channel
exist in current or next target, task won't be canceled.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-04 11:38:59 +08:00
wei liu
af54c3ba85
fix: Make datacoord client retry on index api (#30656)
pr: #30654

This PR add retry on all interface which belong to indexcoord in milvus
2.2 and. move to data coord in milvus 2.3, to prevent meet unimplemented
error during rolling upgrade from milvus 2.2 to 2.3.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-04 11:37:09 +08:00
cai.zhang
38e3d6af3e
enhance: Optimize DescribeIndex to reduce lock contention (#30975)
issue: #29313
issue: #30443
master pr: #30939

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-03-04 11:30:59 +08:00
SimFG
b0569f430b
enhance: [2.3] retry to read when the s3 get the unexpect eof error (#30976)
issue: https://github.com/milvus-io/milvus/issues/30877
pr: #30861

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-03-04 10:42:59 +08:00
PowderLi
c93f127c7d
fix: [cherry-pick] [restful v1] bug list (#30873)
master pr: #30871 issue: #30870
fix: vector field cannot be empty while insert
did a check whether the vector field is empty in advance

master pr: #30740
fix:
1. spelling mistake about metricsType #30643
2. int64 percious #20415
3. insert into collection which has multi vector fields #30674

enhance: support dataType: Float16Vector & BFloat16Vector #22837
#30980(master pr: #30969)
enhance: describe collection will show the field is partition key or not
#30789

---------

Signed-off-by: PowderLi <min.li@zilliz.com>
2024-03-03 17:56:59 +08:00
SimFG
ef84d40e54
enhance: [2.3] make the watch dm channel request better compatibility (#30954)
pr: #30952
issue: https://github.com/milvus-io/milvus/issues/30938

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-03-01 16:09:01 +08:00
wei liu
b0c7f8653f
fix: Segment version doesn't update as expected (#30953)
issue: #30950 
pr: #30951

due to segment version doesn't update as expected.
This PR will update segment version until segment become loaded

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-01 14:21:10 +08:00
congqixia
c3f831fce4
fix: [Cherry-pick] Disk resource is not requested for index loaded with disk (#30757) (#30948)
Cherry pick from master
pr: #30757
See also #30756

This PR:
- Request disk resource when index type, version loaded with disk
- Add attribute cache for index utility
- Add `typeutil.Pair`

---------

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-01 13:07:00 +08:00
chyezh
483a32bced
feat: add collection level flush rate control (#29568)
flush rate control at collection level to avoid generate too much
segment.
0.1 qps by default.

issue: #29477
pr: #29567

Signed-off-by: chyezh <ye.zhen@zilliz.com>
2024-03-01 10:23:01 +08:00
yihao.dai
2f76303989
enhance: Support varchar autoid for bulkinsertV1 (#30896) (#30913)
This PR is a supplement to PR
https://github.com/milvus-io/milvus/pull/30377.

pr: https://github.com/milvus-io/milvus/pull/30896

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-02-29 12:11:00 +08:00
Jiquan Long
b0d8e21445
enhance: optimize the memory usage and speed up loading variable length data (#30787) (#30900)
pr: #30787 
/kind improvement

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-02-29 10:39:00 +08:00
PowderLi
a4219cbb0f
fix: [cherry-pick] set proxy.http.acceptTypeAllowInt64: true as default (#30738)
issue: #30680
pr: #30720

also let the parameter item to be refreshable

Signed-off-by: PowderLi <min.li@zilliz.com>
2024-02-29 09:59:07 +08:00
congqixia
df16bf6acd
fix: [Cherry-pick] Remove time tick delay metrics when nodes go offline (#30833) (#30879)
Cherry-pick from master
pr: #30833
See also #30832

This PR removes time tick delay metrics when rootcoord GetMetrics
response does not have previously existed querynode/datanode

Also add unit tests for this case

---------

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Signed-off-by: Congqi.Xia <congqi.xia@zilliz.com>
2024-02-28 18:55:00 +08:00
Jiquan Long
b10bec38c9
enhance: reduce 1x memory copy when loading json (#30753) (#30864)
/kind improvement
pr: #30753 

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-02-28 16:36:59 +08:00
wei liu
ee705b7ce8
enhance: Correct misleading nodeID in GetComponentStates's log (#30732)
pr: #30731
This PR corrects the misleading nodeId in GetComponentStates's log

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-02-28 13:50:59 +08:00
chyezh
1c8d9fa686
fix: wrong context passing into NewClient, error handling lost in session_util (#30818)
issue: #30799
pr: #30817

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-02-28 10:41:00 +08:00
zhenshan.cao
2f4a13a7ae
enhance: Revert (#30197 #30690 #30415) (#30795)
Revert "enhance: reduce many I/O operations while loading disk index
(#30189) (#30690)" This reverts commit
d4c4bf946b15bc537acd170dfd1d938bea237c7a.

Revert "enhance: limit the max pool size to 16 (#30371) (#30415)" This
reverts commit 52ac0718f059d4aa45c5908ec8507e6045b24e1f.

Revert "enhance: convert the `GetObject` util to async (#30166)
(#30197)" This reverts commit 4b7c5baab773366aa8084762e7321130c4f894b7.

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-02-24 09:07:46 +08:00
Xiaofan
2896f5eb69
enhance: [2.3] change frequent log to debug (#30781)
pr: #30782 
change the "pipeline fetch insert msg" log to debug

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
2024-02-23 14:10:40 +08:00
chyezh
a9625ec1ae
fix: nil ptr is used as nil interface in grpc client (#30755)
issue: #30715
pr: #30754

- Bug: Set nil struct pointer to describe nil interface.
Panic with segment violation when calling method on this nil struct
pointer.

Signed-off-by: chyezh <chyezh@outlook.com>
2024-02-23 10:08:54 +08:00
zhagnlu
e17775a20f
fix: fix upsert using wrong field to compute partition key (#30773)
pr: #30772

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-02-22 23:38:53 +08:00
cai.zhang
ef086dc0ca
fix: [Pick] Skip filling segmentID in indexBuildCh to prevent flush blocked (#30749)
issue: #30580 
master pr: #30747

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-02-22 20:42:56 +08:00
congqixia
3d8b6a4d2e
fix: [Cherry-pick] Release loaded growing if WatchDmlChannel fail (#30735) (#30745)
Cherry pick from master
pr: #30735
See also #30734

---------

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-02-22 16:44:55 +08:00
congqixia
31f33f67e0
fix: [cherry-pick] Update disk usage metrics after segment released (#30702) (#30707)
Cherry-pick from master
pr: #30702
See also #30701

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-02-21 10:54:53 +08:00
cai.zhang
e8e221ca38
[Pick]enhance: Use virtual host for tencent cloud (#30685)
master pr: #30650

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-02-21 09:52:59 +08:00
yah01
d4c4bf946b
enhance: reduce many I/O operations while loading disk index (#30189) (#30690)
before this, every time writting the index chunk data into the disk,
there are 4 I/O operations:
- open the file
- seek to the offset
- write the data
- close the file

this optimized this to open only once and continiously write all data.

This also makes it concurrent to load the files from object storage

pr: #30189

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-02-20 17:40:52 +08:00
congqixia
8734bcc645
fix: [Cherry-pick] Prevent ChunkCache use absolute path in All-in-one mode (#30666) (#30679)
Cherry pick from master
pr: #30666
See also #30651

Append operator of `std::filesystem::path` will replace whole path when
the param of "/" operation is an absolute path.

In "All-in-one" mode, this shall cause ChunkCache removing the original
vector data file when building chunk cache during/after load procedure.

This PR changes the ChunkCache path generation logic to a separate
function in which will check whether the file path is absolute or not.
If the file path is absolute, it removes the root path prefix and return
concatenated file path.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-02-20 16:48:50 +08:00
yah01
52ac0718f0
enhance: limit the max pool size to 16 (#30371) (#30415)
according to our benchmark, concurrency level 16 is enough to fully
utilize the object storage network bandwidth
pr: #30371

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-02-20 15:58:52 +08:00
yah01
4b7c5baab7
enhance: convert the GetObject util to async (#30166) (#30197)
This makes it much easier to use
pr: #30166

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-02-20 11:30:52 +08:00
foxspy
35330ff8ea
enhance: Update Knowhere version (#30640)
/kind branch-feature

Signed-off-by: xianliang <xianliang.li@zilliz.com>
2024-02-18 20:28:52 +08:00
Jiquan Long
26f012c564
fix: Add retry on unimplemented error for datacoord (#30554) (#30639)
issue: #30553
pr: #30554 

when datacoord with version 2.2 and querycoord with version 2.3 coexist
during rolling upgrade, `DescribeIndex/GetIndexInfo` will return
`unimplemented` error
This PR add retry on `DescribeIndex/GetIndexInfo`, to prevent load
collection failed during rolling upgrade from milvus 2.2 to 2.3.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Co-authored-by: wei liu <wei.liu@zilliz.com>
2024-02-18 20:26:59 +08:00
zhenshan.cao
48707f3aac
fix: should return collectionName in response of ListAliases (#30533)
issue : https://github.com/milvus-io/milvus/issues/30369
pr: https://github.com/milvus-io/milvus/pull/30532

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-02-12 08:30:55 +08:00
zhagnlu
a209d05537
fix: erase pk empty check when pk index replace raw data (#30432) (#30578)
pr: #30432

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-02-12 08:24:53 +08:00
chyezh
be1bd9615a
enhance: add configurable memory index load predict memory usage factor (#30563)
pr: #30561

related pr: #30475

Signed-off-by: chyezh <chyezh@outlook.com>
2024-02-06 22:00:49 +08:00
congqixia
8fec7de472
fix: [Cherry-pick] Proxy restful api doesn't register (#30072) (#30559)
Cherry-pick from master
pr: #30072
issue: #30074
This PR fix that management restful api in proxy doesn't register to
http service

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Co-authored-by: wei liu <wei.liu@zilliz.com>
2024-02-06 16:58:33 +08:00
wayblink
b2d3278c56
enhance: Add log when garbage collection resumed (#30536)
/kind enhancement
pr: #30535

Signed-off-by: wayblink <anyang.wang@zilliz.com>
2024-02-05 17:09:53 +08:00
foxspy
88d57f1db9
enhance: Update Knowhere version (#30513)
/kind improvement

Signed-off-by: xianliang <xianliang.li@zilliz.com>
2024-02-04 22:13:07 +08:00
aoiasd
cc2bc3f8f2
enhance: [Cherry-Pick] access log should get get client info by get method (#30503)
https://github.com/milvus-io/milvus/pull/30502

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-02-04 18:57:07 +08:00
congqixia
f2310ab4ce
enhance: [Cherry-pick] Use dynamic pool for NewLoadIndexInfo (#30489) (#30497)
Cherry-pick from master
pr: #30489 
See also #30445

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-02-04 16:39:06 +08:00
aoiasd
ad4a53d225
enhance: [Cherry-Pick] Fix some access log bugs (#30496)
pr: https://github.com/milvus-io/milvus/pull/30409
https://github.com/milvus-io/milvus/pull/29680

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-02-04 16:37:07 +08:00
cai.zhang
3c5ff624f8
fix: [pick]Only use bound indexnodes in bound mode (#30462)
master pr: #30461 
issue: #30463

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-02-03 21:59:05 +08:00
yah01
655e235230
enhance: calculate the accuracy memory usage while loading segment (#30473) (#30475)
the old version Knowhere would copy the index data while loading, we
need to consider this to avoid OOM.

Knowhere provides a util function to indicate whether it will load the
index with disk, if not, we need to double the memory usage prediction
for index data

pr: #30473

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-02-03 13:01:12 +08:00
yihao.dai
20608287b9
fix: Decoupling importing segment from flush process (#30402) (#30439)
This pr decoups importing segment from flush process by:
1. Exclude the importing segment from the flush policy, this approch
avoids notifying the datanode to flush the importing segment, which may
not exist.
2. When RootCoord call Flush, DataCoord directly set the importing
segment state to `Flushed`.

issue: https://github.com/milvus-io/milvus/issues/30359

pr: https://github.com/milvus-io/milvus/pull/30402

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-02-03 12:59:14 +08:00