472 Commits

Author SHA1 Message Date
Jiquan Long
ab059bb064
enhance: add more metrics (#31271) (#31511)
/kind improvement
pr: #31271 
fix: https://github.com/milvus-io/milvus/issues/31272

This pr add more metrics, which are:

Slow query count, which the duration considered as slow can be
configurable;
Number of deleted entities;
Number of entities per collection;
Number of loaded entities per collection;
Number of indexed entities;
Number of indexed entities, per collection, per index and whether it's a
vetor index;
Quota states (LongTimeTickDelay, MemoryExhuasted, DiskQuotaExhuasted)
per database;

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-03-22 16:11:07 +08:00
wei liu
c8658d17f8
fix: Grpcclient return unrecoverable error (#31256) (#31452)
issue: #31222
pr: #31256

grpcclient's `call` func return a unrecoverable error, then the caller's
retry policy also breaks due to this unrecoverable error.

This PR introduce `retry.Handle`, the new func use `func() (bool,
error)` as input parameters, which return `shouldRetry` directly, to
avoid grpcclient return a unrecoverable error

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-21 11:59:12 +08:00
groot
1ca7cba222
enhance: Support MinIO TLS connection (#31292)
issue: https://github.com/milvus-io/milvus/issues/30709
master pr: #31311

Signed-off-by: yhmo <yihua.mo@zilliz.com>
Co-authored-by: Chen Rao <chenrao317328@163.com>
2024-03-21 11:15:20 +08:00
congqixia
94f3aec80a
enhance: [Cherry-pick] Add metrics for querycoord current target cp lag (#31391) (#31463)
Cherry-pick from master
pr: #31391 #31399
See also #31390

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-21 10:17:07 +08:00
congqixia
86e347a1a4
enhance: [2.3] Cache formatted key for param item (#31388) (#31402)
Cherry-pick from master
pr: #31388 
See also #30806

`formatKey` may cost lots of CPU on string processing under high QPS
scenario, this PR adds a formattedKeys cache preventing string operation
in each param get value.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-19 19:25:10 +08:00
congqixia
4e48a4de0e
enhance: Bump milvus & proto version to v2.3.12 (#31193)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-14 19:09:04 +08:00
wei liu
9d712f4dd4
fix: Balance param use duplicated key (#31112) (#31141)
pr: #31112
issue: #31115
This PR fix balance check interval  param use duplicated key

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-11 15:03:02 +08:00
Jiquan Long
c37b7792f4
enhance: purge client infos periodically (#31037) (#31092)
https://github.com/milvus-io/milvus/issues/31007
pr: #31037 

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-03-08 10:17:01 +08:00
congqixia
6b5e19f6b7
enhance: Bump milvus & proto version to v2.3.11 (#31035)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-05 17:15:00 +08:00
yihao.dai
91d17870d6
enhance: Prevent the backlog of channelCP update tasks, perform batch updates of channelCPs (#30941) (#31024)
This PR includes the following adjustments:

1. To prevent channelCP update task backlog, only one task with the same
vchannel is retained in the updater. Additionally, the lastUpdateTime is
refreshed after the flowgraph submits the update task, rather than in
the callBack function.
2. Batch updates of multiple vchannel checkpoints are performed in the
UpdateChannelCheckpoint RPC (default batch size is 128). Additionally,
the lock for channelCPs in DataCoord meta has been switched from key
lock to global lock.
3. The concurrency of UpdateChannelCheckpoint RPCs in the datanode has
been reduced from 1000 to 10.

issue: https://github.com/milvus-io/milvus/issues/30004

pr: https://github.com/milvus-io/milvus/pull/30941

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-03-05 14:27:01 +08:00
congqixia
b7635ed989
enhance: [Cherry-pick] Change proxy connection manager to concurrent safe (#31009)
Cherry-pick from master
pr: #31008 
See also #31007

This PR:
- Add param item for connection manager behavior: TTL & check interval
- Change clientInfo map to concurrent map

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-05 14:13:00 +08:00
SimFG
b0569f430b
enhance: [2.3] retry to read when the s3 get the unexpect eof error (#30976)
issue: https://github.com/milvus-io/milvus/issues/30877
pr: #30861

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-03-04 10:42:59 +08:00
groot
5b695d7e86
fix: Clean kafka default configuration (#30925)
issue: #30917
pr: #30924

Signed-off-by: yhmo <yihua.mo@zilliz.com>
2024-03-01 18:15:29 +08:00
congqixia
430e10c8e2
fix: [Cherry-pick] Use localStorage path to check disk cap (#30944) (#30966)
Cherry-pick from master
pr: #30944
See also #30943

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-01 15:11:01 +08:00
congqixia
c3f831fce4
fix: [Cherry-pick] Disk resource is not requested for index loaded with disk (#30757) (#30948)
Cherry pick from master
pr: #30757
See also #30756

This PR:
- Request disk resource when index type, version loaded with disk
- Add attribute cache for index utility
- Add `typeutil.Pair`

---------

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-01 13:07:00 +08:00
chyezh
483a32bced
feat: add collection level flush rate control (#29568)
flush rate control at collection level to avoid generate too much
segment.
0.1 qps by default.

issue: #29477
pr: #29567

Signed-off-by: chyezh <ye.zhen@zilliz.com>
2024-03-01 10:23:01 +08:00
PowderLi
a4219cbb0f
fix: [cherry-pick] set proxy.http.acceptTypeAllowInt64: true as default (#30738)
issue: #30680
pr: #30720

also let the parameter item to be refreshable

Signed-off-by: PowderLi <min.li@zilliz.com>
2024-02-29 09:59:07 +08:00
congqixia
df16bf6acd
fix: [Cherry-pick] Remove time tick delay metrics when nodes go offline (#30833) (#30879)
Cherry-pick from master
pr: #30833
See also #30832

This PR removes time tick delay metrics when rootcoord GetMetrics
response does not have previously existed querynode/datanode

Also add unit tests for this case

---------

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Signed-off-by: Congqi.Xia <congqi.xia@zilliz.com>
2024-02-28 18:55:00 +08:00
groot
2009c3c783
fix: Support TLS for kafka connection (#30466)
issue: https://github.com/milvus-io/milvus/discussions/27977
pr: #30468 

Add extra configurations in milvus.yaml to pass certificates for kafka.

Signed-off-by: yhmo <yihua.mo@zilliz.com>
2024-02-28 18:43:07 +08:00
congqixia
e389909547
enhance: Bump milvus version to 2.3.10 (#30776)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-02-23 13:48:53 +08:00
congqixia
3a1338436a
enhance: Bump milvus version to v2.3.9 (#30635)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-02-18 17:48:49 +08:00
chyezh
be1bd9615a
enhance: add configurable memory index load predict memory usage factor (#30563)
pr: #30561

related pr: #30475

Signed-off-by: chyezh <chyezh@outlook.com>
2024-02-06 22:00:49 +08:00
jaime
7e7722ed43
enhance: [skip e2e] set logrus log level to reduce output error logs (#30478)
issue: #30295

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-02-04 17:23:06 +08:00
congqixia
cea5396c33
enhance: Bump milvus & milvus-proto version to v2.3.8 (#30492)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-02-04 12:57:05 +08:00
cqy123456
3036c19867
fix: can't not get search_cache_budget_gb in create index (#30353)
issue:https://github.com/milvus-io/milvus/issues/30375
pr: https://github.com/milvus-io/milvus/pull/30119

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-01-31 15:49:03 +08:00
chyezh
21c944beaa
enhance: add basic information of milvus into metrics (#29666)
add basic build information and runtime component dependency into
metrics.

issue: #29664
pr: #29665

Signed-off-by: chyezh <ye.zhen@zilliz.com>
2024-01-29 15:49:04 +08:00
chyezh
77e123762f
enhance: add graceful stop timeout to avoid node stop hang under extreme cases (#30320)
1. add coordinator and proxy graceful stop timeout to 5s.
3. add other work node graceful stop timeout to 900s, and we should
potentially change this to 600s when graceful stop is smooth
4. change the order of datacoord component while stop.
5. `LivenessCheck` do not perform graceful shutdown now. 

issue: https://github.com/milvus-io/milvus/issues/30310
pr: #30317
also see: https://github.com/milvus-io/milvus/pull/30306

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-01-27 08:45:02 +08:00
jaime
650dcc512e
fix: dead lock while getting configs (#30319)
issue: https://github.com/milvus-io/milvus/issues/30295
pr: #30318

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-01-26 20:15:01 +08:00
congqixia
26df754514
enhance: Bump milvus version to 2.3.7 (#30297)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-26 11:07:02 +08:00
yihao.dai
e0f987ee9b
enhance: Allows proactive warming up of chunk cache (#30182) (#30289)
Allows proactive warming up of chunk cache. Original vector data will be
asynchronously loaded into the chunk cache during the load process. It
has the potential to significantly reduce query/search latency for a
certain duration after the load, albeit with a concurrent increase in
disk usage.

issue: https://github.com/milvus-io/milvus/issues/30181

pr: https://github.com/milvus-io/milvus/pull/30182

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-26 09:57:01 +08:00
Bingyi Sun
2c4d0605ef
enhance: add a weight for growing row count when balancing segments (#30293)
Cherry-pick from master
pr: #30271

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-01-26 09:17:03 +08:00
congqixia
565d37ced8
enhance: Bump version to 2.3.6 (#30184)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-22 19:04:55 +08:00
yah01
0e71923408
enhance: enable converting segcore error to merr (#29914) (#30178)
this converts the segcore error to merr if possible
pr: #29914

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-22 16:56:55 +08:00
yah01
1cc5a613d5
enhance: adjust the GPU pool size (#29937) (#30177)
according to benchmark, the GPU pool size with 6 performs best
pr: #29937

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-22 16:55:04 +08:00
MrPresent-Han
6aaccdd5f4
feat: support general capacity restrict for cloud-side resoure contro… (#30017)
related: #29844
pr: #https://github.com/milvus-io/milvus/pull/29845

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2024-01-22 16:18:56 +08:00
SimFG
2465d86138
enhance: [2.3] support related privilege for grant api (#30154)
/kind improvement
pr: #30153

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-01-22 14:42:55 +08:00
yihao.dai
b95f0cc0a1
enhance: Add a counter monitoring for the rate-limit requests (#30109) (#30132)
Add a counter monitoring metric for the ratelimited rpc requests with
labels: proxy nodeID, rpc request type, and state.

issue: https://github.com/milvus-io/milvus/issues/30052

pr: https://github.com/milvus-io/milvus/pull/30109

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-21 14:44:59 +08:00
foxspy
0700434c58
fix: patching search cache param when index meta does not hold one (#30116)
patch search cache param from index configs when index meta could not
get the search cache size key

issue: #30113 
pr: #30119

Signed-off-by: xianliang <xianliang.li@zilliz.com>
2024-01-19 11:50:56 +08:00
SimFG
be1470a654
enhance: [2.3] Add load/release partitions to replicate msg stream (#30001)
/kind improvement
pr: #28399

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-01-18 22:50:55 +08:00
congqixia
7f32576f36
enhance: [cherry-pick] replace magic number with ParamItem for dist handler (#30020) (#30070)
Cherry-pick from master
pr: #30020
See also #28817

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-18 15:58:54 +08:00
wei liu
0447ef5df3
fix: Unexpected rpc msg size limit (#29682) (#29983)
pr: #29682
due to `clientMaxSendSize` and `serverMaxRecvSize` will limit the rpc
request size limit, they should use same config value, and
`serverMaxSendSize` and `clientMaxRecvSize` will limit the rpc response
size limit, they should use same config value too.

This PR fix unexpected rpc msg limit which caused by the wrong usage of
misunderstanding rpc config items

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-16 11:18:52 +08:00
XuanYang-cn
962b3ea5fa
fix: [cherry-pick]Remove logging data when logging skip msg (#29708)
See also: #29696
pr: #29707

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-01-15 17:46:53 +08:00
zhenshan.cao
a689ea3228
feat: Add RBAC functionality to alias (#29885) (#29947)
issue: https://github.com/milvus-io/milvus/issues/29781
issue: https://github.com/milvus-io/milvus-proto/issues/237
pr : https://github.com/milvus-io/milvus/pull/29885

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-01-12 23:36:52 +08:00
jaime
fb956536b9
fix: remove checking if running inside container (#29941)
issue: https://github.com/milvus-io/milvus/issues/29846
pr: #29940

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-01-12 21:12:52 +08:00
congqixia
6d8146a09a
enhance: bump milvus & proto version to 2.3.5 (#29946)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-12 20:54:51 +08:00
wayblink
e1446da83c
feat: [Cherry-pick] Implement DescribeAlias and ListAliases interfaces (#29896)
#22882
pr: #29641

Signed-off-by: wayblink <anyang.wang@zilliz.com>
2024-01-12 16:30:51 +08:00
wei liu
5520bfbb05
enhance: Change some frequency log to rated level (#29720) (#29903)
pr: #29720
This PR change some frequency log to rated level

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-12 11:46:52 +08:00
jaime
c0b711e9fb
enhance: Support read hardware metrics for cgroupv2 (#29847)
issue: #29846
pr: #29850

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-01-11 19:20:57 +08:00
congqixia
cd93954214
enhance: [Cherry-pick] pre-allocate result FieldData space to reduce growslice (#29726) (#29866)
Cherry-pick from master
pr: #29726

See also: #29113

Add a new utitliy function in `pkg/util/typetuil` to pre-allocate field
data slice capacity acoording to search limit. This shall avoid copying
the data during `AppendFieldData` when previous slice is out of space.
And shall also save CPU time during high paylog.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-11 17:59:01 +08:00
wei liu
322e9f39a3
fix: Remove Unnecessary lock in config manager (#29855)
issue: #29709 #291712
pr: #29836
to avoid concurrent recursive RLock and Lock cause deadlock, This PR
remove the unnecessary lock in config manager

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-11 15:01:01 +08:00