7935 Commits

Author SHA1 Message Date
chyezh
3e994242d6
fix: panic with datanode negetive wait group counter (#30136)
issue: #29170
pr: #30135

Signed-off-by: chyezh <chyezh@outlook.com>
2024-01-30 18:07:03 +08:00
chyezh
21c944beaa
enhance: add basic information of milvus into metrics (#29666)
add basic build information and runtime component dependency into
metrics.

issue: #29664
pr: #29665

Signed-off-by: chyezh <ye.zhen@zilliz.com>
2024-01-29 15:49:04 +08:00
xige-16
9ab2ce0767
enhance: [Cherry-pick] Opt vector dimension mismatch error message (#30316)
Cherry-pick from master
pr: https://github.com/milvus-io/milvus/pull/29928

Signed-off-by: xige-16 <xi.ge@zilliz.com>

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2024-01-29 14:47:03 +08:00
chyezh
77e123762f
enhance: add graceful stop timeout to avoid node stop hang under extreme cases (#30320)
1. add coordinator and proxy graceful stop timeout to 5s.
3. add other work node graceful stop timeout to 900s, and we should
potentially change this to 600s when graceful stop is smooth
4. change the order of datacoord component while stop.
5. `LivenessCheck` do not perform graceful shutdown now. 

issue: https://github.com/milvus-io/milvus/issues/30310
pr: #30317
also see: https://github.com/milvus-io/milvus/pull/30306

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-01-27 08:45:02 +08:00
yihao.dai
e0f987ee9b
enhance: Allows proactive warming up of chunk cache (#30182) (#30289)
Allows proactive warming up of chunk cache. Original vector data will be
asynchronously loaded into the chunk cache during the load process. It
has the potential to significantly reduce query/search latency for a
certain duration after the load, albeit with a concurrent increase in
disk usage.

issue: https://github.com/milvus-io/milvus/issues/30181

pr: https://github.com/milvus-io/milvus/pull/30182

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-26 09:57:01 +08:00
Bingyi Sun
2c4d0605ef
enhance: add a weight for growing row count when balancing segments (#30293)
Cherry-pick from master
pr: #30271

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-01-26 09:17:03 +08:00
congqixia
d182a51653
fix: [Cherry-pick] Use correct pools for all CGO methods in segments pkg (#30275)
Cherry-pick from master
pr: #30274
See also #30273

This PR:
- Rename confusing `LoadIndexInfo` to `UpdateIndexInfo` for LocalSegment
- Use `DynamicPool` instead of `LoadPool` for `UpdateSealedSegmentIndex`
- Fix cgo call missing pool control

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-25 19:49:01 +08:00
congqixia
1a54571c10
enhance: [Cherry-pick] Add trace span for scheduling read tasks in QueryNode (#30266)
Cherry-pick from master
pr: #30265 

This PR adds a trace span for search/query task scheduling duration

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-25 15:39:01 +08:00
congqixia
9e8eb2aa51
fix: Revert leader checker related check (#30262)
See also #30150
PR reverted: #29984 #30152

Currently this scenario could not be covered by ut/it/e2e test cases
Revert it for now

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-25 12:39:02 +08:00
congqixia
e3114b6a4d
enhance: [2.3] Utilize partition key optimization in reQuery (#30255)
Partial cherry-pick from master due to code branching
pr: #30253 
See also #30250

This PR add requery flag in query task. When reQuery flag is true, query
task shall skip partition name conversion and use pre-calculated
partitionIDs passed from search task.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-25 11:05:07 +08:00
SimFG
95cd6f20d0
fix: [2.3] wrong format expr for the delete rest api (#30218)
/kind improvement
issue: https://github.com/milvus-io/milvus/issues/30092
pr: #30217

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-01-24 11:27:05 +08:00
cai.zhang
efea282111
feat: [Pick] Support tencent cloud object storage for milvus (#30210)
issue: #30162 
master pr: #30163

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-01-23 16:07:01 +08:00
congqixia
35e4165722
enhance: [2.3] make Load process traceable in querynode & segcore (#30187)
Cherry-pick from master, modified some files since branching
pr: #29858
See also #29803

This PR:
- Add trace span for LoadIndex & LoadFieldData in segment loader
- Add TraceCtx parameter for Index.Load in segcore
- Add span for ReadFiles & Engine Load for Memory/Disk Vector index

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-23 15:58:57 +08:00
yah01
4d0a6dbc25
fix: written file size is over the int32 range and raises error (#30057) (#30207)
we sum the total data size in int32, which could lead to an overflow
error
related #30056

pr: #30057

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-23 13:50:56 +08:00
yah01
9bd94c4fab
fix: the system rejects all queries and never recovers if enabled read rate limit (#30061) (#30196)
fix #30060
pr: #30061

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-23 10:37:00 +08:00
yah01
0e71923408
enhance: enable converting segcore error to merr (#29914) (#30178)
this converts the segcore error to merr if possible
pr: #29914

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-22 16:56:55 +08:00
yah01
c8a129756f
enhance: filter out the not needed collections while listing (#29690) (#30180)
this improves performance while many collections exist resolve #29631
pr: #29690

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-22 16:52:55 +08:00
MrPresent-Han
6aaccdd5f4
feat: support general capacity restrict for cloud-side resoure contro… (#30017)
related: #29844
pr: #https://github.com/milvus-io/milvus/pull/29845

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2024-01-22 16:18:56 +08:00
SimFG
2465d86138
enhance: [2.3] support related privilege for grant api (#30154)
/kind improvement
pr: #30153

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-01-22 14:42:55 +08:00
yah01
ce318f3286
enhance: make the error of parsing expression to ParameterInvalid (#29681) (#29795)
before this, the error is unexpected error
pr: #29681

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-22 13:36:55 +08:00
yihao.dai
917a4d74f3
fix: Use channel cp as the dml&start position for import segments (#30107) (#30133)
This PR discontinuing the subscription to the mq and, instead, employing
the channel checkpoint as the DML and starting position for the import
segments.

issue: https://github.com/milvus-io/milvus/issues/30106

pr: https://github.com/milvus-io/milvus/pull/30107

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-22 13:32:55 +08:00
yah01
a8d9b0ccba
enhance: optimize the loading index performance (#29894) (#30018)
this utilizes concurrent loading
pr: #29894

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-22 13:12:56 +08:00
congqixia
bac1a1355b
fix: [Cherry-pick] collection properties not saved for alter collection (#30145) (#30156)
Cherry-pick from master
pr: #30145
Resolves: #30144

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-22 10:08:55 +08:00
yihao.dai
b95f0cc0a1
enhance: Add a counter monitoring for the rate-limit requests (#30109) (#30132)
Add a counter monitoring metric for the ratelimited rpc requests with
labels: proxy nodeID, rpc request type, and state.

issue: https://github.com/milvus-io/milvus/issues/30052

pr: https://github.com/milvus-io/milvus/pull/30109

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-21 14:44:59 +08:00
PowderLi
3dc2585d9b
enhance: support dataType: array & json (#30077)
issue: #30075 
master pr: #30076

deal with the array<?> field data correctly

Signed-off-by: PowderLi <min.li@zilliz.com>
2024-01-21 14:00:56 +08:00
wei liu
b2997eb881
fix: Leader checker can't remove segment from leader view (#30152)
issue: #30150
pr: #30151

This PR fix three problems:

1. the load request generated by leader checker doesn't set load scope
2. leader checker use wrong node id when generate release task, which
cause the release task finished immediately
3. the release request generated by leader_checker doesn't set the force
flag, the operation to clean leader view on delegator will fail.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-20 18:58:58 +08:00
congqixia
079ddbfc01
enhance: [Cherry-pick] Shuffle candidates before channel assignment (#30066) (#30089)
Cherry-pick from master
pr: #30066

Shuffle candidates to reduce scenario that some channel allocated into
same node

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-19 12:06:54 +08:00
foxspy
0700434c58
fix: patching search cache param when index meta does not hold one (#30116)
patch search cache param from index configs when index meta could not
get the search cache size key

issue: #30113 
pr: #30119

Signed-off-by: xianliang <xianliang.li@zilliz.com>
2024-01-19 11:50:56 +08:00
SimFG
be1470a654
enhance: [2.3] Add load/release partitions to replicate msg stream (#30001)
/kind improvement
pr: #28399

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-01-18 22:50:55 +08:00
wei liu
71e24f0a7f
fix: Remove heartbeat lag logic during get shard leaders (#29999) (#30085)
issue: #29677 #29838
pr: #29999
during get shard leaders, if qeurynode doesn't ack the heartbeat than
10s, querycoord will treat it as unavailable, and won't return shard
leader on it. but when querynode has a full cpu usage, it's easily to
stuck for more than 10s without ack the heartbeat, which cause no shard
leader to search/query.

This PR remove heartbeat lag logic during get shard leaders

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-18 17:48:55 +08:00
congqixia
7f32576f36
enhance: [cherry-pick] replace magic number with ParamItem for dist handler (#30020) (#30070)
Cherry-pick from master
pr: #30020
See also #28817

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-18 15:58:54 +08:00
wei liu
7d73032582
enhance: refactor leader_observer to leader_checker (#29454) (#29984)
issue: #29453
pr: #29452
sync distribution by rpc will also call loadSegment/releaseSegment,
which may cause all kinds of concurrent case on same segment, such as
concurrent load and release on one segment.
This PR add leader_checker which generate load/release task to correct
the leader view, instead of calling sync distribution by rpc

---------

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-18 14:08:54 +08:00
congqixia
ce1ba6808a
enhance: [cherry-pick] change some important request log level to Info (#30062) (#30071)
Cherry-pick from master
pr: #30062 
Some important request log level shall be at least Info level

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-18 12:44:55 +08:00
congqixia
14aa20b7f7
enhance: [cherry-pick] fix otel config param type & leak (#30068)
cherry pick from master
pr: #29810 #30055 

`SampleFraction` shall be float and all `C.CString` shall be freed

Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-18 12:43:05 +08:00
zhenshan.cao
9aceff5a6e
fix: duplicate dynamic field data by mistake (#30043)
issue: #30000 
pr: https://github.com/milvus-io/milvus/pull/30042

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-01-17 00:20:55 +08:00
zhagnlu
9f6a19c56c
fix: increase expr recursion depth to avoid parse failed (#29860) (#30021)
pr: #29860

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-01-16 19:48:38 +08:00
cai.zhang
88c30b48ce
fix: [pick]Fix bug for read data from azure (#30006)
issue: #30005 
master pr: #30007

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-01-16 15:44:53 +08:00
PowderLi
ff93e8b489
fix: [CHERRY-PICK] CollectionSchema.autoID is deprecated (#30011)
issue: [#30000](https://github.com/milvus-io/milvus/issues/30000)
related to: [milvus-proto
#202](https://github.com/milvus-io/milvus-proto/pull/202)
master pr: #30002

1. replace collSchema.AutoID with primaryField.AutoID
2. show `enableDynamic` & `enableDynamicField` at the same time
3. avoid data race about the access to metacache

Signed-off-by: PowderLi <min.li@zilliz.com>
2024-01-16 14:32:53 +08:00
congqixia
1dbc2ab8ee
enhance: [Cherry-pick] make compactor use actual buffer size to decide when to sync(#29945) (#29971)
Cherry-pick from master
pr: #29945
See also: #29657

Datanode Compactor use estimated row number from schema to decide when
to sync the batch of data when executing compaction. This est value
could go way from actual size when the schema contains variable field(
say VarChar, JSON, etc.)

This PR make compactor able to check the actual buffer data size and
make it possible to sync when buffer is actually beyond max binglog
size.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-16 12:22:52 +08:00
congqixia
7fc7e1a0d5
enhance: [Cherry-pick] Use newer checkpoint when packing LoadSegmentRequest (#29922) (#29978)
Cherry-pick from master
pr: #29922 
See also: #29650

Either segment dml position & channel checkpoint could be newer in some
cases. This PR make PackLoadSegments use the newer one improving load
performance during cases where there are lots of upsert.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-16 12:08:53 +08:00
wei liu
81fdb6f472
enhance: Skip generate load segment task (#29724) (#29982)
issue: #29814
pr: #29724
if channel is not subscribed yet, the generated load segment task will
be remove from task scheduler due to the load segment task need to be
transfer to worker node by shard leader.

This PR skip generate load segment task when channel is not subscribed
yet.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-16 10:12:52 +08:00
chyezh
df9b3376dc
fix: Use determined order to lock in BlockAll to avoid deadlock (#29972)
issue: #29104
pr: #29246

Signed-off-by: chyezh <ye.zhen@zilliz.com>
2024-01-15 14:32:51 +08:00
chyezh
072b11355d
fix: SealedIndexingEntry in SealedIndexingRecord may leak without smart pointer protected (#29966)
may related issue: #29828
pr: #29932

Signed-off-by: chyezh <ye.zhen@zilliz.com>
2024-01-15 10:30:52 +08:00
cai.zhang
434ac1f6d0
fix: [Pick]Fix error message for indexing (#29906)
issue: #29897 

master pr: #29898

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-01-14 13:30:52 +08:00
chyezh
c8e3a48214
fix: querynode num entity metric is broken by illegal label (#29949)
issue: #29766
also see pr: #29825
pr: #29948

Signed-off-by: chyezh <ye.zhen@zilliz.com>
2024-01-14 10:22:59 +08:00
congqixia
227071a754
enhance: [cherry-pick] reduce delete detail log to delete range (#29916) (#29930)
Cherry-pick from master
pr: #29916
Delete detail log will be large and hard to read when log level is
debug. This PR change the log to stringer and print only pk range,
number.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-12 20:18:51 +08:00
congqixia
c21229b7bb
enhance: [cherry-pick] add trace span for wait tsafe (#29911) (#29929)
Cherry-pick from master
pr: #29911 
Add tracing span for search/query operation waiting tsafe duration

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-12 20:17:01 +08:00
aoiasd
128f197797
enhance: [Cherry-Pick] support access log print cluster prefix (#29646) (#29831)
relate: https://github.com/milvus-io/milvus/issues/29645
pr: https://github.com/milvus-io/milvus/pull/29646

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-01-12 18:58:52 +08:00
wei liu
86cddd24b5
enhance: Add ctx for load index logs (#29686) (#29905)
pr: #29686
This PR add ctx for load index logs

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-12 18:56:58 +08:00
SimFG
d573f0ec1a
fix: [2.3] the delete msg disorder issue (#29917)
/kind improvement
pr: #29915

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-01-12 18:04:50 +08:00