861 Commits

Author SHA1 Message Date
aoiasd
cf02c623ab
fix: fix injection invalid bug by add inject task to handler inject when queue was empty (#31819)
relate: https://github.com/milvus-io/milvus/issues/31548

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-04-03 14:05:14 +08:00
cqy123456
47f767cf32
enhance: remove float16 in 2.3 branch (#31720)
issue: https://github.com/milvus-io/milvus/issues/31696

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-03-30 10:49:13 +08:00
XuanYang-cn
055dd7ea1d
fix: Clear compaction tasks when release channel (#31694)
See also: #31648
pr: #31666

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-03-29 10:53:12 +08:00
XuanYang-cn
69931a6e7f
fix: Skip changing meta if nodeID not match with channel (#31665)
See also: #31648
pr: #31666

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-03-28 16:05:11 +08:00
jaime
5ddb0b435f
fix: revoke session may be ignored due to server context cancellation in advance (#31213)
issue: #31219
pr: #31220

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-03-14 19:05:04 +08:00
aoiasd
e747f15c80
fix: flush insert data with nil buffer (#31159)
relate: https://github.com/milvus-io/milvus/issues/31165

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-03-11 17:43:03 +08:00
congqixia
3e7f2e8e7d
enhance: [Cherry-Pick] Use ListIndexes instead of DescribeIndex for qc broker (#31163)
Cherry pick from master 
pr: #31122

See also #31103

Since querycoord need index meta information from datacoord only, broker
shall use `ListIndexes` to skip segment index building check logic in
datacoord

This PR is also related to #30538, in which DescribeIndex caused lots of
memory usage and lead to OOM eventually

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-11 14:41:02 +08:00
congqixia
383ff8b0b1
enhance: [2.3] Add flush trigger for channel cp updater (#31082)
See also #31024  #31058

Flush cost boosted from 2 seconds to 5 or more after the change of
channel updater. This PR add a manual trigger method to accelerate flush
procedure.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-07 15:15:01 +08:00
yihao.dai
91d17870d6
enhance: Prevent the backlog of channelCP update tasks, perform batch updates of channelCPs (#30941) (#31024)
This PR includes the following adjustments:

1. To prevent channelCP update task backlog, only one task with the same
vchannel is retained in the updater. Additionally, the lastUpdateTime is
refreshed after the flowgraph submits the update task, rather than in
the callBack function.
2. Batch updates of multiple vchannel checkpoints are performed in the
UpdateChannelCheckpoint RPC (default batch size is 128). Additionally,
the lock for channelCPs in DataCoord meta has been switched from key
lock to global lock.
3. The concurrency of UpdateChannelCheckpoint RPCs in the datanode has
been reduced from 1000 to 10.

issue: https://github.com/milvus-io/milvus/issues/30004

pr: https://github.com/milvus-io/milvus/pull/30941

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-03-05 14:27:01 +08:00
yihao.dai
a5350f64a5
enhance: Reduce the memory usage of the timeTickSender (#30968) (#30991)
In the cache of the timeTickSender, retain only the latest stats instead
of storing stats for every time tick.

issue: https://github.com/milvus-io/milvus/issues/30967

pr: https://github.com/milvus-io/milvus/pull/30968

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-03-05 10:59:01 +08:00
XuanYang-cn
bb2de0d964
fix: [cherry-pick] Clear DN unknown compaction tasks (#30972)
If DC restarted,  those unkonwn compaction tasks
will never get call back in DN, so that the segments in the compaction
task will be locked, unable to sync and compaction again, blocking cp
advance and compaction executing.

See also: #30137
pr: #30850

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-03-04 16:52:59 +08:00
chyezh
3e994242d6
fix: panic with datanode negetive wait group counter (#30136)
issue: #29170
pr: #30135

Signed-off-by: chyezh <chyezh@outlook.com>
2024-01-30 18:07:03 +08:00
chyezh
77e123762f
enhance: add graceful stop timeout to avoid node stop hang under extreme cases (#30320)
1. add coordinator and proxy graceful stop timeout to 5s.
3. add other work node graceful stop timeout to 900s, and we should
potentially change this to 600s when graceful stop is smooth
4. change the order of datacoord component while stop.
5. `LivenessCheck` do not perform graceful shutdown now. 

issue: https://github.com/milvus-io/milvus/issues/30310
pr: #30317
also see: https://github.com/milvus-io/milvus/pull/30306

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-01-27 08:45:02 +08:00
yihao.dai
917a4d74f3
fix: Use channel cp as the dml&start position for import segments (#30107) (#30133)
This PR discontinuing the subscription to the mq and, instead, employing
the channel checkpoint as the DML and starting position for the import
segments.

issue: https://github.com/milvus-io/milvus/issues/30106

pr: https://github.com/milvus-io/milvus/pull/30107

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-22 13:32:55 +08:00
congqixia
1dbc2ab8ee
enhance: [Cherry-pick] make compactor use actual buffer size to decide when to sync(#29945) (#29971)
Cherry-pick from master
pr: #29945
See also: #29657

Datanode Compactor use estimated row number from schema to decide when
to sync the batch of data when executing compaction. This est value
could go way from actual size when the schema contains variable field(
say VarChar, JSON, etc.)

This PR make compactor able to check the actual buffer data size and
make it possible to sync when buffer is actually beyond max binglog
size.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-16 12:22:52 +08:00
SimFG
d573f0ec1a
fix: [2.3] the delete msg disorder issue (#29917)
/kind improvement
pr: #29915

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-01-12 18:04:50 +08:00
congqixia
c56622dea7
enhance: move confusing warning log to error branch (#29891)
`flushInsertData` & `flushDeleteData` prints WARNING log even there is
no error returned. So move error branch into if block.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-12 15:50:52 +08:00
XuanYang-cn
1128b1dd67
fix: [cherry-pick]Save lite WatchInfo into etcd in DataNode (#29751)
See also: #29689
pr: #29687

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-01-10 20:48:50 +08:00
congqixia
cf228c2f1c
fix: Print number of pks instead of delete pk val (#29653)
See also #29445

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-04 10:02:46 +08:00
MrPresent-Han
757834602a
enhance: add param for bloomfilter(#29388) (#29614)
related: https://github.com/milvus-io/milvus/issues/29388
pr: https://github.com/milvus-io/milvus/pull/29490

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2024-01-02 18:58:47 +08:00
yah01
51cab791cf
fix: missing to support compact for Array type (#29505) (#29504)
the array type can't be compacted, the system could continue with the
inserted segments, but these segments can be never compacted

fix #29503
pr: #29505

---------

Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-12-27 08:15:51 +08:00
SimFG
74e72ce27e
enhance: [2.3] Support to get the param value in the runtime (#29298)
pr: #29297
/kind improvement

Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-12-21 20:36:43 +08:00
Xiaofan
8e13199da2
fix: frequent flush cause minio rate limit (#28625)
related to #28549
pr: #28626

1. avoid duplicated sync segments under syncing states
2. add jitter to avoid sync segments at the same time

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
2023-12-20 15:02:55 +08:00
congqixia
9a80dc666b
fix: flushTs is never reset in channelMeta (#29244)
See also #29156
FlushTs need to to be reset to MaxUint64 after channel checkpoint is
after this timestamp. Otherwise, the segment will be shattered and flush
queue will be filled with tasks

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-12-16 14:00:38 +08:00
XuanYang-cn
7b0599765f
fix: [cherry-pick]Skip updating checkpoint after dropcollection (#29221)
pr: #29220

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2023-12-15 16:04:45 +08:00
congqixia
a108bf7bc1
enhance: improve datanode channel checkpoint source log (#29180)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-12-14 14:34:38 +08:00
wayblink
e49860cb80
feat: Introduce channelCheckpointUpdater to reduce goroutine in ttNode (#29107)
pr: #28570

Signed-off-by: wayblink <anyang.wang@zilliz.com>
2023-12-12 13:48:42 +08:00
MrPresent-Han
5f4ac437b2
enhance: [Cherry-pick] Moving etcd client into session (#27069) (#28996)
relate: #26694
pr: https://github.com/milvus-io/milvus/pull/27069

Signed-off-by: Filip Haltmayer <filip.haltmayer@zilliz.com>
Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
Co-authored-by: Filip Haltmayer <81822489+filip-halt@users.noreply.github.com>
2023-12-07 16:22:34 +08:00
congqixia
2873be9264
fix: [2.3] Reject compaction task with growing segments (#28927)
See also #28924
The compaction task generated before datanode finish SaveBinlogPath grpc
call contains segments which are still in Growing state DataNode shall
verify each non-levelzero segments before submit compaction task to
executor

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-12-04 19:14:33 +08:00
yihao.dai
a78ea4fea0
fix: Check ErrSegmentNotFound in delete node (#28371) (#28638)
We have been check ErrSegmentNotFound in insert_buffer_node in datanode,
we should also check it in delete_node.

issue: https://github.com/milvus-io/milvus/issues/27145

pr: https://github.com/milvus-io/milvus/pull/28371

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-11-29 17:50:27 +08:00
jaime
9378f78218
enhance: Add logs for each step during service initialization (#28687)
/kind improvement
pr: #28624

Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-11-27 17:54:26 +08:00
congqixia
6512b12fba
enhance: [cherry-pick] Make etcd kv request timeout configurable (#28661) (#28701)
Cherry-pick from master
pr: #28661
See also #28660
This pr add request timeout config item for etcd kv request timeout
 Sync the default timeout value to same value for etcdKV & tikv config

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-11-24 21:16:26 +08:00
smellthemoon
288844e3cf
enhance: Reduce the goroutine in flowgraph to 2 (#28233) (#28545)
each node in flow graph alloc a goroutine, but it is actually executed
sequentially and can be placed in one goroutine. InputNode will consume
msg form msgstream, alloc one goroutine.
issue: #24826 
pr: #28233

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2023-11-24 15:00:26 +08:00
smellthemoon
d2ebbe2317
enhance: create goroutine only once(#28594) (#28609)
create goroutine only once when getOrCreateMergedTimeTickerSender
pr: #28594

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2023-11-22 10:26:28 +08:00
smellthemoon
d724b07037
enhance: Use single instance for mergedTimeTickerSender (#27730) (#28546)
use single instance for mergedTimeTickerSender
issue: https://github.com/milvus-io/milvus/issues/24826
pr: https://github.com/milvus-io/milvus/pull/27730

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2023-11-21 16:04:23 +08:00
congqixia
fce0284881
[2.3] Refine datanode Timetick Sender (#28393) (#28430)
cherry pick from master
pr: #28393
- Use explicit lifetime control methods: `Start` and `Stop`
- Allow control retry option
- Make sure tt sender worker exit after `Stop` return

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-11-15 10:08:26 +08:00
yah01
e51ceaae3a
Not convert legacy error code to new merr (#28232) (#28274)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-11-08 19:18:20 +08:00
groot
29e66ed46b
Fix bulkinsert bug that segments are compacted after import (#28227)
Signed-off-by: yhmo <yihua.mo@zilliz.com>
2023-11-08 10:18:20 +08:00
SimFG
598788e6b8
Delay the cancellation of ctx when stopping the node (#28249)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-11-08 01:46:20 +08:00
yah01
d10a82dba4
Fix getting incorrect CPU num (#28178)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-11-07 11:52:22 +08:00
yihao.dai
5fae32f77e
Use merr to prevent datanode panic (#28122)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-11-04 00:58:21 +08:00
yah01
f79c7370f4
Fix panic while flushing dropped/compacted segment (#27927)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-10-25 22:10:13 +08:00
yihao.dai
b9d5ef3599
Fix datanode ttNode goroutine leak (#27878)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-10-24 18:22:10 +08:00
aoiasd
9091a27832
Add meta cache to datanode for L0 Delta (#27768)
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2023-10-23 19:42:10 +08:00
SimFG
9b0ecbdca7
Support to replicate the mq message (#27240)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-10-20 14:26:09 +08:00
smellthemoon
4b0ec156b3
Set channel work pool size in datanode (#27728)
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2023-10-19 08:28:08 +08:00
XuanYang-cn
7358c3527b
Add iterators (#27643)
See also: #27606

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2023-10-18 19:34:08 +08:00
jaime
e386a62fae
Remove recollect segment stats during starting datacoord (#27410)
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-10-16 10:26:09 +08:00
jaime
ec1fe3549e
Add a stop hook to clean session (#27564)
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-10-16 10:24:10 +08:00
congqixia
82b2edc4bd
Replace manual composed grpc call with Broker methods (#27676)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-10-13 09:55:34 +08:00