Zhen Ye
38c804fb01
fix: more stable recovery graceful closing and stable unittest ( #42013 )
...
issue: #41544
Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-23 17:52:26 +08:00
congqixia
244aa30076
fix: Lock before reading flusher cp sampling truncate cp ( #42019 )
...
Related to #42018
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-05-22 21:38:28 +08:00
Zhen Ye
c9b0748ff9
enhance: add delete rows into delete msg header and more metric ( #41952 )
...
issue: #41544
- add delete rows into delete messsage header
- add more insert/delete metrics
- fix non-broadcast message has broadcast header
Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-22 20:28:26 +08:00
Zhen Ye
458ab86894
fix: stop retry if collection not found too much when get recovery from coord ( #41980 )
...
issue: #41966
Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-22 16:22:24 +08:00
Zhen Ye
59ab274dbe
fix: use flusher and recovery checkpoint together to determine the truncate position ( #41934 )
...
issue: #41544
- unify the log field of message
- use the minimum one of flusher and recovery storage checkpoint as the
truncate position
Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-20 16:10:24 +08:00
yihao.dai
65dd3982d8
fix: Fix ants.Pool goroutine leak ( #41892 )
...
1. Release the pool after it is no longer in use.
2. Upgrade ants.Pool to fix the goroutine leak issue (see [PR
#287 ](https://github.com/panjf2000/ants/pull/287 )).
issue: https://github.com/milvus-io/milvus/issues/41838
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-05-19 17:56:22 +08:00
Zhen Ye
59dff668dc
enhance: schema change without manual flush ( #41882 )
...
issue: #39718
- remove the manual flush message from schema change operation
- add flush segment id handle into schema change processes
Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: congqixia <congqi.xia@zilliz.com>
2025-05-19 10:14:22 +08:00
Zhen Ye
a3d5ad135e
fix: recover a dropped collection from wal if create collection message can be seen ( #41902 )
...
issue: #41654
Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-17 07:38:21 +08:00
Zhen Ye
d3fff1769e
fix: streaming node panic with when binary size is set as zero ( #41879 )
...
issue: #41853
- persist the estimated binary size for insert message into wal.
- add metric to record the total growing rows of channel.
Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-16 11:12:22 +08:00
Zhen Ye
ae43230703
enhance: set jemalloc prof disable by default ( #41850 )
...
issue: #40730
- add assertion for insert message
- add more buffer for seal notifier
Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-15 20:10:23 +08:00
Zhen Ye
0a465bb5b7
enhance: use recovery+shardmanager, remove segment assignment interceptor ( #41824 )
...
issue: #41544
- add lock interceptor into wal.
- use recovery and shardmanager to replace the original implementation
of segment assignment.
- remove redundant implementation and unittest.
- remove redundant proto definition.
- use 2 streamingnode in e2e.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-14 23:00:23 +08:00
Zhen Ye
21d6d1669e
fix: wal should be reopen if wal append receive the fence error ( #41807 )
...
issue: #41544
Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-14 01:02:56 +08:00
Zhen Ye
7beafe99a7
enhance: implement wal garbage collector with truncate api ( #41770 )
...
issue: #41544
- add a truncator implementation into wal recovery storage.
- add metrics for recovery storage.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-13 22:08:56 +08:00
Zhen Ye
61b6ca5b73
enhance: add in mem shard manager ( #41749 )
...
issue: #41544
- Implement in-memory shard manager to maintain the shard state at write
ahead.
- Remove all rpc and meta operation at write ahead, make the segment
assignment logic only use wal and memory.
- Refactor global stats management, add node-level flush policy.
- Fix the recovery storage inconsistency bug when graceful close.
Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-13 12:04:56 +08:00
Zhen Ye
e675da76e4
enhance: simplify the proto message, make segment assignment code more clean ( #41671 )
...
issue: #41544
- simplify the proto message for flush and create segment.
- simplify the msg handler for flowgraph.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-11 20:49:00 +08:00
Zhen Ye
452d6fb709
fix: write buffer leak if the wal flusher is cancelled when recovery ( #41719 )
...
issue: #41715
Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-10 09:32:56 +08:00
Zhen Ye
3dd9a1147b
enhance: add lock interceptor and recoverable txn manager ( #41640 )
...
issue: #41544
- add a lock interceptor at vchannel granularity.
- make txn manager recoverable and add FailTxnAtVChannel operation.
Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-09 11:14:53 +08:00
Zhen Ye
de8f0af20d
enhance: use dispatcher at delegator when enable streaming ( #41266 )
...
issue: #38399
- add an adaptor type to adapt the streaming service client and
msgstream client to reuse the msgdispatcher.
Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-06 01:12:53 +08:00
Zhen Ye
dfbb02a5f7
enhance: make streaming message as a log field for easier coding ( #41545 )
...
issue: #41544
- implement message can be logged as a field by zap.
- fix too many slow log for woodpecker.
Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-28 14:38:42 +08:00
Zhen Ye
6a15790799
enhance: add interface for message and fix write ahead buffer ( #41470 )
...
issue: #41439
- add IsPersisted and VChannel interface for message
- add WithNotPersisted() for message builder
- fix the persisted time tick lost at write ahead buffer
Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-27 10:24:38 +08:00
Zhen Ye
a3d621cb5e
fix: remove the concurrent limits for streaming service ( #41484 )
...
issue: #41479
Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-24 20:36:38 +08:00
Zhen Ye
ecfc868dcb
fix: write buffer not unregistered when datasyncservice is gone ( #41496 )
...
issue: #41495
Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-24 19:38:38 +08:00
congqixia
b36c88f3c8
enhance: [AddField] Broadcast schema change via WAL ( #41373 )
...
Related to #39718
Add Broadcast logic for collection schema change and notifies:
- Streamnode - Delegator
- Streamnode - Flush component
- QueryNodes via grpc
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-22 16:28:37 +08:00
Zhen Ye
7f5a9a6046
fix: unstable timeticksync unittest ( #41437 )
...
issue: #38399
Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-22 10:53:29 +08:00
Zhen Ye
9339bccccc
enhance: move sent first timeticksync, make recovery more easier ( #41405 )
...
issue: #38399
Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-21 17:18:37 +08:00
Zhen Ye
ef4923e66b
fix: catchup scan never done if wal truncate ( #41345 )
...
issue: #41062
Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-20 22:40:37 +08:00
tinswzy
6fa68c1f16
enhance: Support Woodpecker as a WAL storage option for Milvus ( #41095 )
...
#40916 Support Woodpecker as a WAL storage option for Milvus
Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-04-20 22:22:42 +08:00
Zhen Ye
c893344289
fix: close of wal is block when recovery ( #41326 )
...
issue: #41307
Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-18 16:14:35 +08:00
congqixia
1d564a2d95
fix: Make TestScannerAdaptorReadError stable ( #41303 )
...
Related to #41302
Previously wait for 200 milliseconds could cause unsable behavior of
this unittest. This PR make unittest wait for certain function call
instead of wait for some time.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-15 14:54:34 +08:00
Xianhui Lin
f9febe3bae
enhance: Merge RootCoord, DataCoord And QueryCoord into MixCoord ( #41006 )
...
Merge RootCoord, DataCoord And QueryCoord into MixCoord
Make Session into one
issue : https://github.com/milvus-io/milvus/issues/37764
---------
Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-04-11 16:36:30 +08:00
Zhen Ye
224728c2d2
fix: catchup cannot work if using StartAfter ( #41201 )
...
issue: #41062
Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-10 19:04:27 +08:00
Ted Xu
1bcea2a775
fix: assigning the correct storage version in sync and index tasks ( #41093 )
...
See #39663 #40667
---------
Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-04-08 10:14:25 +08:00
Zhen Ye
7830dd3713
fix: keep the last persisted message in writeaheadbuffer to fix the never catch up ( #41109 )
...
issue: #41062
Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-08 09:10:28 +08:00
Zhen Ye
b03e60558a
enhance: add proxy and datanode checker when wal balance startup ( #40877 )
...
issue: #40532
- balance should enable only when there's no proxy and datanode which
version is lower than 2.6.0
---------
Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-01 11:24:22 +08:00
Zhen Ye
cef1d16454
fix: timetick interceptor panics when closing write ahead buffer ( #40970 )
...
issue: #40967
Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-01 10:44:22 +08:00
Zhen Ye
af80a4dac2
fix: auto flush all segment that is not created by streaming service ( #40767 )
...
issue: #40532
Signed-off-by: chyezh <chyezh@outlook.com>
2025-03-26 16:32:22 +08:00
Zhen Ye
b119ac5d30
enhance: add wal access mode options ( #40617 )
...
issue: #40532
Signed-off-by: chyezh <chyezh@outlook.com>
2025-03-19 14:02:11 +08:00
Zhen Ye
f6fb4bc442
fix: backoff will retry infinitely after reaching max elapse ( #40589 )
...
issue: #40588
Signed-off-by: chyezh <chyezh@outlook.com>
2025-03-13 16:24:06 +08:00
Zhen Ye
5735c3ef19
fix: too many memory usage of streaming node ( #40606 )
...
issue: #40592
Signed-off-by: chyezh <chyezh@outlook.com>
2025-03-13 07:10:07 +08:00
Zhen Ye
d9fe8f0dcf
fix: [skip e2e] wab unittest may failure ( #40470 )
...
issue: #38399
Signed-off-by: chyezh <chyezh@outlook.com>
2025-03-11 11:34:06 +08:00
sthuang
63a7c4570e
feat: storage v2 sync ( #39663 )
...
related: #39173
Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-03-05 11:22:15 +08:00
Zhen Ye
2ff657f2d9
fix: wal may panics when context canceled ( #40265 )
...
issue: #40264
- wal may panics when context canceled
- scanner may data race when closing
Signed-off-by: chyezh <chyezh@outlook.com>
2025-02-28 17:41:58 +08:00
Zhen Ye
84df80b5e4
enhance: refactor metrics of streaming ( #40031 )
...
issue: #38399
- add metrics for broadcaster component.
- add metrics for wal flusher component.
- add metrics for wal interceptors.
- add slow log for wal.
- add more label for some wal metrics. (local or remote/catcup or
tailing...)
Signed-off-by: chyezh <chyezh@outlook.com>
2025-02-25 12:25:56 +08:00
congqixia
cb7f2fa6fd
enhance: Use v2 package name for pkg module ( #39990 )
...
Related to #39095
https://go.dev/doc/modules/version-numbers
Update pkg version according to golang dep version convention
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-02-22 23:15:58 +08:00
Zhen Ye
fd701eca71
fix: local wal perform different with remote wal ( #39967 )
...
issue: #38399
Signed-off-by: chyezh <chyezh@outlook.com>
2025-02-19 19:12:51 +08:00
Zhen Ye
ae700e7519
enhance: make compatitle with old msgstream for new streaming service ( #39943 )
...
issue: #38399
Signed-off-by: chyezh <chyezh@outlook.com>
2025-02-18 11:21:08 +08:00
Zhen Ye
21724ab52c
enhance: generate guaranteets at delegator if local wal ( #39799 )
...
issue: #38399 , #39892
- use mvcc timestamp of wal as guaranteets if wal and delegator is
located at same node.
- fix: ignore growing option is lost at hibridsearch
---------
Signed-off-by: chyezh <chyezh@outlook.com>
2025-02-17 15:22:15 +08:00
SimFG
047254665d
feat: support to replicate import msg ( #39171 )
...
- issue: #39849
---------
Signed-off-by: SimFG <bang.fu@zilliz.com>
Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: chyezh <chyezh@outlook.com>
2025-02-16 00:08:13 +08:00
Zhen Ye
034575396f
fix: streaming consume checkpoint is always nil and limit resource of ci ( #39781 )
...
issue: #38399
- fix the nil pointer bug
- limit the resource usage for streaming e2e
- enhance the go test
- fix: rootcoord block when graceful stop
---------
Signed-off-by: chyezh <chyezh@outlook.com>
2025-02-13 19:18:14 +08:00
Zhen Ye
0988807160
enhance: enable write ahead buffer for streaming service ( #39771 )
...
issue: #38399
- Make a timetick-commit-based write ahead buffer at write side.
- Add a switchable scanner at read side to transfer the state between
catchup and tailing read
Signed-off-by: chyezh <chyezh@outlook.com>
2025-02-12 20:38:46 +08:00