78 Commits

Author SHA1 Message Date
cai.zhang
21b0e5ca9d
enhance: Don't seal segments when only alter collection properties (#46488)
### **PR Type**
Enhancement


___

### **Description**
- Only flush and fence segments for schema-changing alter collection
messages

- Skip segment sealing for collection property-only alterations

- Add conditional check using messageutil.IsSchemaChange utility
function


___

### Diagram Walkthrough


```mermaid
flowchart LR
  A["Alter Collection Message"] --> B{"Is Schema Change?"}
  B -->|Yes| C["Flush and Fence Segments"]
  B -->|No| D["Skip Segment Operations"]
  C --> E["Set Flushed Segment IDs"]
  D --> E
  E --> F["Append Operation"]
```



<details><summary><h3>File Walkthrough</h3></summary>

<table><thead><tr><th></th><th align="left">Relevant
files</th></tr></thead><tbody><tr><td><strong>Enhancement</strong></td><td><table>
<tr>
  <td>
    <details>
<summary><strong>shard_interceptor.go</strong><dd><code>Conditional
segment sealing based on schema changes</code>&nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp; </dd></summary>
<hr>


internal/streamingnode/server/wal/interceptors/shard/shard_interceptor.go

<ul><li>Added import for <code>messageutil</code> package to access
schema change detection <br>utility<br> <li> Modified
<code>handleAlterCollection</code> to conditionally flush and fence
<br>segments only for schema-changing messages<br> <li> Wrapped segment
flushing logic in <code>if
</code><br><code>messageutil.IsSchemaChange(header)</code> check<br>
<li> Skips unnecessary segment sealing when only collection properties
are <br>altered</ul>


</details>


  </td>
<td><a
href="https://github.com/milvus-io/milvus/pull/46488/files#diff-c1acf785e5b530e59137b21584cf567ccd9aeeb613fb3684294b439289e80beb">+9/-3</a>&nbsp;
&nbsp; &nbsp; </td>

</tr>
</table></td></tr></tbody></table>

</details>

___



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Optimized collection schema alteration to conditionally perform
segment allocation operations only when schema changes are detected,
reducing unnecessary overhead in unmodified collection scenarios.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-12-22 20:55:19 +08:00
sijie-ni-0214
f51de1a8ab
feat: support TruncateCollection api to clear collection data (#46167)
issue: https://github.com/milvus-io/milvus/issues/46166

---------

Signed-off-by: sijie-ni-0214 <sijie.ni@zilliz.com>
2025-12-12 10:31:14 +08:00
yihao.dai
f32f2694bc
enhance: Implement new FlushAllMessage and refactor flush all (#45920)
This PR:
1. Define and implement the new FlushAllMessage.
2. Refactor FlushAll to flush the entire cluster.

issue: https://github.com/milvus-io/milvus/issues/45919

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-12-10 19:27:13 +08:00
Zhen Ye
10a781d22c
fix: write ahead buffer unittest failure (#45978)
issue: #45977

Signed-off-by: chyezh <chyezh@outlook.com>
2025-12-02 10:25:10 +08:00
Zhen Ye
8e0ae6433d
fix: LastConfirmedMessageID may be wrong if high concurrent writing (#45873)
issue: #45872

Signed-off-by: chyezh <chyezh@outlook.com>
2025-11-27 12:01:07 +08:00
Zhen Ye
576084fe86
enhance: support alter collection/database with WAL-based DDL framework (#45266)
issue: #43897

- Alter collection/database is implemented by WAL-based DDL framework
now.
- Support AlterCollection/AlterDatabase in wal now.
- Alter operation can be synced by new CDC now.
- Refactor some UT for alter DDL.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-11-04 09:59:33 +08:00
Zhen Ye
25e0485a56
fix: unrecoverable when replicate from old (#45224)
issue: #44962

Signed-off-by: chyezh <chyezh@outlook.com>
2025-11-03 15:07:36 +08:00
Zhen Ye
ce164db1f3
fix: wal state may be unconsistent after recovering from crash (#45092)
issue: #45088, #45086

- Message on control channel should trigger the checkpoint update.
- LastConfrimedMessageID should be recovered from the minimum of
checkpoint or the LastConfirmedMessageID of uncommitted txn.
- Add more log info for wal debugging.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-10-29 16:26:10 +08:00
Zhen Ye
2aa48bf4ca
fix: wrong execution order of DDL/DCL on secondary (#44886)
issue: #44697, #44696

- The DDL executing order of secondary keep same with order of control
channel timetick now.
- filtering the control channel operation on shard manager of
streamingnode to avoid wrong vchannel of create segment.
- fix that the immutable txn message lost replicate header.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-10-21 22:38:05 +08:00
Zhen Ye
19e5e9f910
enhance: broadcaster will lock resource until message acked (#44508)
issue: #43897

- Return LastConfirmedMessageID when wal append operation.
- Add resource-key-based locker for broadcast-ack operation to protect
the coord state when executing ddl.
- Resource-key-based locker is held until the broadcast operation is
acked.
- ResourceKey support shared and exclusive lock.
- Add FastAck execute ack right away after the broadcast done to speed
up ddl.
- Ack callback will support broadcast message result now.
- Add tombstone for broadcaster to avoid to repeatedly commit DDL and
ABA issue.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-09-24 20:58:05 +08:00
Zhen Ye
c171280f63
enhance: support replicate message in wal. (#44456)
issue: #44123

- support replicate message  in wal of milvus.
- support CDC-replicate recovery from wal.
- fix some CDC replicator bugs

Signed-off-by: chyezh <chyezh@outlook.com>
2025-09-22 17:06:11 +08:00
yihao.dai
51f69f32d0
feat: Add CDC support (#44124)
This PR implements a new CDC service for Milvus 2.6, providing log-based
cross-cluster replication.

issue: https://github.com/milvus-io/milvus/issues/44123

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: chyezh <chyezh@outlook.com>
2025-09-16 16:32:01 +08:00
Zhen Ye
cbe4c3d231
enhance: get cchannel before build message (#44229)
issue: #43897

- support never expire txn message.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-09-10 11:09:57 +08:00
Zhen Ye
a86b6f2a54
enhance: extend the stats manage at streaming shard manager for L0 (#43371)
issue: #42416

- Rename the InsertMetric into ModifiedMetric.
- Add L0 control configuration.
- Add some L0 current state collect.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-08-18 20:41:46 +08:00
Zhen Ye
7b005c48bf
enhance: support util template generation for messages (#43881)
issue: #43880

Signed-off-by: chyezh <chyezh@outlook.com>
2025-08-18 01:19:44 +08:00
Zhen Ye
8ff118a9ff
fix: call IntoMessageProto instead of Payload when rpc (#43678)
issue: #43677

Signed-off-by: chyezh <chyezh@outlook.com>
2025-08-06 14:45:40 +08:00
Zhen Ye
b142589942
enhance: support all partitions in shard manager for L0 segment (#43385)
issue: #42416

- change the key from partitionID into PartitionUniqueKey to support
AllPartitionsID

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-18 11:40:51 +08:00
Zhen Ye
e97e44d56e
enhance: limit the gc concurrency when cpu is high (#43059)
issue: #42833

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-04 09:22:43 +08:00
Zhen Ye
1f66b650e9
fix: pulsar cannot work properly if backlog exceed (#42653)
issue: #42649

- the sync operation of different pchannel is concurrent now.
- add a option to notify the backlog clear automatically.
- make pulsar walimpls can be recovered from backlog exceed.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-06-13 14:28:37 +08:00
Zhen Ye
fc010e44a8
fix: release memory after pop from heap (#42482)
issue: #42481

Signed-off-by: chyezh <chyezh@outlook.com>
2025-06-04 10:00:32 +08:00
Zhen Ye
e479467582
fix: panic when upgrading from old arch (#42422)
issue: #42405

- add delete rows into header when upsert.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-31 22:56:29 +08:00
Zhen Ye
c9b0748ff9
enhance: add delete rows into delete msg header and more metric (#41952)
issue: #41544

- add delete rows into delete messsage header
- add more insert/delete metrics
- fix non-broadcast message has broadcast header

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-22 20:28:26 +08:00
Zhen Ye
59dff668dc
enhance: schema change without manual flush (#41882)
issue: #39718

- remove the manual flush message from schema change operation
- add flush segment id handle into schema change processes

Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: congqixia <congqi.xia@zilliz.com>
2025-05-19 10:14:22 +08:00
Zhen Ye
d3fff1769e
fix: streaming node panic with when binary size is set as zero (#41879)
issue: #41853

- persist the estimated binary size for insert message into wal.
- add metric to record the total growing rows of channel.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-16 11:12:22 +08:00
Zhen Ye
ae43230703
enhance: set jemalloc prof disable by default (#41850)
issue: #40730

- add assertion for insert message
- add more buffer for seal notifier

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-15 20:10:23 +08:00
Zhen Ye
0a465bb5b7
enhance: use recovery+shardmanager, remove segment assignment interceptor (#41824)
issue: #41544

- add lock interceptor into wal.
- use recovery and shardmanager to replace the original implementation
of segment assignment.
- remove redundant implementation and unittest.
- remove redundant proto definition.
- use 2 streamingnode in e2e.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-14 23:00:23 +08:00
Zhen Ye
7beafe99a7
enhance: implement wal garbage collector with truncate api (#41770)
issue: #41544

- add a truncator implementation into wal recovery storage.
- add metrics for recovery storage.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-13 22:08:56 +08:00
Zhen Ye
61b6ca5b73
enhance: add in mem shard manager (#41749)
issue: #41544

- Implement in-memory shard manager to maintain the shard state at write
ahead.
- Remove all rpc and meta operation at write ahead, make the segment
assignment logic only use wal and memory.
- Refactor global stats management, add node-level flush policy.
- Fix the recovery storage inconsistency bug when graceful close.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-13 12:04:56 +08:00
Zhen Ye
e675da76e4
enhance: simplify the proto message, make segment assignment code more clean (#41671)
issue: #41544

- simplify the proto message for flush and create segment.
- simplify the msg handler for flowgraph.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-11 20:49:00 +08:00
Zhen Ye
3dd9a1147b
enhance: add lock interceptor and recoverable txn manager (#41640)
issue: #41544

- add a lock interceptor at vchannel granularity.
- make txn manager recoverable and add FailTxnAtVChannel operation.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-09 11:14:53 +08:00
Zhen Ye
dfbb02a5f7
enhance: make streaming message as a log field for easier coding (#41545)
issue: #41544

- implement message can be logged as a field by zap.
- fix too many slow log for woodpecker.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-28 14:38:42 +08:00
Zhen Ye
6a15790799
enhance: add interface for message and fix write ahead buffer (#41470)
issue: #41439

- add IsPersisted and VChannel interface for message
- add WithNotPersisted() for message builder
- fix the persisted time tick lost at write ahead buffer

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-27 10:24:38 +08:00
congqixia
b36c88f3c8
enhance: [AddField] Broadcast schema change via WAL (#41373)
Related to #39718

Add Broadcast logic for collection schema change and notifies:
- Streamnode - Delegator
- Streamnode - Flush component
- QueryNodes via grpc

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-22 16:28:37 +08:00
Zhen Ye
7f5a9a6046
fix: unstable timeticksync unittest (#41437)
issue: #38399

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-22 10:53:29 +08:00
Zhen Ye
9339bccccc
enhance: move sent first timeticksync, make recovery more easier (#41405)
issue: #38399

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-21 17:18:37 +08:00
Zhen Ye
ef4923e66b
fix: catchup scan never done if wal truncate (#41345)
issue: #41062

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-20 22:40:37 +08:00
Xianhui Lin
f9febe3bae
enhance: Merge RootCoord, DataCoord And QueryCoord into MixCoord (#41006)
Merge RootCoord, DataCoord And QueryCoord into MixCoord
Make Session into one
issue : https://github.com/milvus-io/milvus/issues/37764

---------

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-04-11 16:36:30 +08:00
Zhen Ye
224728c2d2
fix: catchup cannot work if using StartAfter (#41201)
issue: #41062

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-10 19:04:27 +08:00
Zhen Ye
7830dd3713
fix: keep the last persisted message in writeaheadbuffer to fix the never catch up (#41109)
issue: #41062

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-08 09:10:28 +08:00
Zhen Ye
cef1d16454
fix: timetick interceptor panics when closing write ahead buffer (#40970)
issue: #40967

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-01 10:44:22 +08:00
Zhen Ye
af80a4dac2
fix: auto flush all segment that is not created by streaming service (#40767)
issue: #40532

Signed-off-by: chyezh <chyezh@outlook.com>
2025-03-26 16:32:22 +08:00
Zhen Ye
d9fe8f0dcf
fix: [skip e2e] wab unittest may failure (#40470)
issue: #38399

Signed-off-by: chyezh <chyezh@outlook.com>
2025-03-11 11:34:06 +08:00
sthuang
63a7c4570e
feat: storage v2 sync (#39663)
related: #39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-03-05 11:22:15 +08:00
Zhen Ye
2ff657f2d9
fix: wal may panics when context canceled (#40265)
issue: #40264

- wal may panics when context canceled
- scanner may data race when closing

Signed-off-by: chyezh <chyezh@outlook.com>
2025-02-28 17:41:58 +08:00
Zhen Ye
84df80b5e4
enhance: refactor metrics of streaming (#40031)
issue: #38399

- add metrics for broadcaster component.
- add metrics for wal flusher component.
- add metrics for wal interceptors.
- add slow log for wal.
- add more label for some wal metrics. (local or remote/catcup or
tailing...)

Signed-off-by: chyezh <chyezh@outlook.com>
2025-02-25 12:25:56 +08:00
congqixia
cb7f2fa6fd
enhance: Use v2 package name for pkg module (#39990)
Related to #39095

https://go.dev/doc/modules/version-numbers

Update pkg version according to golang dep version convention

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-02-22 23:15:58 +08:00
Zhen Ye
ae700e7519
enhance: make compatitle with old msgstream for new streaming service (#39943)
issue: #38399

Signed-off-by: chyezh <chyezh@outlook.com>
2025-02-18 11:21:08 +08:00
Zhen Ye
21724ab52c
enhance: generate guaranteets at delegator if local wal (#39799)
issue: #38399, #39892

- use mvcc timestamp of wal as guaranteets if wal and delegator is
located at same node.
- fix: ignore growing option is lost at hibridsearch

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-02-17 15:22:15 +08:00
Zhen Ye
0988807160
enhance: enable write ahead buffer for streaming service (#39771)
issue: #38399

- Make a timetick-commit-based write ahead buffer at write side.
- Add a switchable scanner at read side to transfer the state between
catchup and tailing read

Signed-off-by: chyezh <chyezh@outlook.com>
2025-02-12 20:38:46 +08:00
Zhen Ye
d3e32bb599
enhance: make pchannel level flusher (#39275)
issue: #38399

- Add a pchannel level checkpoint for flush processing
- Refactor the recovery of flushers of wal
- make a shared wal scanner first, then make multi datasyncservice on it

Signed-off-by: chyezh <chyezh@outlook.com>
2025-02-10 16:32:45 +08:00