331 Commits

Author SHA1 Message Date
Zhen Ye
30091a3bb7
enhance: remove redundant channel manager from datacoord (#44532)
issue: #41611

- After enabling streaming arch, channel manager of data coord is a
redundant component.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-10-09 11:01:57 +08:00
Zhen Ye
19e5e9f910
enhance: broadcaster will lock resource until message acked (#44508)
issue: #43897

- Return LastConfirmedMessageID when wal append operation.
- Add resource-key-based locker for broadcast-ack operation to protect
the coord state when executing ddl.
- Resource-key-based locker is held until the broadcast operation is
acked.
- ResourceKey support shared and exclusive lock.
- Add FastAck execute ack right away after the broadcast done to speed
up ddl.
- Ack callback will support broadcast message result now.
- Add tombstone for broadcaster to avoid to repeatedly commit DDL and
ABA issue.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-09-24 20:58:05 +08:00
Zhen Ye
ba289891c0
enhance: add all ddl message into messages (#44407)
issue: #43897

- add ddl messages proto and add some message utilities.
- support shard/exclusive resource-key-lock.
- add all ddl callbacks future into broadcast registry.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-09-18 10:08:00 +08:00
Zhen Ye
3327df72e4
enhance: make immutable message as the param of ack operation for cdc (#43900)
issue: #43897

- The original broadcast ack operation need to recover message from
etcd, which can not support cdc.
- immutable message will set as the ack parameter to fix it.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-09-01 10:21:52 +08:00
Chun Han
da156981c6
feat: milvus support posix-compatible mode(milvus-io#43942) (#43944)
related: #43942

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-08-27 16:29:50 +08:00
wei liu
3e9e830074
enhance: Implement rewatch mechanism for etcd failure scenarios (#43829)
issue: #43828
Implement robust rewatch mechanism to handle etcd connection failures
and node reconnection scenarios in DataCoord and QueryCoord, along with
heartbeat lag monitoring capabilities.

Changes include:
- Implement rewatchDataNodes/rewatchQueryNodes callbacks for etcd
reconnection scenarios
- Add idempotent rewatchNodes method to handle etcd session recovery
gracefully
- Add QueryCoordLastHeartbeatTimeStamp metric for monitoring node
heartbeat lag
- Clean up heartbeat metrics when nodes go down to prevent metric leaks

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-08-14 10:31:44 +08:00
Zhen Ye
070aabd27e
enhance: fix remove flushing state of segment (#43560)
issue: #43559, #42884

- also fix the data lost when streaming resuming from old arch message.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-25 18:08:54 +08:00
cai.zhang
6989e18599
enhance: Move sort stats task to sort compaction (#42562)
issue: #42560

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-07-08 20:22:47 +08:00
cai.zhang
f6b2a71c95
enhance: Remove chunkmanager-related dependencies from datanode (#43021)
issue: #41611

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-07-03 14:44:45 +08:00
Bingyi Sun
1bf960b1a8
enhance: Check loaded segments before gc (#42639)
issue: https://github.com/milvus-io/milvus/issues/42412

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-06-13 17:44:38 +08:00
yihao.dai
e6da4a64b5
fix: Pre-check import message to prevent pipeline block indefinitely (#42415)
Pre-check import message to prevent pipeline block indefinitely.

issue: https://github.com/milvus-io/milvus/issues/42414

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Co-authored-by: chyezh <chyezh@outlook.com>
2025-06-11 13:40:38 +08:00
Zhen Ye
66cc194ab2
enhance: add partition gc at streaming arch (#42179)
issue: #41976

- make drop partition message as a broadcast message.
- add gc when drop partition message is acked.
- add a call back to handle the broadcast message when ack.
- the ack operation of broadcast message will retry until success.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-29 23:20:30 +08:00
yihao.dai
f71930e8db
enhance: Enhance import context (#42021)
Rename `imeta` to `importMeta` to improve readability, and enhance
import related context usage.

issue: https://github.com/milvus-io/milvus/issues/41123

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-05-23 12:58:27 +08:00
yihao.dai
142bd2fc05
enhance: Pooling for data tasks (#41256)
1. Add global scheduler for datacoord.
2. Define and implement new CreateTask, QueryTask, DropTask interfaces.
3. Refine Import, Compaction, Stats, Index task.

issue: https://github.com/milvus-io/milvus/issues/41123

Co-authored-by: Cai Zhang <cai.zhang@zilliz.com>
2025-05-20 21:06:24 +08:00
cai.zhang
38ded7364f
fix: Don't create index for unsorted importing segment when enable stats (#41864)
issue: #41863

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-05-19 10:52:23 +08:00
SimFG
91d40fa558
fix: Update logging context and upgrade dependencies (#41318)
- issue: #41291

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-04-23 10:52:38 +08:00
Chun Han
016920b023
fix: solve incompitable problem for none-encoding index(#40838) (#41369)
related: #40838

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-04-20 22:56:44 +08:00
Xianhui Lin
f9febe3bae
enhance: Merge RootCoord, DataCoord And QueryCoord into MixCoord (#41006)
Merge RootCoord, DataCoord And QueryCoord into MixCoord
Make Session into one
issue : https://github.com/milvus-io/milvus/issues/37764

---------

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-04-11 16:36:30 +08:00
cai.zhang
05e25431d9
enhance: Deprecate disk params about indexing (#41045)
issue: #40863

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-04-07 11:36:34 +08:00
yihao.dai
b2a8694686
enhance: Merge IndexNode and DataNode (#40272)
Merge DataNode and IndexNode into DataNode.

issue: https://github.com/milvus-io/milvus/issues/39115

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-03-13 14:26:11 +08:00
XuanYang-cn
4bebca6416
enhance: Replace currRows with NumOfRows (#40074)
See also: #40068

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-03-10 12:16:03 +08:00
cai.zhang
5a810400b5
enhance: Optimize Task Scheduling to Enable Concurrent Execution (#40251)
issue: #39101 

2.5 pr: #40104

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-03-02 18:38:00 +08:00
congqixia
cb7f2fa6fd
enhance: Use v2 package name for pkg module (#39990)
Related to #39095

https://go.dev/doc/modules/version-numbers

Update pkg version according to golang dep version convention

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-02-22 23:15:58 +08:00
SimFG
047254665d
feat: support to replicate import msg (#39171)
- issue: #39849

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: chyezh <chyezh@outlook.com>
2025-02-16 00:08:13 +08:00
cai.zhang
1d54ff157f
fix: Restore the compacting state for stats task during recovery (#39459)
issue: #39333

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2025-02-05 17:11:12 +08:00
yihao.dai
38f813bed3
enhance: Read metadata concurrently to accelerate recovery (#38403)
Read metadata such as segments, binlogs, and partitions concurrently at
the collection level.

issue: https://github.com/milvus-io/milvus/issues/37630

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-01-23 14:27:27 +08:00
wei liu
d2834a1812
enhance: Add logs for check health failed (#39208)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-01-15 17:31:00 +08:00
Zhen Ye
bb8d1ab3bf
enhance: make new go package to manage proto (#39114)
issue: #39095

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-10 10:49:01 +08:00
jaime
f03a85725a
enhance: add db name in replica (#38672)
issue: #36621

Signed-off-by: jaime <yun.zhang@zilliz.com>
2025-01-09 19:40:59 +08:00
Zhen Ye
c5a7000a92
enhance: move streaming coord from datacoord to rootcoord (#39007)
issue: #38399

We want to support broadcast operation for both streaming and msgstream.
But msgstream can be only sent message from rootcoord and proxy.
So this pr move the streamingcoord to rootcoord to make easier
implementation.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-07 17:42:57 +08:00
cai.zhang
9672eee78a
enhance: Increase the buffer capacity of notifyIndexChan to support concurrency (#38957)
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-01-05 18:53:01 +08:00
jaime
78438ef41e
fix: revert optimize CPU usage for CheckHealth requests (#35589) (#38555)
issue: #35563

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-12-19 00:38:45 +08:00
jaime
28fdbc4e30
enhance: optimize CPU usage for CheckHealth requests (#35589)
issue: #35563
1. Use an internal health checker to monitor the cluster's health state,
storing the latest state on the coordinator node. The CheckHealth
request retrieves the cluster's health from this latest state on the
proxy sides, which enhances cluster stability.
2. Each health check will assess all collections and channels, with
detailed failure messages temporarily saved in the latest state.
3. Use CheckHealth request instead of the heavy GetMetrics request on
the querynode and datanode

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-12-17 11:02:45 +08:00
tinswzy
27229f7907
enhance: refine exists log print with ctx (#38080)
issue: #35917 
Refines exists log print with ctx

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2024-12-14 22:36:44 +08:00
Zhen Ye
18bef5e062
fix: fix crash when enable standby and streaming (#38239)
issue: #38125

- connect kv at standby mode.
- make balancer initialization lazy.

Signed-off-by: chyezh <chyezh@outlook.com>
2024-12-06 11:52:40 +08:00
tinswzy
7944538ade
enhance: Add ctx param to KV operation interfaces (#38154)
issue: #35917 
Refine KV operation interfaces by adding a ctx param

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2024-12-05 15:16:41 +08:00
tinswzy
1dbb6cd7cb
enhance: refine the datacoord meta related interfaces (#37957)
issue: #35917 
This PR refines the meta-related APIs in datacoord to allow the ctx to
be passed down to the catalog operation interfaces

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2024-11-26 19:46:34 +08:00
Zhen Ye
b9a10a2f68
enhance: remove the rpc layer of coordinator when enabling standalone or mixcoord (#37815)
issue: #37764

- add a local client to call local server directly for
querycoord/rootcoord/datacoord.
- enable local client if milvus is running mixcoord or standalone mode.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-11-23 21:36:33 +08:00
jaime
7bbfe86bcd
enhance: add list index and segment index retrieval API for WebUI (#37861)
issue: #36621

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-11-22 16:58:34 +08:00
cai.zhang
4dc684126e
enhance: Handoff growing segment after sorted (#37385)
issue: #33744 

1. Segments generated from inserts will be loaded as growing until they
are sorted by primary key.
2. This PR may increase memory pressure on the delegator, but we need to
test the performance of stats. In local testing, the speed of stats is
greater than the insert speed.

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-11-07 16:08:24 +08:00
jaime
f348bd9441
feat: add segment,pipeline, replica and resourcegroup api for WebUI (#37344)
issue: #36621

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-11-07 11:52:25 +08:00
Zhen Ye
cae9e1c732
fix: drop collection failed if enable streaming service (#37444)
issue: #36858

- Start channel manager on datacoord, but with empty assign policy in
streaming service.
- Make collection at dropping state can be recovered by flusher to make
sure that
 milvus consume the dropCollection message.
- Add backoff for flusher lifetime.
- remove the proxy watcher from timetick at rootcoord in streaming
service.

Also see the better fixup: #37176

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-11-07 10:26:26 +08:00
jaime
9d16b972ea
feat: add tasks page into management WebUI (#37002)
issue: #36621

1. Add API to access task runtime metrics, including:
  - build index task
  - compaction task
  - import task
- balance (including load/release of segments/channels and some leader
tasks on querycoord)
  - sync task
2. Add a debug model to the webpage by using debug=true or debug=false
in the URL query parameters to enable or disable debug mode.

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-10-28 10:13:29 +08:00
wei liu
d51a808851
fix: Rootcoord stuck at graceful stop progress (#36880)
issue: #34553
when rootcoord trigger graceful stop progress, it will block until all
rpc finished. for create collection request, rootcoord need to block
until datacoord finish to watch all channels, but datacoord need to call
`rootcoord.Alloc` during watch channel, and rootcoord doesn't respond to
new request anymore. which cause create collection stucks, and graceful
stop progress stucks.

This PR remove the func call `rootcoord.Alloc` to solve the logic dead
lock during graceful stop progress.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-10-17 12:15:25 +08:00
yihao.dai
1f47d5510b
fix: Fix import segments leak in segment manager (#36602)
Directly add import segments from the meta, eliminating the dependency
on the segment manager.

issue: https://github.com/milvus-io/milvus/issues/34648

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-08 10:11:22 +08:00
yihao.dai
9e8cafcbe2
enhance: Skip loading bf in datanode (#36367)
Skip loading bf in datanode:
1. When watching vchannels, skip loading bloom filters for segments.
2. Bypass bloom filter checks for delete messages, directly writing to
L0 segments.
3. Remove flushed segments proactively after flush.

issue: https://github.com/milvus-io/milvus/issues/34585

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-09-26 10:11:15 +08:00
yihao.dai
a61668c77e
feat: Introduce stats task for import (#35868)
This PR introduce stats task for import:
1. Define new `Stats` and `IndexBuilding` states for importJob
2. Add new stats step to the import process: trigger the stats task and
wait for its completion
3. Abort stats task if import job failed

issue: https://github.com/milvus-io/milvus/issues/33744

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-09-15 15:17:08 +08:00
cai.zhang
8395c8a8db
enhance: Update stats task to optional (#35947)
issue: #33744

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-09-12 20:37:08 +08:00
cai.zhang
2c9bb4dfa3
feat: Support stats task to sort segment by PK (#35054)
issue: #33744 

This PR includes the following changes:
1. Added a new task type to the task scheduler in datacoord: stats task,
which sorts segments by primary key.
2. Implemented segment sorting in indexnode.
3. Added a new field `FieldStatsLog` to SegmentInfo to store token index
information.

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-09-02 14:19:03 +08:00
Zhen Ye
99dff06391
enhance: using streaming service in insert/upsert/flush/delete/querynode (#35406)
issue: #33285

- using streaming service in insert/upsert/flush/delete/querynode
- fixup flusher bugs and refactor the flush operation
- enable streaming service for dml and ddl
- pass the e2e when enabling streaming service
- pass the integration tst when enabling streaming service

---------

Signed-off-by: chyezh <chyezh@outlook.com>
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-08-29 10:03:08 +08:00