90 Commits

Author SHA1 Message Date
wei liu
975c91df16
feat: Add comprehensive snapshot functionality for collections (#44361)
issue: #44358

Implement complete snapshot management system including creation,
deletion, listing, description, and restoration capabilities across all
system components.

Key features:
- Create snapshots for entire collections
- Drop snapshots by name with proper cleanup
- List snapshots with collection filtering
- Describe snapshot details and metadata

Components added/modified:
- Client SDK with full snapshot API support and options
- DataCoord snapshot service with metadata management
- Proxy layer with task-based snapshot operations
- Protocol buffer definitions for snapshot RPCs
- Comprehensive unit tests with mockey framework
- Integration tests for end-to-end validation

Technical implementation:
- Snapshot metadata storage in etcd with proper indexing
- File-based snapshot data persistence in object storage
- Garbage collection integration for snapshot cleanup
- Error handling and validation across all operations
- Thread-safe operations with proper locking mechanisms

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant/assumption: snapshots are immutable point‑in‑time
captures identified by (collection, snapshot name/ID); etcd snapshot
metadata is authoritative for lifecycle (PENDING → COMMITTED → DELETING)
and per‑segment manifests live in object storage (Avro / StorageV2). GC
and restore logic must see snapshotRefIndex loaded
(snapshotMeta.IsRefIndexLoaded) before reclaiming or relying on
segment/index files.

- New capability added: full end‑to‑end snapshot subsystem — client SDK
APIs (Create/Drop/List/Describe/Restore + restore job queries),
DataCoord SnapshotWriter/Reader (Avro + StorageV2 manifests),
snapshotMeta in meta, SnapshotManager orchestration
(create/drop/describe/list/restore), copy‑segment restore
tasks/inspector/checker, proxy & RPC surface, GC integration, and
docs/tests — enabling point‑in‑time collection snapshots persisted to
object storage and restorations orchestrated across components.

- Logic removed/simplified and why: duplicated recursive
compaction/delta‑log traversal and ad‑hoc lookup code were consolidated
behind two focused APIs/owners (Handler.GetDeltaLogFromCompactTo for
delta traversal and SnapshotManager/SnapshotReader for snapshot I/O).
MixCoord/coordinator broker paths were converted to thin RPC proxies.
This eliminates multiple implementations of the same traversal/lookup,
reducing divergence and simplifying responsibility boundaries.

- Why this does NOT introduce data loss or regressions: snapshot
create/drop use explicit two‑phase semantics (PENDING → COMMIT/DELETING)
with SnapshotWriter writing manifests and metadata before commit; GC
uses snapshotRefIndex guards and
IsRefIndexLoaded/GetSnapshotBySegment/GetSnapshotByIndex checks to avoid
removing referenced files; restore flow pre‑allocates job IDs, validates
resources (partitions/indexes), performs rollback on failure
(rollbackRestoreSnapshot), and converts/updates segment/index metadata
only after successful copy tasks. Extensive unit and integration tests
exercise pending/deleting/GC/restore/error paths to ensure idempotence
and protection against premature deletion.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2026-01-06 10:15:24 +08:00
yihao.dai
512884524b
enhance: Maintain compatibility with the legacy FlushAll (#46564)
issue: https://github.com/milvus-io/milvus/issues/45919

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: FlushAll verification must accept both per-channel
FlushAllTss (new schema) and the legacy single FlushAllTs;
GetFlushAllState chooses the verification path based on which field is
present and treats a channel as flushed only if its channel checkpoint
timestamp >= the applicable threshold (per-channel timestamp or legacy
FlushAllTs).
- Logic removed/simplified: The previous mixed/ambiguous checks were
split into two focused
routines—verifyFlushAllStateByChannelFlushAllTs(logger, channel,
flushAllTss) and verifyFlushAllStateByLegacyFlushAllTs(logger, channel,
flushAllTs)—and GetFlushAllState now selects one path. This centralizes
compatibility logic, eliminates interleaved/duplicated checks, and
retains the outer-loop short-circuiting on the first unflushed channel.
- Why this does NOT cause data loss or regressions: Changes only affect
read-only verification paths (GetFlushAllState/GetFlushState) that
compare in-memory channel checkpoints (meta.GetChannelCheckpoint) to
provided thresholds; no writes to checkpoints or persisted state occur
and FlushAll enqueue/wait behavior is unchanged. Unit tests were added
to cover legacy FlushAllTs behavior and the new FlushAllMsgs→FlushAllTs
extraction, exercising both code paths.
- Enhancement scope and location: Adds backward-compatible support and
concrete FlushAllTs extraction from streaming FlushAllMsgs in Proxy
(internal/proxy/task_flush_all_streaming.go) and compatibility verifiers
in DataCoord (internal/datacoord/services.go), plus corresponding tests
(internal/datacoord/services_test.go, internal/proxy/*_test.go).
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-12-26 18:59:20 +08:00
yihao.dai
889505872a
enhance: Return FlushAllMsg in response (#46347)
issue: https://github.com/milvus-io/milvus/issues/45919

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-12-16 10:35:16 +08:00
sijie-ni-0214
f51de1a8ab
feat: support TruncateCollection api to clear collection data (#46167)
issue: https://github.com/milvus-io/milvus/issues/46166

---------

Signed-off-by: sijie-ni-0214 <sijie.ni@zilliz.com>
2025-12-12 10:31:14 +08:00
yihao.dai
f32f2694bc
enhance: Implement new FlushAllMessage and refactor flush all (#45920)
This PR:
1. Define and implement the new FlushAllMessage.
2. Refactor FlushAll to flush the entire cluster.

issue: https://github.com/milvus-io/milvus/issues/45919

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-12-10 19:27:13 +08:00
aoiasd
354ab2f55e
enhance: sync file resource to querynode and datanode (#44480)
relate:https://github.com/milvus-io/milvus/issues/43687
Support use file resource with sync mode.
Auto download or remove file resource to local when user add or remove
file resource.
Sync file resource to node when find new node session.

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-12-04 16:23:11 +08:00
aoiasd
ed69375f00
enhance: remove resource type from file resource config (#45103)
File resource type was useless till now, remove it before new release.
relate: https://github.com/milvus-io/milvus/issues/43687

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-11-03 10:15:32 +08:00
Zhen Ye
309d564796
enhance: support collection and index with WAL-based DDL framework (#45033)
issue: #43897

- Part of collection/index related DDL is implemented by WAL-based DDL
framework now.
- Support following message type in wal, CreateCollection,
DropCollection, CreatePartition, DropPartition, CreateIndex, AlterIndex,
DropIndex.
- Part of collection/index related DDL can be synced by new CDC now.
- Refactor some UT for collection/index DDL.
- Add Tombstone scheduler to manage the tombstone GC for collection or
partition meta.
- Move the vchannel allocation into streaming pchannel manager.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-10-30 14:24:08 +08:00
Zhen Ye
30091a3bb7
enhance: remove redundant channel manager from datacoord (#44532)
issue: #41611

- After enabling streaming arch, channel manager of data coord is a
redundant component.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-10-09 11:01:57 +08:00
wei liu
92d2fb6360
enhance: Add granular flush targets support for FlushAll operation (#44234)
issue: #44156
Enhance FlushAll functionality to support targeting specific collections
within databases instead of only database-level flushing.

Changes include:

- Add FlushAllTarget message in data_coord.proto for granular targeting
- Support collection-specific flush operations within databases
- Maintain backward compatibility with deprecated db_name field

This enhancement allows users to flush specific collections without
affecting other collections in the same database, providing more precise
control over data persistence operations.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-09-19 18:38:01 +08:00
aoiasd
eca51ed2c6
enhance: add file resource api (#43766)
relate: https://github.com/milvus-io/milvus/issues/43687

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-08-08 14:17:41 +08:00
wei liu
1fae8f5ae3
enhance: Optimize FlushAll performance for multi-table scenarios (#43339)
Replace multiple per-table flush RPC calls with single FlushAll RPC to
improve performance in multi-table scenarios.
issue: #43338
- Implement server-side FlushAll request processing in
DataCoord/MixCoord
- Add flushAllTask to handle unified flush operations across all tables
- Replace proxy-side per-table flush iteration with single RPC call
- Support both streaming and non-streaming service execution paths
- Add comprehensive unit tests for new FlushAll implementation

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-07-30 15:37:37 +08:00
Zhen Ye
cd38d65417
fix: make savebinlogpath idompotent at binlog level (#43615)
issue: #43574

- update all binlog every time when calling udpate savebinlogpath.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-29 19:47:36 +08:00
Zhen Ye
feb5db60f2
fix: make flush save binlog paths idempotent (#43579)
issue: #43574

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-27 23:14:55 +08:00
Zhen Ye
070aabd27e
enhance: fix remove flushing state of segment (#43560)
issue: #43559, #42884

- also fix the data lost when streaming resuming from old arch message.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-25 18:08:54 +08:00
Zhen Ye
e9ab73e93d
enhance: add schema version at recovery storage (#43500)
issue: #43072, #43289

- manage the schema version at recovery storage.
- update the schema when creating collection or alter schema.
- get schema at write buffer based on version.
- recover the schema when upgrading from 2.5.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-23 21:38:54 +08:00
yihao.dai
e6da4a64b5
fix: Pre-check import message to prevent pipeline block indefinitely (#42415)
Pre-check import message to prevent pipeline block indefinitely.

issue: https://github.com/milvus-io/milvus/issues/42414

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Co-authored-by: chyezh <chyezh@outlook.com>
2025-06-11 13:40:38 +08:00
Zhen Ye
b94cee2413
fix: growing segment from old arch is not flushed after upgrading (#42164)
issue: #42162

- enhance: add read ahead buffer size issue #42129
- fix: rocksmq consumer's close operation may get stucked
- fix: growing segment from old arch is not flushed after upgrading

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-29 23:00:28 +08:00
yihao.dai
142bd2fc05
enhance: Pooling for data tasks (#41256)
1. Add global scheduler for datacoord.
2. Define and implement new CreateTask, QueryTask, DropTask interfaces.
3. Refine Import, Compaction, Stats, Index task.

issue: https://github.com/milvus-io/milvus/issues/41123

Co-authored-by: Cai Zhang <cai.zhang@zilliz.com>
2025-05-20 21:06:24 +08:00
Xianhui Lin
f9febe3bae
enhance: Merge RootCoord, DataCoord And QueryCoord into MixCoord (#41006)
Merge RootCoord, DataCoord And QueryCoord into MixCoord
Make Session into one
issue : https://github.com/milvus-io/milvus/issues/37764

---------

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-04-11 16:36:30 +08:00
Xianhui Lin
3bc24c264f
enhance: Add json key inverted index in stats for optimization (#38039)
Add json key inverted index in stats for optimization
https://github.com/milvus-io/milvus/issues/36995

---------

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-04-10 15:20:28 +08:00
zhenshan.cao
758cf29e77
fix: create multiple idential indexes by accident (#40179)
issue: https://github.com/milvus-io/milvus/issues/40163

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2025-04-08 15:06:25 +08:00
Zhen Ye
af80a4dac2
fix: auto flush all segment that is not created by streaming service (#40767)
issue: #40532

Signed-off-by: chyezh <chyezh@outlook.com>
2025-03-26 16:32:22 +08:00
yihao.dai
f65e6b7c6e
enhance: Optimize datacoord meta mutex (#40552)
Use a separate collection mutex.

issue: https://github.com/milvus-io/milvus/issues/40551

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-03-25 13:46:25 +08:00
XuanYang-cn
4bebca6416
enhance: Replace currRows with NumOfRows (#40074)
See also: #40068

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-03-10 12:16:03 +08:00
sthuang
63a7c4570e
feat: storage v2 sync (#39663)
related: #39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-03-05 11:22:15 +08:00
Zhen Ye
f47ab31f23
enhance: remove redundant resource key watch operation, just keep consistency of wal (#40235)
issue: #38399
related PR: #39522

- Just implement exclusive broadcaster between broadcast message with
same resource key to keep same order in different wal.
- After simplify the broadcast model, original watch-based broadcast is
too complicated and redundant, remove it.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-03-03 14:40:05 +08:00
congqixia
cb7f2fa6fd
enhance: Use v2 package name for pkg module (#39990)
Related to #39095

https://go.dev/doc/modules/version-numbers

Update pkg version according to golang dep version convention

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-02-22 23:15:58 +08:00
SimFG
047254665d
feat: support to replicate import msg (#39171)
- issue: #39849

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: chyezh <chyezh@outlook.com>
2025-02-16 00:08:13 +08:00
Zhen Ye
3e788f0fbd
enhance: record memory size (uncompressed) item for index (#38770)
issue: #38715

- Current milvus use a serialized index size(compressed) for estimate
resource for loading.
- Add a new field `MemSize` (before compressing) for index to estimate
resource.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-14 10:33:06 +08:00
Zhen Ye
bb8d1ab3bf
enhance: make new go package to manage proto (#39114)
issue: #39095

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-10 10:49:01 +08:00
yihao.dai
d4dab3c62f
enhance: Reduce segmentManager lock granularity (#37836)
Use a channel level key lock for segments in segmentManager.

issue: https://github.com/milvus-io/milvus/issues/37633,
https://github.com/milvus-io/milvus/issues/37630

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-12-17 14:12:52 +08:00
tinswzy
1dbb6cd7cb
enhance: refine the datacoord meta related interfaces (#37957)
issue: #35917 
This PR refines the meta-related APIs in datacoord to allow the ctx to
be passed down to the catalog operation interfaces

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2024-11-26 19:46:34 +08:00
jaime
1d06d4324b
fix: Int64 overflow in JSON encoding (#37657)
issue: ##36621

- For simple types in a struct, add "string" to the JSON tag for
automatic string conversion during JSON encoding.
- For complex types in a struct, replace "int64" with "string."

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-11-14 22:52:30 +08:00
Zhen Ye
1b6edd0b4b
enhance: refactor the consumer grpc proto for reusing grpc stream for multi-consumer (#37564)
issue: #33285

- Modify the proto of consumer of streaming service.
- Make VChannel as a required option for streaming

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-11-11 17:24:29 +08:00
Zhen Ye
49657c4690
enhance: add create segment message, enable empty segment flush (#37407)
issue: #37172

- add redo interceptor to implement append context refresh. (make new
timetick)
- add create segment handler for flusher.
- make empty segment flushable and directly change it into dropped.
- add create segment message into wal when creating new growing segment.
- make the insert operation into following seq: createSegment -> insert
-> insert -> flushSegment.
- make manual flush into following seq: flushTs -> flushsegment ->
flushsegment -> manualflush.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-11-08 10:16:34 +08:00
yihao.dai
994f52fab8
fix: Revert "enhance: Support db for bulkinsert (#37012)" (#37420)
This reverts commit 6e90f9e8d90440716d596a7fe8fe1db465d529b7.

issue: https://github.com/milvus-io/milvus/issues/31273

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-11-07 17:02:25 +08:00
Zhen Ye
cae9e1c732
fix: drop collection failed if enable streaming service (#37444)
issue: #36858

- Start channel manager on datacoord, but with empty assign policy in
streaming service.
- Make collection at dropping state can be recovered by flusher to make
sure that
 milvus consume the dropCollection message.
- Add backoff for flusher lifetime.
- remove the proxy watcher from timetick at rootcoord in streaming
service.

Also see the better fixup: #37176

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-11-07 10:26:26 +08:00
yihao.dai
6e90f9e8d9
enhance: Support db for bulkinsert (#37012)
issue: https://github.com/milvus-io/milvus/issues/31273

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-25 14:31:39 +08:00
yihao.dai
f0b3942a08
enhance: Limit import job number (#36891)
issue: https://github.com/milvus-io/milvus/issues/36890

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-23 16:01:28 +08:00
cai.zhang
2c9bb4dfa3
feat: Support stats task to sort segment by PK (#35054)
issue: #33744 

This PR includes the following changes:
1. Added a new task type to the task scheduler in datacoord: stats task,
which sorts segments by primary key.
2. Implemented segment sorting in indexnode.
3. Added a new field `FieldStatsLog` to SegmentInfo to store token index
information.

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-09-02 14:19:03 +08:00
congqixia
582d2eec79
enhance: Move datanode/indexnode manager to session pkg (#35634)
Related to #28861

Move session manager, worker manager to session package. Also renaming
each manager to corresponding node name(datanode, indexnode).

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-08-22 16:02:56 +08:00
congqixia
c992a61a23
enhance: Separate allocator pkg in datacoord (#35622)
Related to #28861

Move allocator interface and implementation into separate package. Also
update some unittest logic.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-08-22 10:06:56 +08:00
chyezh
9871966415
enhance: segment alloc interceptor (#34996)
#33285

- add segment alloc interceptor for streamingnode.
- add add manual alloc segment rpc for datacoord.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-08-04 07:40:15 +08:00
yihao.dai
a4439cc911
enhance: Implement flusher in streamingNode (#34942)
- Implement flusher to:
  - Manage the pipelines (creation, deletion, etc.)
  - Manage the segment write buffer
  - Manage sync operation (including receive flushMsg and execute flush)
- Add a new `GetChannelRecoveryInfo` RPC in DataCoord.
- Reorganize packages: `flushcommon` and `datanode`.

issue: https://github.com/milvus-io/milvus/issues/33285

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-08-02 18:30:23 +08:00
congqixia
de8a266d8a
enhance: Enable linux code checker (#35084)
See also #34483

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-30 15:53:51 +08:00
jaime
21fc5f5d46
enhance: Remove datanode reporting TT based on MQ implementation (#34421)
issue: #34420

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-07-05 15:48:09 +08:00
yihao.dai
eb5d4de390
fix: Check if the import job exists (#33672)
issue: https://github.com/milvus-io/milvus/issues/33671

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-06-10 21:51:55 +08:00
wayblink
a1232fafda
feat: Major compaction (#33620)
#30633

Signed-off-by: wayblink <anyang.wang@zilliz.com>
Co-authored-by: MrPresent-Han <chun.han@zilliz.com>
2024-06-10 21:34:08 +08:00
smellthemoon
c61fb1eff5
enhance: do check when add not empty logpath (#33640)
meta only store logid

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-06-07 10:19:51 +08:00