issue: #43897, #44123
pr: #45224
also pick pr: #45216,#45154,#45033,#45145,#45092,#45058,#45029
enhance: Close channel replicator more gracefully (#45029)
issue: https://github.com/milvus-io/milvus/issues/44123
enhance: Show create time for import job (#45058)
issue: https://github.com/milvus-io/milvus/issues/45056
fix: wal state may be unconsistent after recovering from crash (#45092)
issue: #45088, #45086
- Message on control channel should trigger the checkpoint update.
- LastConfrimedMessageID should be recovered from the minimum of
checkpoint or the LastConfirmedMessageID of uncommitted txn.
- Add more log info for wal debugging.
fix: make ack of broadcaster cannot canceled by client (#45145)
issue: #45141
- make ack of broadcaster cannot canceled by rpc.
- make clone for assignment snapshot of wal balancer.
- add server id for GetReplicateCheckpoint to avoid failure.
enhance: support collection and index with WAL-based DDL framework
(#45033)
issue: #43897
- Part of collection/index related DDL is implemented by WAL-based DDL
framework now.
- Support following message type in wal, CreateCollection,
DropCollection, CreatePartition, DropPartition, CreateIndex, AlterIndex,
DropIndex.
- Part of collection/index related DDL can be synced by new CDC now.
- Refactor some UT for collection/index DDL.
- Add Tombstone scheduler to manage the tombstone GC for collection or
partition meta.
- Move the vchannel allocation into streaming pchannel manager.
enhance: support load/release collection/partition with WAL-based DDL
framework (#45154)
issue: #43897
- Load/Release collection/partition is implemented by WAL-based DDL
framework now.
- Support AlterLoadConfig/DropLoadConfig in wal now.
- Load/Release operation can be synced by new CDC now.
- Refactor some UT for load/release DDL.
enhance: Don't start cdc by default (#45216)
issue: https://github.com/milvus-io/milvus/issues/44123
fix: unrecoverable when replicate from old (#45224)
issue: #44962
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: yihao.dai <yihao.dai@zilliz.com>
issue: #43897, #44123
pr: #44898
related pr: #44607#44642#44792#44809#44564#44560#44735#44822#44865#44850#44942#44874#44963#44886#44898
enhance: remove redundant channel manager from datacoord (#44532)
issue: #41611
- After enabling streaming arch, channel manager of data coord is a
redundant component.
fix: Fix CDC OOM due to high buffer size (#44607)
Fix CDC OOM by:
1. free msg buffer manually.
2. limit max msg buffer size.
3. reduce scanner msg hander buffer size.
issue: https://github.com/milvus-io/milvus/issues/44123
fix: remove wrong start timetick to avoid filtering DML whose timetick
is less than it. (#44691)
issue: #41611
- introduced by #44532
enhance: support remove cluster from replicate topology (#44642)
issue: #44558, #44123
- Update config(A->C) to A and C, config(B) to B on replicate topology
(A->B,A->C) can remove the B from replicate topology
- Fix some metric error of CDC
fix: check if qn is sqn with label and streamingnode list (#44792)
issue: #44014
- On standalone, the query node inside need to load segment and watch
channel, so the querynode is not a embeded querynode in streamingnode
without `LabelStreamingNodeEmbeddedQueryNode`. The channel dist manager
can not confirm a standalone node is a embededStreamingNode.
Bug is introduced by #44099
enhance: Make GetReplicateInfo API work at the pchannel level (#44809)
issue: https://github.com/milvus-io/milvus/issues/44123
enhance: Speed up CDC scheduling (#44564)
Make CDC watch etcd replicate pchannel meta instead of listing them
periodically.
issue: https://github.com/milvus-io/milvus/issues/44123
enhance: refactor update replicate config operation using
wal-broadcast-based DDL/DCL framework (#44560)
issue: #43897
- UpdateReplicateConfig operation will broadcast AlterReplicateConfig
message into all pchannels with cluster-exclusive-lock.
- Begin txn message will use commit message timetick now (to avoid
timetick rollback when CDC with txn message).
- If current cluster is secondary, the UpdateReplicateConfig will wait
until the replicate configuration is consistent with the config
replicated from primary.
enhance: support rbac with WAL-based DDL framework (#44735)
issue: #43897
- RBAC(Roles/Users/Privileges/Privilege Groups) is implemented by
WAL-based DDL framework now.
- Support following message type in wal `AlterUser`, `DropUser`,
`AlterRole`, `DropRole`, `AlterUserRole`, `DropUserRole`,
`AlterPrivilege`, `DropPrivilege`, `AlterPrivilegeGroup`,
`DropPrivilegeGroup`, `RestoreRBAC`.
- RBAC can be synced by new CDC now.
- Refactor some UT for RBAC.
enhance: support database with WAL-based DDL framework (#44822)
issue: #43897
- Database related DDL is implemented by WAL-based DDL framework now.
- Support following message type in wal CreateDatabase, AlterDatabase,
DropDatabase.
- Database DDL can be synced by new CDC now.
- Refactor some UT for Database DDL.
enhance: support alias with WAL-based DDL framework (#44865)
issue: #43897
- Alias related DDL is implemented by WAL-based DDL framework now.
- Support following message type in wal AlterAlias, DropAlias.
- Alias DDL can be synced by new CDC now.
- Refactor some UT for Alias DDL.
enhance: Disable import for replicating cluster (#44850)
1. Import in replicating cluster is not supported yet, so disable it for
now.
2. Remove GetReplicateConfiguration wal API
issue: https://github.com/milvus-io/milvus/issues/44123
fix: use short debug string to avoid newline in debug logs (#44925)
issue: #44924
fix: rerank before requery if reranker didn't use field data (#44942)
issue: #44918
enhance: support resource group with WAL-based DDL framework (#44874)
issue: #43897
- Resource group related DDL is implemented by WAL-based DDL framework
now.
- Support following message type in wal AlterResourceGroup,
DropResourceGroup.
- Resource group DDL can be synced by new CDC now.
- Refactor some UT for resource group DDL.
fix: Fix Fix replication txn data loss during chaos (#44963)
Only confirm CommitMsg for txn messages to prevent data loss.
issue: https://github.com/milvus-io/milvus/issues/44962,
https://github.com/milvus-io/milvus/issues/44123
fix: wrong execution order of DDL/DCL on secondary (#44886)
issue: #44697, #44696
- The DDL executing order of secondary keep same with order of control
channel timetick now.
- filtering the control channel operation on shard manager of
streamingnode to avoid wrong vchannel of create segment.
- fix that the immutable txn message lost replicate header.
fix: Fix primary-secondary replication switch blocking (#44898)
1. Fix primary-secondary replication switchover blocking by delete
replicate pchannel meta using modRevision.
2. Stop channel replicator(scanner) when cluster role changes to prevent
continued message consumption and replication.
3. Close Milvus client to prevent goroutine leak.
4. Create Milvus client once for a channel replicator.
5. Simplify CDC controller and resources.
issue: https://github.com/milvus-io/milvus/issues/44123
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: yihao.dai <yihao.dai@zilliz.com>
Cherry-pick from master
pr: #45018#45030
Related to #44761
Refactor proxy shard client management by creating a new
internal/proxy/shardclient package. This improves code organization and
modularity by:
- Moving load balancing logic (LookAsideBalancer, RoundRobinBalancer) to
shardclient package
- Extracting shard client manager and related interfaces into separate
package
- Relocating shard leader management and client lifecycle code
- Adding package documentation (README.md, OWNERS)
- Updating proxy code to use the new shardclient package interfaces
This change makes the shard client functionality more maintainable and
better encapsulated, reducing coupling in the proxy layer.
Also consolidates the proxy package's mockery generation to use a
centralized `.mockery.yaml` configuration file, aligning with the
pattern used by other packages like querycoordv2.
Changes
- **Makefile**: Replace multiple individual mockery commands with a
single config-based invocation for `generate-mockery-proxy` target
- **internal/proxy/.mockery.yaml**: Add mockery configuration defining
all mock interfaces for proxy and proxy/shardclient packages
- **Mock files**: Regenerate mocks using the new configuration:
- `mock_cache.go`: Clean up by removing unused interface methods
(credential, shard cache, policy methods)
- `shardclient/mock_lb_balancer.go`: Update type comments
(nodeInfo → NodeInfo)
- `shardclient/mock_lb_policy.go`: Update formatting
- `shardclient/mock_shardclient_manager.go`: Fix parameter naming
consistency (nodeInfo1 → nodeInfo)
- **task_search_test.go**: Remove obsolete mock expectations for
deprecated cache methods
Benefits
- Centralized mockery configuration for easier maintenance
- Consistent with other packages (querycoordv2, etc.)
- Cleaner mock interfaces by removing unused methods
- Better type consistency in generated mocks
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Cherry-pick from master
pr: #44762
Related to #44761
This commit refactors the privilege management system in the proxy
component by:
1. **Separation of Concerns**: Extracts privilege-related functionality
from MetaCache into a dedicated `internal/proxy/privilege` package,
improving code organization and maintainability.
2. **New Package Structure**: Creates `internal/proxy/privilege/` with:
- `cache.go`: Core privilege cache implementation (PrivilegeCache)
- `result_cache.go`: Privilege enforcement result caching
- `model.go`: Casbin model and policy enforcement functions
- `meta_cache_adapter.go`: Casbin adapter for MetaCache integration
- Corresponding test files and mock implementations
3. **MetaCache Simplification**: Removes privilege and credential
management methods from MetaCache interface and implementation:
- Removed: GetCredentialInfo, RemoveCredential, UpdateCredential
- Removed: GetPrivilegeInfo, GetUserRole, RefreshPolicyInfo,
InitPolicyInfo
- Deleted: meta_cache_adapter.go, privilege_cache.go and their tests
4. **Updated References**: Updates all callsites to use the new
privilegeCache global:
- Authentication interceptor now uses privilegeCache for password
verification
- Credential cache operations (InvalidateCredentialCache,
UpdateCredentialCache, UpdateCredential) now use privilegeCache
- Policy refresh operations (RefreshPolicyInfoCache) now use
privilegeCache
- Privilege interceptor uses new privilege.GetEnforcer() and privilege
result cache
5. **Improved API**: Renames cache functions for clarity:
- GetPrivilegeCache → GetResultCache
- SetPrivilegeCache → SetResultCache
- CleanPrivilegeCache → CleanResultCache
This refactoring makes the codebase more modular, separates privilege
management concerns from general metadata caching, and provides a
clearer API for privilege enforcement operations.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
#44212
Also, record metrics only when storageUsageTracking is enabled.
Use MB for scanned_remote counter and scanned_total counter metrics to
avoid overflow.
---------
Signed-off-by: chasingegg <chao.gao@zilliz.com>
issue: #44212
Implement search/query storage usage statistics in go side(result
reduce), only record storage usage in vector search C++ path. Need to be
implemented in query c++ path in next prs.
---------
Signed-off-by: chasingegg <chao.gao@zilliz.com>
Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com>
Co-authored-by: marcelo.chen <marcelo.chen@zilliz.com>
issue: #44156
Enhance FlushAll functionality to support targeting specific collections
within databases instead of only database-level flushing.
Changes include:
- Add FlushAllTarget message in data_coord.proto for granular targeting
- Support collection-specific flush operations within databases
- Maintain backward compatibility with deprecated db_name field
This enhancement allows users to flush specific collections without
affecting other collections in the same database, providing more precise
control over data persistence operations.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
1. Enable Milvus to read cipher configs
2. Enable cipher plugin in binlog reader and writer
3. Add a testCipher for unittests
4. Support pooling for datanode
5. Add encryption in storagev2
See also: #40321
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
---------
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
Related to #43966#43809
This PR:
- Replace distributed request metrics collection into one interceptor
- Add `Retry` and `Reject` label represents auth rejection and
retry-able error cases
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #29735
Implement partial field update functionality for upsert operations,
supporting scalar, vector, and dynamic JSON fields without requiring all
collection fields.
Changes:
- Add queryPreExecute to retrieve existing records before upsert
- Implement UpdateFieldData function for merging data
- Add IDsChecker utility for efficient primary key lookups
- Fix JSON data creation in tests using proper map marshaling
- Add test cases for partial updates of different field types
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Replace multiple per-table flush RPC calls with single FlushAll RPC to
improve performance in multi-table scenarios.
issue: #43338
- Implement server-side FlushAll request processing in
DataCoord/MixCoord
- Add flushAllTask to handle unified flush operations across all tables
- Replace proxy-side per-table flush iteration with single RPC call
- Support both streaming and non-streaming service execution paths
- Add comprehensive unit tests for new FlushAll implementation
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Related to #40756
Large nq will naturally increase query time, which causing lots of slow
log when user NQ numbers are very large.
This PR make slow search counts span per nq (using avg val) to decide
whether one request is slow or not.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Merge RootCoord, DataCoord And QueryCoord into MixCoord
Make Session into one
issue : https://github.com/milvus-io/milvus/issues/37764
---------
Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
after the pr merged, we can support to insert, upsert, build index,
query, search in the added field.
can only do the above operates in added field after add field request
complete, which is a sync operate.
compact will be supported in the next pr.
#39718
---------
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
issue: #38399
related PR: #39522
- Just implement exclusive broadcaster between broadcast message with
same resource key to keep same order in different wal.
- After simplify the broadcast model, original watch-based broadcast is
too complicated and redundant, remove it.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
enhance: Add schema update time verification for insert and upsert to
use cache
issue: https://github.com/milvus-io/milvus/issues/39093
---------
Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
issue: #36621#39417
1. Adjust the server-side cache size.
2. Add source information for configurations.
3. Add node ID for compaction and indexing tasks.
4. Resolve localhost access issues to fix health check failures for
etcd.
Signed-off-by: jaime <yun.zhang@zilliz.com>