milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-07 19:31:51 +08:00

Author	SHA1	Message	Date
wei liu	975c91df16	feat: Add comprehensive snapshot functionality for collections (#44361 ) issue: #44358 Implement complete snapshot management system including creation, deletion, listing, description, and restoration capabilities across all system components. Key features: - Create snapshots for entire collections - Drop snapshots by name with proper cleanup - List snapshots with collection filtering - Describe snapshot details and metadata Components added/modified: - Client SDK with full snapshot API support and options - DataCoord snapshot service with metadata management - Proxy layer with task-based snapshot operations - Protocol buffer definitions for snapshot RPCs - Comprehensive unit tests with mockey framework - Integration tests for end-to-end validation Technical implementation: - Snapshot metadata storage in etcd with proper indexing - File-based snapshot data persistence in object storage - Garbage collection integration for snapshot cleanup - Error handling and validation across all operations - Thread-safe operations with proper locking mechanisms <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - Core invariant/assumption: snapshots are immutable point‑in‑time captures identified by (collection, snapshot name/ID); etcd snapshot metadata is authoritative for lifecycle (PENDING → COMMITTED → DELETING) and per‑segment manifests live in object storage (Avro / StorageV2). GC and restore logic must see snapshotRefIndex loaded (snapshotMeta.IsRefIndexLoaded) before reclaiming or relying on segment/index files. - New capability added: full end‑to‑end snapshot subsystem — client SDK APIs (Create/Drop/List/Describe/Restore + restore job queries), DataCoord SnapshotWriter/Reader (Avro + StorageV2 manifests), snapshotMeta in meta, SnapshotManager orchestration (create/drop/describe/list/restore), copy‑segment restore tasks/inspector/checker, proxy & RPC surface, GC integration, and docs/tests — enabling point‑in‑time collection snapshots persisted to object storage and restorations orchestrated across components. - Logic removed/simplified and why: duplicated recursive compaction/delta‑log traversal and ad‑hoc lookup code were consolidated behind two focused APIs/owners (Handler.GetDeltaLogFromCompactTo for delta traversal and SnapshotManager/SnapshotReader for snapshot I/O). MixCoord/coordinator broker paths were converted to thin RPC proxies. This eliminates multiple implementations of the same traversal/lookup, reducing divergence and simplifying responsibility boundaries. - Why this does NOT introduce data loss or regressions: snapshot create/drop use explicit two‑phase semantics (PENDING → COMMIT/DELETING) with SnapshotWriter writing manifests and metadata before commit; GC uses snapshotRefIndex guards and IsRefIndexLoaded/GetSnapshotBySegment/GetSnapshotByIndex checks to avoid removing referenced files; restore flow pre‑allocates job IDs, validates resources (partitions/indexes), performs rollback on failure (rollbackRestoreSnapshot), and converts/updates segment/index metadata only after successful copy tasks. Extensive unit and integration tests exercise pending/deleting/GC/restore/error paths to ensure idempotence and protection against premature deletion. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2026-01-06 10:15:24 +08:00
Zhen Ye	1cd0ef943e	fix: use latest timetick to expire cache (#45717 ) issue: #45697 --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-11-20 21:39:04 +08:00
congqixia	314ee81712	enhance: standardize mockery config for proxy package (#45030 ) Related to #44761 #38339 This commit consolidates the proxy package's mockery generation to use a centralized `.mockery.yaml` configuration file, aligning with the pattern used by other packages like querycoordv2. Changes - Makefile: Replace multiple individual mockery commands with a single config-based invocation for `generate-mockery-proxy` target - internal/proxy/.mockery.yaml: Add mockery configuration defining all mock interfaces for proxy and proxy/shardclient packages - Mock files: Regenerate mocks using the new configuration: - `mock_cache.go`: Clean up by removing unused interface methods (credential, shard cache, policy methods) - `shardclient/mock_lb_balancer.go`: Update type comments (nodeInfo → NodeInfo) - `shardclient/mock_lb_policy.go`: Update formatting - `shardclient/mock_shardclient_manager.go`: Fix parameter naming consistency (nodeInfo1 → nodeInfo) - task_search_test.go: Remove obsolete mock expectations for deprecated cache methods Benefits - Centralized mockery configuration for easier maintenance - Consistent with other packages (querycoordv2, etc.) - Cleaner mock interfaces by removing unused methods - Better type consistency in generated mocks Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-10-22 16:00:05 +08:00
congqixia	6c34386ff2	enhance: extract shard client logic into dedicated package (#45018 ) Related to #44761 Refactor proxy shard client management by creating a new internal/proxy/shardclient package. This improves code organization and modularity by: - Moving load balancing logic (LookAsideBalancer, RoundRobinBalancer) to shardclient package - Extracting shard client manager and related interfaces into separate package - Relocating shard leader management and client lifecycle code - Adding package documentation (README.md, OWNERS) - Updating proxy code to use the new shardclient package interfaces This change makes the shard client functionality more maintainable and better encapsulated, reducing coupling in the proxy layer. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-10-22 10:22:04 +08:00
wei liu	0b4a17c22b	fix: Fix exclude nodes clearing logic position in load balancer retry (#42577 ) issue: #42561 Move the exclude nodes clearing logic from ExecuteWithRetry to selectNode after shard leader cache refresh to ensure proper retry behavior: - Remove premature exclude clearing in ExecuteWithRetry that happened before shard leader cache update - Add exclude clearing logic in selectNode after refreshing shard leader cache when all replicas are excluded - Ensure multiple retries can properly update shard leader cache and clear exclude list when needed - Add comprehensive tests for edge cases including empty shard leaders and mixed serviceable node scenarios --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-06-17 08:15:24 +08:00
Zhen Ye	5fd47c3c89	fix: mockery too unavailable after upgrade golang version (#41481 ) issue: #41291 pr: #41318 Signed-off-by: chyezh <chyezh@outlook.com>	2025-04-24 10:46:43 +08:00
congqixia	cb7f2fa6fd	enhance: Use v2 package name for pkg module (#39990 ) Related to #39095 https://go.dev/doc/modules/version-numbers Update pkg version according to golang dep version convention --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-02-22 23:15:58 +08:00
Zhen Ye	bb8d1ab3bf	enhance: make new go package to manage proto (#39114 ) issue: #39095 --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-01-10 10:49:01 +08:00
cai.zhang	73aa95f596	fix: Add version to the proxy cache to resolve concurrency issues (#38067 ) issue: #36989 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-12-04 18:06:39 +08:00
SimFG	7c5a8012cf	enhance: remove the collectionBasicInfo class in the proxy metacache (#37874 ) /kind improvement - issue: #37928 Signed-off-by: SimFG <bang.fu@zilliz.com>	2024-11-22 16:10:33 +08:00
wei liu	2a4c00de9d	enhance: Decouple shard client manager from shard cache (#37371 ) issue: #37115 the old implementation update shard cache and shard client manager at same time, which causes lots of conor case due to concurrent issue without lock. This PR decouple shard client manager from shard cache, so only shard cache will be updated if delegator changes. and make sure shard client manager will always return the right client, and create a new client if not exist. in case of client leak, shard client manager will purge client in async for every 10 minutes. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-11-12 10:30:28 +08:00
Bingyi Sun	6851738fd1	fix: fix `make generate-mockery` panic with go1.22 (#36830 ) https://github.com/milvus-io/milvus/issues/36831 Fix `make generate-mockery` panic. Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-10-17 12:11:31 +08:00
jaime	91d23ecbe1	fix: memory leak in proxy meta cache (#36075 ) issue: #36074 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-09-08 17:33:05 +08:00
wei liu	ba02d54a30	enhance: update shard leader cache when leader location changed (#32470 ) issue: #32466 this PR enhance that when shard location changed, update proxy's shard leader cache. in case of query node failover case, proxy can find replica recover --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-08 10:05:29 +08:00
SimFG	c012e6786f	feat: support rate limiter based on db and partition levels (#31070 ) issue: https://github.com/milvus-io/milvus/issues/30577 co-author: @jaime0815 --------- Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com> Signed-off-by: SimFG <bang.fu@zilliz.com> Co-authored-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>	2024-04-12 16:01:19 +08:00
SimFG	b1a1cca10b	feat: add more operation detail info for better allocation (#30438 ) issue: #30436 --------- Signed-off-by: SimFG <bang.fu@zilliz.com>	2024-03-28 06:33:11 +08:00
cai.zhang	40ca98f57f	enhance: Skip timestamp allocation when search/query consistency level is eventually (#29773 ) issue: #29772 1. Skip timestamp allocation when search/query consistency level is eventually. Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-02-21 09:52:59 +08:00
cai.zhang	f619d792c0	enhance: Break down the granularity of collection info cache expired (#29977 ) issue: #29772 1. `DropPartition` only invalidates the cache related to the partition. 2. `CreateAlias` does not invalidate the cache. Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-01-30 16:45:02 +08:00
cai.zhang	d87726e4c7	enhance: Don't expire ShardLeaders cache actively (#29879 ) issue: #29772 The shardLeaders cache does not actively expire, update the cache when search/query fails. Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-01-26 14:13:01 +08:00
congqixia	4f8c540c77	enhance: cache collection schema attributes to reduce proxy cpu (#29668 ) See also #29113 The collection schema is crucial when performing search/query but some of the information is calculated for every request. This PR change schema field of cached collection info into a utility `schemaInfo` type to store some stable result, say pk field, partitionKeyEnabled, etc. And provided field name to id map for search/query services. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-01-04 17:28:46 +08:00
congqixia	bcf8f27aa7	enhance: refine proxy meta cache partition logic (#29315 ) See also #29113 - Unify partition info refresh logic - Prevent parse partition names for each partition key search request --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-12-20 10:02:43 +08:00
PowderLi	20fc90c591	enhance: find collection schema from cache (#28782 ) issue: #28781 #28329 1. There is no need to call `DescribeCollection`, if the collection's schema is found in the globalMetaCache 2. did `GetProperties` to check the access to Azure Blob Service while construct the ChunkManager Signed-off-by: PowderLi <min.li@zilliz.com>	2023-12-03 19:22:33 +08:00
congqixia	1a8cf5c415	Organize all mockery generation commands in Makefile (#26826 ) Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-09-04 21:19:48 +08:00
wei liu	f9823e039f	fix describe rg with non exist collection (#26227 ) Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2023-08-10 18:43:16 +08:00

24 Commits