mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-07 19:31:51 +08:00

History

feat: Add comprehensive snapshot functionality for collections (#44361 )

issue: #44358

Implement complete snapshot management system including creation,
deletion, listing, description, and restoration capabilities across all
system components.

Key features:
- Create snapshots for entire collections
- Drop snapshots by name with proper cleanup
- List snapshots with collection filtering
- Describe snapshot details and metadata

Components added/modified:
- Client SDK with full snapshot API support and options
- DataCoord snapshot service with metadata management
- Proxy layer with task-based snapshot operations
- Protocol buffer definitions for snapshot RPCs
- Comprehensive unit tests with mockey framework
- Integration tests for end-to-end validation

Technical implementation:
- Snapshot metadata storage in etcd with proper indexing
- File-based snapshot data persistence in object storage
- Garbage collection integration for snapshot cleanup
- Error handling and validation across all operations
- Thread-safe operations with proper locking mechanisms

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant/assumption: snapshots are immutable point‑in‑time
captures identified by (collection, snapshot name/ID); etcd snapshot
metadata is authoritative for lifecycle (PENDING → COMMITTED → DELETING)
and per‑segment manifests live in object storage (Avro / StorageV2). GC
and restore logic must see snapshotRefIndex loaded
(snapshotMeta.IsRefIndexLoaded) before reclaiming or relying on
segment/index files.

- New capability added: full end‑to‑end snapshot subsystem — client SDK
APIs (Create/Drop/List/Describe/Restore + restore job queries),
DataCoord SnapshotWriter/Reader (Avro + StorageV2 manifests),
snapshotMeta in meta, SnapshotManager orchestration
(create/drop/describe/list/restore), copy‑segment restore
tasks/inspector/checker, proxy & RPC surface, GC integration, and
docs/tests — enabling point‑in‑time collection snapshots persisted to
object storage and restorations orchestrated across components.

- Logic removed/simplified and why: duplicated recursive
compaction/delta‑log traversal and ad‑hoc lookup code were consolidated
behind two focused APIs/owners (Handler.GetDeltaLogFromCompactTo for
delta traversal and SnapshotManager/SnapshotReader for snapshot I/O).
MixCoord/coordinator broker paths were converted to thin RPC proxies.
This eliminates multiple implementations of the same traversal/lookup,
reducing divergence and simplifying responsibility boundaries.

- Why this does NOT introduce data loss or regressions: snapshot
create/drop use explicit two‑phase semantics (PENDING → COMMIT/DELETING)
with SnapshotWriter writing manifests and metadata before commit; GC
uses snapshotRefIndex guards and
IsRefIndexLoaded/GetSnapshotBySegment/GetSnapshotByIndex checks to avoid
removing referenced files; restore flow pre‑allocates job IDs, validates
resources (partitions/indexes), performs rollback on failure
(rollbackRestoreSnapshot), and converts/updates segment/index metadata
only after successful copy tasks. Extensive unit and integration tests
exercise pending/deleting/GC/restore/error paths to ensure idempotence
and protection against premature deletion.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>

2026-01-06 10:15:24 +08:00

base

feat: Add comprehensive snapshot functionality for collections (#44361 )

2026-01-06 10:15:24 +08:00

common

feat: [GoSDK] add QueryIterator support for Go client (#46633 )

2025-12-27 01:43:20 +08:00

ruleguard

enhance: Add rules and fix for go_client e2e code style (#34033 )

2024-06-21 10:32:02 +08:00

testcases

feat: Add comprehensive snapshot functionality for collections (#44361 )

2026-01-06 10:15:24 +08:00

.golangci.yml

enhance: cleanup lint check exclusions (#40829 )

2025-03-21 18:12:14 +08:00

Dockerfile

enhance: Bump go version to 1.24.11 fixing CVE (#46034 )

2025-12-03 16:11:11 +08:00

go.mod

feat: Add comprehensive snapshot functionality for collections (#44361 )

2026-01-06 10:15:24 +08:00

go.sum

enhance: Return FlushAllMsg in response (#46347 )

2025-12-16 10:35:16 +08:00

README.md

test: Split gosdk cases into different packages and add rg cases (#39694 )

2025-02-26 16:27:59 +08:00

README.md

Milvus Go Client Test Framework

Overview

This is a comprehensive test framework for the Milvus Go Client, designed to validate various functionalities of the Milvus vector database client. The framework provides a structured approach to writing tests with reusable components and helper functions.

Framework Architecture

Directory Structure

/go_client/
├── testcases/           # Main test cases
│   ├── helper/          # Helper functions and utilities
│   │   ├── helper.go
│   │   ├── data_helper.go
│   │   └── collection_helper.go
│   ├── search_test.go   # Search functionality tests
│   ├── index_test.go    # Index management tests
│   └── ...
├── common/             # Common utilities and constants
└── base/               # Base infrastructure code

Key Components

Collection Preparation: Utilities for creating and managing collections
Data Generation: Tools for generating test data
Helper Functions: Common operations and validations
Test Cases: Organized by functionality

Writing Test Cases

Basic Test Structure

func TestYourFeature(t *testing.T) {
    // 1. Setup context and client
    ctx := hp.CreateContext(t, time.Second*common.DefaultTimeout)
    mc := createDefaultMilvusClient(ctx, t)

    // 2. Prepare collection
    prepare, schema := hp.CollPrepare.CreateCollection(
        ctx, t, mc,
        hp.NewCreateCollectionParams(hp.Int64Vec),
        hp.TNewFieldsOption(),
        hp.TNewSchemaOption(),
    )

    // 3. Insert test data
    prepare.InsertData(ctx, t, mc,
        hp.NewInsertParams(schema),
        hp.TNewDataOption(),
    )

    // 4. Execute test operations
    // ... your test logic here ...

    // 5. Validate results
    require.NoError(t, err)
    require.Equal(t, expected, actual)
}

Using Custom Parameters

Collection Creation Parameters

fieldsOption := hp.TNewFieldsOption().
    TWithEnableAnalyzer(true).
    TWithAnalyzerParams(map[string]any{
        "tokenizer": "standard",
    })

schemaOption := hp.TNewSchemaOption().
    TWithEnableDynamicField(true).
    TWithDescription("Custom schema").
    TWithAutoID(false)

Data Insertion Options

insertOption := hp.TNewDataOption().
    TWithNb(1000).           // Number of records
    TWithDim(128).           // Vector dimension
    TWithStart(100).         // Starting ID
    TWithMaxLen(256).        // Maximum length
    TWithTextLang("en")      // Text language

Index Parameters

indexParams := hp.TNewIndexParams(schema).
    TWithFieldIndex(map[string]index.Index{
        common.DefaultVectorFieldName: index.NewIVFSQIndex(
            &index.IVFSQConfig{
                MetricType: entity.L2,
                NList:     128,
            },
        ),
    })

Search Parameters

searchOpt := client.NewSearchOption(schema.CollectionName, 100, vectors).
    WithOffset(0).
    WithLimit(100).
    WithConsistencyLevel(entity.ClStrong).
    WithFilter("int64 >= 100").
    WithOutputFields([]string{"*"}).
    WithSearchParams(map[string]any{
        "nprobe": 16,
        "ef":     64,
    })

Adding New Parameters

Define New Option Type

// In helper/data_helper.go
type YourNewOption struct {
    newParam1 string
    newParam2 int
}

Add Constructor

func TNewYourOption() *YourNewOption {
    return &YourNewOption{
        newParam1: "default",
        newParam2: 0,
    }
}

Add Parameter Methods

func (opt *YourNewOption) TWithNewParam1(value string) *YourNewOption {
    opt.newParam1 = value
    return opt
}

func (opt *YourNewOption) TWithNewParam2(value int) *YourNewOption {
    opt.newParam2 = value
    return opt
}

Best Practices

Test Organization
- Group related tests in the same file
- Use clear and descriptive test names
- Add comments explaining test purpose
Data Generation
- Use helper functions for generating test data
- Ensure data is appropriate for the test case
- Clean up test data after use
Error Handling
- Use common.CheckErr for consistent error checking
- Test both success and failure scenarios
- Validate error messages when appropriate
Performance Considerations
- Use appropriate timeouts
- Clean up resources after tests
- Consider test execution time

Running Tests

# Run all tests
go test ./testcases/...

# Run specific test
go test -run TestYourFeature ./testcases/

# Run with verbose output
go test -v ./testcases/...

# gotestsum
Recommend you to use gotestsum https://github.com/gotestyourself/gotestsum

# Run all default cases
gotestsum --format testname --hide-summary=output -v ./testcases/... --addr=127.0.0.1:19530 -timeout=30m

# Run a specified file
gotestsum --format testname --hide-summary=output ./testcases/collection_test.go ./testcases/main_test.go --addr=127.0.0.1:19530

# Run L3 rg cases
gotestsum --format testname --hide-summary=output -v ./testcases/advcases/... --addr=127.0.0.1:19530 -timeout=30m -tags=rg

# Run advanced rg cases and default cases
# rg cases conflicts with default cases, so -p=1 is required
gotestsum --format testname --hide-summary=output -v ./testcases/... --addr=127.0.0.1:19530 -timeout=30m -tags=rg -p 1

Contributing

Follow the existing code structure
Add comprehensive test cases
Document new parameters and options
Update this README for significant changes
Ensure code quality standards:
- Run golangci-lint run to check for style mistakes
- Use gofmt -w your/code/path to format your code before submitting
- CI will verify both golint and go format compliance