491 Commits

Author SHA1 Message Date
wei liu
975c91df16
feat: Add comprehensive snapshot functionality for collections (#44361)
issue: #44358

Implement complete snapshot management system including creation,
deletion, listing, description, and restoration capabilities across all
system components.

Key features:
- Create snapshots for entire collections
- Drop snapshots by name with proper cleanup
- List snapshots with collection filtering
- Describe snapshot details and metadata

Components added/modified:
- Client SDK with full snapshot API support and options
- DataCoord snapshot service with metadata management
- Proxy layer with task-based snapshot operations
- Protocol buffer definitions for snapshot RPCs
- Comprehensive unit tests with mockey framework
- Integration tests for end-to-end validation

Technical implementation:
- Snapshot metadata storage in etcd with proper indexing
- File-based snapshot data persistence in object storage
- Garbage collection integration for snapshot cleanup
- Error handling and validation across all operations
- Thread-safe operations with proper locking mechanisms

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant/assumption: snapshots are immutable point‑in‑time
captures identified by (collection, snapshot name/ID); etcd snapshot
metadata is authoritative for lifecycle (PENDING → COMMITTED → DELETING)
and per‑segment manifests live in object storage (Avro / StorageV2). GC
and restore logic must see snapshotRefIndex loaded
(snapshotMeta.IsRefIndexLoaded) before reclaiming or relying on
segment/index files.

- New capability added: full end‑to‑end snapshot subsystem — client SDK
APIs (Create/Drop/List/Describe/Restore + restore job queries),
DataCoord SnapshotWriter/Reader (Avro + StorageV2 manifests),
snapshotMeta in meta, SnapshotManager orchestration
(create/drop/describe/list/restore), copy‑segment restore
tasks/inspector/checker, proxy & RPC surface, GC integration, and
docs/tests — enabling point‑in‑time collection snapshots persisted to
object storage and restorations orchestrated across components.

- Logic removed/simplified and why: duplicated recursive
compaction/delta‑log traversal and ad‑hoc lookup code were consolidated
behind two focused APIs/owners (Handler.GetDeltaLogFromCompactTo for
delta traversal and SnapshotManager/SnapshotReader for snapshot I/O).
MixCoord/coordinator broker paths were converted to thin RPC proxies.
This eliminates multiple implementations of the same traversal/lookup,
reducing divergence and simplifying responsibility boundaries.

- Why this does NOT introduce data loss or regressions: snapshot
create/drop use explicit two‑phase semantics (PENDING → COMMIT/DELETING)
with SnapshotWriter writing manifests and metadata before commit; GC
uses snapshotRefIndex guards and
IsRefIndexLoaded/GetSnapshotBySegment/GetSnapshotByIndex checks to avoid
removing referenced files; restore flow pre‑allocates job IDs, validates
resources (partitions/indexes), performs rollback on failure
(rollbackRestoreSnapshot), and converts/updates segment/index metadata
only after successful copy tasks. Extensive unit and integration tests
exercise pending/deleting/GC/restore/error paths to ensure idempotence
and protection against premature deletion.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2026-01-06 10:15:24 +08:00
cai.zhang
a16d04f5d1
feat: Support ttl field for entity level expiration (#46342)
issue: #46033

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Pull Request Summary: Entity-Level TTL Field Support

### Core Invariant and Design
This PR introduces **per-entity TTL (time-to-live) expiration** via a
dedicated TIMESTAMPTZ field as a fine-grained alternative to
collection-level TTL. The key invariant is **mutual exclusivity**:
collection-level TTL and entity-level TTL field cannot coexist on the
same collection. Validation is enforced at the proxy layer during
collection creation/alteration (`validateTTL()` prevents both being set
simultaneously).

### What Is Removed and Why
- **Global `EntityExpirationTTL` parameter** removed from config
(`configs/milvus.yaml`, `pkg/util/paramtable/component_param.go`). This
was the only mechanism for collection-level expiration. The removal is
safe because:
- The collection-level TTL path (`isEntityExpired(ts)` check) remains
intact in the codebase for backward compatibility
- TTL field check (`isEntityExpiredByTTLField()`) is a secondary path
invoked only when a TTL field is configured
- Existing deployments using collection TTL can continue without
modification
  
The global parameter was removed specifically because entity-level TTL
makes per-entity control redundant with a collection-wide setting, and
the PR chooses one mechanism per collection rather than layering both.

### No Data Loss or Behavior Regression
**TTL filtering logic is additive and safe:**
1. **Collection-level TTL unaffected**: The `isEntityExpired(ts)` check
still applies when no TTL field is configured; callers of
`EntityFilter.Filtered()` pass `-1` as the TTL expiration timestamp when
no field exists, causing `isEntityExpiredByTTLField()` to return false
immediately
2. **Null/invalid TTL values treated safely**: Rows with null TTL or TTL
≤ 0 are marked as "never expire" (using sentinel value `int64(^uint64(0)
>> 1)`) and are preserved across compactions; percentile calculations
only include positive TTL values
3. **Query-time filtering automatic**: TTL filtering is transparently
added to expression compilation via `AddTTLFieldFilterExpressions()`,
which appends `(ttl_field IS NULL OR ttl_field > current_time)` to the
filter pipeline. Entities with null TTL always pass the filter
4. **Compaction triggering granular**: Percentile-based expiration (20%,
40%, 60%, 80%, 100%) allows configurable compaction thresholds via
`SingleCompactionRatioThreshold`, preventing premature data deletion

### Capability Added: Per-Entity Expiration with Data Distribution
Awareness
Users can now specify a TIMESTAMPTZ collection property `ttl_field`
naming a schema field. During data writes, TTL values are collected per
segment and percentile quantiles (5-value array) are computed and stored
in segment metadata. At query time, the TTL field is automatically
filtered. At compaction time, segment-level percentiles drive
expiration-based compaction decisions, enabling intelligent compaction
of segments where a configurable fraction of data has expired (e.g.,
compact when 40% of rows are expired, controlled by threshold ratio).
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2026-01-05 10:27:24 +08:00
cai.zhang
de3050be54
doc: [skip e2e]Add design document for entity level ttl (#46406)
issue: #46033

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-12-21 19:13:17 +08:00
Buqian Zheng
846cf52a95
enhance: Remove unused vector plan node subclasses (#44453)
Remove redundant `VectorPlanNode` subclasses and simplify the visitor
pattern by consolidating to a single `VectorPlanNode`.

The previous design used distinct `VectorPlanNode` subclasses and a
templated `VectorVisitorImpl` for type-directed dispatch. However, the
template parameter was not functionally used to implement different
logic for each vector type, making the subclasses redundant for their
intended purpose.

This PR is created by Cursor Agent and manually moved from
https://github.com/zhengbuqian/milvus/pull/14.

Signed-off-by: zhengbuqian <zhengbuqian@gmail.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: buqian.zheng <buqian.zheng@zilliz.com>
2025-09-22 18:00:27 +08:00
ZhuXi
d079947bdf
doc: [skip e2e] Add research reports on geographic systems (#43737)
issue: #43427
Upload research reports on geographic information systems.

Signed-off-by: Yinwei Li <yinwei.li@zilliz.com>
2025-08-25 15:29:51 +08:00
Xiaofan
1025aaa47b
doc: design doc for row level security (#42652)
the new design doc for row level security

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
2025-06-11 08:06:23 +08:00
Xiaofan
ad45a56776
doc: add support for primary index (#41462)
add design doc discuss about memory efficient primary key index.

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
2025-04-29 19:47:57 +08:00
Buqian Zheng
3de904c7ea
feat: add cachinglayer to sealed segment (#41436)
issue: https://github.com/milvus-io/milvus/issues/41435

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-04-28 10:52:40 +08:00
Xiaofan
73bda08ec9
doc: json storage format (#40479)
the design doc for the json storage improvemnet

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
2025-03-08 09:48:02 +08:00
junjiejiangjjj
097d167e96
doc: Update tools info (#39244)
Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>
2025-01-14 16:53:00 +08:00
Yinzuo Jiang
3628593d20
feat: Implement custom function module in milvus expr (#36560)
OSPP 2024 project:
https://summer-ospp.ac.cn/org/prodetail/247410235?list=org&navpage=org

Solutions:

- parser (planparserv2)
    - add CallExpr in planparserv2/Plan.g4
    - update parser_visitor and show_visitor
- grpc protobuf
    - add CallExpr in plan.proto
- execution (`core/src/exec`)
- add `CallExpr` `ValueExpr` and `ColumnExpr` (both logical and
physical) for function call and function parameters
- function factory (`core/src/exec/expression/function`)
    - create a global hashmap when starting milvus (see server.go)
- the global hashmap stores function signatures and their function
pointers, the CallExpr in execution engine can get the function pointer
by function signature.
- custom functions
    - empty(string)
    - starts_with(string, string)
- add cpp/go unittests and E2E tests

closes: #36559

Signed-off-by: Yinzuo Jiang <jiangyinzuo@foxmail.com>
2024-10-25 15:25:30 +08:00
Yinzuo Jiang
7d74edd6dd
fix: update clang-tidy and clang-format from 10 to 12 (#33141)
Default llvm toolchain version in Ubuntu 20.04 is 10, while Ubuntu 22.04
does not have `clang-tidy-10` or `clang-format-10` by default.

issue: #33142

Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>
Signed-off-by: Yinzuo Jiang <jiangyinzuo@foxmail.com>
2024-06-13 15:27:58 +08:00
shaoting-huang
ca0cf9b3b1
doc: fix typos in design docs (#32885)
Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2024-05-09 15:43:30 +08:00
Sheldon
351c64b606
fix some typos (#27851)
1. fix some typos in md,yaml #22893

Signed-off-by: Sheldon <chuanfeng.liu@zilliz.com>
2023-10-24 09:30:10 +08:00
congqixia
e02670eae9
[Design Doc] Remove runtime dependency of datacoord from datanode (#27183)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-09-26 14:05:26 +08:00
congqixia
baddf3d438
Add design doc for collection-level auto compaction switch (#24041)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-05-11 15:57:20 +08:00
smellthemoon
2afc982ce1
[MEP]Default Value (#23343)
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2023-04-20 10:56:31 +08:00
congqixia
aca442e985
[DOC] Add QueryNodev2 design doc (#23478)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-04-18 11:44:30 +08:00
yah01
c855ea3171
MEP for search by primary keys (#23193)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-04-07 18:26:29 +08:00
Enwei Jiao
940ead200a
Update Development.md (#23207)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-04-04 13:38:28 +08:00
Enwei Jiao
66f50fd354
Add design doc for dynamic config (#23115)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-03-30 08:48:21 +08:00
Enwei Jiao
d2f95176e9
Organize design document directory (#22972)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-03-24 15:33:59 +08:00
jaime
d126f06946
Decouple mq module from internal proto definition (#22536)
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-03-04 23:21:50 +08:00
jaime
58b79eb74c
Add based on timetravel GC for snapshot KV (#21417)
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-01-04 21:37:35 +08:00
zhuwenxing
3b1030de2b
[skip e2e]Fix bad link in doc (#15525)
Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>
2022-02-11 09:31:47 +08:00
min.tian
2b1625819d
[skip e2e] Check the grammar of segcore/timetravel.md (#15182)
Signed-off-by: min.tian <min.tian.cn@gmail.com>
2022-02-09 10:23:47 +08:00
min.tian
a0a88f1b48
[skip e2e] Check the grammar of segment_interface.md (#15174)
Signed-off-by: min.tian <min.tian.cn@gmail.com>
2022-02-09 10:21:52 +08:00
min.tian
06535eb7eb
[skip e2e] Check the grammar of segment_growing.md (#15134)
Signed-off-by: min.tian <min.tian.cn@gmail.com>
2022-02-09 10:16:10 +08:00
Xieql
17887de140
[skip e2e] Improve annotation (#15071)
Signed-off-by: Xieql <qianglong.xie@zilliz.com>
2022-01-10 13:26:24 +08:00
Bennu
513626c69d
[skip e2e] Fix grammar (#15066)
Signed-off-by: Bennu <yunmei.li@zilliz.com>
2022-01-10 13:20:40 +08:00
Bennu
0a0cf9bf94
[skip e2e] Fix grammar (#15065)
Signed-off-by: Bennu <yunmei.li@zilliz.com>
2022-01-10 13:18:42 +08:00
Bennu
65ea6b9172
[skip e2e] Fix grammar (#15064)
Signed-off-by: Bennu <yunmei.li@zilliz.com>
2022-01-10 13:16:48 +08:00
min.tian
0c621fa314
[skip e2e] Check the grammar of segcore/Search.md (#15047)
Signed-off-by: min.tian <min.tian.cn@gmail.com>
2022-01-10 09:47:36 +08:00
Bennu
a1c29d4709
[skip e2e] Fix grammar (#15010)
Signed-off-by: Bennu <yunmei.li@zilliz.com>
2022-01-07 19:03:49 +08:00
Bennu
3aaf65efa9
[skip e2e] Fix grammar (#15009)
Signed-off-by: Bennu <yunmei.li@zilliz.com>
2022-01-07 19:01:56 +08:00
Bennu
56eb069494
[skip e2e] Fix grammar (#15011)
Signed-off-by: Bennu <yunmei.li@zilliz.com>
2022-01-07 18:53:49 +08:00
min.tian
c666645629
[skip e2e] Check the syntax of index design doc (#14973)
Signed-off-by: min.tian <min.tian.cn@gmail.com>
2022-01-07 13:16:22 +08:00
Bennu
b2da0c67f4
[skip e2e] Fix grammar (#14940)
Signed-off-by: Bennu <yunmei.li@zilliz.com>
2022-01-06 18:25:26 +08:00
Bennu
1db9b1b04d
[skip e2e] Fix grammar (#14939)
Signed-off-by: Bennu <yunmei.li@zilliz.com>
2022-01-06 18:23:33 +08:00
Bennu
de676985f9
[skip e2e] Fix grammar (#14938)
Signed-off-by: Bennu <yunmei.li@zilliz.com>
2022-01-06 18:21:41 +08:00
min.tian
a40c5224bd
[skip e2e] Check the syntax of mep-template md (#14890)
Signed-off-by: min.tian <min.tian.cn@gmail.com>
2022-01-06 13:15:56 +08:00
groot
7670bcec36
[skip e2e] Fix typo for design doc (#14920)
Signed-off-by: yhmo <yihua.mo@zilliz.com>
2022-01-06 10:57:20 +08:00
groot
3db94a7d0c
[skip e2e] Fix typo for design doc (#14919)
Signed-off-by: yhmo <yihua.mo@zilliz.com>
2022-01-06 10:55:26 +08:00
groot
ab4efd1d3b
[skip e2e] Fix typo for design doc (#14917)
Signed-off-by: yhmo <yihua.mo@zilliz.com>
2022-01-06 10:53:30 +08:00
yanliang567
545bffa763
[skip e2e]Fix a grammar issue (#14846)
Signed-off-by: yanliang567 <yanliang.qiao@zilliz.com>
2022-01-05 14:35:19 +08:00
Bennu
b6ad963eb2
[skip e2e] Fix grammar (#14832)
Signed-off-by: Bennu <yunmei.li@zilliz.com>
2022-01-05 13:23:19 +08:00
Bennu
19e0b5099f
[skip e2e] Fix grammar (#14831)
Signed-off-by: Bennu <yunmei.li@zilliz.com>
2022-01-05 13:21:24 +08:00
Bennu
08fd1ae227
[skip e2e] Fix grammar (#14830)
Signed-off-by: Bennu <yunmei.li@zilliz.com>
2022-01-05 13:19:31 +08:00
min.tian
bdd32e57e4
[skip e2e] Check the syntax of query_boolean_expr doc (#14798)
Signed-off-by: min.tian <min.tian.cn@gmail.com>
2022-01-05 09:49:24 +08:00
Xieql
ee853ff7db
[skip e2e] Improve annotation (#14711)
Signed-off-by: Xieql <qianglong.xie@zilliz.com>
2022-01-04 19:57:34 +08:00