doc: [skip e2e]Add design document for entity level ttl (#46406)

issue: #46033 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-12-28 06:25:28 +08:00 · 2025-12-21 19:13:17 +08:00 · 2025-12-21 19:13:17 +08:00 · de3050be54
commit de3050be54
parent 32809c1053
1 changed files with 298 additions and 0 deletions
--- a/docs/design_docs/entitiy_level_ttl.md
+++ b/docs/design_docs/entitiy_level_ttl.md
@ -0,0 +1,298 @@
+# Entity-level TTL Design
+
+## Background
+
+Currently, Milvus supports **collection-level TTL** for data expiration, but does not support defining an independent expiration time for individual entities (rows). As application scenarios become more diverse, for example:
+
+* Data from different tenants or businesses stored in the same collection but with different lifecycles;
+* Hot and cold data mixed together, where short-lived data should be cleaned automatically while long-term data is retained;
+* IoT / logging / MLOps data that requires record-level retention policies;
+
+Relying solely on collection-level TTL can no longer satisfy these requirements. If users want to retain only part of the data, they must periodically perform **upsert** operations to refresh the timestamps of those entities. This approach is unintuitive and increases operational and maintenance costs.
+
+Therefore, **Entity-level TTL** becomes a necessary feature.
+
+Related issues:
+
+* [milvus-io/milvus#45917](https://github.com/milvus-io/milvus/issues/45917)
+* [milvus-io/milvus#45923](https://github.com/milvus-io/milvus/issues/45923)
+
+---
+
+## Design Principles
+
+* Fully compatible with existing collection-level TTL behavior.
+* Allow users to choose whether to enable entity-level TTL.
+* User-controllable: support explicit declaration in schema or transparent system management.
+* Minimize changes to compaction and query logic; expiration is determined only by the TTL column and write timestamp.
+* Support dynamic upgrade for existing collections.
+
+---
+
+## Basic Approach
+
+Milvus already supports the `TIMESTAMPTZ` data type. Entity TTL information will therefore be stored in a field of this type.
+
+---
+
+## Design Details
+
+Entity-level TTL is implemented by allowing users to explicitly add a `TIMESTAMPTZ` column in the schema and mark it in collection properties:
+
+```text
+"collection.ttl.field": "ttl"
+```
+
+Here, `ttl` is the name of the column that stores TTL information. This mechanism is **mutually exclusive** with collection-level TTL.
+
+---
+
+### Terminology and Conventions
+
+* **TTL column / TTL field** : A field of type `TIMESTAMPTZ` declared in the schema and marked with `is_ttl = true`.
+* **ExpireAt** : The value stored in the TTL field, representing the absolute expiration timestamp of an entity (UTC by default if no timezone is specified).
+* **Collection-level TTL** : The existing mechanism where retention duration is defined at the collection level (e.g., retain 30 days).
+* **insert_ts / mvcc_ts** : Existing Milvus write or MVCC timestamps, used as fallback when needed.
+* **expirationTimeByPercentile** : A time point corresponding to a certain percentile of expired data within a segment, used to quickly determine whether compaction should be triggered.
+
+Example:
+
+* 20% of data expires at time `t1`
+* 40% of data expires at time `t2`
+
+---
+
+### 1. Collection Properties and Constraints
+
+* Only fields with `DataType == TIMESTAMPTZ` can be configured as a TTL field.
+* Mutually exclusive with collection-level TTL:
+  * If collection-level TTL is enabled, specifying a TTL field is not allowed.
+  * Collection-level TTL must be disabled first.
+* One TTL field per collection:
+  * A collection may contain multiple `TIMESTAMPTZ` fields, but only one can be designated as the TTL field.
+
+---
+
+### 2. Storage Semantics
+
+* Unified convention: the TTL field stores an **absolute expiration time** (`ExpireAt`).
+* Duration-based TTL is not supported.
+* `NULL` value semantics:
+  * A `NULL` TTL value means the entity never expires.
+
+---
+
+### 3. Compatibility Rules
+
+#### Existing Collections
+
+For an existing collection to enable entity-level TTL:
+
+1. Disable collection-level TTL using `AlterCollection`.
+2. Add a new `TIMESTAMPTZ` field using `AddField`.
+3. Update collection properties via `AlterCollection` to mark the new field as the TTL field.
+
+If historical data should also have expiration times, users must perform an **upsert** operation to backfill the TTL field.
+
+---
+
+### 4. SegmentInfo Extension and Compaction Trigger
+
+#### 4.1 SegmentInfo Metadata Extension
+
+A new field `expirationTimeByPercentile` is added to the segment metadata:
+
+```proto
+message SegmentInfo {
+  int64 ID = 1;
+  int64 collectionID = 2;
+  int64 partitionID = 3;
+  string insert_channel = 4;
+  int64 num_of_rows = 5;
+  common.SegmentState state = 6;
+  int64 max_row_num = 7 [deprecated = true]; // deprecated, we use the binary size to control the segment size but not a estimate rows.
+  uint64 last_expire_time = 8;
+  msg.MsgPosition start_position = 9;
+  msg.MsgPosition dml_position = 10;
+  // binlogs consist of insert binlogs
+  repeated FieldBinlog binlogs = 11;
+  repeated FieldBinlog statslogs = 12;
+  // deltalogs consists of delete binlogs. FieldID is not used yet since delete is always applied on primary key
+  repeated FieldBinlog deltalogs = 13;
+  bool createdByCompaction = 14;
+  repeated int64 compactionFrom = 15;
+  uint64 dropped_at = 16; // timestamp when segment marked drop
+  // A flag indicating if:
+  // (1) this segment is created by bulk insert, and
+  // (2) the bulk insert task that creates this segment has not yet reached `ImportCompleted` state.
+  bool is_importing = 17;
+  bool is_fake = 18;
+
+  // denote if this segment is compacted to other segment.
+  // For compatibility reasons, this flag of an old compacted segment may still be False.
+  // As for new fields added in the message, they will be populated with their respective field types' default values.
+  bool compacted = 19;
+
+  // Segment level, indicating compaction segment level
+  // Available value: Legacy, L0, L1, L2
+  // For legacy level, it represent old segment before segment level introduced
+  // so segments with Legacy level shall be treated as L1 segment
+  SegmentLevel level = 20;
+  int64 storage_version = 21;
+
+  int64 partition_stats_version = 22;
+  // use in major compaction, if compaction fail, should revert segment level to last value
+  SegmentLevel last_level = 23;
+  // use in major compaction, if compaction fail, should revert partition stats version to last value 
+  int64 last_partition_stats_version = 24;
+
+  // used to indicate whether the segment is sorted by primary key.
+  bool is_sorted = 25;
+
+  // textStatsLogs is used to record tokenization index for fields.
+  map<int64, TextIndexStats> textStatsLogs = 26;
+  repeated FieldBinlog bm25statslogs = 27;
+
+  // This field is used to indicate that some intermediate state segments should not be loaded.
+  // For example, segments that have been clustered but haven't undergone stats yet.
+  bool is_invisible = 28;
+
+
+  // jsonKeyStats is used to record json key index for fields.
+  map<int64, JsonKeyStats> jsonKeyStats = 29;
+    // This field is used to indicate that the segment is created by streaming service.
+  // This field is meaningful only when the segment state is growing.
+  // If the segment is created by streaming service, it will be a true.
+  // A segment generated by datacoord of old arch, will be false.
+  // After the growing segment is full managed by streamingnode, the true value can never be seen at coordinator.
+  bool is_created_by_streaming = 30;
+  bool is_partition_key_sorted = 31;
+
+  // manifest_path stores the fullpath of LOON manifest file of segemnt data files.
+  // we could keep the fullpath since one segment shall only have one active manifest
+  // and we could keep the possiblity that manifest stores out side of collection/partition/segment path
+  string manifest_path = 32;
+
+  // expirationTimeByPercentile records the expiration timestamps of the segment
+  // at the 20%, 40%, 60%, 80%, and 100% data distribution levels
+  repeated int64 expirationTimeByPercentile = 33;
+}
+```
+
+Meaning:
+
+* `expirationTimeByPercentile`: The expiration timestamps corresponding to the 20%, 40%, 60%, 80%, and 100% percentiles of data within the segment.
+
+---
+
+#### 4.2 Metadata Writing
+
+* Statistics are collected  **only during compaction** .
+* `expirationTimeByPercentile` is computed during sort or mix compaction tasks.
+* For streaming segments, sort compaction is required as the first step, making this approach sufficient.
+
+---
+
+#### 4.3 Compaction Trigger Strategy
+
+* Based on a configured expired-data ratio, select the corresponding percentile from `expirationTimeByPercentile` (rounded down).
+* Compare the selected expiration time with the current time.
+* If the expiration condition is met, trigger a compaction task.
+
+Special cases:
+
+* If `expirationTimeByPercentile` is `NULL`, the segment is treated as non-expiring.
+* For old segments without a TTL field, expiration logic is skipped.
+* Subsequent upsert operations will trigger the corresponding L0 compaction.
+
+---
+
+### 5. Query / Search Logic
+
+* Each query is executed with an MVCC timestamp assigned by Milvus.
+* When loading a collection, the system records which field is configured as the TTL field.
+
+During query execution, expired data is filtered by comparing the TTL field value with the MVCC timestamp inside `mask_with_timestamps`.
+
+---
+
+## PyMilvus Example
+
+### 1. Create Collection
+
+Specify the TTL field in the schema:
+
+```python
+schema = client.create_schema(auto_id=False, description="test entity ttl")
+schema.add_field("id", DataType.INT64, is_primary=True)
+schema.add_field("ttl", DataType.TIMESTAMPTZ, nullable=True)
+schema.add_field("vector", DataType.FLOAT_VECTOR, dim=dim)
+
+prop = {"collection.ttl.field": "ttl"}
+client.create_collection(
+    collection_name,
+    schema=schema,
+    enable_dynamic_field=True,
+    properties=prop,
+)
+```
+
+---
+
+### 2. Insert Data
+
+Insert data the same way as a normal `TIMESTAMPTZ` field:
+
+```python
+rows = [
+    {"id": 0, "vector": [random.random() for _ in range(dim)], "ttl": None},
+    {"id": 1, "vector": [random.random() for _ in range(dim)], "ttl": None},
+    {"id": 2, "vector": [random.random() for _ in range(dim)], "ttl": None},
+    {"id": 3, "vector": [random.random() for _ in range(dim)], "ttl": "2025-12-31T00:00:00Z"},
+    {"id": 4, "vector": [random.random() for _ in range(dim)], "ttl": "2025-12-31T01:00:00Z"},
+    {"id": 5, "vector": [random.random() for _ in range(dim)], "ttl": "2025-12-31T02:00:00Z"},
+    {"id": 6, "vector": [random.random() for _ in range(dim)], "ttl": "2025-12-31T03:00:00Z"},
+    {"id": 7, "vector": [random.random() for _ in range(dim)], "ttl": "2025-12-31T04:00:00Z"},
+    {"id": 8, "vector": [random.random() for _ in range(dim)], "ttl": "2025-12-31T23:59:59Z"},
+]
+
+insert_result = client.insert(collection_name, rows)
+client.flush(collection_name)
+```
+
+---
+
+### 3. Index and Load
+
+Index creation and loading are unaffected. Indexes can still be built on the TTL field if needed.
+
+```python-repl
+index_params = client.prepare_index_params()
+index_params.add_index(
+    field_name="vector",
+    index_type="IVF_FLAT",
+    index_name="vector",
+    metric_type="L2",
+    params={"nlist": 128},
+)
+client.create_index(collection_name, index_params=index_params)
+
+client.load_collection(collection_name)
+```
+
+---
+
+### 4. Search / Query
+
+Use queries with different timestamps to validate expiration behavior:
+
+```python
+query_expr = "id > 0"
+res = client.query(
+    collection_name,
+    query_expr,
+    output_fields=["id", "ttl"],
+    limit=100,
+)
+print(res)
+```