milvus/docs/design_docs/entitiy_level_ttl.md
cai.zhang de3050be54
doc: [skip e2e]Add design document for entity level ttl (#46406)
issue: #46033

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-12-21 19:13:17 +08:00

299 lines
11 KiB
Markdown

# Entity-level TTL Design
## Background
Currently, Milvus supports **collection-level TTL** for data expiration, but does not support defining an independent expiration time for individual entities (rows). As application scenarios become more diverse, for example:
* Data from different tenants or businesses stored in the same collection but with different lifecycles;
* Hot and cold data mixed together, where short-lived data should be cleaned automatically while long-term data is retained;
* IoT / logging / MLOps data that requires record-level retention policies;
Relying solely on collection-level TTL can no longer satisfy these requirements. If users want to retain only part of the data, they must periodically perform **upsert** operations to refresh the timestamps of those entities. This approach is unintuitive and increases operational and maintenance costs.
Therefore, **Entity-level TTL** becomes a necessary feature.
Related issues:
* [milvus-io/milvus#45917](https://github.com/milvus-io/milvus/issues/45917)
* [milvus-io/milvus#45923](https://github.com/milvus-io/milvus/issues/45923)
---
## Design Principles
* Fully compatible with existing collection-level TTL behavior.
* Allow users to choose whether to enable entity-level TTL.
* User-controllable: support explicit declaration in schema or transparent system management.
* Minimize changes to compaction and query logic; expiration is determined only by the TTL column and write timestamp.
* Support dynamic upgrade for existing collections.
---
## Basic Approach
Milvus already supports the `TIMESTAMPTZ` data type. Entity TTL information will therefore be stored in a field of this type.
---
## Design Details
Entity-level TTL is implemented by allowing users to explicitly add a `TIMESTAMPTZ` column in the schema and mark it in collection properties:
```text
"collection.ttl.field": "ttl"
```
Here, `ttl` is the name of the column that stores TTL information. This mechanism is **mutually exclusive** with collection-level TTL.
---
### Terminology and Conventions
* **TTL column / TTL field** : A field of type `TIMESTAMPTZ` declared in the schema and marked with `is_ttl = true`.
* **ExpireAt** : The value stored in the TTL field, representing the absolute expiration timestamp of an entity (UTC by default if no timezone is specified).
* **Collection-level TTL** : The existing mechanism where retention duration is defined at the collection level (e.g., retain 30 days).
* **insert_ts / mvcc_ts** : Existing Milvus write or MVCC timestamps, used as fallback when needed.
* **expirationTimeByPercentile** : A time point corresponding to a certain percentile of expired data within a segment, used to quickly determine whether compaction should be triggered.
Example:
* 20% of data expires at time `t1`
* 40% of data expires at time `t2`
---
### 1. Collection Properties and Constraints
* Only fields with `DataType == TIMESTAMPTZ` can be configured as a TTL field.
* Mutually exclusive with collection-level TTL:
* If collection-level TTL is enabled, specifying a TTL field is not allowed.
* Collection-level TTL must be disabled first.
* One TTL field per collection:
* A collection may contain multiple `TIMESTAMPTZ` fields, but only one can be designated as the TTL field.
---
### 2. Storage Semantics
* Unified convention: the TTL field stores an **absolute expiration time** (`ExpireAt`).
* Duration-based TTL is not supported.
* `NULL` value semantics:
* A `NULL` TTL value means the entity never expires.
---
### 3. Compatibility Rules
#### Existing Collections
For an existing collection to enable entity-level TTL:
1. Disable collection-level TTL using `AlterCollection`.
2. Add a new `TIMESTAMPTZ` field using `AddField`.
3. Update collection properties via `AlterCollection` to mark the new field as the TTL field.
If historical data should also have expiration times, users must perform an **upsert** operation to backfill the TTL field.
---
### 4. SegmentInfo Extension and Compaction Trigger
#### 4.1 SegmentInfo Metadata Extension
A new field `expirationTimeByPercentile` is added to the segment metadata:
```proto
message SegmentInfo {
int64 ID = 1;
int64 collectionID = 2;
int64 partitionID = 3;
string insert_channel = 4;
int64 num_of_rows = 5;
common.SegmentState state = 6;
int64 max_row_num = 7 [deprecated = true]; // deprecated, we use the binary size to control the segment size but not a estimate rows.
uint64 last_expire_time = 8;
msg.MsgPosition start_position = 9;
msg.MsgPosition dml_position = 10;
// binlogs consist of insert binlogs
repeated FieldBinlog binlogs = 11;
repeated FieldBinlog statslogs = 12;
// deltalogs consists of delete binlogs. FieldID is not used yet since delete is always applied on primary key
repeated FieldBinlog deltalogs = 13;
bool createdByCompaction = 14;
repeated int64 compactionFrom = 15;
uint64 dropped_at = 16; // timestamp when segment marked drop
// A flag indicating if:
// (1) this segment is created by bulk insert, and
// (2) the bulk insert task that creates this segment has not yet reached `ImportCompleted` state.
bool is_importing = 17;
bool is_fake = 18;
// denote if this segment is compacted to other segment.
// For compatibility reasons, this flag of an old compacted segment may still be False.
// As for new fields added in the message, they will be populated with their respective field types' default values.
bool compacted = 19;
// Segment level, indicating compaction segment level
// Available value: Legacy, L0, L1, L2
// For legacy level, it represent old segment before segment level introduced
// so segments with Legacy level shall be treated as L1 segment
SegmentLevel level = 20;
int64 storage_version = 21;
int64 partition_stats_version = 22;
// use in major compaction, if compaction fail, should revert segment level to last value
SegmentLevel last_level = 23;
// use in major compaction, if compaction fail, should revert partition stats version to last value
int64 last_partition_stats_version = 24;
// used to indicate whether the segment is sorted by primary key.
bool is_sorted = 25;
// textStatsLogs is used to record tokenization index for fields.
map<int64, TextIndexStats> textStatsLogs = 26;
repeated FieldBinlog bm25statslogs = 27;
// This field is used to indicate that some intermediate state segments should not be loaded.
// For example, segments that have been clustered but haven't undergone stats yet.
bool is_invisible = 28;
// jsonKeyStats is used to record json key index for fields.
map<int64, JsonKeyStats> jsonKeyStats = 29;
// This field is used to indicate that the segment is created by streaming service.
// This field is meaningful only when the segment state is growing.
// If the segment is created by streaming service, it will be a true.
// A segment generated by datacoord of old arch, will be false.
// After the growing segment is full managed by streamingnode, the true value can never be seen at coordinator.
bool is_created_by_streaming = 30;
bool is_partition_key_sorted = 31;
// manifest_path stores the fullpath of LOON manifest file of segemnt data files.
// we could keep the fullpath since one segment shall only have one active manifest
// and we could keep the possiblity that manifest stores out side of collection/partition/segment path
string manifest_path = 32;
// expirationTimeByPercentile records the expiration timestamps of the segment
// at the 20%, 40%, 60%, 80%, and 100% data distribution levels
repeated int64 expirationTimeByPercentile = 33;
}
```
Meaning:
* `expirationTimeByPercentile`: The expiration timestamps corresponding to the 20%, 40%, 60%, 80%, and 100% percentiles of data within the segment.
---
#### 4.2 Metadata Writing
* Statistics are collected **only during compaction** .
* `expirationTimeByPercentile` is computed during sort or mix compaction tasks.
* For streaming segments, sort compaction is required as the first step, making this approach sufficient.
---
#### 4.3 Compaction Trigger Strategy
* Based on a configured expired-data ratio, select the corresponding percentile from `expirationTimeByPercentile` (rounded down).
* Compare the selected expiration time with the current time.
* If the expiration condition is met, trigger a compaction task.
Special cases:
* If `expirationTimeByPercentile` is `NULL`, the segment is treated as non-expiring.
* For old segments without a TTL field, expiration logic is skipped.
* Subsequent upsert operations will trigger the corresponding L0 compaction.
---
### 5. Query / Search Logic
* Each query is executed with an MVCC timestamp assigned by Milvus.
* When loading a collection, the system records which field is configured as the TTL field.
During query execution, expired data is filtered by comparing the TTL field value with the MVCC timestamp inside `mask_with_timestamps`.
---
## PyMilvus Example
### 1. Create Collection
Specify the TTL field in the schema:
```python
schema = client.create_schema(auto_id=False, description="test entity ttl")
schema.add_field("id", DataType.INT64, is_primary=True)
schema.add_field("ttl", DataType.TIMESTAMPTZ, nullable=True)
schema.add_field("vector", DataType.FLOAT_VECTOR, dim=dim)
prop = {"collection.ttl.field": "ttl"}
client.create_collection(
collection_name,
schema=schema,
enable_dynamic_field=True,
properties=prop,
)
```
---
### 2. Insert Data
Insert data the same way as a normal `TIMESTAMPTZ` field:
```python
rows = [
{"id": 0, "vector": [random.random() for _ in range(dim)], "ttl": None},
{"id": 1, "vector": [random.random() for _ in range(dim)], "ttl": None},
{"id": 2, "vector": [random.random() for _ in range(dim)], "ttl": None},
{"id": 3, "vector": [random.random() for _ in range(dim)], "ttl": "2025-12-31T00:00:00Z"},
{"id": 4, "vector": [random.random() for _ in range(dim)], "ttl": "2025-12-31T01:00:00Z"},
{"id": 5, "vector": [random.random() for _ in range(dim)], "ttl": "2025-12-31T02:00:00Z"},
{"id": 6, "vector": [random.random() for _ in range(dim)], "ttl": "2025-12-31T03:00:00Z"},
{"id": 7, "vector": [random.random() for _ in range(dim)], "ttl": "2025-12-31T04:00:00Z"},
{"id": 8, "vector": [random.random() for _ in range(dim)], "ttl": "2025-12-31T23:59:59Z"},
]
insert_result = client.insert(collection_name, rows)
client.flush(collection_name)
```
---
### 3. Index and Load
Index creation and loading are unaffected. Indexes can still be built on the TTL field if needed.
```python-repl
index_params = client.prepare_index_params()
index_params.add_index(
field_name="vector",
index_type="IVF_FLAT",
index_name="vector",
metric_type="L2",
params={"nlist": 128},
)
client.create_index(collection_name, index_params=index_params)
client.load_collection(collection_name)
```
---
### 4. Search / Query
Use queries with different timestamps to validate expiration behavior:
```python
query_expr = "id > 0"
res = client.query(
collection_name,
query_expr,
output_fields=["id", "ttl"],
limit=100,
)
print(res)
```