14 Commits

Author SHA1 Message Date
cai.zhang
a16d04f5d1
feat: Support ttl field for entity level expiration (#46342)
issue: #46033

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Pull Request Summary: Entity-Level TTL Field Support

### Core Invariant and Design
This PR introduces **per-entity TTL (time-to-live) expiration** via a
dedicated TIMESTAMPTZ field as a fine-grained alternative to
collection-level TTL. The key invariant is **mutual exclusivity**:
collection-level TTL and entity-level TTL field cannot coexist on the
same collection. Validation is enforced at the proxy layer during
collection creation/alteration (`validateTTL()` prevents both being set
simultaneously).

### What Is Removed and Why
- **Global `EntityExpirationTTL` parameter** removed from config
(`configs/milvus.yaml`, `pkg/util/paramtable/component_param.go`). This
was the only mechanism for collection-level expiration. The removal is
safe because:
- The collection-level TTL path (`isEntityExpired(ts)` check) remains
intact in the codebase for backward compatibility
- TTL field check (`isEntityExpiredByTTLField()`) is a secondary path
invoked only when a TTL field is configured
- Existing deployments using collection TTL can continue without
modification
  
The global parameter was removed specifically because entity-level TTL
makes per-entity control redundant with a collection-wide setting, and
the PR chooses one mechanism per collection rather than layering both.

### No Data Loss or Behavior Regression
**TTL filtering logic is additive and safe:**
1. **Collection-level TTL unaffected**: The `isEntityExpired(ts)` check
still applies when no TTL field is configured; callers of
`EntityFilter.Filtered()` pass `-1` as the TTL expiration timestamp when
no field exists, causing `isEntityExpiredByTTLField()` to return false
immediately
2. **Null/invalid TTL values treated safely**: Rows with null TTL or TTL
≤ 0 are marked as "never expire" (using sentinel value `int64(^uint64(0)
>> 1)`) and are preserved across compactions; percentile calculations
only include positive TTL values
3. **Query-time filtering automatic**: TTL filtering is transparently
added to expression compilation via `AddTTLFieldFilterExpressions()`,
which appends `(ttl_field IS NULL OR ttl_field > current_time)` to the
filter pipeline. Entities with null TTL always pass the filter
4. **Compaction triggering granular**: Percentile-based expiration (20%,
40%, 60%, 80%, 100%) allows configurable compaction thresholds via
`SingleCompactionRatioThreshold`, preventing premature data deletion

### Capability Added: Per-Entity Expiration with Data Distribution
Awareness
Users can now specify a TIMESTAMPTZ collection property `ttl_field`
naming a schema field. During data writes, TTL values are collected per
segment and percentile quantiles (5-value array) are computed and stored
in segment metadata. At query time, the TTL field is automatically
filtered. At compaction time, segment-level percentiles drive
expiration-based compaction decisions, enabling intelligent compaction
of segments where a configurable fraction of data has expired (e.g.,
compact when 40% of rows are expired, controlled by threshold ratio).
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2026-01-05 10:27:24 +08:00
cai.zhang
76f6768ea1
enhance: Remove timeout for compaction task (#44277)
issue: #44272

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-09-15 11:03:58 +08:00
XuanYang-cn
37a447d166
feat: Add CMEK cipher plugin (#43722)
1. Enable Milvus to read cipher configs
2. Enable cipher plugin in binlog reader and writer
3. Add a testCipher for unittests
4. Support pooling for datanode
5. Add encryption in storagev2

See also: #40321 
Signed-off-by: yangxuan <xuan.yang@zilliz.com>

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-08-27 11:15:52 +08:00
Ted Xu
e37cd19da2
enhance: enable storage v2 by default (#43652)
Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-08-01 08:59:36 +08:00
cai.zhang
e26a532504
enhance: Only download necessary fields during clustering analyze phase (#43322)
issue: #43310

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-07-22 16:40:52 +08:00
yihao.dai
b69e601fe1
fix: [StorageV2] Correct read and write buffer size (#43335)
Correct read and buffer size to 64MB to prevent OOM during clustering
compaction.

issue: https://github.com/milvus-io/milvus/issues/43310

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-07-16 14:28:52 +08:00
congqixia
5a9efb3f81
enhance: [StorageV2] Refine storage rw option usage & validation (#43175)
Related to #39173

This PR:
- Make all datanode task passes storage config via storage config option
- Remove legacy comments, rootPath & bucketName parameters
- Fix clustering compaction option behavior
- Add validation logic for `rwOptions`
- Use correct storageType from storageConfig
- Add storage config in sync task

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-11 01:14:48 +08:00
cai.zhang
6989e18599
enhance: Move sort stats task to sort compaction (#42562)
issue: #42560

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-07-08 20:22:47 +08:00
yihao.dai
dccfc69660
enhance: Get compaction params from request (#41125)
Make DataNode use compaction parameters from request instead of
configuration.

issue: https://github.com/milvus-io/milvus/issues/41123

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-04-15 10:28:53 +08:00
yihao.dai
5b78ef0a49
fix: Fix delete data loss due to duplicate binlogID (#40960)
With concurrenct L0 compaction
(https://github.com/milvus-io/milvus/pull/36816), delta logs might be
written to the same L1 segment, causing logID duplication when using the
incremental beginLogID. This PR removes the beginLogID mechanism and
instead passes a log ID range, where the number of IDs in the range
equals the number of compaction segment binlogs multiplied by an
expansion factor.

issue: https://github.com/milvus-io/milvus/issues/40207

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-04-01 10:36:22 +08:00
Ted Xu
128efaa3e3
enhance: simplify size calculation in file writers (#40808)
See: #40342

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-03-26 20:04:22 +08:00
sthuang
d7df78a6c9
feat: Storage v2 compaction (#40667)
- Feat: Support Mix compaction. Covering tests include compatibility and
rollback ability.
  - Read v1 segments and compact with v2 format.
  - Read both v1 and v2 segments and compact with v2 format.
  - Read v2 segments and compact with v2 format.
  - Compact with duplicate primary key test.
  - Compact with bm25 segments.
  - Compact with merge sort segments.
  - Compact with no expiration segments.
  - Compact with lack binlog segments.
  - Compact with nullable field segments.
- Feat: Support Clustering compaction. Covering tests include
compatibility and rollback ability.
  - Read v1 segments and compact with v2 format.
  - Read both v1 and v2 segments and compact with v2 format.
  - Read v2 segments and compact with v2 format.
  - Compact bm25 segments with v2 format.
  - Compact with memory limit.
- Enhance: Use serdeMap serialize in BuildRecord function to support all
Milvus data types.
related: #39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-03-21 10:16:12 +08:00
Ted Xu
df4285c9ef
enhance: API integration with storage v2 in clustering-compactions (#40133)
See #39173

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-03-13 14:12:06 +08:00
XuanYang-cn
837ac295fa
enhance: Remove iterators in datanode (#40301)
Iterators are long deprecated, but sort are still using it. This PR
unifies stats task with the latest compaction common functions and
remove the usage of iterators.

1. Rename `datanode/compaction` to `datanode/compactor`
2. Add `internal/compaction` and move some compaction commons into it.
3. Replace `DeltalogIterators` with `ComposeDeleteFromDeltalogs`
4. Remove `datanode/iterators`

See also: #39242

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-03-04 12:14:00 +08:00