15 Commits

Author SHA1 Message Date
congqixia
fa2c3c404c
enhance: Forbid writing V1 format and always use StorageV2 (#46791)
Related to #46595

Remove the EnableStorageV2 config option and enforce StorageV2 format
across all write paths including compaction, import, write buffer, and
streaming segment allocation. V1 format write tests are now skipped as
writing V1 format is no longer supported.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2026-01-06 11:55:23 +08:00
cai.zhang
a16d04f5d1
feat: Support ttl field for entity level expiration (#46342)
issue: #46033

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Pull Request Summary: Entity-Level TTL Field Support

### Core Invariant and Design
This PR introduces **per-entity TTL (time-to-live) expiration** via a
dedicated TIMESTAMPTZ field as a fine-grained alternative to
collection-level TTL. The key invariant is **mutual exclusivity**:
collection-level TTL and entity-level TTL field cannot coexist on the
same collection. Validation is enforced at the proxy layer during
collection creation/alteration (`validateTTL()` prevents both being set
simultaneously).

### What Is Removed and Why
- **Global `EntityExpirationTTL` parameter** removed from config
(`configs/milvus.yaml`, `pkg/util/paramtable/component_param.go`). This
was the only mechanism for collection-level expiration. The removal is
safe because:
- The collection-level TTL path (`isEntityExpired(ts)` check) remains
intact in the codebase for backward compatibility
- TTL field check (`isEntityExpiredByTTLField()`) is a secondary path
invoked only when a TTL field is configured
- Existing deployments using collection TTL can continue without
modification
  
The global parameter was removed specifically because entity-level TTL
makes per-entity control redundant with a collection-wide setting, and
the PR chooses one mechanism per collection rather than layering both.

### No Data Loss or Behavior Regression
**TTL filtering logic is additive and safe:**
1. **Collection-level TTL unaffected**: The `isEntityExpired(ts)` check
still applies when no TTL field is configured; callers of
`EntityFilter.Filtered()` pass `-1` as the TTL expiration timestamp when
no field exists, causing `isEntityExpiredByTTLField()` to return false
immediately
2. **Null/invalid TTL values treated safely**: Rows with null TTL or TTL
≤ 0 are marked as "never expire" (using sentinel value `int64(^uint64(0)
>> 1)`) and are preserved across compactions; percentile calculations
only include positive TTL values
3. **Query-time filtering automatic**: TTL filtering is transparently
added to expression compilation via `AddTTLFieldFilterExpressions()`,
which appends `(ttl_field IS NULL OR ttl_field > current_time)` to the
filter pipeline. Entities with null TTL always pass the filter
4. **Compaction triggering granular**: Percentile-based expiration (20%,
40%, 60%, 80%, 100%) allows configurable compaction thresholds via
`SingleCompactionRatioThreshold`, preventing premature data deletion

### Capability Added: Per-Entity Expiration with Data Distribution
Awareness
Users can now specify a TIMESTAMPTZ collection property `ttl_field`
naming a schema field. During data writes, TTL values are collected per
segment and percentile quantiles (5-value array) are computed and stored
in segment metadata. At query time, the TTL field is automatically
filtered. At compaction time, segment-level percentiles drive
expiration-based compaction decisions, enabling intelligent compaction
of segments where a configurable fraction of data has expired (e.g.,
compact when 40% of rows are expired, controlled by threshold ratio).
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2026-01-05 10:27:24 +08:00
congqixia
55bfd610b6
enhance: [StorageV2] Integrate FFI interface for packed reader (#45132)
Related to #44956

Integrate the StorageV2 FFI interface as the unified storage layer for
reading packed columnar data, replacing the custom iterative reader with
a manifest-based approach using the milvus-storage library.

Changes:
- Add C++ FFI reader implementation (ffi_reader_c.cpp/h) with Arrow C
Stream interface
- Implement utility functions to convert CStorageConfig to
milvus-storage Properties
- Create ManifestReader in Go that generates manifests from binlogs
- Add FFI packed reader CGO bindings (packed_reader_ffi.go)
- Refactor NewBinlogRecordReader to use ManifestReader for V2 storage
- Support both manifest file paths and direct manifest content
- Enable configurable buffer sizes and column projection

Technical improvements:
- Zero-copy data exchange using Arrow C Data Interface
- Optimized I/O operations through milvus-storage library
- Simplified code path with manifest-based reading
- Better performance with batched streaming reads

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-05 19:57:34 +08:00
XuanYang-cn
a3bdabb328
enhance: Unify compaction executor task state management (#44721)
Remove stopTask.
Replace multiple task tracking maps with single unified taskState map.
Fix slot tracking, improve state transitions, and add comprehensive test

See also: #44714

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-10-11 17:53:57 +08:00
cai.zhang
76f6768ea1
enhance: Remove timeout for compaction task (#44277)
issue: #44272

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-09-15 11:03:58 +08:00
Bingyi Sun
08a2678f07
feat: add a namespace compactor (#44253)
#44011

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-09-10 14:45:56 +08:00
XuanYang-cn
37a447d166
feat: Add CMEK cipher plugin (#43722)
1. Enable Milvus to read cipher configs
2. Enable cipher plugin in binlog reader and writer
3. Add a testCipher for unittests
4. Support pooling for datanode
5. Add encryption in storagev2

See also: #40321 
Signed-off-by: yangxuan <xuan.yang@zilliz.com>

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-08-27 11:15:52 +08:00
Ted Xu
e37cd19da2
enhance: enable storage v2 by default (#43652)
Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-08-01 08:59:36 +08:00
congqixia
5a9efb3f81
enhance: [StorageV2] Refine storage rw option usage & validation (#43175)
Related to #39173

This PR:
- Make all datanode task passes storage config via storage config option
- Remove legacy comments, rootPath & bucketName parameters
- Fix clustering compaction option behavior
- Add validation logic for `rwOptions`
- Use correct storageType from storageConfig
- Add storage config in sync task

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-11 01:14:48 +08:00
cai.zhang
6989e18599
enhance: Move sort stats task to sort compaction (#42562)
issue: #42560

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-07-08 20:22:47 +08:00
yihao.dai
dccfc69660
enhance: Get compaction params from request (#41125)
Make DataNode use compaction parameters from request instead of
configuration.

issue: https://github.com/milvus-io/milvus/issues/41123

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-04-15 10:28:53 +08:00
yihao.dai
5b78ef0a49
fix: Fix delete data loss due to duplicate binlogID (#40960)
With concurrenct L0 compaction
(https://github.com/milvus-io/milvus/pull/36816), delta logs might be
written to the same L1 segment, causing logID duplication when using the
incremental beginLogID. This PR removes the beginLogID mechanism and
instead passes a log ID range, where the number of IDs in the range
equals the number of compaction segment binlogs multiplied by an
expansion factor.

issue: https://github.com/milvus-io/milvus/issues/40207

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-04-01 10:36:22 +08:00
sthuang
d7df78a6c9
feat: Storage v2 compaction (#40667)
- Feat: Support Mix compaction. Covering tests include compatibility and
rollback ability.
  - Read v1 segments and compact with v2 format.
  - Read both v1 and v2 segments and compact with v2 format.
  - Read v2 segments and compact with v2 format.
  - Compact with duplicate primary key test.
  - Compact with bm25 segments.
  - Compact with merge sort segments.
  - Compact with no expiration segments.
  - Compact with lack binlog segments.
  - Compact with nullable field segments.
- Feat: Support Clustering compaction. Covering tests include
compatibility and rollback ability.
  - Read v1 segments and compact with v2 format.
  - Read both v1 and v2 segments and compact with v2 format.
  - Read v2 segments and compact with v2 format.
  - Compact bm25 segments with v2 format.
  - Compact with memory limit.
- Enhance: Use serdeMap serialize in BuildRecord function to support all
Milvus data types.
related: #39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-03-21 10:16:12 +08:00
Ted Xu
be86d31ea3
feat: compaction to support add field (#40415)
See: #39718

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-03-18 11:32:12 +08:00
XuanYang-cn
837ac295fa
enhance: Remove iterators in datanode (#40301)
Iterators are long deprecated, but sort are still using it. This PR
unifies stats task with the latest compaction common functions and
remove the usage of iterators.

1. Rename `datanode/compaction` to `datanode/compactor`
2. Add `internal/compaction` and move some compaction commons into it.
3. Replace `DeltalogIterators` with `ComposeDeleteFromDeltalogs`
4. Remove `datanode/iterators`

See also: #39242

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-03-04 12:14:00 +08:00