milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-07 19:31:51 +08:00

Author	SHA1	Message	Date
cai.zhang	a16d04f5d1	feat: Support ttl field for entity level expiration (#46342 ) issue： #46033 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Pull Request Summary: Entity-Level TTL Field Support ### Core Invariant and Design This PR introduces per-entity TTL (time-to-live) expiration via a dedicated TIMESTAMPTZ field as a fine-grained alternative to collection-level TTL. The key invariant is mutual exclusivity: collection-level TTL and entity-level TTL field cannot coexist on the same collection. Validation is enforced at the proxy layer during collection creation/alteration (`validateTTL()` prevents both being set simultaneously). ### What Is Removed and Why - Global `EntityExpirationTTL` parameter removed from config (`configs/milvus.yaml`, `pkg/util/paramtable/component_param.go`). This was the only mechanism for collection-level expiration. The removal is safe because: - The collection-level TTL path (`isEntityExpired(ts)` check) remains intact in the codebase for backward compatibility - TTL field check (`isEntityExpiredByTTLField()`) is a secondary path invoked only when a TTL field is configured - Existing deployments using collection TTL can continue without modification The global parameter was removed specifically because entity-level TTL makes per-entity control redundant with a collection-wide setting, and the PR chooses one mechanism per collection rather than layering both. ### No Data Loss or Behavior Regression TTL filtering logic is additive and safe: 1. Collection-level TTL unaffected: The `isEntityExpired(ts)` check still applies when no TTL field is configured; callers of `EntityFilter.Filtered()` pass `-1` as the TTL expiration timestamp when no field exists, causing `isEntityExpiredByTTLField()` to return false immediately 2. Null/invalid TTL values treated safely: Rows with null TTL or TTL ≤ 0 are marked as "never expire" (using sentinel value `int64(^uint64(0) >> 1)`) and are preserved across compactions; percentile calculations only include positive TTL values 3. Query-time filtering automatic: TTL filtering is transparently added to expression compilation via `AddTTLFieldFilterExpressions()`, which appends `(ttl_field IS NULL OR ttl_field > current_time)` to the filter pipeline. Entities with null TTL always pass the filter 4. Compaction triggering granular: Percentile-based expiration (20%, 40%, 60%, 80%, 100%) allows configurable compaction thresholds via `SingleCompactionRatioThreshold`, preventing premature data deletion ### Capability Added: Per-Entity Expiration with Data Distribution Awareness Users can now specify a TIMESTAMPTZ collection property `ttl_field` naming a schema field. During data writes, TTL values are collected per segment and percentile quantiles (5-value array) are computed and stored in segment metadata. At query time, the TTL field is automatically filtered. At compaction time, segment-level percentiles drive expiration-based compaction decisions, enabling intelligent compaction of segments where a configurable fraction of data has expired (e.g., compact when 40% of rows are expired, controlled by threshold ratio). <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2026-01-05 10:27:24 +08:00
congqixia	92c0c38e24	fix: validate collection TTL property to prevent compaction stuck (#46717 ) If collection TTL property is malformed (e.g., non-numeric value), compaction tasks would fail silently and get stuck. This change: - Add centralized GetCollectionTTL/GetCollectionTTLFromMap functions in pkg/common to handle TTL parsing with proper error handling - Validate TTL property in createCollectionTask and alterCollectionTask PreExecute to reject invalid values early - Refactor datacoord compaction policies to use the new common functions - Remove duplicated getCollectionTTL from datacoord/util.go issue: #46716 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - Core invariant: collection.ttl.seconds must be a parseable int64 and validated at collection creation/alter time so malformed TTLs never reach compaction/execution codepaths. - Bug fix (resolves #46716): malformed/non-numeric TTLs could silently cause compaction tasks to fail/stall; fixed by adding centralized parsing helpers pkg/common.GetCollectionTTL and GetCollectionTTLFromMap and validating TTL in createCollectionTask.PreExecute and alterCollectionTask.PreExecute (calls with default -1 and return parameter-invalid errors on parse failure). - Simplification / removed redundancy: eliminated duplicated getCollectionTTL in internal/datacoord/util.go and replaced ad-hoc TTL parsing across datacoord (compaction policies, import_util, compaction triggers) and proxy util with the common helpers, centralizing error handling and defaulting logic. - No data loss or behavior regression: valid TTL parsing semantics unchanged (helpers use identical int64 parsing and default fallback from paramtable/CommonCfg); validation occurs in PreExecute so existing valid collections proceed unchanged while malformed values are rejected early—compaction codepaths now receive only validated TTL values (or explicit defaults), preventing silent skips without altering valid execution flows. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2026-01-01 08:13:22 +08:00
XuanYang-cn	0507db2015	feat: Add force merge (#45556 ) See also: #46043 --------- Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-12-19 18:03:18 +08:00
Xiaofan	ca2e27f576	enhance: remove uncessary segment size estimation and make it configurable (#46302 ) fix #46300 remove unused segment size estimation, and make size estimation configurable Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>	2025-12-13 02:58:46 +08:00
cai.zhang	cc07be3c30	fix: Ignore compaction task when from segment is not healthy (#45534 ) issue: #45533 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-11-13 23:07:39 +08:00
Bingyi Sun	58277c8eb0	feat: Auto add namespace field data if namespace is enabled (#44933 ) issue: #44011 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-10-24 18:40:05 +08:00
cai.zhang	76f6768ea1	enhance: Remove timeout for compaction task (#44277 ) issue: #44272 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-09-15 11:03:58 +08:00
Spade A	faeb7fd410	feat: impl StructArray -- create schema, insert, and retrieve data (#42855 ) Ref https://github.com/milvus-io/milvus/issues/42148 https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of storage for handling with VectorArray. This PR: 1. impls the go part of storage for VectorArray 2. impls the collection creation with StructArrayField and VectorArray 3. insert and retrieve data from the collection. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>	2025-07-27 01:30:55 +08:00
XuanYang-cn	4dcaa97682	fix: Use diskSegmentMaxSize for coll with sparse and dense vectors (#43194 ) Previous code uses diskSegmentMaxSize if and only if all of the collection's vector fields are indexed with DiskANN index. When introducing sparse vectors, since sparse vector cannot be indexed with DiskANN index, collections with both dense and sparse vectors will use maxSize instead. This PR changes the requirments of using diskSegmentMaxSize to all dense vectors are indexed with DiskANN indexs, ignoring sparse vector fields. See also: #43193 Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-07-16 18:04:52 +08:00
cai.zhang	6989e18599	enhance: Move sort stats task to sort compaction (#42562 ) issue: #42560 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-07-08 20:22:47 +08:00
XuanYang-cn	0adf44e6f8	enhance: Check if segment has too many deletions together (#42668 ) This PR moves the deltalog file count check inside hasTooManyDeletions check. Unifies the logic on checking if a segment has too many deletions including: delta log count, deleted rows ratio and deltalog size. This change removes several uncessary traverse through segment's binlogs and deltalogs. And add more clear trigger logs Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-06-24 16:30:49 +08:00
Xianhui Lin	98067f5fc6	fix: datacoord stop get stuck After upgrading from 2.5 to 2.6 (#42674 ) datacoord stop get stuck After upgrading from 2.5 to 2.6 issue:https://github.com/milvus-io/milvus/issues/42656 Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-06-12 16:56:36 +08:00
cai.zhang	43c99a2c49	fix: Only mark segment compacting for sort stats task (#42516 ) issue: #42506 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-06-04 22:46:32 +08:00
Chun Han	e9b5d9e8bc	enhance: refine compaction trigger to reduce read/write amplifaction(#41336 ) (#41728 ) related: #41336 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-06-04 11:24:38 +08:00
Chun Han	d1cfa58a0a	feature: support compact expiry data(#41336 ) (#42056 ) related: #41336 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-05-25 16:46:31 +08:00
yihao.dai	142bd2fc05	enhance: Pooling for data tasks (#41256 ) 1. Add global scheduler for datacoord. 2. Define and implement new CreateTask, QueryTask, DropTask interfaces. 3. Refine Import, Compaction, Stats, Index task. issue: https://github.com/milvus-io/milvus/issues/41123 Co-authored-by: Cai Zhang <cai.zhang@zilliz.com>	2025-05-20 21:06:24 +08:00
foxspy	1d99f8bd67	enhance: add force rebuild index configuration (#41473 ) issue: #41431 Signed-off-by: xianliang.li <xianliang.li@zilliz.com>	2025-04-29 16:20:56 +08:00
XuanYang-cn	793fdeafe1	enhance: Refine logs in compaction trigger (#41171 ) See also: #41118 --------- Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-04-10 18:08:26 +08:00
yihao.dai	5b78ef0a49	fix: Fix delete data loss due to duplicate binlogID (#40960 ) With concurrenct L0 compaction (https://github.com/milvus-io/milvus/pull/36816), delta logs might be written to the same L1 segment, causing logID duplication when using the incremental beginLogID. This PR removes the beginLogID mechanism and instead passes a log ID range, where the number of IDs in the range equals the number of compaction segment binlogs multiplied by an expansion factor. issue: https://github.com/milvus-io/milvus/issues/40207 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-04-01 10:36:22 +08:00
congqixia	84e8e141ea	enhance: Support detailed manual compaction criterion (#40892 ) Related to #40866 This PR: - update go-api/v2 and support partition id/channel/segment level manual compaction - refines the compaction trigger implementation - unify the compaction signal usage --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-03-25 20:06:22 +08:00
Ted Xu	96952ad3c5	fix: compaction task cannot be genereted if size greater than max size (#40348 ) See: #40343 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com> Signed-off-by: Congqi Xia <congqi.xia@zilliz.com> Co-authored-by: Congqi Xia <congqi.xia@zilliz.com>	2025-03-05 14:40:01 +08:00
congqixia	cb7f2fa6fd	enhance: Use v2 package name for pkg module (#39990 ) Related to #39095 https://go.dev/doc/modules/version-numbers Update pkg version according to golang dep version convention --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-02-22 23:15:58 +08:00
yihao.dai	272d95ad79	enhance: Reduce mutex contention in datacoord meta (#38219 ) 1. Using secondary index to avoid retrieving all segments at `GetSegmentsChanPart`. 2. Perform batch SetAllocations to reduce the number of times the meta lock is acquired. issue: https://github.com/milvus-io/milvus/issues/37630 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-01-15 01:15:02 +08:00
Zhen Ye	bb8d1ab3bf	enhance: make new go package to manage proto (#39114 ) issue: #39095 --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-01-10 10:49:01 +08:00
tinswzy	27229f7907	enhance: refine exists log print with ctx (#38080 ) issue: #35917 Refines exists log print with ctx Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>	2024-12-14 22:36:44 +08:00
Ted Xu	dc85d8e968	enhance: improve mix compaction performance by removing max segment limitations (#38344 ) See #37234 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-12-11 20:38:42 +08:00
tinswzy	5768dbbb5d	enhance: refine pular related mq interfaces (#38007 ) issue: #35917 Refines the pulsar-related mq APIs to allow the ctx to be passed down Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>	2024-12-04 20:50:39 +08:00
Ted Xu	3a7a8c7944	enhance: try compact small segments first if they may compose a full segment (#37709 ) See #37234 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-12-02 16:12:38 +08:00
tinswzy	1dbb6cd7cb	enhance: refine the datacoord meta related interfaces (#37957 ) issue: #35917 This PR refines the meta-related APIs in datacoord to allow the ctx to be passed down to the catalog operation interfaces Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>	2024-11-26 19:46:34 +08:00
XuanYang-cn	a45a288a25	fix: Separate L0 and Mix trigger interval (#37190 ) See also: #37108 - Add MixCompactionTriggerInterval, default 60s - Add L0CompactionTriggerInterval, default 10s - Export Single related compaction configs - Raise SingleCompactionDeltaLogMaxSize from 2MB to 16MB --------- Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2024-11-12 10:56:37 +08:00
cai.zhang	ac8c5fcd5d	enhance: Remove pre-marking segments as L2 during clustering compaction (#36799 ) issue: #36686 This pr will remove pre-marking segments as L2 during clustering compaction in version 2.5, and ensure compatibility with version 2.4. The core of this change is to ensure that the many-to-many lineage derivation logic is correct, making sure that both the parent and child cannot simultaneously exist in the target segment view. feature: - Clustering compaction no longer marks the input segments as L2. - Add a new field `is_invisible` to `segmentInfo`, and mark segments that have completed clustering but have not yet built indexes as `is_invisible` to prevent them from being loaded prematurely." - Do not mark the input segment as `Dropped` before the clustering compaction is completed. - After compaction fails, only the result segment needs to be marked as Dropped. compatibility: - If the upgraded task has not failed, there are no compatibility issues. - If the status after the upgrade is `MetaSaved`, then skip the stats task based on whether TmpSegments is empty. - If the failure occurs before `MetaSaved`: - there are no ResultSegments, and InputSegments have not been marked as dropped yet. - the level of input segments need to revert to LastLevel - If the failure occurs after `MetaSaved`: - ResultSegments have already been generated, and InputSegments have been marked as Dropped. At this point, simply make the ResultSegments visible. - the level of ResultSegments needs to be set to L1（in order to participate in mixCompaction） --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-10-23 17:15:28 +08:00
yihao.dai	d230b91bd1	enhance: Add PreallocatedSegmentIDs for the compaction task (#36734 ) Add `PreallocatedSegmentIDs` field to the compaction task, allowing the `ResultSegments` in the compaction task to represent the final segments produced by the compaction. issue: https://github.com/milvus-io/milvus/issues/36733 also related: https://github.com/milvus-io/milvus/issues/36686 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-10-13 17:59:21 +08:00
jaime	ef1832ff9c	enhance: enable manual compaction for collections without indexes (#36577 ) issue: #36576 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-10-08 19:57:18 +08:00
cai.zhang	8395c8a8db	enhance: Update stats task to optional (#35947 ) issue: #33744 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-09-12 20:37:08 +08:00
cai.zhang	2c9bb4dfa3	feat: Support stats task to sort segment by PK (#35054 ) issue: #33744 This PR includes the following changes: 1. Added a new task type to the task scheduler in datacoord: stats task, which sorts segments by primary key. 2. Implemented segment sorting in indexnode. 3. Added a new field `FieldStatsLog` to SegmentInfo to store token index information. --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-09-02 14:19:03 +08:00
XuanYang-cn	323400c190	enhance: Enable to write multiple segments in mix compactor (#35705 ) Prevent segments to be written larger than maxSize * expansionRate See also: #35584 Signed-off-by: yangxuan <xuan.yang@zilliz.com> --------- Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2024-08-30 11:29:01 +08:00
congqixia	c992a61a23	enhance: Separate allocator pkg in datacoord (#35622 ) Related to #28861 Move allocator interface and implementation into separate package. Also update some unittest logic. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-08-22 10:06:56 +08:00
yihao.dai	227ecd3901	enhance: Remove the check for channel cp lag when generating compaction plan (#35383 ) issue: https://github.com/milvus-io/milvus/issues/35382 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-08-19 19:40:55 +08:00
yihao.dai	678018d9ca	enhance: Avoid unnecessary compaction (#35148 ) Estimate the import segment size based on DiskSegmentMaxSize(2G) to avoid unnecessary compaction after import completed. issue: https://github.com/milvus-io/milvus/issues/35147 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-08-06 10:30:21 +08:00
wayblink	5bbb1c201c	enhance:support l2 single compaction (#34935 ) #34928 Signed-off-by: wayblink <anyang.wang@zilliz.com>	2024-08-01 14:36:13 +08:00
yihao.dai	ca758c36cc	enhance: Pre-allocate ids for compaction (#34187 ) This PR removes the dependency of compaction on the ID allocator by pre-allocating the logID and segmentID. issue: https://github.com/milvus-io/milvus/issues/33957 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-07-17 13:23:42 +08:00
wayblink	358e9a10d2	enhance: Alter compactTo segments before compactFrom to avoid data loss if crash (#34513 ) #34512 Signed-off-by: wayblink <anyang.wang@zilliz.com>	2024-07-12 00:55:34 +08:00
XuanYang-cn	e0b39d8bf4	fix: Milvus panic when compaction disabled and dropping a collection (#34103 ) See also: #31059 --------- Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2024-07-11 14:44:52 +08:00
cai.zhang	4cf1a358ba	fix: Sync the sealed and flushed segments to datanode (#34301 ) issue: #33696 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-07-01 22:42:08 +08:00
wayblink	a1232fafda	feat: Major compaction (#33620 ) #30633 Signed-off-by: wayblink <anyang.wang@zilliz.com> Co-authored-by: MrPresent-Han <chun.han@zilliz.com>	2024-06-10 21:34:08 +08:00
zhenshan.cao	ac4f3997ce	enhance: Reconstructing Compaction to possess persistence capability (#33265 ) issue #33586 Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2024-06-05 10:17:50 +08:00
yihao.dai	7730b910b9	enhance: Decouple compaction from shard (#33138 ) Decouple compaction from shard, remove dependencies on shards (e.g. SyncSegments, injection). issue: https://github.com/milvus-io/milvus/issues/32809 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-05-24 09:07:41 +08:00
congqixia	8cf2cf5c94	enhance: Add `go-deadlock` as unittest only dependency (#33063 ) See also #33062 This PR: - Add `lock.RWMutex` & `lock.Mutex` alias to switch implementation based on build flags - When build flags has `test` in it, use `go-deadlock` to detect possible deadlocks - Replace all `sync.RWMutex` & `sync.Mutex` in datacoord pkg Signed-off-by: Congqi Xia <congqi.xia@zilliz.com> Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-05-15 16:33:34 +08:00
cai.zhang	6ea7633bd5	enhance: Add memory size for binlog (#33025 ) issue: #33005 1. add `MemorySize` field for insert binlog. 2. `LogSize` means the file size in the storage object. 3. `MemorySize` means the size of the data in the memory. --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com> Signed-off-by: cai.zhang <cai.zhang@zilliz.com>	2024-05-15 12:59:34 +08:00
XuanYang-cn	6843d6d376	fix: Compaction trigger choose 2 same segments (#32800 ) DataNode would stuck at compactor try to lock the same segmentID See also: #32765 --------- Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2024-05-07 19:01:31 +08:00

1 2 3

130 Commits