If collection TTL property is malformed (e.g., non-numeric value),
compaction tasks would fail silently and get stuck. This change:
- Add centralized GetCollectionTTL/GetCollectionTTLFromMap functions in
pkg/common to handle TTL parsing with proper error handling
- Validate TTL property in createCollectionTask and alterCollectionTask
PreExecute to reject invalid values early
- Refactor datacoord compaction policies to use the new common functions
- Remove duplicated getCollectionTTL from datacoord/util.go
issue: #46716
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: collection.ttl.seconds must be a parseable int64 and
validated at collection creation/alter time so malformed TTLs never
reach compaction/execution codepaths.
- Bug fix (resolves#46716): malformed/non-numeric TTLs could silently
cause compaction tasks to fail/stall; fixed by adding centralized
parsing helpers pkg/common.GetCollectionTTL and GetCollectionTTLFromMap
and validating TTL in createCollectionTask.PreExecute and
alterCollectionTask.PreExecute (calls with default -1 and return
parameter-invalid errors on parse failure).
- Simplification / removed redundancy: eliminated duplicated
getCollectionTTL in internal/datacoord/util.go and replaced ad-hoc TTL
parsing across datacoord (compaction policies, import_util, compaction
triggers) and proxy util with the common helpers, centralizing error
handling and defaulting logic.
- No data loss or behavior regression: valid TTL parsing semantics
unchanged (helpers use identical int64 parsing and default fallback from
paramtable/CommonCfg); validation occurs in PreExecute so existing valid
collections proceed unchanged while malformed values are rejected
early—compaction codepaths now receive only validated TTL values (or
explicit defaults), preventing silent skips without altering valid
execution flows.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #43897
also for issue: #46166
add ack_sync_up flag into broadcast message header, which indicates that
whether the broadcast operation is need to be synced up between the
streaming node and the coordinator.
If the ack_sync_up is false, the broadcast operation will be acked once
the recovery storage see the message at current vchannel, the fast ack
operation can be applied to speed up the broadcast operation.
If the ack_sync_up is true, the broadcast operation will be acked after
the checkpoint of current vchannel reach current message.
The fast ack operation can not be applied to speed up the broadcast
operation, because the ack operation need to be synced up with streaming
node.
e.g. if truncate collection operation want to call ack once callback
after the all segment are flushed at current vchannel, it should set the
ack_sync_up to be true.
TODO: current implementation doesn't promise the ack sync up semantic,
it only promise FastAck operation will not be applied, wait for 3.0 to
implement the ack sync up semantic. only for truncate api now.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #41611
- After enabling streaming arch, channel manager of data coord is a
redundant component.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
after the pr merged, we can support to insert, upsert, build index,
query, search in the added field.
can only do the above operates in added field after add field request
complete, which is a sync operate.
compact will be supported in the next pr.
#39718
---------
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
issue: #35563
1. Use an internal health checker to monitor the cluster's health state,
storing the latest state on the coordinator node. The CheckHealth
request retrieves the cluster's health from this latest state on the
proxy sides, which enhances cluster stability.
2. Each health check will assess all collections and channels, with
detailed failure messages temporarily saved in the latest state.
3. Use CheckHealth request instead of the heavy GetMetrics request on
the querynode and datanode
Signed-off-by: jaime <yun.zhang@zilliz.com>
Native support for Google cloud storage using the Google Cloud Storage
libraries. Authentication is performed using GCS service account
credentials JSON.
Currently, Milvus supports Google Cloud Storage using S3-compatible APIs
via the AWS SDK. This approach has the following limitations:
1. Overhead: Translating requests between S3-compatible APIs and GCS can
introduce additional overhead.
2. Compatibility Limitations: Some features of the original S3 API may
not fully translate or work as expected with GCS.
To address these limitations, This enhancement is needed.
Related Issue: #36212
issue: #33744
This PR includes the following changes:
1. Added a new task type to the task scheduler in datacoord: stats task,
which sorts segments by primary key.
2. Implemented segment sorting in indexnode.
3. Added a new field `FieldStatsLog` to SegmentInfo to store token index
information.
---------
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
issue: # #34545
Print warn log instead of check health fail if orphan channel cp meta is
found in health check request.
Signed-off-by: jaime <yun.zhang@zilliz.com>
issue: #33005
1. add `MemorySize` field for insert binlog.
2. `LogSize` means the file size in the storage object.
3. `MemorySize` means the size of the data in the memory.
---------
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
Related to https://github.com/milvus-io/milvus/issues/32165
1. nodeid based channel store access should use map access instead of
iteration.
2. The join-ish functions calls are slow when # collections/segments
increases (e.g. 10k).
e.g.
getNumRowsOfCollectionUnsafe is O(num_segments); GetAllCollectionNumRows
is of O(num_collections*num_segments).
Signed-off-by: yiwangdr <yiwangdr@gmail.com>
issue: #29892
This PR:
1. Move the process of gathering materialized search info to when the
search plan is created, before it goes to each segment, to avoid
repeated work and access the plan node under multi-threaded
circumstances.
2. Enforce the supported MV type to `VARCHAR`
3. Add integration test
Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>
issue: #31662#31409
during FilterIndexedSegment in GetRecoveryInfo, it try to acquire index
meta's read lock for every segment. when a collection has thousands of
segments, which may blocked for more than 10 seconds and even longer.
cause `AddSegmentIndex` may also triggered frequently, which try to get
the write lock.
This PR avoid acquire index meta's lock for each segment
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
patch search cache param from index configs when index meta could not
get the search cache size key
#issue: #30113
Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
Compaction plan result contained one segment for one plan. For l0
compaction would write to multiple segments, this PR expand the segments
number in plan results and refactor some names for readibility.
- Name refactory: - CompactionStateResult -> CompactionPlanResult -
CompactionResult -> CompactionSegment
See also: #27606
Signed-off-by: yangxuan <xuan.yang@zilliz.com>