405 Commits

Author SHA1 Message Date
congqixia
8f97eb355f
enhance: [StorageV2] Make bucket name concatenation transparent to user (#44232)
Related to #39173

This PR:
- Bump milvus-storage commit to handle bucket name concatenation logic
in multipart s3 fs
- Remove all user-side bucket name concatenation code

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-08 10:15:55 +08:00
Gao
2e98cb0103
enhance: load resource estimation for tiered index (#44171)
issue: https://github.com/milvus-io/milvus/issues/42032

- Use bytes to estimate load resource in the whole estimation procedure
- Add num_rows and dim info for vector index to better estimate
- Disable eviction for tiered index's meta

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2025-09-04 19:41:53 +08:00
Bingyi Sun
0c0630cc38
feat: support dropping index without releasing collection (#42941)
issue: #42942

This pr includes the following changes:
1. Added checks for index checker in querycoord to generate drop index
tasks
2. Added drop index interface to querynode
3. To avoid search failure after dropping the index, the querynode
allows the use of lazy mode (warmup=disable) to load raw data even when
indexes contain raw data.
4. In segcore, loading the index no longer deletes raw data; instead, it
evicts it.
5. In expr, the index is pinned to prevent concurrent errors.

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-09-02 16:17:52 +08:00
congqixia
7721edf32a
enhance: Add mutex and range check preventing concurrent del (#44128)
This PR adds a mutex prevent concurrent applying delete on same segment
and check latestDeltaTimestamp to skip overlapping delete range

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-01 14:29:52 +08:00
zhagnlu
fc876639cf
enhance: support json stats with shredding design (#42534)
#42533

Co-authored-by: luzhang <luzhang@zilliz.com>
2025-09-01 10:49:52 +08:00
sparknack
70c8114e85
enhance: cachinglayer: resource management for segment loading (#43846)
issue: #41435

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-08-29 11:37:50 +08:00
Buqian Zheng
6420d72391
enhance: print as storage size unit MB with 2 digits only, so the log is easier to read (#44085)
issue: https://github.com/milvus-io/milvus/issues/41435

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-08-27 19:47:50 +08:00
Chun Han
da156981c6
feat: milvus support posix-compatible mode(milvus-io#43942) (#43944)
related: #43942

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-08-27 16:29:50 +08:00
XuanYang-cn
37a447d166
feat: Add CMEK cipher plugin (#43722)
1. Enable Milvus to read cipher configs
2. Enable cipher plugin in binlog reader and writer
3. Add a testCipher for unittests
4. Support pooling for datanode
5. Add encryption in storagev2

See also: #40321 
Signed-off-by: yangxuan <xuan.yang@zilliz.com>

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-08-27 11:15:52 +08:00
Spade A
8456f824be
feat: impl StructArray -- miscellaneous staffs for struct array (#43960)
Ref https://github.com/milvus-io/milvus/issues/42148

1. enable storage v2
2. implement some missing staffs
3. fix some bugs and add tests

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-08-26 21:35:53 +08:00
Tianx
c0d62268ac
feat: add timesatmptz data type (#44005)
issue: https://github.com/milvus-io/milvus/issues/27467
>
https://github.com/milvus-io/milvus/issues/27467#issuecomment-3092211420
> * [x]  M1 Create collection with timestamptz field
> * [x]  M2 Insert timestamptz field data
> * [x]  M3 Retrieve timestamptz field data
> * [x]  M4 Implement handoff[ ]  

The second PR of issue:
https://github.com/milvus-io/milvus/issues/27467, which completes M1-M4
described above.

---------

Signed-off-by: xtx <xtianx@smail.nju.edu.cn>
2025-08-26 15:59:53 +08:00
zhagnlu
8934c18792
enhance: support cache result cache for expr (#43923)
issue: #43878

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-08-26 10:55:52 +08:00
wei liu
399f63300c
enhance: Implement dynamic interval updates for ticker components (#43865)
issue: #43858

Enable dynamic configuration updates for ticker intervals without
restart. This enhancement allows runtime configuration changes to take
effect immediately for better operational flexibility.

Changes include:
- Apply "drain+Reset only when interval changed" pattern across all
ticker components to preserve existing timing phases
- Fix goroutine variable capture issue in CheckerController.Start()
- Remove unnecessary ticker.Stop() in manual trigger paths
- Add dynamic interval checking in QueryCoordV2 components:
  * checkers/controller.go: Various checker intervals
  * dist/dist_handler.go: DistPullInterval, CheckExecutedFlagInterval
  * session/cluster.go: CheckNodeSessionInterval
  * server.go: CheckAutoBalanceConfigInterval
  * observers/target_observer.go: UpdateNextTargetInterval
  * observers/collection_observer.go: CollectionObserverInterval
- Add dynamic interval checking in QueryNodeV2 components:
  * segments/disk_usage_fetcher.go: DiskSizeFetchInterval
- Ensure thread safety by performing all ticker operations in same
goroutine with proper drain before Reset to avoid spurious triggers

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-08-21 10:07:47 +08:00
Spade A
d6a428e880
feat: impl StructArray -- support create index for vector array (embedding list) and search on it (#43726)
Ref https://github.com/milvus-io/milvus/issues/42148

This PR supports create index for vector array (now, only for
`DataType.FLOAT_VECTOR`) and search on it.
The index type supported in this PR is `EMB_LIST_HNSW` and the metric
type is `MAX_SIM` only.

The way to use it:
```python
milvus_client = MilvusClient("xxx:19530")
schema = milvus_client.create_schema(enable_dynamic_field=True, auto_id=True)
...
struct_schema = milvus_client.create_struct_array_field_schema("struct_array_field")
...
struct_schema.add_field("struct_float_vec", DataType.ARRAY_OF_VECTOR, element_type=DataType.FLOAT_VECTOR, dim=128, max_capacity=1000)
...
schema.add_struct_array_field(struct_schema)
index_params = milvus_client.prepare_index_params()
index_params.add_index(field_name="struct_float_vec", index_type="EMB_LIST_HNSW", metric_type="MAX_SIM", index_params={"nlist": 128})
...
milvus_client.create_index(COLLECTION_NAME, schema=schema, index_params=index_params)
```

Note: This PR uses `Lims` to convey offsets of the vector array to
knowhere where vectors of multiple vector arrays are concatenated and we
need offsets to specify which vectors belong to which vector array.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-08-20 10:27:46 +08:00
Xianhui Lin
c7d8dc100a
fix: add segment lock in LoadTextIndex and LoadJSONKeyIndex (#43811)
fix: add segment lock in LoadTextIndex and LoadJSONKeyIndex
issue:https://github.com/milvus-io/milvus/issues/43572

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-08-18 01:17:52 +08:00
sparknack
544c7c0600
enhance: update cachinglayer default cache ratio to 0.3 (#43723)
issue: #41435

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-08-05 01:35:39 +08:00
zhagnlu
f14c7d598c
fix: skip load raw data when loading index for storagev2 (#43720)
#43653

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-08-04 21:17:39 +08:00
Chun Han
d826d6ac91
fix: try to get span raw data for variable length data type(#43544) (#43705)
related: #43544

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-08-04 11:15:38 +08:00
sparknack
bdd65871ea
enhance: tiered storage: estimate segment loading resource usage while considering eviction (#43323)
issue: #41435 

After introducing the caching layer's lazy loading and eviction
mechanisms, most parts of a segment won't be loaded into memory or disk
immediately, even if the segment is marked as LOADED. This means
physical resource usage may be very low. However, we still need to
reserve enough resources for the segments marked as LOADED. Thus, the
logic of resource usage estimation during segment loading, which based
on physcial resource usage only for now, should be changed.

To address this issue, we introduced the concept of logical resource
usage in this patch. This can be thought of as the base reserved
resource for each LOADED segment.

A segment’s logical resource usage is derived from its final evictable
and inevictable resource usage and calculated as follows:

```
SLR = SFPIER + evitable_cache_ratio * SFPER
```

it also equals to

```
SLR = (SFPIER + SFPER) - (1.0 - evitable_cache_ratio) * SFPER
```

`SLR`: The logical resource usage of a segment.
`SFPIER`: The final physical inevictable resource usage of a segment.
`SFPER`: The final physical evictable resource usage of a segment.
`evitable_cache_ratio`: The ratio of a segment's evictable resources
that can be cached locally. The higher the ratio, the more physical
memory is reserved for evictable memory.

When loading a segment, two types of resource usage are taken into
account.

First is the estimated maximum physical resource usage:

```
PPR = HPR + CPR + SMPR - SFPER
```

`PPR`: The predicted physical resource usage after the current segment
is allowed to load.
`HPR`: The physical resource usage obtained from hardware information.  
`CPR`: The total physical resource usage of segments that have been
committed but not yet loaded. When one new segment is allow to load,
`CPR' = CPR + (SMR - SER)`. When one of the committed segments is
loaded, `CPR' = CPR - (SMR - SER)`.
`SMPR`: The maximum physical resource usage of the current segment.
`SFPER`: The final physical evictable resource usage of the current
segment.

Second is the estimated logical resource usage, this check is only valid
when eviction is enabled:

```
PLR = LLR + CLR + SLR
```

`PLR`: The predicted logical resource usage after the current segment is
allowed to load.
`LLR`: The total logical resource usage of all loaded segments. When a
new segment is loaded, `LLR` should be updated to `LLR' = LLR + SLR`.
`CLR`: The total logical resource usage of segments that have been
committed but not yet loaded. When one new segment is allow to load,
`CLR' = CLR + SLR`. When one of the committed segments is loaded, `CLR'
= CLR - SLR`.
`SLR`: The logical resource usage of the current segment.

Only when `PPR < PRL && PLR < PRL` (`PRL`: Physical resource limit of
the querynode), the segment is allowed to be loaded.

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-08-01 21:31:37 +08:00
zhagnlu
2594250906
fix: fix miss loading index for storagev2 (#43674)
#43653

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-08-01 13:07:36 +08:00
Chun Han
d72c0357ff
fix: empty hybridsearch result due to one-sub-search empty(#43537) (#43647)
related: #43537

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-07-31 19:47:37 +08:00
Spade A
faeb7fd410
feat: impl StructArray -- create schema, insert, and retrieve data (#42855)
Ref https://github.com/milvus-io/milvus/issues/42148

https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of
storage for handling with VectorArray.
This PR:
1. impls the go part of storage for VectorArray
2. impls the collection creation with StructArrayField and VectorArray
3. insert and retrieve data from the collection.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>
2025-07-27 01:30:55 +08:00
Buqian Zheng
f7b262a702
feat: make storagev1 to support eviction (#43219)
issue: https://github.com/milvus-io/milvus/issues/41435

turns out we have per file binlog size in golang code, by passing it
into segcore we can support eviction in storage v1

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-07-19 02:02:52 +08:00
Zhen Ye
3aacd179f7
fix: balance channel before balance segment when upgrading (#43346)
issue: #43117, #42966, #43373

- also fix channel balance may not work at 2.6.
- fix error lost at delete path
- add mvcc into s/q log
- change the log level for TestCoordDownSearch

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-17 20:16:52 +08:00
Chun Han
07745439b5
fix: empty search groupby result causing crash(#43137) (#43214)
related: #43137

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-07-10 12:04:48 +08:00
congqixia
f027eea545
enhance: [AddField] Add log for segcore segment schema change (#43215)
Related to #39178

This PR add logs for segment schema change operations.

Also fixes the nit comments from PR #42490

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-10 10:22:47 +08:00
congqixia
7bc7b18ed5
fix: [AddField] Prevent concurrent load during UpdateSchema (#43043)
Related to #43028

This PR:
- Add mutex prevent concurrent load segment & schema change
- Add schema verison field in load meta
- Update schema in PutOrRef if schema verison is larger

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-02 17:38:44 +08:00
wei liu
c381bf3e41
enhance: add logs for count(*) (#43001)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-07-01 19:36:43 +08:00
Spade A
26ec841feb
feat: optimize Like query with n-gram (#41803)
Ref #42053

This is the first PR for optimizing `LIKE` with ngram inverted index.
Now, only VARCHAR data type is supported and only InnerMatch LIKE
(%xxx%) query is supported.


How to use it:
```
milvus_client = MilvusClient("http://localhost:19530")
schema = milvus_client.create_schema()
...
schema.add_field("content_ngram", DataType.VARCHAR, max_length=10000)
...
index_params = milvus_client.prepare_index_params()
index_params.add_index(field_name="content_ngram", index_type="NGRAM", index_name="ngram_index", min_gram=2, max_gram=3)
milvus_client.create_collection(COLLECTION_NAME, ...)
```

min_gram and max_gram controls how we tokenize the documents. For
example, for min_gram=2 and max_gram=4, we will tokenize each document
with 2-gram, 3-gram and 4-gram.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-07-01 10:08:44 +08:00
sthuang
0d57acb13a
enhance: [StorageV2] field id as meta path for wide column when load (#42863)
related: #42862 #39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-06-25 11:08:48 +08:00
congqixia
4ba177cd2c
enhance: [StorageV2] Handle narrow column group resource estimation (#42842)
Related to #39173

In storage v2, "narrow" column group could have group id not mapped
schema, which causing loading fails or resource estimation result
inaccurate.

This PR handles this case by mapping binlog from index instead of vice
versa.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-06-19 14:44:39 +08:00
Spade A
e2c85eec81
fix: load stats index based on mmap config (#42788)
ref https://github.com/milvus-io/milvus/issues/42626

This PR makes text match index and json key stats index be loaded based
on mmap config.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-06-19 10:10:39 +08:00
Chun Han
001619aef9
feat: supporing load priority for loading (#42413)
related: #40781

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-06-17 15:22:38 +08:00
congqixia
c9bc70f272
fix: [AddField] Use shared_ptr of schema in plan fixing dangling ref (#42693)
Related to #42640

The search/query plan holded a reference to schema, which could be
destructed after schema change. This PR make plan hold a shared ptr to
it fixing dangling reference problem under concurrent read & schema
change.

This PR also remove field binlog check for loading index for old segment
with old schema may have binlog lack.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-06-12 20:46:36 +08:00
XuanYang-cn
83877b9faf
enhance: remove extra get collection (#42042)
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-06-10 18:34:35 +08:00
Zhen Ye
508264f953
fix: querynode upgrade from 2.5 get stucked (#42502)
issue: #42492

- consider the old RO query node (not streaming node) when balancing
channel.
- querynode graceful stop can be done if there's only L0 segment exists.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-06-04 11:20:30 +08:00
congqixia
b76478378a
feat: [Tiered] Make load list work as warmup hint (#42490)
Related to #42489
See also #41435

This PR's main target is to make partial load field list work as caching
layer warmup policy hint. If user specify load field list, the fields
not included in the list shall use `disabled` warmup policy and be able
to lazily loaded if any read op uses them.

The major changes are listed here:
- Pass load list to segcore and creating collection&schema
- Add util functions to check field shall be proactively loaded
- Adapt storage v2 column group, which may lead to hint fail if columns
share same group

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-06-04 10:28:32 +08:00
congqixia
cc42d49769
fix: [StorageV2][AddField] Handle lack binlog rows in storage v2 (#42186)
Related to #39173 #39718

In storage v2, the `lack_bin_rows` cannot be used since field id is not
column group id, which will not be matched forever.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-05-31 02:44:30 +08:00
congqixia
6d2ad519b1
enhance:[StorageV2] Adapt local storage & other minor issue (#42167)
Related to #39173

This PR
- Handle storage v2 log path in local storage mode on querynode
- Ignore field info check when append index for loaded sealed segment
when using storage v2

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-05-30 10:22:29 +08:00
cqy123456
5fe7015f63
enhance: InterimIndex support more index type and data type (#41021)
issue: https://github.com/milvus-io/milvus/issues/27678
cherry pick from : https://github.com/milvus-io/milvus/pull/39180,
https://github.com/milvus-io/milvus/pull/40429

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2025-05-28 08:40:28 +08:00
wei liu
dad43a3894
fix: cost metrics collection logic for replica selection (#41965)
issue: #41621
- Deprecate EnableWorkerSQCostMetrics parameter
- Always collect cost metrics from all search and retrieve results
- Update code with comments explaining the changes rationale

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-05-22 10:20:25 +08:00
Ted Xu
ae32203d3a
fix: support group by with nullable grouping keys (#41797)
See #36264

In this PR:
- Enhanced error handling in parse of grouping field.
- Fixed null handling in reduce tasks in proxy nodes. 
- Updated tests to reflect changes in error handling and data processing
logic.

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-05-17 20:54:22 +08:00
Buqian Zheng
b0260d8676
feat: manual evict cache after built interim index (#41836)
issue: https://github.com/milvus-io/milvus/issues/41435

this PR also makes HasRawData of ChunkedSegmentSealedImpl to return
based on metadata, without needing to load the cache just to answer this
simple question.

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-05-16 16:34:23 +08:00
congqixia
a6d09ff4cd
enhance: [StorageV2] fix issues integrating basic RW operations (#41834)
Related to #39173

This PR:
- Upgrade milvus-storage commit to fix filesystem finalized issue
- Add bucket-name as prefix for all fs style access io
- Initial arrow fs on querynodes startup
- Fix timestamp access when loading sealed segment

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-05-15 09:52:23 +08:00
congqixia
c45c1fadb2
enhance: [AddField] Keep all binlog when loading (#41809)
Related to #41726 #41736

The load field list blocks the new field from being loaded.

`load_fields` shall work as hint after tiered storage support API to
specifiy this behavior.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-05-14 17:30:21 +08:00
Buqian Zheng
ff5c2770e5
feat: cachinglayer: various improvements (#41546)
issue: https://github.com/milvus-io/milvus/issues/41435

this PR is based on https://github.com/milvus-io/milvus/pull/41436. 

Improvements include:

- Lazy Load support for Storage v1
- Use Low/High watermark to control eviction
- Caching Layer related config changes
- Removed ChunkCache related configs and code in golang
- Add `PinAllCells` helper method to CacheSlot class
- Modified ValueAt, RawAt, PrimitiveRawAt to Bulk version, to reduce
caching layer overhead
- Removed some unclear templated bulk_subscript methods
- CachedSearchIterator to store PinWrapper when searching on
ChunkedColumn, and removed unused contrustor.

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-05-10 09:19:16 +08:00
Buqian Zheng
3de904c7ea
feat: add cachinglayer to sealed segment (#41436)
issue: https://github.com/milvus-io/milvus/issues/41435

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-04-28 10:52:40 +08:00
congqixia
b5443ddbd0
enhance: [AddField] Reopen loaded segments after AddField (#41529)
Related to #39718

This PR:
- Add reopen logic for growing & sealed segments
- Lazy reopen when schema version increases
- Add FinishLoad api for loading progress

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-26 08:48:39 +08:00
Zhen Ye
5fd47c3c89
fix: mockery too unavailable after upgrade golang version (#41481)
issue: #41291
pr: #41318

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-24 10:46:43 +08:00
SimFG
91d40fa558
fix: Update logging context and upgrade dependencies (#41318)
- issue: #41291

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-04-23 10:52:38 +08:00