216 Commits

Author SHA1 Message Date
Buqian Zheng
389104d200
enhance: rename PanicInfo to ThrowInfo (#43384)
issue: #41435

this is to prevent AI from thinking of our exception throwing as a
dangerous PANIC operation that terminates the program.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-07-19 20:22:52 +08:00
Buqian Zheng
f7b262a702
feat: make storagev1 to support eviction (#43219)
issue: https://github.com/milvus-io/milvus/issues/41435

turns out we have per file binlog size in golang code, by passing it
into segcore we can support eviction in storage v1

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-07-19 02:02:52 +08:00
congqixia
ae48f0e484
fix: [StorageV2] Handle missing column creating index (#43292)
Related to #43250

Use FieldIDList to check missing field. If column is missing, return
empty resultset

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-14 17:06:50 +08:00
sthuang
276c52490d
fix: [StorageV2] missing arrow fs when building index (#43162)
fix: https://github.com/milvus-io/milvus/issues/43150,
https://github.com/milvus-io/milvus/issues/43149

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-07-07 15:26:46 +08:00
congqixia
1d9a9a993d
fix: [StorageV2] Use correct template typename for cache_raw_data_to_disk_common (#43104)
Related to #43099

Previously `cache_raw_data_to_disk_common` used `milvus::DataType`
template typename, which shall be `knowhere::bf16` or other actual
datatype.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-03 18:50:46 +08:00
Zhen Ye
bbbc7d4517
enhance: collect all cgo calling into metric and log slow cgo call (#43035)
issue: #42833

- also fix the error metric for async cgo.
- also make sure the roles can be seen when node startup, #43041.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-03 15:00:44 +08:00
sparknack
7e855f1046
enhance: add disk file writer with Direct IO support (#42665)
issue: #43040 

This patch introduces a disk file writer that supports Direct IO.

Currently, it is exclusively utilized during the QueryNode load process.

Below is its parameters:

1. `common.diskWriteMode`
This parameter controls the write mode of the local disk, which is used
to write temporary data downloaded from remote storage.
Currently, only QueryNode uses 'common.diskWrite*' parameters. Support
for other components will be added in the future.
The options include 'direct' and 'buffered'. The default value is
'buffered'.

2. `common.diskWriteBufferSizeKb`
Disk write buffer size in KB, only used when disk write mode is
'direct', default is 64KB.
Current valid range is [4, 65536]. If the value is not aligned to 4KB,
it will be rounded up to the nearest multiple of 4KB.

3. `common.diskWriteNumThreads`
This parameter controls the number of writer threads used for disk write
operations. The valid range is [0, hardware_concurrency].
It is designed to limit the maximum concurrency of disk write operations
to reduce the impact on disk read performance.
For example, if you want to limit the maximum concurrency of disk write
operations to 1, you can set this parameter to 1.
The default value is 0, which means the caller will perform write
operations directly without using an additional writer thread pool.
In this case, the maximum concurrency of disk write operations is
determined by the caller's thread pool size.

Both parameters can be updated during runtime.

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-07-02 22:18:44 +08:00
Spade A
26ec841feb
feat: optimize Like query with n-gram (#41803)
Ref #42053

This is the first PR for optimizing `LIKE` with ngram inverted index.
Now, only VARCHAR data type is supported and only InnerMatch LIKE
(%xxx%) query is supported.


How to use it:
```
milvus_client = MilvusClient("http://localhost:19530")
schema = milvus_client.create_schema()
...
schema.add_field("content_ngram", DataType.VARCHAR, max_length=10000)
...
index_params = milvus_client.prepare_index_params()
index_params.add_index(field_name="content_ngram", index_type="NGRAM", index_name="ngram_index", min_gram=2, max_gram=3)
milvus_client.create_collection(COLLECTION_NAME, ...)
```

min_gram and max_gram controls how we tokenize the documents. For
example, for min_gram=2 and max_gram=4, we will tokenize each document
with 2-gram, 3-gram and 4-gram.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-07-01 10:08:44 +08:00
sthuang
238bd30f42
fix: [StorageV2] end to end minor issues for sync, stats, and load (#42948)
Fix issues in end-to-end tests: 
1. **Split column groups based on schema**, rather than estimating by
average chunk row size. **Ensure column group consistency within a
segment**, to avoid errors caused by loading multiple column group
chunks simultaneously.
2. **Use sorted segmentId** when generating the stats binlog path, to
ensure consistent and correct file path resolution.
3. **Determine field IDs as follows**:
For multi-column column groups, retrieve the field ID list from
metadata.
For single-column column groups, use the column group ID directly as the
field ID.

related: #39173 
fix: #42862

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-06-27 14:44:42 +08:00
XuanYang-cn
0dfe5308e1
enhance: Tidy Download and decode in segcore storage (#42902)
1. Unify calling from GetObjectData
2. Move SetData inside Deserialize

See also: #40013

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-06-25 11:10:43 +08:00
sthuang
0d57acb13a
enhance: [StorageV2] field id as meta path for wide column when load (#42863)
related: #42862 #39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-06-25 11:08:48 +08:00
Xianhui Lin
b902960057
fix: revert remote jsonstats path (#42882)
fix: revert remote jsonstats path
relate-pr:https://github.com/milvus-io/milvus/pull/42676
issue:https://github.com/milvus-io/milvus/issues/42872

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-06-21 13:24:39 +08:00
sthuang
4a0a2441f2
enhance: [StorageV2] field id as meta path for wide column (#42787)
related: #39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-06-19 15:00:38 +08:00
Spade A
e2c85eec81
fix: load stats index based on mmap config (#42788)
ref https://github.com/milvus-io/milvus/issues/42626

This PR makes text match index and json key stats index be loaded based
on mmap config.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-06-19 10:10:39 +08:00
Spade A
80f1d707f7
fix: tidy up path for scalar index (#42676)
Ref #42626

This path tidy up path for scalar index including path for loading index
from remote storage and temporary path for buliding index.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-06-18 00:42:38 +08:00
Chun Han
001619aef9
feat: supporing load priority for loading (#42413)
related: #40781

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-06-17 15:22:38 +08:00
congqixia
f01ff57f3f
fix: [StorageV2] Use correct offset filling null bitmap (#42774)
Related to #39173

`null_bitmap_data()` returns raw pointer of null bitmap of Array. While
after slicing, this bitmap is not rewritten due to zero copy
implementation, so the current start pos maybe non-zero while
FillFieldData generating column `valid_data` array.

This PR add `offset` param for `FillFieldData` method, and force all
invocation pass correct offset of `null_bitmap_data` ptr.

Also update milvus-storage commit fixing reader failed to return data
when buffer size smaller than row group size problem.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-06-17 10:08:38 +08:00
Spade A
9873e0ee78
fix: fix text match index / json key stats index leak when segment released (#42655)
Ref https://github.com/milvus-io/milvus/issues/42626

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-06-13 04:28:37 +08:00
congqixia
c9bc70f272
fix: [AddField] Use shared_ptr of schema in plan fixing dangling ref (#42693)
Related to #42640

The search/query plan holded a reference to schema, which could be
destructed after schema change. This PR make plan hold a shared ptr to
it fixing dangling reference problem under concurrent read & schema
change.

This PR also remove field binlog check for loading index for old segment
with old schema may have binlog lack.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-06-12 20:46:36 +08:00
Spade A
911a8df17c
feat: impl StructArray -- data storage support in segcore (#42406)
Ref https://github.com/milvus-io/milvus/issues/42148
This PR mainly enables segcore to support array of vector (read and
write, but not indexing). Now only float vector as the element type is
supported.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-06-12 14:38:35 +08:00
Buqian Zheng
8511ede5f8
feat: add back queryNode.cache.warmup for compatibility (#42621)
issue: https://github.com/milvus-io/milvus/issues/41435

also make ChunkTranslator to load in parallel

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-06-12 10:56:40 +08:00
congqixia
499e9a0a73
fix: [AddField] Use corresponding datatype for int8/int16 def val (#42633)
Related to #42629

This PR handles converting default value to int8/int18 scalar with int32
default value definition

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-06-11 11:54:34 +08:00
Bingyi Sun
fbf5cb4e62
feat: Add json flat index (#39917)
issue: https://github.com/milvus-io/milvus/issues/35528

This PR introduces a JSON flat index that allows indexing JSON fields
and dynamic fields in the same way as other field types.

In a previous PR (#36750), we implemented a JSON index that requires
specifying a JSON path and casting a type. The only distinction lies in
the json_cast_type parameter. When json_cast_type is set to JSON type,
Milvus automatically creates a JSON flat index.

For details on how Tantivy interprets JSON data, refer to the [tantivy
documentation](https://github.com/quickwit-oss/tantivy/blob/main/doc/src/json.md#pitfalls-limitation-and-corner-cases).

Limitations
Array handling: Arrays do not function as nested objects. See the
[limitations
section](https://github.com/quickwit-oss/tantivy/blob/main/doc/src/json.md#arrays-do-not-work-like-nested-object)
for more details.

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-06-10 19:14:35 +08:00
sthuang
89c3afb12e
fix: [StorageV2] index/stats task level storage v2 fs (#42191)
related: #39173

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-06-10 11:06:35 +08:00
congqixia
f1188b6781
enhance: [storagev2] Support partition key isolation index (#42574)
Related to #39173

This patch make storage v2 support partition key isolation index feature

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-06-09 14:02:33 +08:00
congqixia
b50c4a7973
enhance: Make segcore thread name set correctly (#42497)
Previous PR: #42017 did not work due to following updated points by this
PR:

- Initialize the `name_map`, which not touched at all before
- Trim the thread name under 15 characters to fit syscall limit

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-06-06 16:26:32 +08:00
sthuang
490827974d
enhance: avoid shutdown sdk api in minio cm destructor (#42459)
related: #39173

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-06-04 09:58:39 +08:00
cqy123456
727f4ec24b
enhance:mmapchunkmanager allocates MmapChunkDescriptor itself (#42150)
issue: https://github.com/milvus-io/milvus/issues/42157

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2025-06-03 14:42:31 +08:00
congqixia
cc42d49769
fix: [StorageV2][AddField] Handle lack binlog rows in storage v2 (#42186)
Related to #39173 #39718

In storage v2, the `lack_bin_rows` cannot be used since field id is not
column group id, which will not be matched forever.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-05-31 02:44:30 +08:00
Chun Han
ed0df38605
enhance: resize high priority wqthreadpool dynamically(#40838) (#41549) (#41929)
related: #40838
pr: https://github.com/milvus-io/milvus/pull/41549

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
2025-05-30 10:18:36 +08:00
cqy123456
5fe7015f63
enhance: InterimIndex support more index type and data type (#41021)
issue: https://github.com/milvus-io/milvus/issues/27678
cherry pick from : https://github.com/milvus-io/milvus/pull/39180,
https://github.com/milvus-io/milvus/pull/40429

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2025-05-28 08:40:28 +08:00
sthuang
b9b554676c
fix: storage v2 get field data with correct column group files (#42107)
related: #39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-05-27 15:26:28 +08:00
congqixia
9fb0257bfa
enhance: Set thread name for segcore thread pool (#42017)
Thread name could be helpful when debugging thread explosion issues

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-05-22 19:06:27 +08:00
Buqian Zheng
8a85bc4213
fix: fixes async warmup deadlock (#41995)
issue: https://github.com/milvus-io/milvus/issues/41993

also updated cachinglayer metrics

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-05-22 09:54:24 +08:00
congqixia
f2a8330f87
fix: [StorageV2] Use correct group building index (#41925)
Related to #39173 #41534

This pr fixes an issue that building mem index may report datatype not
match error when collection split fields into multiple groups

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-05-20 13:26:23 +08:00
Buqian Zheng
ff5c2770e5
feat: cachinglayer: various improvements (#41546)
issue: https://github.com/milvus-io/milvus/issues/41435

this PR is based on https://github.com/milvus-io/milvus/pull/41436. 

Improvements include:

- Lazy Load support for Storage v1
- Use Low/High watermark to control eviction
- Caching Layer related config changes
- Removed ChunkCache related configs and code in golang
- Add `PinAllCells` helper method to CacheSlot class
- Modified ValueAt, RawAt, PrimitiveRawAt to Bulk version, to reduce
caching layer overhead
- Removed some unclear templated bulk_subscript methods
- CachedSearchIterator to store PinWrapper when searching on
ChunkedColumn, and removed unused contrustor.

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-05-10 09:19:16 +08:00
congqixia
bcf94a0754
fix: Remove noexcept from CacheIndexToDiskInternal (#41725)
Related to #41219

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-05-09 14:16:53 +08:00
congqixia
b1f3fe1f07
fix: Use sum of num_rows instead of last one (#41685)
Related to #41656

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-05-07 19:40:53 +08:00
sthuang
6c377b6e86
feat: Storage v2 index and stats raw data (#41534)
related: #39173

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-04-30 08:48:54 +08:00
Buqian Zheng
3de904c7ea
feat: add cachinglayer to sealed segment (#41436)
issue: https://github.com/milvus-io/milvus/issues/41435

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-04-28 10:52:40 +08:00
Chun Han
016920b023
fix: solve incompitable problem for none-encoding index(#40838) (#41369)
related: #40838

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-04-20 22:56:44 +08:00
sthuang
1f1c836fb9
feat: Storage v2 growing segment load (#41001)
support parallel loading sealed and growing segments with storage v2
format by async reading row groups.
related: #39173

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-04-16 17:14:33 +08:00
Chun Han
59b14d38f5
enhance: Optimize index format for improved load performance(#40838) (#40839)
related: https://github.com/milvus-io/milvus/issues/40838

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-04-15 03:10:30 +08:00
Bingyi Sun
bf617115ca
enhance: Remove single chunk segment related codes (#39249)
https://github.com/milvus-io/milvus/issues/39112

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-04-11 18:56:29 +08:00
Xianhui Lin
3bc24c264f
enhance: Add json key inverted index in stats for optimization (#38039)
Add json key inverted index in stats for optimization
https://github.com/milvus-io/milvus/issues/36995

---------

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-04-10 15:20:28 +08:00
zhagnlu
10a63b3f2e
enhance: add formatter for serveral types to remove compile warning (#41094)
#41091

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-04-07 11:54:24 +08:00
Spade A
216be1494b
fix: add log for object storage operation fail (#40666)
fix: #40665

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-02 01:26:21 +08:00
Bingyi Sun
8fbacf3583
fix: Null expr does not work for json field (#40456)
issue: https://github.com/milvus-io/milvus/issues/40455

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-03-14 16:06:08 +08:00
Chun Han
259f9106ad
enhance: refine variable-length-type memory usage(#38736) (#39578)
related: #38736

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-02-27 21:13:58 +08:00
Patrick Weizhi Xu
04fff74a56
feat: introduce Text data type (#39874)
issue: https://github.com/milvus-io/milvus/issues/39818

This PR mimics Varchar data type, allows insert, search, query, delete,
full-text search and others.
Functionalities related to filter expressions are disabled temporarily. 

Storage changes for Text data type will be in the following PRs.

Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>
2025-02-19 11:04:51 +08:00