2020 Commits

Author SHA1 Message Date
congqixia
f027eea545
enhance: [AddField] Add log for segcore segment schema change (#43215)
Related to #39178

This PR add logs for segment schema change operations.

Also fixes the nit comments from PR #42490

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-10 10:22:47 +08:00
zhagnlu
21d1fb2aa3
fix: fix move cursor bug for chunk segment with index (#43095)
#42974

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-07-09 17:38:47 +08:00
Spade A
d41eec6f10
fix: void copy when getting json chunk (#43183)
fix: https://github.com/milvus-io/milvus/issues/43182

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-07-08 15:28:46 +08:00
sthuang
a0ae5bccc9
fix: [StorageV2] load growing segment get dim datatype check (#43168)
related: https://github.com/milvus-io/milvus/issues/43072

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-07-07 15:46:47 +08:00
sthuang
276c52490d
fix: [StorageV2] missing arrow fs when building index (#43162)
fix: https://github.com/milvus-io/milvus/issues/43150,
https://github.com/milvus-io/milvus/issues/43149

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-07-07 15:26:46 +08:00
sthuang
9f361a228e
enhance: storage v2 chunked column memory size from meta (#43130)
use meta to get chunked column memory size to avoid getting cells
actually from storage.
related: #39173

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-07-07 14:24:46 +08:00
Spade A
fce0bbe2ae
fix: remove redundant locks for null_offset (#43103)
Ref: https://github.com/milvus-io/milvus/issues/40308
https://github.com/milvus-io/milvus/pull/40363 add lock for protecting
concurrent read/write for null offset. But we don't need this for sealed
segment.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-07-04 10:10:45 +08:00
congqixia
1d9a9a993d
fix: [StorageV2] Use correct template typename for cache_raw_data_to_disk_common (#43104)
Related to #43099

Previously `cache_raw_data_to_disk_common` used `milvus::DataType`
template typename, which shall be `knowhere::bf16` or other actual
datatype.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-03 18:50:46 +08:00
Zhen Ye
bbbc7d4517
enhance: collect all cgo calling into metric and log slow cgo call (#43035)
issue: #42833

- also fix the error metric for async cgo.
- also make sure the roles can be seen when node startup, #43041.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-03 15:00:44 +08:00
sparknack
7e855f1046
enhance: add disk file writer with Direct IO support (#42665)
issue: #43040 

This patch introduces a disk file writer that supports Direct IO.

Currently, it is exclusively utilized during the QueryNode load process.

Below is its parameters:

1. `common.diskWriteMode`
This parameter controls the write mode of the local disk, which is used
to write temporary data downloaded from remote storage.
Currently, only QueryNode uses 'common.diskWrite*' parameters. Support
for other components will be added in the future.
The options include 'direct' and 'buffered'. The default value is
'buffered'.

2. `common.diskWriteBufferSizeKb`
Disk write buffer size in KB, only used when disk write mode is
'direct', default is 64KB.
Current valid range is [4, 65536]. If the value is not aligned to 4KB,
it will be rounded up to the nearest multiple of 4KB.

3. `common.diskWriteNumThreads`
This parameter controls the number of writer threads used for disk write
operations. The valid range is [0, hardware_concurrency].
It is designed to limit the maximum concurrency of disk write operations
to reduce the impact on disk read performance.
For example, if you want to limit the maximum concurrency of disk write
operations to 1, you can set this parameter to 1.
The default value is 0, which means the caller will perform write
operations directly without using an additional writer thread pool.
In this case, the maximum concurrency of disk write operations is
determined by the caller's thread pool size.

Both parameters can be updated during runtime.

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-07-02 22:18:44 +08:00
Spade A
26ec841feb
feat: optimize Like query with n-gram (#41803)
Ref #42053

This is the first PR for optimizing `LIKE` with ngram inverted index.
Now, only VARCHAR data type is supported and only InnerMatch LIKE
(%xxx%) query is supported.


How to use it:
```
milvus_client = MilvusClient("http://localhost:19530")
schema = milvus_client.create_schema()
...
schema.add_field("content_ngram", DataType.VARCHAR, max_length=10000)
...
index_params = milvus_client.prepare_index_params()
index_params.add_index(field_name="content_ngram", index_type="NGRAM", index_name="ngram_index", min_gram=2, max_gram=3)
milvus_client.create_collection(COLLECTION_NAME, ...)
```

min_gram and max_gram controls how we tokenize the documents. For
example, for min_gram=2 and max_gram=4, we will tokenize each document
with 2-gram, 3-gram and 4-gram.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-07-01 10:08:44 +08:00
Bingyi Sun
23c784cf69
fix: Fix querynode crash caused by json index (#42982)
issue: https://github.com/milvus-io/milvus/issues/42978

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-06-27 16:44:41 +08:00
XuanYang-cn
17f1ab71bb
enhance: Remove not inused BuildIndexInfo (#42926)
1. removed not inuse cgo methods in index_c.h/cpp
2. removed indexcogowrapper/build_index_info.go

See also: #39242

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-06-27 15:00:42 +08:00
congqixia
9b06ecb72f
enhance: [StorageV2] Release record and close reader (#42983)
Related to #39173

This PR
- Close packed reader after sort
- Release arrow.Record preventing memory leakage
- Invoke `pack_reader->Close()` for CloseReader

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-06-27 14:46:43 +08:00
sthuang
238bd30f42
fix: [StorageV2] end to end minor issues for sync, stats, and load (#42948)
Fix issues in end-to-end tests: 
1. **Split column groups based on schema**, rather than estimating by
average chunk row size. **Ensure column group consistency within a
segment**, to avoid errors caused by loading multiple column group
chunks simultaneously.
2. **Use sorted segmentId** when generating the stats binlog path, to
ensure consistent and correct file path resolution.
3. **Determine field IDs as follows**:
For multi-column column groups, retrieve the field ID list from
metadata.
For single-column column groups, use the column group ID directly as the
field ID.

related: #39173 
fix: #42862

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-06-27 14:44:42 +08:00
foxspy
be05b653c1
enhance: update knowhere version (#42938)
issue: #42937

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-06-26 01:22:41 +08:00
congqixia
336e743b55
fix: [AddField] Respect growing mmap setting adding empty field (#42933)
Related to #42856

Data under mmapped growing segment shall be treated respecting
growingMmap setting. Otherwise, varchar datatype could be treated with
logic error.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-06-25 21:10:42 +08:00
zhagnlu
69872f45ad
fix: fix is_not_in for trie index (#42716)
#42604

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-06-25 16:52:42 +08:00
XuanYang-cn
0dfe5308e1
enhance: Tidy Download and decode in segcore storage (#42902)
1. Unify calling from GetObjectData
2. Move SetData inside Deserialize

See also: #40013

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-06-25 11:10:43 +08:00
sthuang
0d57acb13a
enhance: [StorageV2] field id as meta path for wide column when load (#42863)
related: #42862 #39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-06-25 11:08:48 +08:00
sthuang
ad6d620e9f
fix: [StorageV2] Compiling debug mode throw DCHECK s3 initialize error (#42922)
related: https://github.com/milvus-io/milvus/issues/42844

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-06-24 19:30:41 +08:00
Spade A
50f7579d8f
fix: fix some bugs discovered by chaos tests (#42906)
fix: https://github.com/milvus-io/milvus/issues/42870

This PR fixes:
1. SetBitset fn shuold consider growing segments with concurrent write
2. avoid using from_raw_parts directly

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-06-24 16:32:42 +08:00
Bingyi Sun
669ea51ce5
enhance: Make json index compatible with caching layer (#42484)
issue: https://github.com/milvus-io/milvus/issues/42483

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-06-24 15:16:41 +08:00
zhagnlu
1024121ad9
fix:fix incorrect use of class member (#42885)
#39173

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-06-23 20:36:46 +08:00
cai.zhang
59b003adac
enhance: Skip modify field meta when rename collection or rename dbName (#42875)
issue: #42873

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-06-23 17:04:41 +08:00
Bingyi Sun
24e24caf14
fix: Remove cached null expr result (#42818)
issue: #42698
cached result may be changed in caller so there is no need to cache it

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-06-23 10:44:40 +08:00
Xianhui Lin
b902960057
fix: revert remote jsonstats path (#42882)
fix: revert remote jsonstats path
relate-pr:https://github.com/milvus-io/milvus/pull/42676
issue:https://github.com/milvus-io/milvus/issues/42872

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-06-21 13:24:39 +08:00
Spade A
e15926b40c
enhance: optimize tantivy cargo config (#42880)
fix: https://github.com/milvus-io/milvus/issues/42879

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-06-20 16:17:49 +08:00
aoiasd
43a9f7a79e
enhance: Add and run rust format command in makefile (#42807)
relate: https://github.com/milvus-io/milvus/issues/42806

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-06-20 10:22:39 +08:00
sthuang
4a0a2441f2
enhance: [StorageV2] field id as meta path for wide column (#42787)
related: #39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-06-19 15:00:38 +08:00
Spade A
e2c85eec81
fix: load stats index based on mmap config (#42788)
ref https://github.com/milvus-io/milvus/issues/42626

This PR makes text match index and json key stats index be loaded based
on mmap config.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-06-19 10:10:39 +08:00
aoiasd
d49989345b
enhance: forbid regex filter clone regex for each streamer (#42781)
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-06-18 16:10:39 +08:00
Spade A
80f1d707f7
fix: tidy up path for scalar index (#42676)
Ref #42626

This path tidy up path for scalar index including path for loading index
from remote storage and temporary path for buliding index.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-06-18 00:42:38 +08:00
congqixia
f9caad95b9
fix: [AddField] Check field empty instead of existence (#42789)
Related to #42773

Growing segment fills all known meta into `InsertRecord` data, which
cause even the field is missing, the field data will still exists.

This PR update the logic while finish loading growing segment to check
field empty or not instead of existence.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-06-17 17:22:39 +08:00
Chun Han
001619aef9
feat: supporing load priority for loading (#42413)
related: #40781

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-06-17 15:22:38 +08:00
zhagnlu
9c31a47c0f
fix:fix arith mod bug for big int (#42699)
#42624

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-06-17 10:36:38 +08:00
congqixia
f01ff57f3f
fix: [StorageV2] Use correct offset filling null bitmap (#42774)
Related to #39173

`null_bitmap_data()` returns raw pointer of null bitmap of Array. While
after slicing, this bitmap is not rewritten due to zero copy
implementation, so the current start pos maybe non-zero while
FillFieldData generating column `valid_data` array.

This PR add `offset` param for `FillFieldData` method, and force all
invocation pass correct offset of `null_bitmap_data` ptr.

Also update milvus-storage commit fixing reader failed to return data
when buffer size smaller than row group size problem.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-06-17 10:08:38 +08:00
zhagnlu
d35c33da9f
fix: fix wrong assgin to chunk object (#42672)
#39173

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-06-15 21:18:37 +08:00
Spade A
9873e0ee78
fix: fix text match index / json key stats index leak when segment released (#42655)
Ref https://github.com/milvus-io/milvus/issues/42626

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-06-13 04:28:37 +08:00
congqixia
c9bc70f272
fix: [AddField] Use shared_ptr of schema in plan fixing dangling ref (#42693)
Related to #42640

The search/query plan holded a reference to schema, which could be
destructed after schema change. This PR make plan hold a shared ptr to
it fixing dangling reference problem under concurrent read & schema
change.

This PR also remove field binlog check for loading index for old segment
with old schema may have binlog lack.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-06-12 20:46:36 +08:00
Spade A
911a8df17c
feat: impl StructArray -- data storage support in segcore (#42406)
Ref https://github.com/milvus-io/milvus/issues/42148
This PR mainly enables segcore to support array of vector (read and
write, but not indexing). Now only float vector as the element type is
supported.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-06-12 14:38:35 +08:00
Buqian Zheng
8511ede5f8
feat: add back queryNode.cache.warmup for compatibility (#42621)
issue: https://github.com/milvus-io/milvus/issues/41435

also make ChunkTranslator to load in parallel

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-06-12 10:56:40 +08:00
Bingyi Sun
6c16d3dbee
enhance: Add bulk api for json data (#42407)
issue: https://github.com/milvus-io/milvus/issues/42409

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-06-12 10:40:39 +08:00
foxspy
58f9278db7
fix: fix build interim index failures (#42679)
issue: #42028 

W20250522 09:52:55.785657 12779 ChunkedSegmentSealedImpl.cpp:1752]
[SERVER][generate_interim_index][CGO_LOAD][]fail to generate binlog
index, because bad optional access

After the cachelayer is added, num_rows_ can not be obtained before
interim index generated , and an external parameter pass is required

Signed-off-by: foxspy <xianliang.li@zilliz.com>
2025-06-12 05:12:39 +08:00
congqixia
499e9a0a73
fix: [AddField] Use corresponding datatype for int8/int16 def val (#42633)
Related to #42629

This PR handles converting default value to int8/int18 scalar with int32
default value definition

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-06-11 11:54:34 +08:00
Bingyi Sun
fbf5cb4e62
feat: Add json flat index (#39917)
issue: https://github.com/milvus-io/milvus/issues/35528

This PR introduces a JSON flat index that allows indexing JSON fields
and dynamic fields in the same way as other field types.

In a previous PR (#36750), we implemented a JSON index that requires
specifying a JSON path and casting a type. The only distinction lies in
the json_cast_type parameter. When json_cast_type is set to JSON type,
Milvus automatically creates a JSON flat index.

For details on how Tantivy interprets JSON data, refer to the [tantivy
documentation](https://github.com/quickwit-oss/tantivy/blob/main/doc/src/json.md#pitfalls-limitation-and-corner-cases).

Limitations
Array handling: Arrays do not function as nested objects. See the
[limitations
section](https://github.com/quickwit-oss/tantivy/blob/main/doc/src/json.md#arrays-do-not-work-like-nested-object)
for more details.

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-06-10 19:14:35 +08:00
cqy123456
c9680a5b56
fix: avoid load index or create interim index in ChunkedSegmentSealedImpl::HasRawData() (#42622)
issue: https://github.com/milvus-io/milvus/issues/42526

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2025-06-10 14:54:34 +08:00
cqy123456
317bbfbf81
enhance: milvus support minhash vector and mhjaccard metric (#42036)
issue:
https://github.com/issues/assigned?issue=milvus-io%7Cmilvus%7C41746

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2025-06-10 14:38:34 +08:00
Bingyi Sun
b3ecf77a66
fix: Fix the bug of valid data write corruption (#42556)
issue: https://github.com/milvus-io/milvus/issues/42554

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-06-10 14:22:34 +08:00
zhagnlu
2861096734
fix: Add explicit move semantics to get_batch_view interface (#42403)
#42401

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-06-10 13:06:35 +08:00