2055 Commits

Author SHA1 Message Date
Buqian Zheng
d367770649
enhance: greatly reduce the loading memory overhead - by up to 25% (#43533)
issue: #43088
issue: #43038

The current loading process:

* When loading an index, we first download the index files into a list
of buffers, say A
* then constructing(copying) them into a vector of FieldDatas(each file
is a FieldData), say B
* assembles them together as a huge BinarySet, say C
* lastly, copy into the actual index data structure, say D

The problem:

* We can see that, after each step, we don't need the data in previous
step.
* But currently, we release the memory of A, B, C only after we have
finished constructing D
* This leads to a up to 4x peak memory usage comparing with the raw
index size, during the loading process
* This PR allows timely releasing of B after we assembled C. So after
this PR, the peak memory usage during loading will be up to 3x of the
raw index size.

I will create another PR to release A after we created B, that seems
more complicated and need more work.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-07-24 11:26:54 +08:00
zhagnlu
d64dceea47
fix:add convert int to float function to array_contains related expr (#43468)
#43281

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-07-23 15:20:53 +08:00
Buqian Zheng
7ced9fc5d9
fix: fix loading resource estimation (#43509)
currently we multiplied the requesting size when adding to loading, but
did not do so when estimating projected usage.

issue: #43088

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-07-23 10:36:53 +08:00
congqixia
cc1034fe96
fix: [AddField] Resolve FieldIndexing dangling reference (#43499)
Related to #43113

This PR:
- Change member of FieldIndex from `FieldMeta &` to needed `DataType`
and dim member resolving dangling reference after schema change
- Add double check after acquiring lock to reduce multiple assignment
- Change `auto schema` to `auto& schema` to reduce schema copy

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-23 00:14:52 +08:00
sthuang
59bbdd93f5
fix: [StorageV2] fill the correct group chunk into cell (#43486)
The root cause of the issue lies in the fact that when a sealed segment
contains multiple row groups, the get_cells function may receive
unordered cids. This can result in row groups being written into
incorrect cells during data retrieval.

Previously, this issue was hard to reproduce because the old Storage V2
writer had a bug that caused it to write row groups larger than 1MB.
These large row groups could lead to uncontrolled memory usage and
eventually an OOM (Out of Memory) error. Additionally, compaction
typically produced a single large row group, which avoided the incorrect
cell-filling issue during query execution.

related: https://github.com/milvus-io/milvus/issues/43388,
https://github.com/milvus-io/milvus/issues/43372,
https://github.com/milvus-io/milvus/issues/43464, #43446, #43453

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-07-22 22:22:53 +08:00
Buqian Zheng
0599113a4b
enhance: add timeout to resource reservation (#43441)
issue: https://github.com/milvus-io/milvus/issues/41435

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-07-22 15:24:53 +08:00
Chun Han
5a1092304c
fix: refine judgement for batch views(#38736) (#43481)
related: #38736

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-07-22 14:20:53 +08:00
sthuang
f77571d5c1
fix: [StorageV2] file writer write row group split to default size (#43471)
Bumped milvus storage version.
related: https://github.com/milvus-io/milvus/issues/43310

* https://github.com/milvus-io/milvus-storage/pull/213
* https://github.com/milvus-io/milvus-storage/pull/217
* https://github.com/milvus-io/milvus-storage/pull/220

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-07-22 09:52:52 +08:00
sthuang
6c5f5f1e32
enhance: [StorageV2] refactor group chunk translator (#43406)
related: #43372

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-07-21 19:46:53 +08:00
sparknack
81694739ef
fix: revert ska::flat_hash_set to std::unordered_set to address an un… (#43428)
issue: #43388

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-07-21 17:39:40 +08:00
aoiasd
e9fc140eaf
fix: jieba tokenizer cause panic when dict word was empty string (#43337)
relate: https://github.com/milvus-io/milvus/issues/42779

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-07-21 16:34:53 +08:00
aoiasd
c7b53ed43b
enhance: run rust format (#43447)
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-07-21 14:12:53 +08:00
Bingyi Sun
21e71f6eb2
fix: Check json nested path before validating data type (#43329)
issue: #43279

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-07-21 10:30:54 +08:00
Xianhui Lin
c13393418c
fix: invalid string error when enabled json stats (#43380)
fix: invalid string error when enabled json stats
issue: https://github.com/milvus-io/milvus/issues/43151

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-07-20 23:38:53 +08:00
aoiasd
f7e1f1c382
enhance: support download lindera system dictionary online (#43121)
relate: https://github.com/milvus-io/milvus/issues/43120

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-07-20 23:24:52 +08:00
Buqian Zheng
389104d200
enhance: rename PanicInfo to ThrowInfo (#43384)
issue: #41435

this is to prevent AI from thinking of our exception throwing as a
dangerous PANIC operation that terminates the program.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-07-19 20:22:52 +08:00
Buqian Zheng
f7b262a702
feat: make storagev1 to support eviction (#43219)
issue: https://github.com/milvus-io/milvus/issues/41435

turns out we have per file binlog size in golang code, by passing it
into segcore we can support eviction in storage v1

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-07-19 02:02:52 +08:00
Spade A
42ad786f75
fix: update tantivy for fixing dir removing race condition (#43399)
fix: https://github.com/milvus-io/milvus/issues/43258

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-07-18 15:44:56 +08:00
Buqian Zheng
d793def47c
feat: impose a physical memory limit when loading cells (#43222)
issue: #41435 

issue: https://github.com/milvus-io/milvus/issues/43038

This PR also:


1. removed ERROR state from ListNode
2. CacheSlot will do reserveMemory once for all requested cells after
updating the state to LOADING, so now we transit a cell to LOADING
before its resource reservation
3. reject resource reservation directly if size >= max_size

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-07-18 11:18:52 +08:00
Spade A
8612a2c946
enhance: optimize in by batch-in (#43268)
fix: https://github.com/milvus-io/milvus/issues/43267

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-07-17 19:40:52 +08:00
sparknack
9b4081e110
enhance: cachinglayer: some performance optimization (#42858)
issue: #41435

We compared the performance using the modified test_sealed.cpp, which
randomly accesses all rows in all chunks and counts the number of runs
within 3s.

## performance data comparison (ops/second)

chunk config: 1x1000

| Field Type | w/o cachinglayer (commit 640f526301) | w/ cachinglayer |
w/ cachinglayer + opt |
|---|---|---|---|
| Bool field | 82428 | -63.6% (29983) | +2.7% (84675) |
| Int8 field | 82228 | -63.3% (30166) | +2.4% (84163) |
| Int16 field | 82572 | -63.8% (29867) | +1.8% (84036) |
| Int32 field | 82797 | -63.7% (30031) | +1.5% (84043) |
| Int64 field | 81077 | -62.9% (30107) | +0.6% (81604) |
| Float field | 82678 | -63.4% (30266) | +1.8% (84146) |
| Double field | 81925 | -63.4% (29974) | +0.2% (82097) |
| Varchar field | 19933 | -19.6% (16027) | +18.9% (23690) |
| JSON field | 16519 | -96.8% (533) | +2.5% (16927) |
| Int array field | 7325 | -13.7% (6321) | -1.4% (7220) |
| Long array field | 6347 | -8.9% (5781) | -0.1% (6344) |
| Bool array field | 8275 | -14.0% (7116) | +0.4% (8311) |
| String array field | 2281 | -5.0% (2168) | +0.2% (2287) |
| Double array field | 6427 | -13.3% (5574) | -2.0% (6301) |
| Float array field | 7291 | -13.0% (6346) | -1.5% (7183) |
| Vector field | 27487 | -40.4% (16371) | -4.7% (26192) |
| Float16 vector field | 49773 | -54.6% (22601) | -5.9% (46834) |
| BFloat16 vector field | 49783 | -53.1% (23350) | -5.7% (46934) |
| Int8 vector field | 63871 | -59.0% (26179) | -6.2% (59926) |

---

chunk config: 10x1000

| Field Type | w/o cachinglayer (commit 640f526301) | w/ cachinglayer |
w/ cachinglayer + opt |
|---|---|---|---|
| Bool field | 3659 | -48.6% (1879) | +110.1% (7686) |
| Int8 field | 3410 | -45.3% (1864) | +123.9% (7636) |
| Int16 field | 3647 | -48.6% (1874) | +110.1% (7661) |
| Int32 field | 3647 | -48.8% (1866) | +109.6% (7645) |
| Int64 field | 3645 | -48.9% (1863) | +107.8% (7573) |
| Float field | 3647 | -49.0% (1861) | +109.5% (7639) |
| Double field | 3640 | -45.1% (1998) | +108.4% (7586) |
| Varchar field | 1594 | -23.9% (1213) | +20.6% (1922) |
| JSON field | 1202 | -26.5% (884) | +16.1% (1396) |
| Int array field | 602 | -12.3% (528) | +12.7% (678) |
| Long array field | 529 | -12.2% (465) | +7.5% (569) |
| Double array field | 537 | -13.0% (467) | +6.4% (571) |
| Vector field | 1520 | -37.9% (943) | -5.5% (1437) |
| Float16 vector field | 2607 | -47.0% (1382) | +6.4% (2774) |
| BFloat16 vector field | 2586 | -46.5% (1383) | +8.8% (2813) |
| Int8 vector field | 3101 | -47.3% (1633) | +41.9% (4400) |

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-07-17 11:20:51 +08:00
zhagnlu
ee43954534
fix:fix text_match bug because of not adapting to multi-chunk model (#43303)
https://github.com/milvus-io/milvus/issues/43296

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-07-17 10:32:51 +08:00
Spade A
d750816ba0
fix: remove std::string support for stlsort index (#43355)
fix: https://github.com/milvus-io/milvus/issues/43354

The current implementation of stdsort index is not supported for
std::string. Remove the code.

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-07-16 17:46:51 +08:00
foxspy
58a9e49066
enhance: update knowhere version (#43331)
issue: #42937 #43294

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-07-16 15:04:50 +08:00
Bingyi Sun
1b8c958cff
enhance: fix tantivy wrapper is freed after json flat executor is destructed (#43233)
Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-07-16 10:58:50 +08:00
congqixia
fe8de016d5
fix: [StorageV2] Align null bitmap offset when loading multi-chunk (#43321)
Related to #43262

This patch fixes following logic bug:
- When multiple chunks are loaded and size cannot be divided by 8, just
appending uint8_t as bitmap will cause null bitmap dislocation
- `null_bitmap_data()` points to start of whole row group, which may not
stand for current `arrow::Array`

The current solutions is:
- Reorganize the null_bitmap with currect size & offset
- Pass `array->offset()` in tuple to info the current offset

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-15 19:22:50 +08:00
Bingyi Sun
ccfaa7bee8
fix: Fix the bug when offsets is nullptr in bulk api (#43127)
issue: https://github.com/milvus-io/milvus/issues/42978

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-07-15 17:54:50 +08:00
Spade A
db91d85dbc
feat: more types of matches for ngram (#43081)
Ref https://github.com/milvus-io/milvus/issues/42053

This PR enable ngram to support more kinds of matches such as prefix and
postfix match.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-07-14 20:34:50 +08:00
Spade A
e14a52721e
enhance: use stl sort with high cardinality for data_type int (#43305)
fix: https://github.com/milvus-io/milvus/issues/43304

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-07-14 18:40:50 +08:00
congqixia
ae48f0e484
fix: [StorageV2] Handle missing column creating index (#43292)
Related to #43250

Use FieldIDList to check missing field. If column is missing, return
empty resultset

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-14 17:06:50 +08:00
foxspy
8171a2a0b5
enhance: update knowhere version (#43246)
issue: #42937

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-07-14 11:06:49 +08:00
Alexander Guzhva
a848c4a8c5
fix: fix incorrect bitset for the division comparison when the right is < 0 (#43179)
issue: https://github.com/milvus-io/milvus/issues/42900
@sunby Unfortunately, it is not that easy to fix as it was thought in
#43177

Upd: also handles `Inf` and `NaN` values, and the division by zero case
for `fp32` and `fp64`

Signed-off-by: Alexandr Guzhva <alexanderguzhva@gmail.com>
2025-07-11 19:04:49 +08:00
congqixia
6bbed3b019
fix: [AddField] Add shared_lock for insert prevent race (#43229)
Related to #43113

When schema change happens, insert shall not happen, otherwise:
- Data race may happen causing insertion failure
- Inconsistent data schema

This PR add shared_lock prevent this data race.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-10 21:26:48 +08:00
PjJinchen
a90694165b
feat: Supports tracing services that require header-based authentication. (#43211)
issue: https://github.com/milvus-io/milvus/issues/43082

support tracing services that require header-based authentication.
for example: aliyun SLS, volcengine LogService etc...

[aliyun
SLS](https://help.aliyun.com/zh/sls/import-trace-data-from-golang-applications-to-log-service-by-using-opentelemetry-sdk-for-golang?spm=a2c4g.11186623.help-menu-search-28958.d_1#section-ktk-xxz-8om)

Add a headers config in trace config

```
trace:
  exporter: otlp
  sampleFraction: 1
  otlp:
    endpoint:  milvus-cn-beijing-pre.cn-beijing.log.aliyuncs.com:10010
    method:  # otlp export method, acceptable values: ["grpc", "http"],  using "grpc" by default
    secure: true
    headers:  # base64
  initTimeoutSeconds: 10
```

it is encoded as base64, raw data is json
```
{
    "x-sls-otel-project": "milvus-cn-beijing-pre",
    "x-sls-otel-instance-id": "milvus-cn-beijing-pre",
    "x-sls-otel-ak-id": "xxx",
    "x-sls-otel-ak-secret": "xxx"
}
```

[volcengine
tls](https://www.volcengine.com/docs/6470/812322#grpc-%E5%8D%8F%E8%AE%AE%E5%88%9D%E5%A7%8B%E5%8C%96%E7%A4%BA%E4%BE%8B)

Add a headers config in trace config

```
trace:
  exporter: otlp
  sampleFraction: 1
  otlp:
    endpoint:  xxx
    method:  # otlp export method, acceptable values: ["grpc", "http"],  using "grpc" by default
    secure: true
    headers:  # base64
  initTimeoutSeconds: 10
```

it is encoded as base64, raw data is json
```
{
    "x-tls-otel-region": "cn-beijing",
    "x-tls-otel-tracetopic": "milvus-cn-beijing-pre",
    "x-tls-otel-ak": "xxx",
    "x-tls-otel-sk": "xxx"
}
```

Signed-off-by: PjJinchen <6268414+pj1987111@users.noreply.github.com>
2025-07-10 17:32:48 +08:00
Chun Han
07745439b5
fix: empty search groupby result causing crash(#43137) (#43214)
related: #43137

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-07-10 12:04:48 +08:00
congqixia
f027eea545
enhance: [AddField] Add log for segcore segment schema change (#43215)
Related to #39178

This PR add logs for segment schema change operations.

Also fixes the nit comments from PR #42490

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-10 10:22:47 +08:00
zhagnlu
21d1fb2aa3
fix: fix move cursor bug for chunk segment with index (#43095)
#42974

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-07-09 17:38:47 +08:00
Spade A
d41eec6f10
fix: void copy when getting json chunk (#43183)
fix: https://github.com/milvus-io/milvus/issues/43182

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-07-08 15:28:46 +08:00
sthuang
a0ae5bccc9
fix: [StorageV2] load growing segment get dim datatype check (#43168)
related: https://github.com/milvus-io/milvus/issues/43072

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-07-07 15:46:47 +08:00
sthuang
276c52490d
fix: [StorageV2] missing arrow fs when building index (#43162)
fix: https://github.com/milvus-io/milvus/issues/43150,
https://github.com/milvus-io/milvus/issues/43149

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-07-07 15:26:46 +08:00
sthuang
9f361a228e
enhance: storage v2 chunked column memory size from meta (#43130)
use meta to get chunked column memory size to avoid getting cells
actually from storage.
related: #39173

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-07-07 14:24:46 +08:00
Spade A
fce0bbe2ae
fix: remove redundant locks for null_offset (#43103)
Ref: https://github.com/milvus-io/milvus/issues/40308
https://github.com/milvus-io/milvus/pull/40363 add lock for protecting
concurrent read/write for null offset. But we don't need this for sealed
segment.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-07-04 10:10:45 +08:00
congqixia
1d9a9a993d
fix: [StorageV2] Use correct template typename for cache_raw_data_to_disk_common (#43104)
Related to #43099

Previously `cache_raw_data_to_disk_common` used `milvus::DataType`
template typename, which shall be `knowhere::bf16` or other actual
datatype.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-03 18:50:46 +08:00
Zhen Ye
bbbc7d4517
enhance: collect all cgo calling into metric and log slow cgo call (#43035)
issue: #42833

- also fix the error metric for async cgo.
- also make sure the roles can be seen when node startup, #43041.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-03 15:00:44 +08:00
sparknack
7e855f1046
enhance: add disk file writer with Direct IO support (#42665)
issue: #43040 

This patch introduces a disk file writer that supports Direct IO.

Currently, it is exclusively utilized during the QueryNode load process.

Below is its parameters:

1. `common.diskWriteMode`
This parameter controls the write mode of the local disk, which is used
to write temporary data downloaded from remote storage.
Currently, only QueryNode uses 'common.diskWrite*' parameters. Support
for other components will be added in the future.
The options include 'direct' and 'buffered'. The default value is
'buffered'.

2. `common.diskWriteBufferSizeKb`
Disk write buffer size in KB, only used when disk write mode is
'direct', default is 64KB.
Current valid range is [4, 65536]. If the value is not aligned to 4KB,
it will be rounded up to the nearest multiple of 4KB.

3. `common.diskWriteNumThreads`
This parameter controls the number of writer threads used for disk write
operations. The valid range is [0, hardware_concurrency].
It is designed to limit the maximum concurrency of disk write operations
to reduce the impact on disk read performance.
For example, if you want to limit the maximum concurrency of disk write
operations to 1, you can set this parameter to 1.
The default value is 0, which means the caller will perform write
operations directly without using an additional writer thread pool.
In this case, the maximum concurrency of disk write operations is
determined by the caller's thread pool size.

Both parameters can be updated during runtime.

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-07-02 22:18:44 +08:00
Spade A
26ec841feb
feat: optimize Like query with n-gram (#41803)
Ref #42053

This is the first PR for optimizing `LIKE` with ngram inverted index.
Now, only VARCHAR data type is supported and only InnerMatch LIKE
(%xxx%) query is supported.


How to use it:
```
milvus_client = MilvusClient("http://localhost:19530")
schema = milvus_client.create_schema()
...
schema.add_field("content_ngram", DataType.VARCHAR, max_length=10000)
...
index_params = milvus_client.prepare_index_params()
index_params.add_index(field_name="content_ngram", index_type="NGRAM", index_name="ngram_index", min_gram=2, max_gram=3)
milvus_client.create_collection(COLLECTION_NAME, ...)
```

min_gram and max_gram controls how we tokenize the documents. For
example, for min_gram=2 and max_gram=4, we will tokenize each document
with 2-gram, 3-gram and 4-gram.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-07-01 10:08:44 +08:00
Bingyi Sun
23c784cf69
fix: Fix querynode crash caused by json index (#42982)
issue: https://github.com/milvus-io/milvus/issues/42978

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-06-27 16:44:41 +08:00
XuanYang-cn
17f1ab71bb
enhance: Remove not inused BuildIndexInfo (#42926)
1. removed not inuse cgo methods in index_c.h/cpp
2. removed indexcogowrapper/build_index_info.go

See also: #39242

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-06-27 15:00:42 +08:00
congqixia
9b06ecb72f
enhance: [StorageV2] Release record and close reader (#42983)
Related to #39173

This PR
- Close packed reader after sort
- Release arrow.Record preventing memory leakage
- Invoke `pack_reader->Close()` for CloseReader

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-06-27 14:46:43 +08:00
sthuang
238bd30f42
fix: [StorageV2] end to end minor issues for sync, stats, and load (#42948)
Fix issues in end-to-end tests: 
1. **Split column groups based on schema**, rather than estimating by
average chunk row size. **Ensure column group consistency within a
segment**, to avoid errors caused by loading multiple column group
chunks simultaneously.
2. **Use sorted segmentId** when generating the stats binlog path, to
ensure consistent and correct file path resolution.
3. **Determine field IDs as follows**:
For multi-column column groups, retrieve the field ID list from
metadata.
For single-column column groups, use the column group ID directly as the
field ID.

related: #39173 
fix: #42862

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-06-27 14:44:42 +08:00