congqixia
8f97eb355f
enhance: [StorageV2] Make bucket name concatenation transparent to user ( #44232 )
...
Related to #39173
This PR:
- Bump milvus-storage commit to handle bucket name concatenation logic
in multipart s3 fs
- Remove all user-side bucket name concatenation code
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-08 10:15:55 +08:00
Gao
2e98cb0103
enhance: load resource estimation for tiered index ( #44171 )
...
issue: https://github.com/milvus-io/milvus/issues/42032
- Use bytes to estimate load resource in the whole estimation procedure
- Add num_rows and dim info for vector index to better estimate
- Disable eviction for tiered index's meta
---------
Signed-off-by: chasingegg <chao.gao@zilliz.com>
2025-09-04 19:41:53 +08:00
foxspy
d55bf49bf1
enhance: update knowhere version ( #44144 )
...
issue: #42937
---------
Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-09-03 01:31:53 +08:00
sparknack
70c8114e85
enhance: cachinglayer: resource management for segment loading ( #43846 )
...
issue: #41435
---------
Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-08-29 11:37:50 +08:00
XuanYang-cn
37a447d166
feat: Add CMEK cipher plugin ( #43722 )
...
1. Enable Milvus to read cipher configs
2. Enable cipher plugin in binlog reader and writer
3. Add a testCipher for unittests
4. Support pooling for datanode
5. Add encryption in storagev2
See also: #40321
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
---------
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-08-27 11:15:52 +08:00
Spade A
90a7e63665
enhance: collect doc_id from posting list directly for text match ( #43899 )
...
issue: https://github.com/milvus-io/milvus/issues/43898
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-08-27 10:39:52 +08:00
Gao
e97a618630
enhance: support readAt interface for remote input stream ( #43997 )
...
#42032
Also, fix the cacheoptfield method to work in storagev2.
Also, change the sparse related interface for knowhere version bump
#43974 .
Also, includes https://github.com/milvus-io/milvus/pull/44046 for metric
lost.
---------
Signed-off-by: chasingegg <chao.gao@zilliz.com>
Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com>
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Co-authored-by: marcelo.chen <marcelo.chen@zilliz.com>
Co-authored-by: Congqi Xia <congqi.xia@zilliz.com>
2025-08-26 11:19:58 +08:00
Gao
b602b4187d
enhance: upgrade aws-sdk from 1.9.234 to 1.11.352 ( #43916 )
...
issue: #43908
Signed-off-by: chasingegg <chao.gao@zilliz.com>
2025-08-19 11:11:45 +08:00
foxspy
647c2bca2d
enhance: Support streaming read and write of vector index files ( #43824 )
...
issue: #42032
Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-08-15 23:41:43 +08:00
sthuang
5e4eb4a6e0
enhance: [StorageV2] bump storage version ( #43871 )
...
related: https://github.com/milvus-io/milvus/issues/43869
bump storage version. include the following feature:
* https://github.com/milvus-io/milvus-storage/pull/231
* https://github.com/milvus-io/milvus-storage/pull/232
* https://github.com/milvus-io/milvus-storage/pull/233
Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-08-15 17:59:43 +08:00
Gao
81a0915c29
enhance: add milvus-common module to decouple knwhere & segcore ( #43624 )
...
issue: https://github.com/milvus-io/milvus/issues/42032
https://github.com/milvus-io/milvus/issues/41435
based on pr: https://github.com/milvus-io/milvus/pull/42124
---------
Signed-off-by: chasingegg <chao.gao@zilliz.com>
Co-authored-by: xianliang.li <xianliang.li@zilliz.com>
2025-08-11 14:09:42 +08:00
congqixia
1561a4ae8c
enhance: [StorageV2] Avoid create local parent dir if fs remote ( #43790 )
...
Related to #43752
milvus-storage pr: milvus-io/milvus-storage#230
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-08-08 10:19:40 +08:00
aoiasd
4f02b06abc
enhance: support set lindera dict build dir and download url in yaml ( #43541 )
...
relate: https://github.com/milvus-io/milvus/issues/43120
---------
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-08-04 09:47:38 +08:00
sparknack
4aabe23a45
enhance: update flat_hash_map.hpp to a modified version ( #43506 )
...
issue: #41435
Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-07-31 20:09:36 +08:00
sthuang
a2c7ed2780
fix: [StorageV2] sort field binlogs paths for packed reader and writer ( #43585 )
...
key changes:
* fix unstable storage v2 compaction unit test by guaranteeing the order
of paths during sync.
* bump milvus-storage version, include
https://github.com/milvus-io/milvus-storage/pull/222
https://github.com/milvus-io/milvus-storage/pull/223
https://github.com/milvus-io/milvus-storage/pull/224
https://github.com/milvus-io/milvus-storage/pull/225
https://github.com/milvus-io/milvus-storage/pull/226
* Also fix the below related oom issue.
related: https://github.com/milvus-io/milvus/issues/43310
Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-07-30 08:09:36 +08:00
foxspy
d57890449f
enhance: update knowhere version ( #43528 )
...
issue: #42937
Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-07-29 17:21:36 +08:00
aoiasd
c9412434c8
enhance: add char group tokenizer ( #42793 )
...
relate: https://github.com/milvus-io/milvus/issues/42792
Add char group tokenizer which support use costum char group or use some
build-in char group as delimiters.
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-07-29 11:11:35 +08:00
sthuang
f77571d5c1
fix: [StorageV2] file writer write row group split to default size ( #43471 )
...
Bumped milvus storage version.
related: https://github.com/milvus-io/milvus/issues/43310
* https://github.com/milvus-io/milvus-storage/pull/213
* https://github.com/milvus-io/milvus-storage/pull/217
* https://github.com/milvus-io/milvus-storage/pull/220
Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-07-22 09:52:52 +08:00
aoiasd
e9fc140eaf
fix: jieba tokenizer cause panic when dict word was empty string ( #43337 )
...
relate: https://github.com/milvus-io/milvus/issues/42779
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-07-21 16:34:53 +08:00
aoiasd
c7b53ed43b
enhance: run rust format ( #43447 )
...
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-07-21 14:12:53 +08:00
aoiasd
f7e1f1c382
enhance: support download lindera system dictionary online ( #43121 )
...
relate: https://github.com/milvus-io/milvus/issues/43120
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-07-20 23:24:52 +08:00
Spade A
42ad786f75
fix: update tantivy for fixing dir removing race condition ( #43399 )
...
fix: https://github.com/milvus-io/milvus/issues/43258
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-07-18 15:44:56 +08:00
Spade A
8612a2c946
enhance: optimize in by batch-in ( #43268 )
...
fix: https://github.com/milvus-io/milvus/issues/43267
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-07-17 19:40:52 +08:00
sparknack
9b4081e110
enhance: cachinglayer: some performance optimization ( #42858 )
...
issue: #41435
We compared the performance using the modified test_sealed.cpp, which
randomly accesses all rows in all chunks and counts the number of runs
within 3s.
## performance data comparison (ops/second)
chunk config: 1x1000
| Field Type | w/o cachinglayer (commit 640f526301) | w/ cachinglayer |
w/ cachinglayer + opt |
|---|---|---|---|
| Bool field | 82428 | -63.6% (29983) | +2.7% (84675) |
| Int8 field | 82228 | -63.3% (30166) | +2.4% (84163) |
| Int16 field | 82572 | -63.8% (29867) | +1.8% (84036) |
| Int32 field | 82797 | -63.7% (30031) | +1.5% (84043) |
| Int64 field | 81077 | -62.9% (30107) | +0.6% (81604) |
| Float field | 82678 | -63.4% (30266) | +1.8% (84146) |
| Double field | 81925 | -63.4% (29974) | +0.2% (82097) |
| Varchar field | 19933 | -19.6% (16027) | +18.9% (23690) |
| JSON field | 16519 | -96.8% (533) | +2.5% (16927) |
| Int array field | 7325 | -13.7% (6321) | -1.4% (7220) |
| Long array field | 6347 | -8.9% (5781) | -0.1% (6344) |
| Bool array field | 8275 | -14.0% (7116) | +0.4% (8311) |
| String array field | 2281 | -5.0% (2168) | +0.2% (2287) |
| Double array field | 6427 | -13.3% (5574) | -2.0% (6301) |
| Float array field | 7291 | -13.0% (6346) | -1.5% (7183) |
| Vector field | 27487 | -40.4% (16371) | -4.7% (26192) |
| Float16 vector field | 49773 | -54.6% (22601) | -5.9% (46834) |
| BFloat16 vector field | 49783 | -53.1% (23350) | -5.7% (46934) |
| Int8 vector field | 63871 | -59.0% (26179) | -6.2% (59926) |
---
chunk config: 10x1000
| Field Type | w/o cachinglayer (commit 640f526301) | w/ cachinglayer |
w/ cachinglayer + opt |
|---|---|---|---|
| Bool field | 3659 | -48.6% (1879) | +110.1% (7686) |
| Int8 field | 3410 | -45.3% (1864) | +123.9% (7636) |
| Int16 field | 3647 | -48.6% (1874) | +110.1% (7661) |
| Int32 field | 3647 | -48.8% (1866) | +109.6% (7645) |
| Int64 field | 3645 | -48.9% (1863) | +107.8% (7573) |
| Float field | 3647 | -49.0% (1861) | +109.5% (7639) |
| Double field | 3640 | -45.1% (1998) | +108.4% (7586) |
| Varchar field | 1594 | -23.9% (1213) | +20.6% (1922) |
| JSON field | 1202 | -26.5% (884) | +16.1% (1396) |
| Int array field | 602 | -12.3% (528) | +12.7% (678) |
| Long array field | 529 | -12.2% (465) | +7.5% (569) |
| Double array field | 537 | -13.0% (467) | +6.4% (571) |
| Vector field | 1520 | -37.9% (943) | -5.5% (1437) |
| Float16 vector field | 2607 | -47.0% (1382) | +6.4% (2774) |
| BFloat16 vector field | 2586 | -46.5% (1383) | +8.8% (2813) |
| Int8 vector field | 3101 | -47.3% (1633) | +41.9% (4400) |
---------
Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-07-17 11:20:51 +08:00
foxspy
58a9e49066
enhance: update knowhere version ( #43331 )
...
issue: #42937 #43294
Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-07-16 15:04:50 +08:00
Spade A
db91d85dbc
feat: more types of matches for ngram ( #43081 )
...
Ref https://github.com/milvus-io/milvus/issues/42053
This PR enable ngram to support more kinds of matches such as prefix and
postfix match.
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-07-14 20:34:50 +08:00
foxspy
8171a2a0b5
enhance: update knowhere version ( #43246 )
...
issue: #42937
Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-07-14 11:06:49 +08:00
Spade A
26ec841feb
feat: optimize Like query with n-gram ( #41803 )
...
Ref #42053
This is the first PR for optimizing `LIKE` with ngram inverted index.
Now, only VARCHAR data type is supported and only InnerMatch LIKE
(%xxx%) query is supported.
How to use it:
```
milvus_client = MilvusClient("http://localhost:19530 ")
schema = milvus_client.create_schema()
...
schema.add_field("content_ngram", DataType.VARCHAR, max_length=10000)
...
index_params = milvus_client.prepare_index_params()
index_params.add_index(field_name="content_ngram", index_type="NGRAM", index_name="ngram_index", min_gram=2, max_gram=3)
milvus_client.create_collection(COLLECTION_NAME, ...)
```
min_gram and max_gram controls how we tokenize the documents. For
example, for min_gram=2 and max_gram=4, we will tokenize each document
with 2-gram, 3-gram and 4-gram.
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-07-01 10:08:44 +08:00
foxspy
be05b653c1
enhance: update knowhere version ( #42938 )
...
issue: #42937
Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-06-26 01:22:41 +08:00
sthuang
ad6d620e9f
fix: [StorageV2] Compiling debug mode throw DCHECK s3 initialize error ( #42922 )
...
related: https://github.com/milvus-io/milvus/issues/42844
Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-06-24 19:30:41 +08:00
Spade A
50f7579d8f
fix: fix some bugs discovered by chaos tests ( #42906 )
...
fix: https://github.com/milvus-io/milvus/issues/42870
This PR fixes:
1. SetBitset fn shuold consider growing segments with concurrent write
2. avoid using from_raw_parts directly
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-06-24 16:32:42 +08:00
Spade A
e15926b40c
enhance: optimize tantivy cargo config ( #42880 )
...
fix: https://github.com/milvus-io/milvus/issues/42879
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-06-20 16:17:49 +08:00
aoiasd
43a9f7a79e
enhance: Add and run rust format command in makefile ( #42807 )
...
relate: https://github.com/milvus-io/milvus/issues/42806
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-06-20 10:22:39 +08:00
Spade A
e2c85eec81
fix: load stats index based on mmap config ( #42788 )
...
ref https://github.com/milvus-io/milvus/issues/42626
This PR makes text match index and json key stats index be loaded based
on mmap config.
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-06-19 10:10:39 +08:00
aoiasd
d49989345b
enhance: forbid regex filter clone regex for each streamer ( #42781 )
...
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-06-18 16:10:39 +08:00
congqixia
f01ff57f3f
fix: [StorageV2] Use correct offset filling null bitmap ( #42774 )
...
Related to #39173
`null_bitmap_data()` returns raw pointer of null bitmap of Array. While
after slicing, this bitmap is not rewritten due to zero copy
implementation, so the current start pos maybe non-zero while
FillFieldData generating column `valid_data` array.
This PR add `offset` param for `FillFieldData` method, and force all
invocation pass correct offset of `null_bitmap_data` ptr.
Also update milvus-storage commit fixing reader failed to return data
when buffer size smaller than row group size problem.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-06-17 10:08:38 +08:00
Bingyi Sun
fbf5cb4e62
feat: Add json flat index ( #39917 )
...
issue: https://github.com/milvus-io/milvus/issues/35528
This PR introduces a JSON flat index that allows indexing JSON fields
and dynamic fields in the same way as other field types.
In a previous PR (#36750 ), we implemented a JSON index that requires
specifying a JSON path and casting a type. The only distinction lies in
the json_cast_type parameter. When json_cast_type is set to JSON type,
Milvus automatically creates a JSON flat index.
For details on how Tantivy interprets JSON data, refer to the [tantivy
documentation](https://github.com/quickwit-oss/tantivy/blob/main/doc/src/json.md#pitfalls-limitation-and-corner-cases ).
Limitations
Array handling: Arrays do not function as nested objects. See the
[limitations
section](https://github.com/quickwit-oss/tantivy/blob/main/doc/src/json.md#arrays-do-not-work-like-nested-object )
for more details.
---------
Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-06-10 19:14:35 +08:00
cqy123456
317bbfbf81
enhance: milvus support minhash vector and mhjaccard metric ( #42036 )
...
issue:
https://github.com/issues/assigned?issue=milvus-io%7Cmilvus%7C41746
Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2025-06-10 14:38:34 +08:00
aoiasd
fd6e2b52ff
enhance: use english name as language name for all type language identifier ( #42600 )
...
Set whatlang detect return language name as english name.
Make sure same with lingua.
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-06-10 10:24:35 +08:00
aoiasd
6e16653597
fix: update tantivy commit version to fix stemmer panic ( #42171 )
...
relate: https://github.com/milvus-io/milvus/issues/42168
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-06-09 10:34:33 +08:00
foxspy
3dbad0306a
fix: Add bypass thread pool mode to avoid growing indexes blocking insert/load ( #41012 )
...
issue: #40825
Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-05-20 14:30:24 +08:00
congqixia
a22088a380
enhance: [StorageV2] Make packed reader use correct path ( #41919 )
...
Related to #39173
This PR
- Use updated path with bucketName for packedReader
- Update milvus-storage commit to report reader/writer initialization
failure, see also milvus-io/milvus-storage#192
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-05-20 10:36:23 +08:00
congqixia
3bbc0fa560
enhance: [StorageV2] update storage to pass endpoint as-is ( #41889 )
...
Related to milvus-io/milvus-storage#190
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-05-16 18:06:21 +08:00
Buqian Zheng
b0260d8676
feat: manual evict cache after built interim index ( #41836 )
...
issue: https://github.com/milvus-io/milvus/issues/41435
this PR also makes HasRawData of ChunkedSegmentSealedImpl to return
based on metadata, without needing to load the cache just to answer this
simple question.
---------
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-05-16 16:34:23 +08:00
congqixia
a6d09ff4cd
enhance: [StorageV2] fix issues integrating basic RW operations ( #41834 )
...
Related to #39173
This PR:
- Upgrade milvus-storage commit to fix filesystem finalized issue
- Add bucket-name as prefix for all fs style access io
- Initial arrow fs on querynodes startup
- Fix timestamp access when loading sealed segment
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-05-15 09:52:23 +08:00
foxspy
358bc150df
enhance: add force rebuild index configuration ( #41473 )
...
issue: #41431
Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-05-14 10:52:21 +08:00
foxspy
e2ddbe4962
feat: add cachinglayer to index ( #41653 )
...
issue: #41435
Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-05-08 10:12:54 +08:00
Bingyi Sun
0dee3ccfd7
enhance: Make user specified doc id selectable for tantivy index writer ( #41528 )
...
issue: https://github.com/milvus-io/milvus/issues/41527
---------
Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-05-07 10:48:53 +08:00
foxspy
1d99f8bd67
enhance: add force rebuild index configuration ( #41473 )
...
issue: #41431
Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-04-29 16:20:56 +08:00
Spade A
910f68c986
fix: update tantivy to fix tantivy doc out of order when merge ( #41596 )
...
issue: #41597
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-29 13:46:49 +08:00