651 Commits

Author SHA1 Message Date
Patrick Weizhi Xu
04fff74a56
feat: introduce Text data type (#39874)
issue: https://github.com/milvus-io/milvus/issues/39818

This PR mimics Varchar data type, allows insert, search, query, delete,
full-text search and others.
Functionalities related to filter expressions are disabled temporarily. 

Storage changes for Text data type will be in the following PRs.

Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>
2025-02-19 11:04:51 +08:00
congqixia
59881a7f73
fix: Remove load field & schema column size check (#39833)
Related to #39788

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-02-18 16:24:51 +08:00
zhagnlu
316534e065
enhance: optimize delete init construct code (#39327)
#39326

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-02-17 21:05:26 +08:00
Bingyi Sun
b59555057d
feat: support json index (#36750)
https://github.com/milvus-io/milvus/issues/35528

This PR adds json index support for json and dynamic fields. Now you can
only do unary query like 'a["b"] > 1' using this index. We will support
more filter type later.

basic usage:
```
collection.create_index("json_field", {"index_type": "INVERTED",
    "params": {"json_cast_type": DataType.STRING, "json_path":
'json_field["a"]["b"]'}})
```

There are some limits to use this index:
1. If a record does not have the json path you specify, it will be
ignored and there will not be an error.
2. If a value of the json path fails to be cast to the type you specify,
it will be ignored and there will not be an error.
3. A specific json path can have only one json index.
4. If you try to create more than one json indexes for one json field,
sdk(pymilvus<=2.4.7) may return immediately because of internal
implementation. This will be fixed in a later version.

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-02-15 14:06:15 +08:00
aoiasd
24d2bbc441
enhance: unmashall ts msg in dispatcher instead in msgstream (#38656)
relate: https://github.com/milvus-io/milvus/issues/38655

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-02-14 12:04:13 +08:00
sparknack
2d9bef44d4
fix: sparse: add inverted_index_algo and dim_max_score_ratio config (#39358)
issue: #39332

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-02-07 16:40:44 +08:00
sthuang
c4ae9f4ece
feat: introduce third-party milvus-storage (#39418)
related: https://github.com/milvus-io/milvus/issues/39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-01-24 17:21:13 +08:00
Cai Yudong
5730b69e56
feat: Enable more VECTOR_INT8 unittest (#39569)
Issue: #38666

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2025-01-24 17:03:07 +08:00
congqixia
844df76cc0
enhance: Rectify run_clang_format grep command (#39534)
Previously the grep with regex does not work and failed to match lots of
.cpp files

This PR:
- use "-E" flag to use regex match
- commit the fixed result of current cpp code

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-01-23 17:07:05 +08:00
Cai Yudong
341d6c1eb7
feat: Update segcore for VECTOR_INT8 (#39415)
Issue: #38666

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2025-01-21 11:03:03 +08:00
congqixia
45d49df89b
fix: Skip load extra indexes for sorted segment pk field (#39389)
Related to #39339

Extra indexes can be ignored for most cases since sorted pk column
already provided indexing features

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-01-20 18:13:15 +08:00
congqixia
7cac87caca
fix: Skip erase field if index build on PK field (#39370)
Related to #39339

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-01-17 20:31:02 +08:00
congqixia
eb63334312
enhance: Add try-catch and return CStatus for NewCollection (#39279)
Related to #28795

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-01-15 19:17:01 +08:00
Cai Yudong
5bf1b2b929
feat: Support Int8Vector in go (#38990)
Issue: #38666

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2025-01-14 20:43:06 +08:00
congqixia
da1b786ef8
enhance: Utilize "find0" in segment.find_first (#39229)
Related to #39003

Previous PR #39004 has to clone & flip bitset due to bitset does not
support find0 operator. #39176 added this feature so clone & flip could
be removed now.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-01-14 14:14:58 +08:00
Zhen Ye
3e788f0fbd
enhance: record memory size (uncompressed) item for index (#38770)
issue: #38715

- Current milvus use a serialized index size(compressed) for estimate
resource for loading.
- Add a new field `MemSize` (before compressing) for index to estimate
resource.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-14 10:33:06 +08:00
Bingyi Sun
a00ba861a4
fix: Fix in filter search result is empty if pk type is varchar (#39106)
https://github.com/milvus-io/milvus/issues/39107

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-01-13 16:14:58 +08:00
Zhen Ye
5f94954bb4
fix: data race when accessing field_ when retrieving (#39151)
issue: #39148

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-13 11:23:04 +08:00
Spade A
8abf6c9149
fix: build text index when loading field data (#39070)
fix: https://github.com/milvus-io/milvus/issues/39053
may fix https://github.com/milvus-io/milvus/issues/38644 which could be
caused by https://github.com/milvus-io/milvus/issues/39053

---------

Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-01-09 15:24:56 +08:00
Ted Xu
3dc95153b7
fix: build break under debug mode (#38790)
See #38435

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-01-07 17:36:56 +08:00
Chun Han
3739446a33
enhance: refine array view to optimize memory usage(#38736) (#38808)
related: #38736

700m data, array_length=10
non-mmap_offsets_uint64: 2.0G
mmap_offsets_uint64: 1.1G
mmap_offsets_uint32: 880MB

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-01-07 13:26:55 +08:00
congqixia
72f5b85c05
enhance: Accelerate find_first by utilizing bitset simd methods (#39004)
Related to #39003

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-01-07 10:34:54 +08:00
zhagnlu
8165044b6d
fix: fix query incorrect in case of concurrent delete (#38991)
#38961

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-01-06 15:14:54 +08:00
Spade A
4245c5bed1
fix: text match panics when enable_match is set be false (#38950)
fix: https://github.com/milvus-io/milvus/issues/38949

---------

Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-01-03 14:20:55 +08:00
aoiasd
bc15ad24f2
fix: sealed segment get empty index params when brute force search for bm25 (#38707)
relate: https://github.com/milvus-io/milvus/issues/38236

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-12-25 19:06:51 +08:00
Ted Xu
acc8fb7af6
enhance: eliminate compile warnings (part2) (#38535)
See #38435

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-12-25 15:30:50 +08:00
zhagnlu
8fcb33c21d
fix:fix delete record assert failed (#38580)
#38472

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-12-19 18:22:47 +08:00
zhagnlu
87056be748
fix: fix snapshot or size when query (#38549)
#38472

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-12-18 16:42:45 +08:00
Chun Han
decdfdae10
fix: growing-groupby-crush(#38533) (#38538)
related: #38533

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2024-12-17 21:05:12 +08:00
zhagnlu
9afcc5bc5c
fix:fix incorrect dir operations when create or load inverted index (#38359)
#37944

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-12-17 20:06:45 +08:00
zhagnlu
d0a7e98a27
fix:remove incorrect assert for delete query (#38509)
#38472

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-12-17 17:48:44 +08:00
Bingyi Sun
dd4f33ae19
fix: Fix chunked segment can not warmup using mmap (#38492)
issue: #38410

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-12-17 13:42:45 +08:00
Ted Xu
4919ccf543
enhance: eliminate compile warnings (#38420)
See: #38435

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-12-16 09:58:43 +08:00
zhagnlu
01de0afc4e
enhance: refactor delete mvcc function (#38066)
#37413

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-12-15 18:02:43 +08:00
Chun Han
c1f9158996
fix: search-group-by failed to get data from multi-chunked-segment(##… (#38383)
related: #38343

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2024-12-13 16:54:43 +08:00
Gao
994fc544e7
enhance: support iterative filter execution (#37363)
issue: #37360

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2024-12-11 11:32:44 +08:00
Zhen Ye
c6dcef7b84
enhance: move segcore codes of segment into one package (#37722)
issue: #33285

- move most cgo opeartions related to search/query into segcore package
for reusing for streamingnode.
- add go unittest for segcore operations.

Signed-off-by: chyezh <chyezh@outlook.com>
2024-11-29 10:22:36 +08:00
congqixia
cb6542339e
enhance: Mark cgo thread with tag name (#38000)
Related to #37999

This PR add `SetThreadName` API for marking cgo thread and utilize it
when initializing cgo worker.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-26 11:22:35 +08:00
Bingyi Sun
6b82320953
fix: Fix using wrong upperbound when searching by pk (#37769)
issue: https://github.com/milvus-io/milvus/issues/37649

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-11-19 10:16:31 +08:00
aoiasd
e9391acf80
fix: bm25 brute force search need index params k1 and b (#37721)
relate: https://github.com/milvus-io/milvus/issues/35853

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-11-18 15:44:31 +08:00
zhagnlu
e4b6773d0a
fix: fix create text index dir conflict bug (#37693)
#37623

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-11-15 18:26:30 +08:00
Buqian Zheng
0565300b7f
fix: Sparse to use CC index as growing/temp index (#37591)
relate: https://github.com/milvus-io/milvus/issues/35853

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-11-14 10:54:31 +08:00
smellthemoon
3389a6b500
enhance: support null in text match index (#37517)
#37508

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-11-13 11:08:29 +08:00
Zhen Ye
3c225e5c94
fix: data race when using fields_ (#37612)
issue: #37609

Signed-off-by: chyezh <chyezh@outlook.com>
2024-11-13 04:06:30 +08:00
Bingyi Sun
c1eccce2fa
enhance: enable multiple chunked segment by default (#37570)
Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-11-12 09:20:28 +08:00
aoiasd
12951f0abb
enhance: rename tokenizer to analyzer and check analyzer params (#37478)
relate: https://github.com/milvus-io/milvus/issues/35853

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-11-10 16:12:26 +08:00
Bingyi Sun
40ba5a3414
fix: fix chunked segment term filter expression and add ut (#37392)
issue: https://github.com/milvus-io/milvus/issues/37143

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-11-07 11:04:19 -08:00
aoiasd
d67853fa89
feat: Tokenizer support build with params and clone for concurrency (#37048)
relate: https://github.com/milvus-io/milvus/issues/35853
https://github.com/milvus-io/milvus/issues/36751

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-11-06 17:48:24 +08:00
cai.zhang
625b6176cd
fix: Search for pk using raw data to reduce the overhead caused by views (#37202)
issue: #37152

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-11-05 20:36:24 +08:00
Bingyi Sun
cd2655c861
fix: fix wrong method is called to fetch variable valid data (#37304)
issue: https://github.com/milvus-io/milvus/issues/37147

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-11-01 01:52:20 +08:00