122 Commits

Author SHA1 Message Date
Jiquan Long
e549148a19
enhance: full-support for wildcard pattern matching (#30288)
issue: #29988 
This pr adds full-support for wildcard pattern matching from end to end.
Before this pr, the users can only use prefix match in their expression,
for example, "like 'prefix%'". With this pr, more flexible syntax can be
combined.

To do so, this pr makes these changes:
- 1. support regex query both on index and raw data;
- 2. translate the pattern matching to regex query, so that it can be
handled by the regex query logic;
- 3. loose the limit of the expression parsing, which allows general
pattern matching syntax;

With the support of regex query in segcore backend, we can also add
mysql-like `REGEXP` syntax later easily.

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-02-01 12:37:04 +08:00
xige-16
e9fdd2475d
fix: fix searchPlan metricType modified concurrently (#30227)
issue: #30225
/kind bug
Signed-off-by: xige-16 <xi.ge@zilliz.com>

---------

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2024-01-26 14:03:09 +08:00
Xu Tong
e429965f32
Add float16 approve for multi-type part (#28427)
issue:https://github.com/milvus-io/milvus/issues/22837

Add bfloat16 vector, add the index part of float16 vector.

Signed-off-by: Writer-X <1256866856@qq.com>
2024-01-11 15:48:51 +08:00
congqixia
d6429933a7
enhance: make Load process traceable in querynode & segcore (#29858)
See also #29803

This PR:
- Add trace span for `LoadIndex` & `LoadFieldData` in segment loader
- Add `TraceCtx` parameter for `Index.Load` in segcore
- Add span for ReadFiles & Engine Load for Memory/Disk Vector index

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-10 21:58:51 +08:00
zhenshan.cao
60e88fb833
fix: Restore the MVCC functionality. (#29749)
When the TimeTravel functionality was previously removed, it
inadvertently affected the MVCC functionality within the system. This PR
aims to reintroduce the internal MVCC functionality as follows:

1. Add MvccTimestamp to the requests of Search/Query and the results of
Search internally.
2. When the delegator receives a Query/Search request and there is no
MVCC timestamp set in the request, set the delegator's current tsafe as
the MVCC timestamp of the request. If the request already has an MVCC
timestamp, do not modify it.
3. When the Proxy handles Search and triggers the second phase ReQuery,
divide the ReQuery into different shards and pass the MVCC timestamp to
the corresponding Query requests.

issue: #29656

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-01-09 11:38:48 +08:00
Jiquan Long
e9f3df3626
fix: inverted index file not found (#29695)
issue: https://github.com/milvus-io/milvus/issues/29654

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-01-07 20:26:49 +08:00
MrPresent-Han
9e2e7157e9
feat: support search_group_by for milvus(#25324) (#28983)
related: #25324

Search GroupBy function, used to aggregate result entities based on a
specific scalar column.
several points to mention:

1. Temporarliy, the whole groupby is implemented separated from
iterative expr framework **for the first period**
2. In the long term, the groupBy operation will be incorporated into the
iterative expr framework:https://github.com/milvus-io/milvus/pull/28166
3. This pr includes some unrelated mocked interface regarding alterIndex
due to some unworth-to-mention reasons. All these un-associated content
will be removed before the final pr is merged. This version of pr is
only for review
4. All other related details were commented in the files comparison

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2024-01-05 15:50:47 +08:00
Jiquan Long
3f46c6d459
feat: support inverted index (#28783)
issue: https://github.com/milvus-io/milvus/issues/27704

Add inverted index for some data types in Milvus. This index type can
save a lot of memory compared to loading all data into RAM and speed up
the term query and range query.

Supported: `INT8`, `INT16`, `INT32`, `INT64`, `FLOAT`, `DOUBLE`, `BOOL`
and `VARCHAR`.

Not supported: `ARRAY` and `JSON`.

Note:
- The inverted index for `VARCHAR` is not designed to serve full-text
search now. We will treat every row as a whole keyword instead of
tokenizing it into multiple terms.
- The inverted index don't support retrieval well, so if you create
inverted index for field, those operations which depend on the raw data
will fallback to use chunk storage, which will bring some performance
loss. For example, comparisons between two columns and retrieval of
output fields.

The inverted index is very easy to be used.

Taking below collection as an example:

```python
fields = [
		FieldSchema(name="pk", dtype=DataType.VARCHAR, is_primary=True, auto_id=False, max_length=100),
		FieldSchema(name="int8", dtype=DataType.INT8),
		FieldSchema(name="int16", dtype=DataType.INT16),
		FieldSchema(name="int32", dtype=DataType.INT32),
		FieldSchema(name="int64", dtype=DataType.INT64),
		FieldSchema(name="float", dtype=DataType.FLOAT),
		FieldSchema(name="double", dtype=DataType.DOUBLE),
		FieldSchema(name="bool", dtype=DataType.BOOL),
		FieldSchema(name="varchar", dtype=DataType.VARCHAR, max_length=1000),
		FieldSchema(name="random", dtype=DataType.DOUBLE),
		FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim),
]
schema = CollectionSchema(fields)
collection = Collection("demo", schema)
```

Then we can simply create inverted index for field via:

```python
index_type = "INVERTED"
collection.create_index("int8", {"index_type": index_type})
collection.create_index("int16", {"index_type": index_type})
collection.create_index("int32", {"index_type": index_type})
collection.create_index("int64", {"index_type": index_type})
collection.create_index("float", {"index_type": index_type})
collection.create_index("double", {"index_type": index_type})
collection.create_index("bool", {"index_type": index_type})
collection.create_index("varchar", {"index_type": index_type})
```

Then, term query and range query on the field can be speed up
automatically by the inverted index:

```python
result = collection.query(expr='int64 in [1, 2, 3]', output_fields=["pk"])
result = collection.query(expr='int64 < 5', output_fields=["pk"])
result = collection.query(expr='int64 > 2997', output_fields=["pk"])
result = collection.query(expr='1 < int64 < 5', output_fields=["pk"])
```

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2023-12-31 19:50:47 +08:00
zhagnlu
a602171d06
enhance: Refactor runtime and expr framework (#28166)
#28165

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2023-12-18 12:04:42 +08:00
Xu Tong
8ec85f5f4c
Add template for VectorMemIndex (#28324)
Signed-off-by: Writer-X <1256866856@qq.com>
2023-11-11 13:20:22 +08:00
yah01
267c67dfee
enhance: reduce 1x copy while retrieving data from growing segment (#28323)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-11-10 15:44:22 +08:00
yah01
dc89730a50
Support collection-level mmap control (#26901)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-11-02 23:52:16 +08:00
Enwei Jiao
f8dd589755
Refactor collection's cgo call (#28055)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-11-02 13:02:13 +08:00
Gao
7a65b6fb85
Limit faiss ivf index build thread num and fix ut (#27567)
Signed-off-by: chasingegg <chao.gao@zilliz.com>
2023-10-11 10:33:33 +08:00
Enwei Jiao
b80a3e19d3
Add code for PanicInfo (#27364)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-09-27 12:01:28 +08:00
foxspy
5db4a0489e
dynamic index version control (#27335)
Co-authored-by: longjiquan <jiquan.long@zilliz.com>
2023-09-25 21:39:27 +08:00
foxspy
370b6fde58
milvus support multi index engine (#27178)
Co-authored-by: longjiquan <jiquan.long@zilliz.com>
2023-09-22 09:59:26 +08:00
cai.zhang
a362bb1457
Support array datatype (#26369)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2023-09-19 14:23:23 +08:00
yihao.dai
bb6711f28c
Add ChunkCache: support get vector from storage (#26142)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-09-15 10:21:20 +08:00
Enwei Jiao
0afdfdb9af
Remove other Exceptions, keeps SegcoreError only (#27017)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-09-14 14:05:20 +08:00
cqy123456
0ff4ddc76c
remove VectorMemNMIndex (#27000)
Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2023-09-12 17:13:18 +08:00
Enwei Jiao
c3f15c6b95
Refactor duplicate error class into one place (#26985)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-09-11 20:43:17 +08:00
Xu Tong
9166011c4a
Add float16 vector (#25852)
Signed-off-by: Writer-X <1256866856@qq.com>
2023-09-08 10:03:16 +08:00
cai.zhang
c073aa0dc3
Fix bug for json_contains_all has multiple array elements (#26446)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2023-08-18 22:36:19 +08:00
xige-16
5b8d716cbc
Add ut for growing segment load binlog (#26268)
Signed-off-by: xige-16 <xi.ge@zilliz.com>
2023-08-13 20:41:31 +08:00
cai.zhang
a0198ce8ae
Support json contains feature (#25384)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2023-08-11 17:09:30 +08:00
zhagnlu
411f9ac823
Upgrade minio-go and add region and virtual host config for segcore chunk manager (#26194)
Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2023-08-11 10:37:36 +08:00
yah01
dd5f896dc8
Load batch by batch (#25212)
This will significantly reduce the memory usage while loading
- 1x memory usage and MBs overhead for buffer (memory mode)
- only MBs overhead for buffer (mmap mode)

Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-07-06 13:58:27 +08:00
xige-16
04082b3de2
Migrate the ability to upload and download binlog to cpp (#22984)
Signed-off-by: xige-16 <xi.ge@zilliz.com>
2023-06-25 14:38:44 +08:00
zhagnlu
f60b839127
Support element in json array in segcore part(#24677) (#24829)
Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2023-06-14 14:38:37 +08:00
zhagnlu
113f9a0ebc
Support SIMD of several Expr (#23715) (#23717)
Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2023-05-12 14:11:20 +08:00
cai.zhang
9715a850fa
Support expr with json field (#23804)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2023-05-10 10:19:19 +08:00
yah01
62eea5286f
Support to filter with json expr (#23739)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-04-30 20:36:39 +08:00
yihao.dai
092d743917
Add support for getting vectors by ids (#23450)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-04-23 09:00:32 +08:00
yah01
546080dcdd
Support to retrieve json (#23563)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-04-21 11:46:32 +08:00
Cai Yudong
ef63e64ded
Remove ANNOY index type (#23189)
Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>
2023-04-04 16:30:27 +08:00
Cai Yudong
0e9a4478e3
Remove useless index mode (#22934)
Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>
2023-03-23 21:39:59 +08:00
Cai Yudong
ab3cbdfc61
Partial change to prepare for GPU index type support (#22591)
Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>
2023-03-14 23:21:56 +08:00
Jiquan Long
a36fefb009
Fix cpplint (#22657)
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2023-03-10 09:47:54 +08:00
yah01
7478e44911
Support using mmap to load data (#22052)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-03-01 18:07:49 +08:00
smellthemoon
9e0ec15436
Support range search (#21652)
Signed-off-by: smellthemoon <xinguo.li@zilliz.com>
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: jaime <yun.zhang@zilliz.com>
2023-02-21 09:48:32 +08:00
presburger
9950cacd10
support knowhere 2.0 (#21857)
Signed-off-by: Yusheng.Ma <Yusheng.Ma@zilliz.com>
2023-02-10 14:24:32 +08:00
Jiquan Long
d7156812c1
Try using ASAN in ci ut (#21089)
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2022-12-29 15:29:31 +08:00
Enwei Jiao
958e94f6f0
Use Conan as c++ package manager (#19920)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>

Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2022-11-23 10:39:11 +08:00
xige-16
4a66965df4
Delete RAW_DATA copy when load IVF_FLAT index data (#20274)
Signed-off-by: xige-16 <xi.ge@zilliz.com>

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-11-05 17:33:05 +08:00
xige-16
158787811e
Move assemble/disassemble func to core (#19420)
Signed-off-by: xige-16 <xi.ge@zilliz.com>

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-10-16 21:17:25 +08:00
xige-16
a1db9038fb
Move disk index params to config file (#19714)
Signed-off-by: xige-16 <xi.ge@zilliz.com>

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-10-14 17:51:24 +08:00
xige-16
8c9c1672ae
Assign different storage config for indexes (#19517)
Signed-off-by: xige-16 <xi.ge@zilliz.com>

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-10-14 14:45:23 +08:00
Cai Yudong
87d78a4a85
Ignore cases when comparing metric type in segcore (#19437)
Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>
2022-09-26 17:58:52 +08:00
xige-16
428840178c
Support diskann index for vector field (#19093)
Signed-off-by: xige-16 <xi.ge@zilliz.com>

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-09-21 20:16:51 +08:00