116 Commits

Author SHA1 Message Date
MrPresent-Han
9e2e7157e9
feat: support search_group_by for milvus(#25324) (#28983)
related: #25324

Search GroupBy function, used to aggregate result entities based on a
specific scalar column.
several points to mention:

1. Temporarliy, the whole groupby is implemented separated from
iterative expr framework **for the first period**
2. In the long term, the groupBy operation will be incorporated into the
iterative expr framework:https://github.com/milvus-io/milvus/pull/28166
3. This pr includes some unrelated mocked interface regarding alterIndex
due to some unworth-to-mention reasons. All these un-associated content
will be removed before the final pr is merged. This version of pr is
only for review
4. All other related details were commented in the files comparison

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2024-01-05 15:50:47 +08:00
Jiquan Long
3f46c6d459
feat: support inverted index (#28783)
issue: https://github.com/milvus-io/milvus/issues/27704

Add inverted index for some data types in Milvus. This index type can
save a lot of memory compared to loading all data into RAM and speed up
the term query and range query.

Supported: `INT8`, `INT16`, `INT32`, `INT64`, `FLOAT`, `DOUBLE`, `BOOL`
and `VARCHAR`.

Not supported: `ARRAY` and `JSON`.

Note:
- The inverted index for `VARCHAR` is not designed to serve full-text
search now. We will treat every row as a whole keyword instead of
tokenizing it into multiple terms.
- The inverted index don't support retrieval well, so if you create
inverted index for field, those operations which depend on the raw data
will fallback to use chunk storage, which will bring some performance
loss. For example, comparisons between two columns and retrieval of
output fields.

The inverted index is very easy to be used.

Taking below collection as an example:

```python
fields = [
		FieldSchema(name="pk", dtype=DataType.VARCHAR, is_primary=True, auto_id=False, max_length=100),
		FieldSchema(name="int8", dtype=DataType.INT8),
		FieldSchema(name="int16", dtype=DataType.INT16),
		FieldSchema(name="int32", dtype=DataType.INT32),
		FieldSchema(name="int64", dtype=DataType.INT64),
		FieldSchema(name="float", dtype=DataType.FLOAT),
		FieldSchema(name="double", dtype=DataType.DOUBLE),
		FieldSchema(name="bool", dtype=DataType.BOOL),
		FieldSchema(name="varchar", dtype=DataType.VARCHAR, max_length=1000),
		FieldSchema(name="random", dtype=DataType.DOUBLE),
		FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim),
]
schema = CollectionSchema(fields)
collection = Collection("demo", schema)
```

Then we can simply create inverted index for field via:

```python
index_type = "INVERTED"
collection.create_index("int8", {"index_type": index_type})
collection.create_index("int16", {"index_type": index_type})
collection.create_index("int32", {"index_type": index_type})
collection.create_index("int64", {"index_type": index_type})
collection.create_index("float", {"index_type": index_type})
collection.create_index("double", {"index_type": index_type})
collection.create_index("bool", {"index_type": index_type})
collection.create_index("varchar", {"index_type": index_type})
```

Then, term query and range query on the field can be speed up
automatically by the inverted index:

```python
result = collection.query(expr='int64 in [1, 2, 3]', output_fields=["pk"])
result = collection.query(expr='int64 < 5', output_fields=["pk"])
result = collection.query(expr='int64 > 2997', output_fields=["pk"])
result = collection.query(expr='1 < int64 < 5', output_fields=["pk"])
```

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2023-12-31 19:50:47 +08:00
zhagnlu
a602171d06
enhance: Refactor runtime and expr framework (#28166)
#28165

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2023-12-18 12:04:42 +08:00
Xu Tong
8ec85f5f4c
Add template for VectorMemIndex (#28324)
Signed-off-by: Writer-X <1256866856@qq.com>
2023-11-11 13:20:22 +08:00
yah01
267c67dfee
enhance: reduce 1x copy while retrieving data from growing segment (#28323)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-11-10 15:44:22 +08:00
yah01
dc89730a50
Support collection-level mmap control (#26901)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-11-02 23:52:16 +08:00
Enwei Jiao
f8dd589755
Refactor collection's cgo call (#28055)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-11-02 13:02:13 +08:00
Gao
7a65b6fb85
Limit faiss ivf index build thread num and fix ut (#27567)
Signed-off-by: chasingegg <chao.gao@zilliz.com>
2023-10-11 10:33:33 +08:00
Enwei Jiao
b80a3e19d3
Add code for PanicInfo (#27364)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-09-27 12:01:28 +08:00
foxspy
5db4a0489e
dynamic index version control (#27335)
Co-authored-by: longjiquan <jiquan.long@zilliz.com>
2023-09-25 21:39:27 +08:00
foxspy
370b6fde58
milvus support multi index engine (#27178)
Co-authored-by: longjiquan <jiquan.long@zilliz.com>
2023-09-22 09:59:26 +08:00
cai.zhang
a362bb1457
Support array datatype (#26369)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2023-09-19 14:23:23 +08:00
yihao.dai
bb6711f28c
Add ChunkCache: support get vector from storage (#26142)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-09-15 10:21:20 +08:00
Enwei Jiao
0afdfdb9af
Remove other Exceptions, keeps SegcoreError only (#27017)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-09-14 14:05:20 +08:00
cqy123456
0ff4ddc76c
remove VectorMemNMIndex (#27000)
Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2023-09-12 17:13:18 +08:00
Enwei Jiao
c3f15c6b95
Refactor duplicate error class into one place (#26985)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-09-11 20:43:17 +08:00
Xu Tong
9166011c4a
Add float16 vector (#25852)
Signed-off-by: Writer-X <1256866856@qq.com>
2023-09-08 10:03:16 +08:00
cai.zhang
c073aa0dc3
Fix bug for json_contains_all has multiple array elements (#26446)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2023-08-18 22:36:19 +08:00
xige-16
5b8d716cbc
Add ut for growing segment load binlog (#26268)
Signed-off-by: xige-16 <xi.ge@zilliz.com>
2023-08-13 20:41:31 +08:00
cai.zhang
a0198ce8ae
Support json contains feature (#25384)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2023-08-11 17:09:30 +08:00
zhagnlu
411f9ac823
Upgrade minio-go and add region and virtual host config for segcore chunk manager (#26194)
Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2023-08-11 10:37:36 +08:00
yah01
dd5f896dc8
Load batch by batch (#25212)
This will significantly reduce the memory usage while loading
- 1x memory usage and MBs overhead for buffer (memory mode)
- only MBs overhead for buffer (mmap mode)

Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-07-06 13:58:27 +08:00
xige-16
04082b3de2
Migrate the ability to upload and download binlog to cpp (#22984)
Signed-off-by: xige-16 <xi.ge@zilliz.com>
2023-06-25 14:38:44 +08:00
zhagnlu
f60b839127
Support element in json array in segcore part(#24677) (#24829)
Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2023-06-14 14:38:37 +08:00
zhagnlu
113f9a0ebc
Support SIMD of several Expr (#23715) (#23717)
Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2023-05-12 14:11:20 +08:00
cai.zhang
9715a850fa
Support expr with json field (#23804)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2023-05-10 10:19:19 +08:00
yah01
62eea5286f
Support to filter with json expr (#23739)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-04-30 20:36:39 +08:00
yihao.dai
092d743917
Add support for getting vectors by ids (#23450)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-04-23 09:00:32 +08:00
yah01
546080dcdd
Support to retrieve json (#23563)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-04-21 11:46:32 +08:00
Cai Yudong
ef63e64ded
Remove ANNOY index type (#23189)
Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>
2023-04-04 16:30:27 +08:00
Cai Yudong
0e9a4478e3
Remove useless index mode (#22934)
Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>
2023-03-23 21:39:59 +08:00
Cai Yudong
ab3cbdfc61
Partial change to prepare for GPU index type support (#22591)
Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>
2023-03-14 23:21:56 +08:00
Jiquan Long
a36fefb009
Fix cpplint (#22657)
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2023-03-10 09:47:54 +08:00
yah01
7478e44911
Support using mmap to load data (#22052)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-03-01 18:07:49 +08:00
smellthemoon
9e0ec15436
Support range search (#21652)
Signed-off-by: smellthemoon <xinguo.li@zilliz.com>
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: jaime <yun.zhang@zilliz.com>
2023-02-21 09:48:32 +08:00
presburger
9950cacd10
support knowhere 2.0 (#21857)
Signed-off-by: Yusheng.Ma <Yusheng.Ma@zilliz.com>
2023-02-10 14:24:32 +08:00
Jiquan Long
d7156812c1
Try using ASAN in ci ut (#21089)
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2022-12-29 15:29:31 +08:00
Enwei Jiao
958e94f6f0
Use Conan as c++ package manager (#19920)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>

Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2022-11-23 10:39:11 +08:00
xige-16
4a66965df4
Delete RAW_DATA copy when load IVF_FLAT index data (#20274)
Signed-off-by: xige-16 <xi.ge@zilliz.com>

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-11-05 17:33:05 +08:00
xige-16
158787811e
Move assemble/disassemble func to core (#19420)
Signed-off-by: xige-16 <xi.ge@zilliz.com>

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-10-16 21:17:25 +08:00
xige-16
a1db9038fb
Move disk index params to config file (#19714)
Signed-off-by: xige-16 <xi.ge@zilliz.com>

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-10-14 17:51:24 +08:00
xige-16
8c9c1672ae
Assign different storage config for indexes (#19517)
Signed-off-by: xige-16 <xi.ge@zilliz.com>

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-10-14 14:45:23 +08:00
Cai Yudong
87d78a4a85
Ignore cases when comparing metric type in segcore (#19437)
Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>
2022-09-26 17:58:52 +08:00
xige-16
428840178c
Support diskann index for vector field (#19093)
Signed-off-by: xige-16 <xi.ge@zilliz.com>

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-09-21 20:16:51 +08:00
Cai Yudong
686b0ce796
Upgrade to knowhere-v1.3.0, remove following index support: (#18935)
- IVF_SQ8H
- RHNSW_FLAT/RHNSW_PQ/RHNSW_SQ
- NGT
- NSG
- SPTAG

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>
2022-09-05 10:41:11 +08:00
zhenshan.cao
a287a2b3fd
Return empty result in advance if all data filtered out (#18329) (#18438)
Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2022-07-28 12:36:30 +08:00
Jeng.Gwan
638f6c36e9
Support to get real row count of segment (#18115)
Signed-off-by: xaxys <zheng.guan@zilliz.com>
2022-07-18 09:58:28 +08:00
zhagnlu
257da153ce
Fix core dump when nq has no topk result (#17923) (#18051)
Signed-off-by: zhagnlu <lu.zhang@zilliz.com>

Co-authored-by: zhagnlu <lu.zhang@zilliz.com>
2022-07-05 19:48:20 +08:00
Jiquan Long
6954a5ba3e
Fix search successfully with invalid metric type (#17977)
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2022-07-01 22:28:23 +08:00
Cai Yudong
a001412e12
Replace faiss::MetricType with knowhere::MetricType (#17891)
Signed-off-by: yudong.cai <yudong.cai@zilliz.com>
2022-06-29 14:20:19 +08:00