52 Commits

Author SHA1 Message Date
Jiquan Long
3f46c6d459
feat: support inverted index (#28783)
issue: https://github.com/milvus-io/milvus/issues/27704

Add inverted index for some data types in Milvus. This index type can
save a lot of memory compared to loading all data into RAM and speed up
the term query and range query.

Supported: `INT8`, `INT16`, `INT32`, `INT64`, `FLOAT`, `DOUBLE`, `BOOL`
and `VARCHAR`.

Not supported: `ARRAY` and `JSON`.

Note:
- The inverted index for `VARCHAR` is not designed to serve full-text
search now. We will treat every row as a whole keyword instead of
tokenizing it into multiple terms.
- The inverted index don't support retrieval well, so if you create
inverted index for field, those operations which depend on the raw data
will fallback to use chunk storage, which will bring some performance
loss. For example, comparisons between two columns and retrieval of
output fields.

The inverted index is very easy to be used.

Taking below collection as an example:

```python
fields = [
		FieldSchema(name="pk", dtype=DataType.VARCHAR, is_primary=True, auto_id=False, max_length=100),
		FieldSchema(name="int8", dtype=DataType.INT8),
		FieldSchema(name="int16", dtype=DataType.INT16),
		FieldSchema(name="int32", dtype=DataType.INT32),
		FieldSchema(name="int64", dtype=DataType.INT64),
		FieldSchema(name="float", dtype=DataType.FLOAT),
		FieldSchema(name="double", dtype=DataType.DOUBLE),
		FieldSchema(name="bool", dtype=DataType.BOOL),
		FieldSchema(name="varchar", dtype=DataType.VARCHAR, max_length=1000),
		FieldSchema(name="random", dtype=DataType.DOUBLE),
		FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim),
]
schema = CollectionSchema(fields)
collection = Collection("demo", schema)
```

Then we can simply create inverted index for field via:

```python
index_type = "INVERTED"
collection.create_index("int8", {"index_type": index_type})
collection.create_index("int16", {"index_type": index_type})
collection.create_index("int32", {"index_type": index_type})
collection.create_index("int64", {"index_type": index_type})
collection.create_index("float", {"index_type": index_type})
collection.create_index("double", {"index_type": index_type})
collection.create_index("bool", {"index_type": index_type})
collection.create_index("varchar", {"index_type": index_type})
```

Then, term query and range query on the field can be speed up
automatically by the inverted index:

```python
result = collection.query(expr='int64 in [1, 2, 3]', output_fields=["pk"])
result = collection.query(expr='int64 < 5', output_fields=["pk"])
result = collection.query(expr='int64 > 2997', output_fields=["pk"])
result = collection.query(expr='1 < int64 < 5', output_fields=["pk"])
```

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2023-12-31 19:50:47 +08:00
yah01
a0e1a1eb31
feat: support enable/disable mmap for index (#29005)
support enable/disable mmap for index, the user could alter the index's
mode by `AlterIndex` method
related: https://github.com/milvus-io/milvus/issues/21866

---------

Signed-off-by: yah01 <yah2er0ne@outlook.com>
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-12-21 18:07:24 +08:00
cai.zhang
1b4d2674b3
fix: Set the default index name to the name of the existing index (#29275)
issue: #29269

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2023-12-20 17:22:41 +08:00
cai.zhang
2f7252b44e
enhance: Set default index name as field name (#29218)
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2023-12-15 12:06:39 +08:00
SimFG
9b0ecbdca7
Support to replicate the mq message (#27240)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-10-20 14:26:09 +08:00
xige-16
6cbb67832f
Compatible with scalar index types marisa-trie and Ascending (#27638)
Signed-off-by: xige-16 <xi.ge@zilliz.com>
2023-10-15 13:52:06 +08:00
yah01
be980fbc38
Refine state check (#27541)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-10-11 21:01:35 +08:00
yah01
41495ed266
Improve the error message for getting all indexes of collection (#27389)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-10-08 21:23:32 +08:00
yah01
8394b3a1ec
Block creating new error from status reason (#27426)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-10-07 11:29:32 +08:00
cai.zhang
dedb90f85f
Fix error message for creating scalar index (#27382)
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2023-09-27 10:33:27 +08:00
jaime
7f7c71ea7d
Decoupling client and server API in types interface (#27186)
Co-authored-by:: aoiasd <zhicheng.yue@zilliz.com>

Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-09-26 09:57:25 +08:00
SimFG
26f06dd732
Format the code (#27275)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-09-21 09:45:27 +08:00
yah01
168e82ee10
Fix panic while handling with the nil status (#27040)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-09-15 10:09:21 +08:00
yah01
00c65fa0d7
Refine QueryNode errors (#27013)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-09-12 16:07:18 +08:00
Xu Tong
9166011c4a
Add float16 vector (#25852)
Signed-off-by: Writer-X <1256866856@qq.com>
2023-09-08 10:03:16 +08:00
yah01
3349db4aa7
Refine errors to remove changes breaking design (#26521)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-09-04 09:57:09 +08:00
cai.zhang
760a2d9aa7
Support AllocTimestamp api for Milvus (#25784)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2023-07-25 10:05:00 +08:00
Enwei Jiao
66fdc71479
Refactor logs in DataCoord & DataNode (#25574)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-07-14 15:56:31 +08:00
smellthemoon
948d04cdd8
Check field type when create index (#25248)
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2023-07-03 15:26:26 +08:00
jaime
18df2ba6fd
[Cherry-Pick] Support Database (#24769)
Support Database(#23742)
Fix db nonexists error for FlushAll (#24222)
Fix check collection limits fails (#24235)
backward compatibility with empty DB name (#24317)
Fix GetFlushAllState with DB (#24347)
Remove db from global meta cache after drop database (#24474)
Fix db name is empty for describe collection response (#24603)
Add RBAC for Database API (#24653)
Fix miss load the same name collection during recover stage (#24941)

RBAC supports Database validation (#23609)
Fix to list grant with db return empty (#23922)
Optimize PrivilegeAll permission check (#23972)
Add the default db value for the rbac request (#24307)

Signed-off-by: jaime <yun.zhang@zilliz.com>
Co-authored-by: SimFG <bang.fu@zilliz.com>
Co-authored-by: longjiquan <jiquan.long@zilliz.com>
2023-06-25 17:20:43 +08:00
cai.zhang
c81bdda55a
Hide autoindex params details (#24851) (#24935)
Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
Co-authored-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2023-06-16 14:30:40 +08:00
congqixia
41af0a98fa
Use go-api/v2 for milvus-proto (#24770)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-06-09 01:28:37 +08:00
cai.zhang
93ea9c4925
Support return pending index rows when describe index (#24588)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2023-06-01 18:14:32 +08:00
Jiquan Long
29ae1229b6
Support AutoIndex (#24387)
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2023-05-29 20:35:28 +08:00
cai.zhang
6209d5d717
Remove unused code for creating index (#24442)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2023-05-26 16:49:26 +08:00
cai.zhang
008285f849
Support dynamic schema for create collection (#24176)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2023-05-18 09:33:24 +08:00
congqixia
73a181d226
Fix get vector it timeout and improve some string const usage (#24141)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-05-16 17:41:22 +08:00
wayblink
e13d900398
Add some log in getIndexStatisticsTask (#23897)
Signed-off-by: wayblink <anyang.wang@zilliz.com>
2023-05-08 10:02:39 +08:00
Jiquan Long
7be7e6f360
Refactor check logic of index parameters (#23856)
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2023-05-06 10:40:39 +08:00
wayblink
899702f13c
Implement GetIndexStatistics interface (#23603)
Signed-off-by: wayblink <anyang.wang@zilliz.com>
2023-05-06 10:34:39 +08:00
Gao
c1e8406c44
Override index type as AUTOINDEX when autoindex is enabled (#23744)
Signed-off-by: chasingegg <gaoc96@qq.com>
2023-04-30 11:50:40 +08:00
cai.zhang
bd65ef8e95
Fix bug for type params when indexing (#23242) (#23250)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2023-04-09 16:18:29 +08:00
jaime
c9d0c157ec
Move some modules from internal to public package (#22572)
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-04-06 19:14:32 +08:00
cai.zhang
977943463e
Check index params and type params (#22988)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2023-03-26 22:15:59 +08:00
Enwei Jiao
697dedac7e
Use cockroachdb/errors to replace other error pkg (#22390)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-02-26 11:31:49 +08:00
cai.zhang
e127cf7b99
Reset indexpb for upgrade (#21620)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2023-01-11 14:35:40 +08:00
cai.zhang
e5f408dceb
Merge IndexCoord and DataCoord (#21267)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2023-01-04 19:37:36 +08:00
Enwei Jiao
89b810a4db
Refactor all params into ParamItem (#20987)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>

Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2022-12-07 18:01:19 +08:00
Enwei Jiao
956c5e1b9d
Make Params singleton (#20088)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>

Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2022-11-04 14:25:38 +08:00
cai.zhang
bcffb3c389
Check indexes in indexcoord instead of proxy (#20193)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>

Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2022-10-31 11:39:33 +08:00
cai.zhang
26d26aa8ab
Set index name in proxy when drop index (#20160)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>

Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2022-10-28 16:41:32 +08:00
cai.zhang
4848edda8d
Append field indexID for index name (#20046)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>

Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2022-10-27 20:23:38 +08:00
bigsheeper
b074f530e6
Forbid createIndex if collection loaded before (#20100)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2022-10-27 13:05:31 +08:00
cai.zhang
c551de8f72
Catch errors on gloabl meta cache (#20023)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>

Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2022-10-25 11:29:30 +08:00
cai.zhang
94c15a49e9
Can't drop loaded partition (#19938)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>

Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2022-10-21 14:41:28 +08:00
cai.zhang
836081c8c7
Dropping index must happen without collection being loaded (#19886)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>

Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2022-10-19 23:17:44 +08:00
smellthemoon
108e51b2f0
[test]Create index before load and fix error message (#19874) (#19857)
Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>

Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>

Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>
Co-authored-by: zhuwenxing <wenxing.zhu@zilliz.com>
2022-10-19 10:01:26 +08:00
cai.zhang
9d43947f1c
Must create index before load (#19516)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>

Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2022-10-17 18:01:25 +08:00
cai.zhang
9e8a59ac4c
Add nameing rules for index name (#19803)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>

Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2022-10-16 21:05:25 +08:00
SimFG
a55f739608
Separate public proto files (#19782)
Signed-off-by: SimFG <bang.fu@zilliz.com>

Signed-off-by: SimFG <bang.fu@zilliz.com>
2022-10-16 20:49:27 +08:00