56 Commits

Author SHA1 Message Date
Xu Tong
e429965f32
Add float16 approve for multi-type part (#28427)
issue:https://github.com/milvus-io/milvus/issues/22837

Add bfloat16 vector, add the index part of float16 vector.

Signed-off-by: Writer-X <1256866856@qq.com>
2024-01-11 15:48:51 +08:00
Jiquan Long
3f46c6d459
feat: support inverted index (#28783)
issue: https://github.com/milvus-io/milvus/issues/27704

Add inverted index for some data types in Milvus. This index type can
save a lot of memory compared to loading all data into RAM and speed up
the term query and range query.

Supported: `INT8`, `INT16`, `INT32`, `INT64`, `FLOAT`, `DOUBLE`, `BOOL`
and `VARCHAR`.

Not supported: `ARRAY` and `JSON`.

Note:
- The inverted index for `VARCHAR` is not designed to serve full-text
search now. We will treat every row as a whole keyword instead of
tokenizing it into multiple terms.
- The inverted index don't support retrieval well, so if you create
inverted index for field, those operations which depend on the raw data
will fallback to use chunk storage, which will bring some performance
loss. For example, comparisons between two columns and retrieval of
output fields.

The inverted index is very easy to be used.

Taking below collection as an example:

```python
fields = [
		FieldSchema(name="pk", dtype=DataType.VARCHAR, is_primary=True, auto_id=False, max_length=100),
		FieldSchema(name="int8", dtype=DataType.INT8),
		FieldSchema(name="int16", dtype=DataType.INT16),
		FieldSchema(name="int32", dtype=DataType.INT32),
		FieldSchema(name="int64", dtype=DataType.INT64),
		FieldSchema(name="float", dtype=DataType.FLOAT),
		FieldSchema(name="double", dtype=DataType.DOUBLE),
		FieldSchema(name="bool", dtype=DataType.BOOL),
		FieldSchema(name="varchar", dtype=DataType.VARCHAR, max_length=1000),
		FieldSchema(name="random", dtype=DataType.DOUBLE),
		FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim),
]
schema = CollectionSchema(fields)
collection = Collection("demo", schema)
```

Then we can simply create inverted index for field via:

```python
index_type = "INVERTED"
collection.create_index("int8", {"index_type": index_type})
collection.create_index("int16", {"index_type": index_type})
collection.create_index("int32", {"index_type": index_type})
collection.create_index("int64", {"index_type": index_type})
collection.create_index("float", {"index_type": index_type})
collection.create_index("double", {"index_type": index_type})
collection.create_index("bool", {"index_type": index_type})
collection.create_index("varchar", {"index_type": index_type})
```

Then, term query and range query on the field can be speed up
automatically by the inverted index:

```python
result = collection.query(expr='int64 in [1, 2, 3]', output_fields=["pk"])
result = collection.query(expr='int64 < 5', output_fields=["pk"])
result = collection.query(expr='int64 > 2997', output_fields=["pk"])
result = collection.query(expr='1 < int64 < 5', output_fields=["pk"])
```

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2023-12-31 19:50:47 +08:00
congqixia
8a9ab69369
fix: Skip statslog generation flushing empty L0 segment (#28733)
See also #27675

When L0 segment contains only delta data, merged statslog shall be
skiped when performing sync task

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-11-25 15:10:25 +08:00
yah01
ece592a42f
Deliver L0 segments delete records (#27722)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-11-07 01:44:18 +08:00
XuanYang-cn
7358c3527b
Add iterators (#27643)
See also: #27606

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2023-10-18 19:34:08 +08:00
XuanYang-cn
2f16339aac
Enhance InsertData and FieldData (#27436)
1. Add NewInsertData
2. Add GetRowNum(), GetMemorySize(), and, Append() for InsertData
3. Add AppendRow() for FieldData for compaction

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2023-10-17 17:36:11 +08:00
SimFG
26f06dd732
Format the code (#27275)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-09-21 09:45:27 +08:00
Xu Tong
9166011c4a
Add float16 vector (#25852)
Signed-off-by: Writer-X <1256866856@qq.com>
2023-09-08 10:03:16 +08:00
congqixia
41af0a98fa
Use go-api/v2 for milvus-proto (#24770)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-06-09 01:28:37 +08:00
yah01
ebd0279d3f
Check error by Error() and NoError() for better report message (#24736)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-06-08 15:36:36 +08:00
aoiasd
c84bdcea49
merge stats log when segment flushing or compacting (#23570)
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2023-05-29 10:21:28 +08:00
Enwei Jiao
967a97b9bd
Support json & array types (#23408)
Signed-off-by: yah01 <yang.cen@zilliz.com>
Co-authored-by: yah01 <yang.cen@zilliz.com>
2023-04-20 11:32:31 +08:00
jaime
c9d0c157ec
Move some modules from internal to public package (#22572)
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-04-06 19:14:32 +08:00
yah01
081572d31c
Refactor QueryNode (#21625)
Signed-off-by: yah01 <yang.cen@zilliz.com>
Co-authored-by: Congqi Xia <congqi.xia@zilliz.com>
Co-authored-by: aoiasd <zhicheng.yue@zilliz.com>
2023-03-27 00:42:00 +08:00
Xiaofan
949d5d078f
Fix memory calculation in dataCodec (#21800)
Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
2023-01-28 11:09:52 +08:00
Xiaofan
633a749880
Recude IndexCodec Load Memory (#20621)
Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>

Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
2022-11-18 10:47:08 +08:00
SimFG
a55f739608
Separate public proto files (#19782)
Signed-off-by: SimFG <bang.fu@zilliz.com>

Signed-off-by: SimFG <bang.fu@zilliz.com>
2022-10-16 20:49:27 +08:00
SimFG
d7f38a803d
Separate some proto files (#19218)
Signed-off-by: SimFG <bang.fu@zilliz.com>

Signed-off-by: SimFG <bang.fu@zilliz.com>
2022-09-16 16:56:49 +08:00
xige-16
4de1bfe5bc
Add cpp data codec (#18538)
Signed-off-by: xige-16 <xi.ge@zilliz.com>
Co-authored-by: zhagnlu lu.zhang@zilliz.com

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-09-09 22:12:34 +08:00
xige-16
99984b88e1
Support delete varChar value (#16229)
Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-04-02 17:43:29 +08:00
XuanYang-cn
bccf65ec67
[skip e2e]Update license for storage datacodec (#14039)
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2021-12-23 12:01:36 +08:00
godchen
7e56f08747
Add payload bytes interface. (#13467)
Signed-off-by: godchen0212 <qingxiang.chen@zilliz.com>
2021-12-16 16:35:42 +08:00
XuanYang-cn
48b45d82e5
Add ut for binlog_io to 100 coverage (#12283)
Make DN ut coverage upto 90%
Resolves: #8058

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2021-11-26 17:43:17 +08:00
godchen
9d5bcd3e3a
Close event and binlog reader (#12173)
Signed-off-by: godchen <qingxiang.chen@zilliz.com>
2021-11-22 17:27:14 +08:00
godchen
863f1bb34e
Fix multi delete data not effect (#11422)
Signed-off-by: godchen <qingxiang.chen@zilliz.com>
2021-11-09 15:01:17 +08:00
XuanYang-cn
cd06f50645
Remove schema in delete codec (#10517)
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2021-10-24 09:59:10 +08:00
godchen
ffc0c07610
Change delete data primary key to int64 (#10438)
Signed-off-by: godchen <qingxiang.chen@zilliz.com>
2021-10-22 15:37:12 +08:00
XuanYang-cn
2255fe0b45
Change deserialize deltelog from 1 blob to blobs (#10085)
See also: #9530

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2021-10-19 10:28:35 +08:00
godchen
59ab0e441c
Add bloom filter for stats (#9630)
* Add bloom filter for stats

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* trigger GitHub actions

Signed-off-by: godchen <qingxiang.chen@zilliz.com>
2021-10-13 10:22:33 +08:00
dragondriver
dedf745b76
Rename IndexParamsFile to IndexParamsKey (#9563)
Signed-off-by: dragondriver <jiquan.long@zilliz.com>
2021-10-09 19:27:02 +08:00
dragondriver
818cf3ffa0
Split blob into several string rows when index file is large (#8919)
Signed-off-by: dragondriver <jiquan.long@zilliz.com>
2021-09-30 17:57:01 +08:00
dragondriver
cf8600077f
Refactor the index file format (#8514)
Signed-off-by: dragondriver <jiquan.long@zilliz.com>
2021-09-29 09:52:12 +08:00
godchen
af173dd2a0
Add delete codec (#8736)
Signed-off-by: godchen <qingxiang.chen@zilliz.com>
2021-09-28 14:30:02 +08:00
godchen
db94d7771f
Read vector from disk (#6707)
* Read vector from disk

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* go fmt

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* fix git action error

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* fix error

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* fix test error

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* fix action error

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* fix caculate error

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* change var name

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* remove unused method

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* remove unused method

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* fix error

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* fix len error

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* remove unused code

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* change bytes to float method

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* change float to bytes method

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* fix action error

Signed-off-by: godchen <qingxiang.chen@zilliz.com>
2021-07-24 09:25:22 +08:00
Cai Yudong
a992dcf6a8
Support query return vector output field (#6570)
* improve code readibility

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* add offset in RetrieveResults

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* add VectorFieldInfo into Segment struct

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* add new interface for query vector

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* update load vector field logic

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* update load vector field logic

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* fill in field name in query result

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* add FieldId into FieldData

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* add fillVectorOutputFieldsIfNeeded

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* update data_codec_test.go

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* add DeserializeFieldData

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* realize query return vector output field

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* fix static-check

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* disable query vector case

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>
2021-07-16 17:19:55 +08:00
godchen
1c6786f85c
Add blob info (#5792)
* Add blob info

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* fix error

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* fix error

Signed-off-by: godchen <qingxiang.chen@zilliz.com>
2021-06-16 12:03:57 +08:00
Xiangyu Wang
23c4de0eb8
Flush statistics for all int64 fields (#5318)
Resolves: #5262

Signed-off-by: Xiangyu Wang <xiangyu.wang@zilliz.com>
2021-05-20 10:38:45 +00:00
Xiangyu Wang
82ccd4cec0
Rename module (#4988)
* Rename module

Signed-off-by: Xiangyu Wang <xiangyu.wang@zilliz.com>
2021-04-22 14:45:57 +08:00
godchen
0dfcb90881 Add storage copyright
Signed-off-by: godchen <qingxiang.chen@zilliz.com>
2021-04-19 11:32:24 +08:00
godchen
a5ad70a5ab Add unittest for storage
Signed-off-by: godchen <qingxiang.chen@zilliz.com>
2021-04-19 10:36:19 +08:00
godchen
f3649f0419 Refactor interface and proto
Signed-off-by: godchen <qingxiang.chen@zilliz.com>
2021-03-12 14:22:09 +08:00
bigsheeper
01e9dc8e3f Remove collection name
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2021-02-03 11:52:19 +08:00
bigsheeper
5e781b9370 Remove field name in query node and segCore
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2021-02-03 10:10:07 +08:00
sunby
45b99c0cf3 Save indexName and indexID in IndexCodec
Signed-off-by: sunby <bingyi.sun@zilliz.com>
2021-02-02 19:56:11 +08:00
bigsheeper
73d2b6a101 Get index param from minio and filter by vector fields
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2021-01-29 15:22:24 +08:00
neza2017
3a866dab3d Update proto
Signed-off-by: neza2017 <yefu.chen@zilliz.com>
2021-01-28 17:25:43 +08:00
neza2017
2ebeade25e Fix get component states
Signed-off-by: neza2017 <yefu.chen@zilliz.com>
2021-01-28 17:13:00 +08:00
sunby
2be8cc1c4b Add index params serde to IndexCodec
Signed-off-by: sunby <bingyi.sun@zilliz.com>
2021-01-28 16:41:24 +08:00
XuanYang-cn
7ce0f27ebc Add buffer to minIO for binlogs
Signed-off-by: XuanYang-cn <xuan.yang@zilliz.com>
2020-12-23 18:06:04 +08:00
sunby
e7ebfcb05a Save index meta to meta table
Signed-off-by: sunby <bingyi.sun@zilliz.com>
2020-12-23 15:13:45 +08:00