1527 Commits

Author SHA1 Message Date
smellthemoon
1c1f2a1371
enhance:change some logs (#29579)
related #29588

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-01-05 16:12:48 +08:00
yihao.dai
3561586edf
feat: Add import reader for binlog (#28910)
This PR defines the new import reader interfaces and implement a binlog
reader for import.

issue: https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-05 11:48:47 +08:00
Jiquan Long
3f46c6d459
feat: support inverted index (#28783)
issue: https://github.com/milvus-io/milvus/issues/27704

Add inverted index for some data types in Milvus. This index type can
save a lot of memory compared to loading all data into RAM and speed up
the term query and range query.

Supported: `INT8`, `INT16`, `INT32`, `INT64`, `FLOAT`, `DOUBLE`, `BOOL`
and `VARCHAR`.

Not supported: `ARRAY` and `JSON`.

Note:
- The inverted index for `VARCHAR` is not designed to serve full-text
search now. We will treat every row as a whole keyword instead of
tokenizing it into multiple terms.
- The inverted index don't support retrieval well, so if you create
inverted index for field, those operations which depend on the raw data
will fallback to use chunk storage, which will bring some performance
loss. For example, comparisons between two columns and retrieval of
output fields.

The inverted index is very easy to be used.

Taking below collection as an example:

```python
fields = [
		FieldSchema(name="pk", dtype=DataType.VARCHAR, is_primary=True, auto_id=False, max_length=100),
		FieldSchema(name="int8", dtype=DataType.INT8),
		FieldSchema(name="int16", dtype=DataType.INT16),
		FieldSchema(name="int32", dtype=DataType.INT32),
		FieldSchema(name="int64", dtype=DataType.INT64),
		FieldSchema(name="float", dtype=DataType.FLOAT),
		FieldSchema(name="double", dtype=DataType.DOUBLE),
		FieldSchema(name="bool", dtype=DataType.BOOL),
		FieldSchema(name="varchar", dtype=DataType.VARCHAR, max_length=1000),
		FieldSchema(name="random", dtype=DataType.DOUBLE),
		FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim),
]
schema = CollectionSchema(fields)
collection = Collection("demo", schema)
```

Then we can simply create inverted index for field via:

```python
index_type = "INVERTED"
collection.create_index("int8", {"index_type": index_type})
collection.create_index("int16", {"index_type": index_type})
collection.create_index("int32", {"index_type": index_type})
collection.create_index("int64", {"index_type": index_type})
collection.create_index("float", {"index_type": index_type})
collection.create_index("double", {"index_type": index_type})
collection.create_index("bool", {"index_type": index_type})
collection.create_index("varchar", {"index_type": index_type})
```

Then, term query and range query on the field can be speed up
automatically by the inverted index:

```python
result = collection.query(expr='int64 in [1, 2, 3]', output_fields=["pk"])
result = collection.query(expr='int64 < 5', output_fields=["pk"])
result = collection.query(expr='int64 > 2997', output_fields=["pk"])
result = collection.query(expr='1 < int64 < 5', output_fields=["pk"])
```

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2023-12-31 19:50:47 +08:00
cai.zhang
c45f8a2946
fix: Import data from parquet file in streaming way (#29514)
issue: #29292

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2023-12-27 15:30:46 +08:00
XuanYang-cn
7a6aa8552a
fix: add back existing datanode metrics (#29360)
See also: #29204

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2023-12-22 14:20:43 +08:00
congqixia
f699be79f7
fix: grpc client check session skipped due to role not match (#29356)
Related to #28815

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-12-21 10:12:51 +08:00
wei liu
e41fd6fbde
enhance: Move proxy client manager to util package (#28955)
issue:  #28898

This PR move the `ProxyClientManager` to util package, in case of
reusing it's implementation in querycoord

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-12-20 19:22:42 +08:00
wayblink
2274aa3b50
fix: bulkinsert binlog didn't consider ts order when processing delta data (#29163)
#29162

Signed-off-by: wayblink <anyang.wang@zilliz.com>
2023-12-14 14:36:40 +08:00
Bingyi Sun
ad866d2889
feat: integrate storagev2 into index build process (#28995)
issue: https://github.com/milvus-io/milvus/issues/28994

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2023-12-13 17:24:38 +08:00
wei liu
fe1eeae2aa
enhance: Use mockery to replace manual mock code (#29074)
issue: #29043
This PR remove mannul mock code for proxy and data coord

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-12-13 10:46:44 +08:00
cai.zhang
49b8657f95
enhance: Support implicit type conversion for parquet (#29046)
issue: #29019

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2023-12-12 16:14:44 +08:00
congqixia
1fe5f12bd5
enhance: Add client connect wrapper to keep connection alive (#29058)
See also #29057
Add wrapper to maintain client&connection
When reset operation is needed, `Close` method shall wait until all
on-going request return

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-12-11 17:20:38 +08:00
cai.zhang
2b05460ef9
enhance: Make import-related error message clearer (#28978)
issue: #28976

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2023-12-08 10:12:38 +08:00
wayblink
6736f65345
feat: skip some empty ttMsg in Datanode flowgraph (#28756)
/kind feature

Signed-off-by: wayblink <anyang.wang@zilliz.com>
2023-12-07 01:00:37 +08:00
yihao.dai
d26b563a8b
feat: Define import API and metadata (#28731)
Define the new rpc and metadata for ImportV2.

see also: https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-12-04 19:56:35 +08:00
Bingyi Sun
45e6801ce4
feat: Add checker activation service interfaces (#28850)
issue: #28610

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2023-12-04 17:38:37 +08:00
cai.zhang
f5f4f0872e
enhance: Support importing data with parquet file (#28608)
issue: #28272

Numpy does not support array type import. 
Array type data is imported through parquet.

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2023-11-29 20:52:27 +08:00
cai.zhang
1b7a503f89
enhance: Revert import support csv format (#28760)
Revert import support csv format.
issue: #28778

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2023-11-28 14:32:27 +08:00
cai.zhang
c29b60e18e
enhance: Support Array DataType for bulk_insert (#28341)
issue: #28272 
Support array DataType for bulk_insert with json, binlog files.

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2023-11-27 13:50:27 +08:00
MrPresent-Han
fc30d291be
fix createCollection failed occasionally (#28592) (#28712)
fix: create collection seldom failure #28592

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2023-11-27 11:10:25 +08:00
wayblink
da339535d5
enhance: Merge flowgraph goroutines into 1 (#28654)
/kind enhancement
#24826

Signed-off-by: wayblink <anyang.wang@zilliz.com>
2023-11-23 19:52:25 +08:00
smellthemoon
73f2bab454
enhance:add some log when create client and get component states (#28160)
/kind improvement

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2023-11-22 09:12:22 +08:00
PowderLi
c238bff9fb
fix: symbol 'GetStorageMetrics' and 'enableDynamicField' (#28580)
/kind bug
to #28579 #28504

1. replace enableDynamic with enableDynamicField
2. cgo directly link to milvus_storage

Signed-off-by: PowderLi <min.li@zilliz.com>
2023-11-21 10:20:22 +08:00
Bingyi Sun
d7145e2c06
enhance: Update golangci_lint version (#28535)
Update golangci lint and fix some warnings

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2023-11-21 10:04:21 +08:00
PowderLi
a1c505dbd5
add internal storage metrics (#28278)
/kind improvement
issue: #28277

Signed-off-by: PowderLi <min.li@zilliz.com>
2023-11-19 17:22:25 +08:00
XuanYang-cn
40d5c902b6
Enable getting multiple segments in plan result (#28350)
Compaction plan result contained one segment for one plan. For l0
compaction would write to multiple segments, this PR expand the segments
number in plan results and refactor some names for readibility.

- Name refactory: - CompactionStateResult -> CompactionPlanResult -
CompactionResult -> CompactionSegment

See also: #27606

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2023-11-14 15:56:19 +08:00
smellthemoon
0aa90de141
Reduce the goroutine in flowgraph to 2 (#28233)
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2023-11-13 10:50:17 +08:00
wei liu
bce1054f92
Fix retry when proxy stopped (#28264)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-11-09 18:58:21 +08:00
groot
3f6b203018
Fix bulkinsert bug that segments are compacted after import (#28192)
Signed-off-by: yhmo <yihua.mo@zilliz.com>
2023-11-07 15:14:26 +08:00
wei liu
5b45a138b1
disable auto balance when old node exists (#28191)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-11-07 14:02:20 +08:00
wei liu
da41a5b51e
fix check grpc error logic (#28182)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-11-07 11:54:18 +08:00
Xiaofan
da19e49daf
Support purge old session for standalone (#28184)
Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
2023-11-06 21:21:42 +08:00
wei liu
68a86471ba
fix grpc client retry on node server not match error (#28169)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-11-03 23:42:16 +08:00
wayblink
00ae019ff0
Use go 1.20 csv_reader to keep milvus go=1.18 limitation (#28080)
Signed-off-by: wayblink <anyang.wang@zilliz.com>
2023-11-03 10:40:16 +08:00
wei liu
ecec5dfcfd
fix retry on offline node (#28079)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-11-03 10:14:16 +08:00
groot
abd5b199cc
Bulkinsert support pure list json (#27990)
Signed-off-by: yhmo <yihua.mo@zilliz.com>
2023-11-01 19:02:13 +08:00
Enwei Jiao
8ae9c947ae
Use OpenDAL to access object store (#25642)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-11-01 09:00:14 +08:00
KumaJie
e88212ba4b
Add CSV file import function (#27149)
Signed-off-by: kuma <675613722@qq.com>
Co-authored-by: kuma <675613722@qq.com>
2023-10-31 22:47:23 +08:00
yah01
9658367a3c
Refine chunk manager errors (#27590)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-10-31 12:18:15 +08:00
Enwei Jiao
4a33391b8f
rename createindex (#27903)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-10-27 10:12:14 +08:00
Filip Haltmayer
6b1a106a31
Moving etcd client into session (#27069)
Signed-off-by: Filip Haltmayer <filip.haltmayer@zilliz.com>
2023-10-27 07:36:12 +08:00
zhagnlu
6060dd7ea8
Add chunk manager request timeout (#27692)
Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2023-10-23 20:08:08 +08:00
SimFG
9b0ecbdca7
Support to replicate the mq message (#27240)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-10-20 14:26:09 +08:00
congqixia
bcbe98aba1
Add querynode client wrapper and avoid grpc in standalone mode (#27781)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-10-19 11:10:07 +08:00
jaime
ac2d1bb5c2
Support receive signals from parent process (#27756)
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-10-18 20:20:11 +08:00
yihao.dai
49b3a12804
Return newly defined merr instead of grpc unimplemented err (#27751)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-10-18 15:32:11 +08:00
congqixia
2f201c25e2
Remove deprecated io/ioutil usage (#27747)
`io/ioutil` package is deprecated, use `io`,`os` package replacement
also added golangci-lint rule to block future reference

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Co-authored-by: guoguangwu <guoguangwu@magic-shield.com>
2023-10-17 20:32:09 +08:00
wei liu
7aa862c0ea
fix retry on unimplemented rpc error (#27639)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-10-16 10:50:09 +08:00
jaime
ec1fe3549e
Add a stop hook to clean session (#27564)
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-10-16 10:24:10 +08:00
yah01
be980fbc38
Refine state check (#27541)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-10-11 21:01:35 +08:00