related: #25324
Search GroupBy function, used to aggregate result entities based on a
specific scalar column.
several points to mention:
1. Temporarliy, the whole groupby is implemented separated from
iterative expr framework **for the first period**
2. In the long term, the groupBy operation will be incorporated into the
iterative expr framework:https://github.com/milvus-io/milvus/pull/28166
3. This pr includes some unrelated mocked interface regarding alterIndex
due to some unworth-to-mention reasons. All these un-associated content
will be removed before the final pr is merged. This version of pr is
only for review
4. All other related details were commented in the files comparison
Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/29230
this pr do these things:
1. add gpu brute force;
2. limit gpu index only support l2 / ip;
Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
issue: #29672
the storage account need privileges of actions
`Microsoft.Storage/storageAccounts/blobServices/containers/blobs/*` at
least
Signed-off-by: PowderLi <min.li@zilliz.com>
The tests need to call a private method, Milvus uses `#define` to
replace private with public, the hack trick works but would be broken if
the including order changed.
This uses friend to make all things work well
Signed-off-by: yah01 <yang.cen@zilliz.com>
Signed-off-by: yah01 <yah2er0ne@outlook.com>
issue: #29494
1. link with install path's libblob-chunk-manager
2. performance of `ShouldBindWith` is better than `ShouldBindBodyWith`
3. the middleware shouldn't read the unrefreshed parameter repeatly
Signed-off-by: PowderLi <min.li@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/27704
Add inverted index for some data types in Milvus. This index type can
save a lot of memory compared to loading all data into RAM and speed up
the term query and range query.
Supported: `INT8`, `INT16`, `INT32`, `INT64`, `FLOAT`, `DOUBLE`, `BOOL`
and `VARCHAR`.
Not supported: `ARRAY` and `JSON`.
Note:
- The inverted index for `VARCHAR` is not designed to serve full-text
search now. We will treat every row as a whole keyword instead of
tokenizing it into multiple terms.
- The inverted index don't support retrieval well, so if you create
inverted index for field, those operations which depend on the raw data
will fallback to use chunk storage, which will bring some performance
loss. For example, comparisons between two columns and retrieval of
output fields.
The inverted index is very easy to be used.
Taking below collection as an example:
```python
fields = [
FieldSchema(name="pk", dtype=DataType.VARCHAR, is_primary=True, auto_id=False, max_length=100),
FieldSchema(name="int8", dtype=DataType.INT8),
FieldSchema(name="int16", dtype=DataType.INT16),
FieldSchema(name="int32", dtype=DataType.INT32),
FieldSchema(name="int64", dtype=DataType.INT64),
FieldSchema(name="float", dtype=DataType.FLOAT),
FieldSchema(name="double", dtype=DataType.DOUBLE),
FieldSchema(name="bool", dtype=DataType.BOOL),
FieldSchema(name="varchar", dtype=DataType.VARCHAR, max_length=1000),
FieldSchema(name="random", dtype=DataType.DOUBLE),
FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim),
]
schema = CollectionSchema(fields)
collection = Collection("demo", schema)
```
Then we can simply create inverted index for field via:
```python
index_type = "INVERTED"
collection.create_index("int8", {"index_type": index_type})
collection.create_index("int16", {"index_type": index_type})
collection.create_index("int32", {"index_type": index_type})
collection.create_index("int64", {"index_type": index_type})
collection.create_index("float", {"index_type": index_type})
collection.create_index("double", {"index_type": index_type})
collection.create_index("bool", {"index_type": index_type})
collection.create_index("varchar", {"index_type": index_type})
```
Then, term query and range query on the field can be speed up
automatically by the inverted index:
```python
result = collection.query(expr='int64 in [1, 2, 3]', output_fields=["pk"])
result = collection.query(expr='int64 < 5', output_fields=["pk"])
result = collection.query(expr='int64 > 2997', output_fields=["pk"])
result = collection.query(expr='1 < int64 < 5', output_fields=["pk"])
```
---------
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
issue:https://github.com/milvus-io/milvus/issues/29230
this pr do two things about cagra index:
a.milvus yaml config support gpu memory settings
b.add cagra-params check
Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
Co-authored-by: yusheng.ma <yusheng.ma@zilliz.com>
Do not fill the invalid ids for the empty results, it will incur useless
memory overhead and reduce overhead when nq and topk is large.
---------
Signed-off-by: chasingegg <chao.gao@zilliz.com>
See also #28795
Orignal `C.NewSegment` may panic if some condition is not met, this pr
changes response struct to `CNewSegmentResult`, which contains
`C.CStatus` and may return catched exception
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #28781#28329
1. There is no need to call `DescribeCollection`, if the collection's
schema is found in the globalMetaCache
2. did `GetProperties` to check the access to Azure Blob Service while
construct the ChunkManager
Signed-off-by: PowderLi <min.li@zilliz.com>
for now the assert method in segcore could accept a string information,
too many codes don't print the value they assert.
make it happy
related #28811
---------
Signed-off-by: yah01 <yah2er0ne@outlook.com>
Now segcore load system field info as well, the growing segment
assertion shall not pass with "+ 2" value
This will cause all growing segments load failure
Fix#28801
Related to #28478
See also #28524
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #27678
when interimIndex = true, memory predict should be update with the
memory usage of binlog index build process.
Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>