issue: https://github.com/milvus-io/milvus/issues/42148
Optimized from
Go VectorArray → VectorArray Proto → Binary → C++ VectorArray Proto →
C++ VectorArray local impl → Memory
to
Go VectorArray → Arrow ListArray → Memory
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
#42032
Also, fix the cacheoptfield method to work in storagev2.
Also, change the sparse related interface for knowhere version bump
#43974 .
Also, includes https://github.com/milvus-io/milvus/pull/44046 for metric
lost.
---------
Signed-off-by: chasingegg <chao.gao@zilliz.com>
Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com>
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Co-authored-by: marcelo.chen <marcelo.chen@zilliz.com>
Co-authored-by: Congqi Xia <congqi.xia@zilliz.com>
Ref https://github.com/milvus-io/milvus/issues/42148
This PR supports create index for vector array (now, only for
`DataType.FLOAT_VECTOR`) and search on it.
The index type supported in this PR is `EMB_LIST_HNSW` and the metric
type is `MAX_SIM` only.
The way to use it:
```python
milvus_client = MilvusClient("xxx:19530")
schema = milvus_client.create_schema(enable_dynamic_field=True, auto_id=True)
...
struct_schema = milvus_client.create_struct_array_field_schema("struct_array_field")
...
struct_schema.add_field("struct_float_vec", DataType.ARRAY_OF_VECTOR, element_type=DataType.FLOAT_VECTOR, dim=128, max_capacity=1000)
...
schema.add_struct_array_field(struct_schema)
index_params = milvus_client.prepare_index_params()
index_params.add_index(field_name="struct_float_vec", index_type="EMB_LIST_HNSW", metric_type="MAX_SIM", index_params={"nlist": 128})
...
milvus_client.create_index(COLLECTION_NAME, schema=schema, index_params=index_params)
```
Note: This PR uses `Lims` to convey offsets of the vector array to
knowhere where vectors of multiple vector arrays are concatenated and we
need offsets to specify which vectors belong to which vector array.
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
Ref https://github.com/milvus-io/milvus/issues/42148
This PR mainly enables segcore to support array of vector (read and
write, but not indexing). Now only float vector as the element type is
supported.
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
support parallel loading sealed and growing segments with storage v2
format by async reading row groups.
related: #39173
---------
Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
issue:https://github.com/milvus-io/milvus/issues/27576
# Main Goals
1. Create and describe collections with geospatial fields, enabling both
client and server to recognize and process geo fields.
2. Insert geospatial data as payload values in the insert binlog, and
print the values for verification.
3. Load segments containing geospatial data into memory.
4. Ensure query outputs can display geospatial data.
5. Support filtering on GIS functions for geospatial columns.
# Solution
1. **Add Type**: Modify the Milvus core by adding a Geospatial type in
both the C++ and Go code layers, defining the Geospatial data structure
and the corresponding interfaces.
2. **Dependency Libraries**: Introduce necessary geospatial data
processing libraries. In the C++ source code, use Conan package
management to include the GDAL library. In the Go source code, add the
go-geom library to the go.mod file.
3. **Protocol Interface**: Revise the Milvus protocol to provide
mechanisms for Geospatial message serialization and deserialization.
4. **Data Pipeline**: Facilitate interaction between the client and
proxy using the WKT format for geospatial data. The proxy will convert
all data into WKB format for downstream processing, providing column
data interfaces, segment encapsulation, segment loading, payload
writing, and cache block management.
5. **Query Operators**: Implement simple display and support for filter
queries. Initially, focus on filtering based on spatial relationships
for a single column of geospatial literal values, providing parsing and
execution for query expressions.
6. **Client Modification**: Enable the client to handle user input for
geospatial data and facilitate end-to-end testing.Check the modification
in pymilvus.
---------
Signed-off-by: tasty-gumi <1021989072@qq.com>
#34778#34849
fix two problems:
1. count(*) incorrect, if growing insert duplicated (pk, timestamp)
pairs that pk and timestamp all same, need to keep just one pair.
2. count(*) may core dump, if get_real_count interface get snapshot and
do mvcc at not consistency status, mainly happens under concurrency.
Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
1. support read and write null in segcore
will store valid_data(use uint8_t type to save memory) in fieldData.
2. support load null
binlog reader read and write data into column(sealed segment),
insertRecord(growing segment). In sealed segment, store valid_data
directly. In growing segment, considering prior implementation and easy
code reading, it covert uint8_t to fbvector<bool>, which may optimize in
future.
3. retrieve valid_data.
parse valid_data in search/query.
#31728
---------
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
This PR adds the ability to search/get sparse float vectors in segcore,
and added unit tests by modifying lots of existing tests into
parameterized ones.
https://github.com/milvus-io/milvus/issues/29419
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
- Reduce 1x copy for varchar/string/JSON/array types while retrieving
- Reduce 1x copy for int8/int16 while retrieving
Signed-off-by: yah01 <yah2er0ne@outlook.com>