milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-06 19:02:18 +08:00

Author	SHA1	Message	Date
cai.zhang	19346fa389	feat: Geospatial Data Type and GIS Function support for milvus (#44547 ) issue: #43427 This pr's main goal is merge #37417 to milvus 2.5 without conflicts. # Main Goals 1. Create and describe collections with geospatial type 2. Insert geospatial data into the insert binlog 3. Load segments containing geospatial data into memory 4. Enable query and search can display geospatial data 5. Support using GIS funtions like ST_EQUALS in query 6. Support R-Tree index for geometry type # Solution 1. Add Type: Modify the Milvus core by adding a Geospatial type in both the C++ and Go code layers, defining the Geospatial data structure and the corresponding interfaces. 2. Dependency Libraries: Introduce necessary geospatial data processing libraries. In the C++ source code, use Conan package management to include the GDAL library. In the Go source code, add the go-geom library to the go.mod file. 3. Protocol Interface: Revise the Milvus protocol to provide mechanisms for Geospatial message serialization and deserialization. 4. Data Pipeline: Facilitate interaction between the client and proxy using the WKT format for geospatial data. The proxy will convert all data into WKB format for downstream processing, providing column data interfaces, segment encapsulation, segment loading, payload writing, and cache block management. 5. Query Operators: Implement simple display and support for filter queries. Initially, focus on filtering based on spatial relationships for a single column of geospatial literal values, providing parsing and execution for query expressions.Now only support brutal search 7. Client Modification: Enable the client to handle user input for geospatial data and facilitate end-to-end testing.Check the modification in pymilvus. --------- Signed-off-by: Yinwei Li <yinwei.li@zilliz.com> Signed-off-by: Cai Zhang <cai.zhang@zilliz.com> Co-authored-by: ZhuXi <150327960+Yinwei-Yu@users.noreply.github.com>	2025-09-28 19:43:05 +08:00
Gao	2e98cb0103	enhance: load resource estimation for tiered index (#44171 ) issue: https://github.com/milvus-io/milvus/issues/42032 - Use bytes to estimate load resource in the whole estimation procedure - Add num_rows and dim info for vector index to better estimate - Disable eviction for tiered index's meta --------- Signed-off-by: chasingegg <chao.gao@zilliz.com>	2025-09-04 19:41:53 +08:00
Spade A	d6a428e880	feat: impl StructArray -- support create index for vector array (embedding list) and search on it (#43726 ) Ref https://github.com/milvus-io/milvus/issues/42148 This PR supports create index for vector array (now, only for `DataType.FLOAT_VECTOR`) and search on it. The index type supported in this PR is `EMB_LIST_HNSW` and the metric type is `MAX_SIM` only. The way to use it: ```python milvus_client = MilvusClient("xxx:19530") schema = milvus_client.create_schema(enable_dynamic_field=True, auto_id=True) ... struct_schema = milvus_client.create_struct_array_field_schema("struct_array_field") ... struct_schema.add_field("struct_float_vec", DataType.ARRAY_OF_VECTOR, element_type=DataType.FLOAT_VECTOR, dim=128, max_capacity=1000) ... schema.add_struct_array_field(struct_schema) index_params = milvus_client.prepare_index_params() index_params.add_index(field_name="struct_float_vec", index_type="EMB_LIST_HNSW", metric_type="MAX_SIM", index_params={"nlist": 128}) ... milvus_client.create_index(COLLECTION_NAME, schema=schema, index_params=index_params) ``` Note: This PR uses `Lims` to convey offsets of the vector array to knowhere where vectors of multiple vector arrays are concatenated and we need offsets to specify which vectors belong to which vector array. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-08-20 10:27:46 +08:00
Spade A	10fe53ff59	feat: support json for ngram (#43170 ) Ref https://github.com/milvus-io/milvus/issues/42053 This PR enable ngram to support json data type. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-07-25 10:28:54 +08:00
Spade A	26ec841feb	feat: optimize `Like` query with n-gram (#41803 ) Ref #42053 This is the first PR for optimizing `LIKE` with ngram inverted index. Now, only VARCHAR data type is supported and only InnerMatch LIKE (%xxx%) query is supported. How to use it: ``` milvus_client = MilvusClient("http://localhost:19530") schema = milvus_client.create_schema() ... schema.add_field("content_ngram", DataType.VARCHAR, max_length=10000) ... index_params = milvus_client.prepare_index_params() index_params.add_index(field_name="content_ngram", index_type="NGRAM", index_name="ngram_index", min_gram=2, max_gram=3) milvus_client.create_collection(COLLECTION_NAME, ...) ``` min_gram and max_gram controls how we tokenize the documents. For example, for min_gram=2 and max_gram=4, we will tokenize each document with 2-gram, 3-gram and 4-gram. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-07-01 10:08:44 +08:00
Spade A	911a8df17c	feat: impl StructArray -- data storage support in segcore (#42406 ) Ref https://github.com/milvus-io/milvus/issues/42148 This PR mainly enables segcore to support array of vector (read and write, but not indexing). Now only float vector as the element type is supported. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-06-12 14:38:35 +08:00
Bingyi Sun	cc5ac1c220	enhance: Support cast function for json index (#41949 ) issue: #41948 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-06-05 19:42:32 +08:00
foxspy	3dbad0306a	fix: Add bypass thread pool mode to avoid growing indexes blocking insert/load (#41012 ) issue: #40825 Signed-off-by: xianliang.li <xianliang.li@zilliz.com>	2025-05-20 14:30:24 +08:00
foxspy	e2ddbe4962	feat: add cachinglayer to index (#41653 ) issue: #41435 Signed-off-by: xianliang.li <xianliang.li@zilliz.com>	2025-05-08 10:12:54 +08:00
Bingyi Sun	9676365af9	fix: Fix json index not equal filter (#40647 ) issue: #35528 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-27 23:06:23 +08:00
Bingyi Sun	b59555057d	feat: support json index (#36750 ) https://github.com/milvus-io/milvus/issues/35528 This PR adds json index support for json and dynamic fields. Now you can only do unary query like 'a["b"] > 1' using this index. We will support more filter type later. basic usage: ``` collection.create_index("json_field", {"index_type": "INVERTED", "params": {"json_cast_type": DataType.STRING, "json_path": 'json_field["a"]["b"]'}}) ``` There are some limits to use this index: 1. If a record does not have the json path you specify, it will be ignored and there will not be an error. 2. If a value of the json path fails to be cast to the type you specify, it will be ignored and there will not be an error. 3. A specific json path can have only one json index. 4. If you try to create more than one json indexes for one json field, sdk(pymilvus<=2.4.7) may return immediately because of internal implementation. This will be fixed in a later version. --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-02-15 14:06:15 +08:00
Spade A	8c4ba70a4c	fix: enable to build index with single segment (#39233 ) fix https://github.com/milvus-io/milvus/issues/39232 --------- Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-01-16 11:01:06 +08:00
SimFG	130a923dec	enhance: the estimate method when loading the collection (#36307 ) - issue: #36530 --------- Signed-off-by: SimFG <bang.fu@zilliz.com> Signed-off-by: xianliang.li <xianliang.li@zilliz.com> Co-authored-by: xianliang.li <xianliang.li@zilliz.com>	2024-10-09 17:35:19 +08:00
zhenshan.cao	aa247f192d	enhance: remove unused code for StorageV2 (#35132 ) issue: https://github.com/milvus-io/milvus/issues/34168 Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2024-08-01 12:08:13 +08:00
zhagnlu	d43ec4db0b	enhance: support array bitmap index (#33527 ) #32900 --------- Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-06-16 21:51:58 +08:00
Jiquan Long	0c5d8660aa	feat: support inverted index for array (#33452 ) issue: https://github.com/milvus-io/milvus/issues/27704 --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2024-05-31 09:47:47 +08:00
yah01	99e0f1e65a	enhance: unable to compile C++ tests (#29616 ) The tests need to call a private method, Milvus uses `#define` to replace private with public, the hack trick works but would be broken if the including order changed. This uses friend to make all things work well Signed-off-by: yah01 <yang.cen@zilliz.com> Signed-off-by: yah01 <yah2er0ne@outlook.com>	2024-01-04 13:20:46 +08:00
Jiquan Long	3f46c6d459	feat: support inverted index (#28783 ) issue: https://github.com/milvus-io/milvus/issues/27704 Add inverted index for some data types in Milvus. This index type can save a lot of memory compared to loading all data into RAM and speed up the term query and range query. Supported: `INT8`, `INT16`, `INT32`, `INT64`, `FLOAT`, `DOUBLE`, `BOOL` and `VARCHAR`. Not supported: `ARRAY` and `JSON`. Note: - The inverted index for `VARCHAR` is not designed to serve full-text search now. We will treat every row as a whole keyword instead of tokenizing it into multiple terms. - The inverted index don't support retrieval well, so if you create inverted index for field, those operations which depend on the raw data will fallback to use chunk storage, which will bring some performance loss. For example, comparisons between two columns and retrieval of output fields. The inverted index is very easy to be used. Taking below collection as an example: ```python fields = [ FieldSchema(name="pk", dtype=DataType.VARCHAR, is_primary=True, auto_id=False, max_length=100), FieldSchema(name="int8", dtype=DataType.INT8), FieldSchema(name="int16", dtype=DataType.INT16), FieldSchema(name="int32", dtype=DataType.INT32), FieldSchema(name="int64", dtype=DataType.INT64), FieldSchema(name="float", dtype=DataType.FLOAT), FieldSchema(name="double", dtype=DataType.DOUBLE), FieldSchema(name="bool", dtype=DataType.BOOL), FieldSchema(name="varchar", dtype=DataType.VARCHAR, max_length=1000), FieldSchema(name="random", dtype=DataType.DOUBLE), FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim), ] schema = CollectionSchema(fields) collection = Collection("demo", schema) ``` Then we can simply create inverted index for field via: ```python index_type = "INVERTED" collection.create_index("int8", {"index_type": index_type}) collection.create_index("int16", {"index_type": index_type}) collection.create_index("int32", {"index_type": index_type}) collection.create_index("int64", {"index_type": index_type}) collection.create_index("float", {"index_type": index_type}) collection.create_index("double", {"index_type": index_type}) collection.create_index("bool", {"index_type": index_type}) collection.create_index("varchar", {"index_type": index_type}) ``` Then, term query and range query on the field can be speed up automatically by the inverted index: ```python result = collection.query(expr='int64 in [1, 2, 3]', output_fields=["pk"]) result = collection.query(expr='int64 < 5', output_fields=["pk"]) result = collection.query(expr='int64 > 2997', output_fields=["pk"]) result = collection.query(expr='1 < int64 < 5', output_fields=["pk"]) ``` --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2023-12-31 19:50:47 +08:00
Bingyi Sun	36f69ea031	feat: integrate storagev2 in building index of segcore (#28768 ) issue: https://github.com/milvus-io/milvus/issues/28655 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2023-12-05 16:48:54 +08:00
Bingyi Sun	e5ce385ffd	enhance: remove -inl.h files (#28674 ) issue: https://github.com/milvus-io/milvus/issues/28673 Move template implementations from -inl.h to .cpp file and make explicit instantiation Signed-off-by: sunby <sunbingyi1992@gmail.com>	2023-11-23 17:20:25 +08:00
foxspy	370b6fde58	milvus support multi index engine (#27178 ) Co-authored-by: longjiquan <jiquan.long@zilliz.com>	2023-09-22 09:59:26 +08:00
xige-16	04082b3de2	Migrate the ability to upload and download binlog to cpp (#22984 ) Signed-off-by: xige-16 <xi.ge@zilliz.com>	2023-06-25 14:38:44 +08:00
Cai Yudong	0e9a4478e3	Remove useless index mode (#22934 ) Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>	2023-03-23 21:39:59 +08:00
yah01	bdd6bc7695	Re-format cpp code (#22513 ) Signed-off-by: yah01 <yang.cen@zilliz.com>	2023-03-02 15:55:49 +08:00
xige-16	428840178c	Support diskann index for vector field (#19093 ) Signed-off-by: xige-16 <xi.ge@zilliz.com> Signed-off-by: xige-16 <xi.ge@zilliz.com>	2022-09-21 20:16:51 +08:00
Cai Yudong	d5db4ae463	Merge utils/Types.h with common/Types.h (#16445 ) Signed-off-by: yudong.cai <yudong.cai@zilliz.com>	2022-05-02 22:21:51 +08:00
Jiquan Long	fd589baca7	Integrates marisa trie index (#16192 ) Signed-off-by: dragondriver <jiquan.long@zilliz.com>	2022-04-01 15:31:29 +08:00
Jiquan Long	48706f416f	Migrate scalar index from knowhere (#16174 ) Signed-off-by: dragondriver <jiquan.long@zilliz.com>	2022-03-24 14:57:26 +08:00

28 Commits