milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-07 19:31:51 +08:00

Author	SHA1	Message	Date
foxspy	8e64bf929c	enhance: add scalar filtering and vector search latency metrics (#34785 ) add scalar filtering and vector search latency metrics to distinguish the cost of scalar filtering. To add metrics in query chain, add a monitor module and move the metric files from original storage module. issue: #34780 Signed-off-by: xianliang.li <xianliang.li@zilliz.com>	2024-07-19 14:01:39 +08:00
zhagnlu	f1b2f7b640	enhance: refactor bitmap index and internal hybrid index (#34450 ) #32900 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-07-18 10:39:42 +08:00
Patrick Weizhi Xu	104d0966b7	feat: support partition key isolation (#34336 ) issue: #34332 --------- Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>	2024-07-11 19:01:35 +08:00
zhagnlu	cc1bc07bfd	enhance: add log to bitmap index (#34197 ) #32900 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-06-30 20:02:06 +08:00
Cai Yudong	ad90360162	enhance: Update knowhere commit (#34223 ) Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>	2024-06-27 18:20:06 +08:00
zhagnlu	03a3f50892	enhance: add skip using array index when some situation (#33947 ) #32900 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-06-23 21:26:02 +08:00
zhagnlu	0d7ea8ec42	enhance: Enhance and correct exception module (#33705 ) #33704 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-06-23 21:22:01 +08:00
Gao	0d20303e54	fix: fix binary vector data size (#33750 ) issue: https://github.com/milvus-io/milvus/issues/22837 - fix byte size wrong for binary vectors - fix the expect/actual error msg Signed-off-by: chasingegg <chao.gao@zilliz.com>	2024-06-18 21:39:59 +08:00
zhagnlu	d43ec4db0b	enhance: support array bitmap index (#33527 ) #32900 --------- Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-06-16 21:51:58 +08:00
Buqian Zheng	47b04ea167	enhance: support sparse cardinal hnsw index (#33656 ) issue: #29419 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2024-06-12 16:57:55 +08:00
Jiquan Long	ecf2bcee42	enhance: speed up array-equal operator via inverted index (#33633 ) fix: #33632 --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2024-06-11 14:13:54 +08:00
cqy123456	703fc73f71	enhance: disk index support binary vector (#33631 ) issue:https://github.com/milvus-io/milvus/issues/22837 related https://github.com/milvus-io/milvus/pull/33575 Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>	2024-06-05 19:37:57 +08:00
Gao	545d4725fb	fix: correct get vector data size for bf16/fp16/binary vector (#33377 ) related #22837 Signed-off-by: chasingegg <chao.gao@zilliz.com>	2024-06-05 14:31:57 +08:00
Jiquan Long	0c5d8660aa	feat: support inverted index for array (#33452 ) issue: https://github.com/milvus-io/milvus/issues/27704 --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2024-05-31 09:47:47 +08:00
zhagnlu	589d4dfd82	enhance: optimize bitmap index (#33358 ) #32900 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-05-30 13:09:43 +08:00
Buqian Zheng	c5918ffbdb	enhance: mark sparse inverted index as mmap-able (#33281 ) issue: #29419 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2024-05-23 14:11:42 +08:00
zhagnlu	d669fbcf46	enhance: support bitmap index for scalar type (#32902 ) #32900 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-05-19 21:49:38 +08:00
Bingyi Sun	fecd9c21ba	feat: LRU cache implementation (#32567 ) issue: https://github.com/milvus-io/milvus/issues/32783 This pr is the implementation of lru cache on branch lru-dev. Signed-off-by: sunby <sunbingyi1992@gmail.com> Co-authored-by: chyezh <chyezh@outlook.com> Co-authored-by: MrPresent-Han <chun.han@zilliz.com> Co-authored-by: Ted Xu <ted.xu@zilliz.com> Co-authored-by: jaime <yun.zhang@zilliz.com> Co-authored-by: wayblink <anyang.wang@zilliz.com>	2024-05-06 20:29:30 +08:00
Gao	0fab265eed	enhance: update knowhere and some header changes (#32468 ) Signed-off-by: chasingegg <chao.gao@zilliz.com>	2024-04-22 15:47:26 +08:00
Chun Han	337cc0756d	fix: lack good results for insufficient ef(#29883 ) (#32080 ) related: #29883 Signed-off-by: MrPresent-Han <chun.han@zilliz.com>	2024-04-13 22:13:23 +08:00
Jiquan Long	4fb85be525	fix: put inverted index into local storage (#32209 ) issue: https://github.com/milvus-io/milvus/issues/32154 Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2024-04-13 21:57:19 +08:00
cai.zhang	1b767669a4	enhance: Throw error instead of crash when index cannot be built (#31844 ) issue: #27589 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-04-09 11:51:18 +08:00
cqy123456	aba4993c6c	fix: fix some fp16/bf16 code miss in segcore. (#31771 ) issue：https://github.com/milvus-io/milvus/issues/22837 Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>	2024-04-07 14:13:16 +08:00
Cai Yudong	246586be27	enhance: Unify data type check APIs under internal/core (#31800 ) Issue: #22837 Move and rename following C++ APIs: datatype_sizeof() ==> GetDataTypeSize() datatype_name() ==> GetDataTypeName() datatype_is_vector() / IsVectorType() ==> IsVectorDataType() datatype_is_variable() ==> IsVariableDataType() datatype_is_sparse_vector() ==> IsSparseFloatVectorDataType() datatype_is_string() / IsString() ==> IsDataTypeString() datatype_is_floating() / IsFloat() ==> IsDataTypeFloat() datatype_is_binary() ==> IsDataTypeBinary() datatype_is_json() ==> IsDataTypeJson() datatype_is_array() ==> IsDataTypeArray() datatype_is_variable() == IsDataTypeVariable() datatype_is_integer() / IsIntegral() ==> IsDataTypeInteger() Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>	2024-04-02 19:15:14 +08:00
Cai Yudong	675a5dc822	fix: Save traceID and spanID as std::vector into search config (#31278 ) Issue: #30961 Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>	2024-03-29 14:29:11 +08:00
Buqian Zheng	96cfae55a5	feat: [Sparse Float Vector] segcore to support sparse vector search and get raw vector by id (#30629 ) This PR adds the ability to search/get sparse float vectors in segcore, and added unit tests by modifying lots of existing tests into parameterized ones. https://github.com/milvus-io/milvus/issues/29419 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2024-03-12 09:16:30 -07:00
Buqian Zheng	070dfc77bf	feat: [Sparse Float Vector] segcore basics and index building (#30357 ) This commit adds sparse float vector support to segcore with the following: 1. data type enum declarations 2. Adds corresponding data structures for handling sparse float vectors in various scenarios, including: * FieldData as a bridge between the binlog and the in memory data structures * mmap::Column as the in memory representation of a sparse float vector column of a sealed segment; * ConcurrentVector as the in memory representation of a sparse float vector of a growing segment which supports inserts. 3. Adds logic in payload reader/writer to serialize/deserialize from/to binlog 4. Adds the ability to allow the index node to build sparse float vector index 5. Adds the ability to allow the query node to build growing index for growing segment and temp index for sealed segment without index built This commit also includes some code cleanness, comment improvement, and some unit tests for sparse vector. https://github.com/milvus-io/milvus/issues/29419 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2024-03-11 14:45:02 +08:00
Cai Yudong	a99143dd52	fix: Save traceID and spanID as hex string into search config (#31071 ) Issue: #30961 Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>	2024-03-11 14:21:01 +08:00
Cai Yudong	122981aeb9	fix: Disable knowhere trace as a quick fix (#31055 ) Issue: #30961 Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>	2024-03-08 15:27:01 +08:00
MrPresent-Han	29f44f840a	enhance: refine groupBy error msg(#29968 ) (#30920 ) related: #29968 Signed-off-by: MrPresent-Han <chun.han@zilliz.com>	2024-03-01 18:53:03 +08:00
Cai Yudong	8a219e0102	feat: Support knowhere trace using OpenTelemetry (#30750 ) Issue: #21508 Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>	2024-02-28 12:29:00 +08:00
MrPresent-Han	77eb6defb1	feat: support groupby on growing and non-indexed sealed egment(#30307 ) (#30644 ) related: #30308 Signed-off-by: MrPresent-Han <chun.han@zilliz.com>	2024-02-21 14:02:53 +08:00
zhagnlu	976b6fc0e4	enhance: change opendal as compile configurable (#30384 ) #30373 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-02-20 19:16:52 +08:00
MrPresent-Han	92d1d744ae	fix: groupby results lack good results(#29883 ) (#30428 ) related: #29883 Signed-off-by: MrPresent-Han <chun.han@zilliz.com>	2024-02-06 17:08:34 +08:00
Jiquan Long	e549148a19	enhance: full-support for wildcard pattern matching (#30288 ) issue: #29988 This pr adds full-support for wildcard pattern matching from end to end. Before this pr, the users can only use prefix match in their expression, for example, "like 'prefix%'". With this pr, more flexible syntax can be combined. To do so, this pr makes these changes: - 1. support regex query both on index and raw data; - 2. translate the pattern matching to regex query, so that it can be handled by the regex query logic; - 3. loose the limit of the expression parsing, which allows general pattern matching syntax; With the support of regex query in segcore backend, we can also add mysql-like `REGEXP` syntax later easily. --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2024-02-01 12:37:04 +08:00
zhagnlu	8c58d9af67	enhance: optimize marisa trie range search for performance (#30079 ) #30078 #29986 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-01-25 10:07:00 +08:00
Patrick Weizhi Xu	0907d76253	enhance: pass partition key scalar info if enabled when build vector index (#29931 ) issue: #29892 Pass optional scalar IVF offsets to Cardinal Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>	2024-01-24 00:04:55 +08:00
cqy123456	42bb4e37e5	fix:diskann search crash when search list = 9999999999 (#30185 ) issue: https://github.com/milvus-io/milvus/issues/29020 Json can't not pass a max_int32 value to int32_t, so let knowhere check value range by itself. After fix this, pymilvus will report: pymilvus.exceptions.MilvusException: <MilvusException: (code=65535, message=fail to search on QueryNode 6: worker(6) query failed: => failed to search: arithmetic overflow: param search_list_size should be at most 2147483647)> Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>	2024-01-23 14:46:55 +08:00
Bingyi Sun	8030b90891	fix: correct file name when loading index (#29985 ) issue: #29973 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-01-16 10:24:52 +08:00
yah01	f2e36db488	enhance: optimize the loading index performance (#29894 ) this utilizes concurrent loading Signed-off-by: yah01 <yang.cen@zilliz.com>	2024-01-12 17:44:51 +08:00
Xu Tong	e429965f32	Add float16 approve for multi-type part (#28427 ) issue：https://github.com/milvus-io/milvus/issues/22837 Add bfloat16 vector, add the index part of float16 vector. Signed-off-by: Writer-X <1256866856@qq.com>	2024-01-11 15:48:51 +08:00
yah01	031243fee7	feat: support mmap for marisa trie (#29613 ) this supports mmap for marisa trie index related https://github.com/milvus-io/milvus/issues/21866 Signed-off-by: yah01 <yang.cen@zilliz.com>	2024-01-11 10:22:50 +08:00
congqixia	d6429933a7	enhance: make Load process traceable in querynode & segcore (#29858 ) See also #29803 This PR: - Add trace span for `LoadIndex` & `LoadFieldData` in segment loader - Add `TraceCtx` parameter for `Index.Load` in segcore - Add span for ReadFiles & Engine Load for Memory/Disk Vector index --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-01-10 21:58:51 +08:00
Jiquan Long	e9f3df3626	fix: inverted index file not found (#29695 ) issue: https://github.com/milvus-io/milvus/issues/29654 --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2024-01-07 20:26:49 +08:00
foxspy	271edc6669	fix: throw exception when upload file failed for DiskIndex (#29627 ) related to : #29417 cardinal indexes upload index files in `Serialize` interface, and throw exception when the `Serialize` failed. Signed-off-by: xianliang <xianliang.li@zilliz.com>	2024-01-07 20:03:13 +08:00
MrPresent-Han	9e2e7157e9	feat: support search_group_by for milvus(#25324 ) (#28983 ) related: #25324 Search GroupBy function, used to aggregate result entities based on a specific scalar column. several points to mention: 1. Temporarliy, the whole groupby is implemented separated from iterative expr framework for the first period 2. In the long term, the groupBy operation will be incorporated into the iterative expr framework:https://github.com/milvus-io/milvus/pull/28166 3. This pr includes some unrelated mocked interface regarding alterIndex due to some unworth-to-mention reasons. All these un-associated content will be removed before the final pr is merged. This version of pr is only for review 4. All other related details were commented in the files comparison Signed-off-by: MrPresent-Han <chun.han@zilliz.com>	2024-01-05 15:50:47 +08:00
yah01	99e0f1e65a	enhance: unable to compile C++ tests (#29616 ) The tests need to call a private method, Milvus uses `#define` to replace private with public, the hack trick works but would be broken if the including order changed. This uses friend to make all things work well Signed-off-by: yah01 <yang.cen@zilliz.com> Signed-off-by: yah01 <yah2er0ne@outlook.com>	2024-01-04 13:20:46 +08:00
Jiquan Long	3f46c6d459	feat: support inverted index (#28783 ) issue: https://github.com/milvus-io/milvus/issues/27704 Add inverted index for some data types in Milvus. This index type can save a lot of memory compared to loading all data into RAM and speed up the term query and range query. Supported: `INT8`, `INT16`, `INT32`, `INT64`, `FLOAT`, `DOUBLE`, `BOOL` and `VARCHAR`. Not supported: `ARRAY` and `JSON`. Note: - The inverted index for `VARCHAR` is not designed to serve full-text search now. We will treat every row as a whole keyword instead of tokenizing it into multiple terms. - The inverted index don't support retrieval well, so if you create inverted index for field, those operations which depend on the raw data will fallback to use chunk storage, which will bring some performance loss. For example, comparisons between two columns and retrieval of output fields. The inverted index is very easy to be used. Taking below collection as an example: ```python fields = [ FieldSchema(name="pk", dtype=DataType.VARCHAR, is_primary=True, auto_id=False, max_length=100), FieldSchema(name="int8", dtype=DataType.INT8), FieldSchema(name="int16", dtype=DataType.INT16), FieldSchema(name="int32", dtype=DataType.INT32), FieldSchema(name="int64", dtype=DataType.INT64), FieldSchema(name="float", dtype=DataType.FLOAT), FieldSchema(name="double", dtype=DataType.DOUBLE), FieldSchema(name="bool", dtype=DataType.BOOL), FieldSchema(name="varchar", dtype=DataType.VARCHAR, max_length=1000), FieldSchema(name="random", dtype=DataType.DOUBLE), FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim), ] schema = CollectionSchema(fields) collection = Collection("demo", schema) ``` Then we can simply create inverted index for field via: ```python index_type = "INVERTED" collection.create_index("int8", {"index_type": index_type}) collection.create_index("int16", {"index_type": index_type}) collection.create_index("int32", {"index_type": index_type}) collection.create_index("int64", {"index_type": index_type}) collection.create_index("float", {"index_type": index_type}) collection.create_index("double", {"index_type": index_type}) collection.create_index("bool", {"index_type": index_type}) collection.create_index("varchar", {"index_type": index_type}) ``` Then, term query and range query on the field can be speed up automatically by the inverted index: ```python result = collection.query(expr='int64 in [1, 2, 3]', output_fields=["pk"]) result = collection.query(expr='int64 < 5', output_fields=["pk"]) result = collection.query(expr='int64 > 2997', output_fields=["pk"]) result = collection.query(expr='1 < int64 < 5', output_fields=["pk"]) ``` --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2023-12-31 19:50:47 +08:00
yah01	aef483806d	enhance: improve the segcore logs (#29372 ) - remove the streaming logging - refine existing logs fix #29366 --------- Signed-off-by: yah01 <yah2er0ne@outlook.com>	2023-12-23 21:52:43 +08:00
yah01	04b2518ae7	enhance: fix the incorrect init parameter (#29357 ) as the `driver_` field is not used so this doesn't matter for now Signed-off-by: yah01 <yang.cen@zilliz.com>	2023-12-20 20:50:43 +08:00

1 2 3 4 5

228 Commits