milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-07 19:31:51 +08:00

Author	SHA1	Message	Date
MrPresent-Han	9e2e7157e9	feat: support search_group_by for milvus(#25324 ) (#28983 ) related: #25324 Search GroupBy function, used to aggregate result entities based on a specific scalar column. several points to mention: 1. Temporarliy, the whole groupby is implemented separated from iterative expr framework for the first period 2. In the long term, the groupBy operation will be incorporated into the iterative expr framework:https://github.com/milvus-io/milvus/pull/28166 3. This pr includes some unrelated mocked interface regarding alterIndex due to some unworth-to-mention reasons. All these un-associated content will be removed before the final pr is merged. This version of pr is only for review 4. All other related details were commented in the files comparison Signed-off-by: MrPresent-Han <chun.han@zilliz.com>	2024-01-05 15:50:47 +08:00
Jiquan Long	3f46c6d459	feat: support inverted index (#28783 ) issue: https://github.com/milvus-io/milvus/issues/27704 Add inverted index for some data types in Milvus. This index type can save a lot of memory compared to loading all data into RAM and speed up the term query and range query. Supported: `INT8`, `INT16`, `INT32`, `INT64`, `FLOAT`, `DOUBLE`, `BOOL` and `VARCHAR`. Not supported: `ARRAY` and `JSON`. Note: - The inverted index for `VARCHAR` is not designed to serve full-text search now. We will treat every row as a whole keyword instead of tokenizing it into multiple terms. - The inverted index don't support retrieval well, so if you create inverted index for field, those operations which depend on the raw data will fallback to use chunk storage, which will bring some performance loss. For example, comparisons between two columns and retrieval of output fields. The inverted index is very easy to be used. Taking below collection as an example: ```python fields = [ FieldSchema(name="pk", dtype=DataType.VARCHAR, is_primary=True, auto_id=False, max_length=100), FieldSchema(name="int8", dtype=DataType.INT8), FieldSchema(name="int16", dtype=DataType.INT16), FieldSchema(name="int32", dtype=DataType.INT32), FieldSchema(name="int64", dtype=DataType.INT64), FieldSchema(name="float", dtype=DataType.FLOAT), FieldSchema(name="double", dtype=DataType.DOUBLE), FieldSchema(name="bool", dtype=DataType.BOOL), FieldSchema(name="varchar", dtype=DataType.VARCHAR, max_length=1000), FieldSchema(name="random", dtype=DataType.DOUBLE), FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim), ] schema = CollectionSchema(fields) collection = Collection("demo", schema) ``` Then we can simply create inverted index for field via: ```python index_type = "INVERTED" collection.create_index("int8", {"index_type": index_type}) collection.create_index("int16", {"index_type": index_type}) collection.create_index("int32", {"index_type": index_type}) collection.create_index("int64", {"index_type": index_type}) collection.create_index("float", {"index_type": index_type}) collection.create_index("double", {"index_type": index_type}) collection.create_index("bool", {"index_type": index_type}) collection.create_index("varchar", {"index_type": index_type}) ``` Then, term query and range query on the field can be speed up automatically by the inverted index: ```python result = collection.query(expr='int64 in [1, 2, 3]', output_fields=["pk"]) result = collection.query(expr='int64 < 5', output_fields=["pk"]) result = collection.query(expr='int64 > 2997', output_fields=["pk"]) result = collection.query(expr='1 < int64 < 5', output_fields=["pk"]) ``` --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2023-12-31 19:50:47 +08:00
zhagnlu	a602171d06	enhance: Refactor runtime and expr framework (#28166 ) #28165 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2023-12-18 12:04:42 +08:00
Xu Tong	8ec85f5f4c	Add template for VectorMemIndex (#28324 ) Signed-off-by: Writer-X <1256866856@qq.com>	2023-11-11 13:20:22 +08:00
yah01	267c67dfee	enhance: reduce 1x copy while retrieving data from growing segment (#28323 ) Signed-off-by: yah01 <yah2er0ne@outlook.com>	2023-11-10 15:44:22 +08:00
yah01	dc89730a50	Support collection-level mmap control (#26901 ) Signed-off-by: yah01 <yah2er0ne@outlook.com>	2023-11-02 23:52:16 +08:00
Enwei Jiao	f8dd589755	Refactor collection's cgo call (#28055 ) Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>	2023-11-02 13:02:13 +08:00
Gao	7a65b6fb85	Limit faiss ivf index build thread num and fix ut (#27567 ) Signed-off-by: chasingegg <chao.gao@zilliz.com>	2023-10-11 10:33:33 +08:00
Enwei Jiao	b80a3e19d3	Add code for PanicInfo (#27364 ) Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>	2023-09-27 12:01:28 +08:00
foxspy	5db4a0489e	dynamic index version control (#27335 ) Co-authored-by: longjiquan <jiquan.long@zilliz.com>	2023-09-25 21:39:27 +08:00
foxspy	370b6fde58	milvus support multi index engine (#27178 ) Co-authored-by: longjiquan <jiquan.long@zilliz.com>	2023-09-22 09:59:26 +08:00
cai.zhang	a362bb1457	Support array datatype (#26369 ) Signed-off-by: cai.zhang <cai.zhang@zilliz.com>	2023-09-19 14:23:23 +08:00
yihao.dai	bb6711f28c	Add ChunkCache: support get vector from storage (#26142 ) Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2023-09-15 10:21:20 +08:00
Enwei Jiao	0afdfdb9af	Remove other Exceptions, keeps SegcoreError only (#27017 ) Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>	2023-09-14 14:05:20 +08:00
cqy123456	0ff4ddc76c	remove VectorMemNMIndex (#27000 ) Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>	2023-09-12 17:13:18 +08:00
Enwei Jiao	c3f15c6b95	Refactor duplicate error class into one place (#26985 ) Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>	2023-09-11 20:43:17 +08:00
Xu Tong	9166011c4a	Add float16 vector (#25852 ) Signed-off-by: Writer-X <1256866856@qq.com>	2023-09-08 10:03:16 +08:00
cai.zhang	c073aa0dc3	Fix bug for json_contains_all has multiple array elements (#26446 ) Signed-off-by: cai.zhang <cai.zhang@zilliz.com>	2023-08-18 22:36:19 +08:00
xige-16	5b8d716cbc	Add ut for growing segment load binlog (#26268 ) Signed-off-by: xige-16 <xi.ge@zilliz.com>	2023-08-13 20:41:31 +08:00
cai.zhang	a0198ce8ae	Support json contains feature (#25384 ) Signed-off-by: cai.zhang <cai.zhang@zilliz.com>	2023-08-11 17:09:30 +08:00
zhagnlu	411f9ac823	Upgrade minio-go and add region and virtual host config for segcore chunk manager (#26194 ) Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2023-08-11 10:37:36 +08:00
yah01	dd5f896dc8	Load batch by batch (#25212 ) This will significantly reduce the memory usage while loading - 1x memory usage and MBs overhead for buffer (memory mode) - only MBs overhead for buffer (mmap mode) Signed-off-by: yah01 <yang.cen@zilliz.com>	2023-07-06 13:58:27 +08:00
xige-16	04082b3de2	Migrate the ability to upload and download binlog to cpp (#22984 ) Signed-off-by: xige-16 <xi.ge@zilliz.com>	2023-06-25 14:38:44 +08:00
zhagnlu	f60b839127	Support element in json array in segcore part(#24677 ) (#24829 ) Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2023-06-14 14:38:37 +08:00
zhagnlu	113f9a0ebc	Support SIMD of several Expr (#23715 ) (#23717 ) Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2023-05-12 14:11:20 +08:00
cai.zhang	9715a850fa	Support expr with json field (#23804 ) Signed-off-by: cai.zhang <cai.zhang@zilliz.com>	2023-05-10 10:19:19 +08:00
yah01	62eea5286f	Support to filter with json expr (#23739 ) Signed-off-by: yah01 <yang.cen@zilliz.com>	2023-04-30 20:36:39 +08:00
yihao.dai	092d743917	Add support for getting vectors by ids (#23450 ) Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2023-04-23 09:00:32 +08:00
yah01	546080dcdd	Support to retrieve json (#23563 ) Signed-off-by: yah01 <yang.cen@zilliz.com>	2023-04-21 11:46:32 +08:00
Cai Yudong	ef63e64ded	Remove ANNOY index type (#23189 ) Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>	2023-04-04 16:30:27 +08:00
Cai Yudong	0e9a4478e3	Remove useless index mode (#22934 ) Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>	2023-03-23 21:39:59 +08:00
Cai Yudong	ab3cbdfc61	Partial change to prepare for GPU index type support (#22591 ) Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>	2023-03-14 23:21:56 +08:00
Jiquan Long	a36fefb009	Fix cpplint (#22657 ) Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2023-03-10 09:47:54 +08:00
yah01	7478e44911	Support using mmap to load data (#22052 ) Signed-off-by: yah01 <yang.cen@zilliz.com>	2023-03-01 18:07:49 +08:00
smellthemoon	9e0ec15436	Support range search (#21652 ) Signed-off-by: smellthemoon <xinguo.li@zilliz.com> Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: jaime <yun.zhang@zilliz.com>	2023-02-21 09:48:32 +08:00
presburger	9950cacd10	support knowhere 2.0 (#21857 ) Signed-off-by: Yusheng.Ma <Yusheng.Ma@zilliz.com>	2023-02-10 14:24:32 +08:00
Jiquan Long	d7156812c1	Try using ASAN in ci ut (#21089 ) Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2022-12-29 15:29:31 +08:00
Enwei Jiao	958e94f6f0	Use Conan as c++ package manager (#19920 ) Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com> Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>	2022-11-23 10:39:11 +08:00
xige-16	4a66965df4	Delete RAW_DATA copy when load IVF_FLAT index data (#20274 ) Signed-off-by: xige-16 <xi.ge@zilliz.com> Signed-off-by: xige-16 <xi.ge@zilliz.com>	2022-11-05 17:33:05 +08:00
xige-16	158787811e	Move assemble/disassemble func to core (#19420 ) Signed-off-by: xige-16 <xi.ge@zilliz.com> Signed-off-by: xige-16 <xi.ge@zilliz.com>	2022-10-16 21:17:25 +08:00
xige-16	a1db9038fb	Move disk index params to config file (#19714 ) Signed-off-by: xige-16 <xi.ge@zilliz.com> Signed-off-by: xige-16 <xi.ge@zilliz.com>	2022-10-14 17:51:24 +08:00
xige-16	8c9c1672ae	Assign different storage config for indexes (#19517 ) Signed-off-by: xige-16 <xi.ge@zilliz.com> Signed-off-by: xige-16 <xi.ge@zilliz.com>	2022-10-14 14:45:23 +08:00
Cai Yudong	87d78a4a85	Ignore cases when comparing metric type in segcore (#19437 ) Signed-off-by: yudong.cai <yudong.cai@zilliz.com> Signed-off-by: yudong.cai <yudong.cai@zilliz.com>	2022-09-26 17:58:52 +08:00
xige-16	428840178c	Support diskann index for vector field (#19093 ) Signed-off-by: xige-16 <xi.ge@zilliz.com> Signed-off-by: xige-16 <xi.ge@zilliz.com>	2022-09-21 20:16:51 +08:00
Cai Yudong	686b0ce796	Upgrade to knowhere-v1.3.0, remove following index support: (#18935 ) - IVF_SQ8H - RHNSW_FLAT/RHNSW_PQ/RHNSW_SQ - NGT - NSG - SPTAG Signed-off-by: yudong.cai <yudong.cai@zilliz.com> Signed-off-by: yudong.cai <yudong.cai@zilliz.com>	2022-09-05 10:41:11 +08:00
zhenshan.cao	a287a2b3fd	Return empty result in advance if all data filtered out (#18329 ) (#18438 ) Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2022-07-28 12:36:30 +08:00
Jeng.Gwan	638f6c36e9	Support to get real row count of segment (#18115 ) Signed-off-by: xaxys <zheng.guan@zilliz.com>	2022-07-18 09:58:28 +08:00
zhagnlu	257da153ce	Fix core dump when nq has no topk result (#17923 ) (#18051 ) Signed-off-by: zhagnlu <lu.zhang@zilliz.com> Co-authored-by: zhagnlu <lu.zhang@zilliz.com>	2022-07-05 19:48:20 +08:00
Jiquan Long	6954a5ba3e	Fix search successfully with invalid metric type (#17977 ) Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2022-07-01 22:28:23 +08:00
Cai Yudong	a001412e12	Replace faiss::MetricType with knowhere::MetricType (#17891 ) Signed-off-by: yudong.cai <yudong.cai@zilliz.com>	2022-06-29 14:20:19 +08:00

1 2 3

116 Commits