99 Commits

Author SHA1 Message Date
Cai Yudong
246586be27
enhance: Unify data type check APIs under internal/core (#31800)
Issue: #22837 

Move and rename following C++ APIs:
datatype_sizeof() ==> GetDataTypeSize()
datatype_name() ==> GetDataTypeName()
datatype_is_vector() / IsVectorType() ==> IsVectorDataType()
datatype_is_variable() ==> IsVariableDataType()
datatype_is_sparse_vector() ==> IsSparseFloatVectorDataType()
datatype_is_string() / IsString() ==> IsDataTypeString()
datatype_is_floating() / IsFloat() ==> IsDataTypeFloat()
datatype_is_binary() ==> IsDataTypeBinary()
datatype_is_json() ==> IsDataTypeJson()
datatype_is_array() ==> IsDataTypeArray()
datatype_is_variable() == IsDataTypeVariable()
datatype_is_integer() / IsIntegral() ==> IsDataTypeInteger()

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2024-04-02 19:15:14 +08:00
Buqian Zheng
7fc3094a42
fix: fix growing index data race and properly handle build error (#31170)
issue: https://github.com/milvus-io/milvus/issues/31169

also properly handling index build error by re-create a new index so
that nothing will be left in the previous failed index build attempt.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-03-13 20:19:04 +08:00
Buqian Zheng
96cfae55a5
feat: [Sparse Float Vector] segcore to support sparse vector search and get raw vector by id (#30629)
This PR adds the ability to search/get sparse float vectors in segcore,
and added unit tests by modifying lots of existing tests into
parameterized ones.

https://github.com/milvus-io/milvus/issues/29419

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-03-12 09:16:30 -07:00
Buqian Zheng
070dfc77bf
feat: [Sparse Float Vector] segcore basics and index building (#30357)
This commit adds sparse float vector support to segcore with the
following:

1. data type enum declarations
2. Adds corresponding data structures for handling sparse float vectors
in various scenarios, including:
* FieldData as a bridge between the binlog and the in memory data
structures
* mmap::Column as the in memory representation of a sparse float vector
column of a sealed segment;
* ConcurrentVector as the in memory representation of a sparse float
vector of a growing segment which supports inserts.
3. Adds logic in payload reader/writer to serialize/deserialize from/to
binlog
4. Adds the ability to allow the index node to build sparse float vector
index
5. Adds the ability to allow the query node to build growing index for
growing segment and temp index for sealed segment without index built

This commit also includes some code cleanness, comment improvement, and
some unit tests for sparse vector.

https://github.com/milvus-io/milvus/issues/29419

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-03-11 14:45:02 +08:00
MrPresent-Han
77eb6defb1
feat: support groupby on growing and non-indexed sealed egment(#30307) (#30644)
related: #30308

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2024-02-21 14:02:53 +08:00
yah01
f542bdbf3c
enhance: calc the accurate mem size of segment (#30093)
this stats the real memory size of segment, also reduces the memory
usage in mmap mode
resolve #30095

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-19 12:32:53 +08:00
yah01
6c477ce3a7
enhance: optimize the loading strategy (#29910)
as we have the pool size limit so we don't need to limit the concurrency
manually

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-12 14:26:50 +08:00
Xu Tong
e429965f32
Add float16 approve for multi-type part (#28427)
issue:https://github.com/milvus-io/milvus/issues/22837

Add bfloat16 vector, add the index part of float16 vector.

Signed-off-by: Writer-X <1256866856@qq.com>
2024-01-11 15:48:51 +08:00
MrPresent-Han
9e2e7157e9
feat: support search_group_by for milvus(#25324) (#28983)
related: #25324

Search GroupBy function, used to aggregate result entities based on a
specific scalar column.
several points to mention:

1. Temporarliy, the whole groupby is implemented separated from
iterative expr framework **for the first period**
2. In the long term, the groupBy operation will be incorporated into the
iterative expr framework:https://github.com/milvus-io/milvus/pull/28166
3. This pr includes some unrelated mocked interface regarding alterIndex
due to some unworth-to-mention reasons. All these un-associated content
will be removed before the final pr is merged. This version of pr is
only for review
4. All other related details were commented in the files comparison

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2024-01-05 15:50:47 +08:00
yah01
aef483806d
enhance: improve the segcore logs (#29372)
- remove the streaming logging
- refine existing logs

fix #29366

---------

Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-12-23 21:52:43 +08:00
zhagnlu
a602171d06
enhance: Refactor runtime and expr framework (#28166)
#28165

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2023-12-18 12:04:42 +08:00
Bingyi Sun
36f69ea031
feat: integrate storagev2 in building index of segcore (#28768)
issue: https://github.com/milvus-io/milvus/issues/28655

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2023-12-05 16:48:54 +08:00
congqixia
1dc086496f
fix: schema->size() check logic with system field (#28802)
Now segcore load system field info as well, the growing segment
assertion shall not pass with "+ 2" value
This will cause all growing segments load failure
Fix #28801
Related to #28478
See also #28524

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-11-29 22:40:28 +08:00
yah01
02c5a649cf
enhance: store system fields in segcore (#28524)
we need the system fields info for some usacase
fix: #28523

---------

Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-11-21 09:28:22 +08:00
yah01
f7d2ab6677
enhance: reduce 1x copy for variable length field while retrieving (#28345)
- Reduce 1x copy for varchar/string/JSON/array types while retrieving
- Reduce 1x copy for int8/int16 while retrieving

Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-11-15 18:08:20 +08:00
yah01
267c67dfee
enhance: reduce 1x copy while retrieving data from growing segment (#28323)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-11-10 15:44:22 +08:00
cqy123456
4fbe3c9142
replace loaded binlog with binlog index for search performance (#27673)
Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2023-11-01 02:20:15 +08:00
yah01
f212158d61
Fix delete records timestamp may be reordered (#27941)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-10-27 10:24:10 +08:00
Enwei Jiao
b80a3e19d3
Add code for PanicInfo (#27364)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-09-27 12:01:28 +08:00
yah01
93e2eb78c9
Delete only if primary keys exist (#25292)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-09-20 19:03:25 +08:00
cai.zhang
a362bb1457
Support array datatype (#26369)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2023-09-19 14:23:23 +08:00
Enwei Jiao
0afdfdb9af
Remove other Exceptions, keeps SegcoreError only (#27017)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-09-14 14:05:20 +08:00
yah01
3203ce1654
Reduce copy while retrieving primary keys (#26616)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-09-11 21:31:18 +08:00
Enwei Jiao
c3f15c6b95
Refactor duplicate error class into one place (#26985)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-09-11 20:43:17 +08:00
Xu Tong
9166011c4a
Add float16 vector (#25852)
Signed-off-by: Writer-X <1256866856@qq.com>
2023-09-08 10:03:16 +08:00
yah01
b475f25042
Remove invalid offset check while filling data (#26666)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-08-30 09:52:27 +08:00
yah01
ba882b49b6
Optimize query/search on growing segment while output vector field (#26542)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-08-24 09:46:24 +08:00
xige-16
1055c90456
Add default retrieve limit (#24782)
Signed-off-by: xige-16 <xi.ge@zilliz.com>
2023-08-10 14:11:15 +08:00
smellthemoon
28cea5e763
Fix func typo (#26167)
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2023-08-07 19:23:08 +08:00
MrPresent-Han
5634ba777d
add new threadpool with various priority to avoid deadlock(#25781) (#26028)
Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2023-08-03 09:31:07 +08:00
Jiquan Long
5c1f79dc54
Push down the limit operator to segcore (#25959)
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2023-08-01 20:29:05 +08:00
yihao.dai
2d1ed6af45
Use PK index for string data type (#25390)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-07-07 15:26:26 +08:00
yah01
dd5f896dc8
Load batch by batch (#25212)
This will significantly reduce the memory usage while loading
- 1x memory usage and MBs overhead for buffer (memory mode)
- only MBs overhead for buffer (mmap mode)

Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-07-06 13:58:27 +08:00
foxspy
31173727b2
growing segment index memory opt & get vector bugfix (#25272)
Signed-off-by: xianliang <xianliang.li@zilliz.com>
2023-07-05 00:04:25 +08:00
xige-16
04082b3de2
Migrate the ability to upload and download binlog to cpp (#22984)
Signed-off-by: xige-16 <xi.ge@zilliz.com>
2023-06-25 14:38:44 +08:00
yah01
a413842e38
Fix deleted data is still visible (#24849)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-06-16 17:16:41 +08:00
zhagnlu
c5b1533fdc
Change segcore search_id from travelling all bits to select true bits(#24659) (#24800)
Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2023-06-16 16:48:44 +08:00
zhagnlu
f3f3f8a849
Segcore retrieve by pk optimazation (#24659) (#24660)
Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2023-06-15 15:04:39 +08:00
yah01
60fdd7e4f4
Introduce simdjson (#23644)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-04-26 10:30:34 +08:00
foxspy
6f4ed517de
add growing segment index (#23615)
Signed-off-by: xianliang <xianliang.li@zilliz.com>
2023-04-26 10:14:41 +08:00
yah01
546080dcdd
Support to retrieve json (#23563)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-04-21 11:46:32 +08:00
yah01
ab1ac69712
Fix inserting into growing segment copies data (#22792)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-03-16 20:11:55 +08:00
yah01
bdd6bc7695
Re-format cpp code (#22513)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-03-02 15:55:49 +08:00
smellthemoon
f5ab719f21
timestamp decided if the pks were the same (#20166)
Signed-off-by: lixinguo <xinguo.li@zilliz.com>

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2022-11-21 10:55:10 +08:00
xige-16
428840178c
Support diskann index for vector field (#19093)
Signed-off-by: xige-16 <xi.ge@zilliz.com>

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-09-21 20:16:51 +08:00
Cai Yudong
305601ad25
Add API SearchOnSealedChunk() to search on sealed segment without index (#18830)
Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>
2022-08-25 18:16:55 +08:00
Cai Yudong
dcf45df029
Optimize API vector_search parameter in segcore (#18827)
Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>
2022-08-25 16:16:54 +08:00
xige-16
54d17bc5da
Fix query too slow when insert multi repeated pk data (#18231)
Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-07-13 10:22:26 +08:00
Cai Yudong
e78269f450
Optimize search related interface in segcore (#17568)
Signed-off-by: yudong.cai <yudong.cai@zilliz.com>
2022-06-23 18:16:13 +08:00
xige-16
36ad989590
Fix segOffset grater than insert barrier when mark delete (#17444)
Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-06-10 20:02:08 +08:00