30 Commits

Author SHA1 Message Date
cqy123456
32f685ff12
enhance: growing segment support mmap (#32633)
issue: https://github.com/milvus-io/milvus/issues/32984

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-06-18 14:42:00 +08:00
Buqian Zheng
8cb350598c
enhance: Improve GetVectorById of Sparse Float Vector (#33209)
issue: #29419

* sparse float vector to support raw data mmap

For get vector from chunk cache, I added a unit test but marking it as
skipped due to a known issue. I have tested it locally.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-06-12 10:09:55 +08:00
zhagnlu
6ce9df913f
fix: clean vector memory (#33692)
#33533

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-06-07 10:37:54 +08:00
zhagnlu
c6f8a73bb2
enhance: optimize some cache to reduce memory usage (#33534)
#33533

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-06-04 14:09:47 +08:00
Cai Yudong
246586be27
enhance: Unify data type check APIs under internal/core (#31800)
Issue: #22837 

Move and rename following C++ APIs:
datatype_sizeof() ==> GetDataTypeSize()
datatype_name() ==> GetDataTypeName()
datatype_is_vector() / IsVectorType() ==> IsVectorDataType()
datatype_is_variable() ==> IsVariableDataType()
datatype_is_sparse_vector() ==> IsSparseFloatVectorDataType()
datatype_is_string() / IsString() ==> IsDataTypeString()
datatype_is_floating() / IsFloat() ==> IsDataTypeFloat()
datatype_is_binary() ==> IsDataTypeBinary()
datatype_is_json() ==> IsDataTypeJson()
datatype_is_array() ==> IsDataTypeArray()
datatype_is_variable() == IsDataTypeVariable()
datatype_is_integer() / IsIntegral() ==> IsDataTypeInteger()

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2024-04-02 19:15:14 +08:00
chyezh
5655ec4fc0
enhance: add mmap usage metrics (#31708)
issue: #31707

Signed-off-by: chyezh <chyezh@outlook.com>
2024-04-01 11:35:12 +08:00
zhagnlu
cf5109ec17
fix: fix mmap failed when string field all value is empty (#31406)
#31162

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-03-21 10:41:07 +08:00
Buqian Zheng
96cfae55a5
feat: [Sparse Float Vector] segcore to support sparse vector search and get raw vector by id (#30629)
This PR adds the ability to search/get sparse float vectors in segcore,
and added unit tests by modifying lots of existing tests into
parameterized ones.

https://github.com/milvus-io/milvus/issues/29419

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-03-12 09:16:30 -07:00
Buqian Zheng
070dfc77bf
feat: [Sparse Float Vector] segcore basics and index building (#30357)
This commit adds sparse float vector support to segcore with the
following:

1. data type enum declarations
2. Adds corresponding data structures for handling sparse float vectors
in various scenarios, including:
* FieldData as a bridge between the binlog and the in memory data
structures
* mmap::Column as the in memory representation of a sparse float vector
column of a sealed segment;
* ConcurrentVector as the in memory representation of a sparse float
vector of a growing segment which supports inserts.
3. Adds logic in payload reader/writer to serialize/deserialize from/to
binlog
4. Adds the ability to allow the index node to build sparse float vector
index
5. Adds the ability to allow the query node to build growing index for
growing segment and temp index for sealed segment without index built

This commit also includes some code cleanness, comment improvement, and
some unit tests for sparse vector.

https://github.com/milvus-io/milvus/issues/29419

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-03-11 14:45:02 +08:00
Jiquan Long
16b785e149
enhance: optimize the memory usage and speed up loading variable length data (#30787)
/kind improvement
this removes the 1x copying while loading variable length data, also
avoids constructing std::string, which could lead to memory
fragmentation

---------

Signed-off-by: yah01 <yah2er0ne@outlook.com>
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
Co-authored-by: yah01 <yah2er0ne@outlook.com>
2024-02-28 16:45:00 +08:00
Jiquan Long
4459078e0b
fix: wrong num_entities used when mmap variable length data (#30848)
https://github.com/milvus-io/milvus/issues/30728

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-02-28 16:38:56 +08:00
yah01
1b7f1d7067
enhance: mmap data corrupted after seal the column (#29422)
this bug was introduced in recent changes

Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-12-23 15:20:43 +08:00
yah01
7a2374e698
enhance: reduce the memory usage of variable length data (#29387)
add all loading data into a buffer and then copy them into the a
fit-in-size memory

---------

Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-12-21 18:02:42 +08:00
yah01
04b2518ae7
enhance: fix the incorrect init parameter (#29357)
as the `driver_` field is not used so this doesn't matter for now

Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-12-20 20:50:43 +08:00
yah01
8f89e9cf75
enhance: remove all unnecessary string formatting (#29323)
done by two regex expressions:
- `PanicInfo\((.+),[. \n]+fmt::format\(([.\s\S]+?)\)\)`
- `AssertInfo\((.+),[. \n]+fmt::format\(([.\s\S]+?)\)\)`

related: #28811

---------

Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-12-20 10:04:43 +08:00
zhagnlu
a602171d06
enhance: Refactor runtime and expr framework (#28166)
#28165

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2023-12-18 12:04:42 +08:00
yah01
342635ed61
enhance: enable assert method to format arguments (#28812)
for now the assert method in segcore could accept a string information,
too many codes don't print the value they assert.

make it happy
related #28811

---------

Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-12-01 18:04:33 +08:00
yah01
c96d07682e
enhance: enhance the accuracy of memory usage (#28554)
before this, Milvus use container/system's memory info to get the memory
usage, which could be inaccurate.

we allocates the memory by private anon mmap,
then `rss - shared` would be the accurate memory usage

resolve #28553

---------

Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-11-23 15:12:23 +08:00
Enwei Jiao
b80a3e19d3
Add code for PanicInfo (#27364)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-09-27 12:01:28 +08:00
cai.zhang
a362bb1457
Support array datatype (#26369)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2023-09-19 14:23:23 +08:00
yah01
0459a662e4
use MADV_WILLNEED for scalar column data (#27170)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-09-18 18:05:22 +08:00
yihao.dai
bb6711f28c
Add ChunkCache: support get vector from storage (#26142)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-09-15 10:21:20 +08:00
Enwei Jiao
c3f15c6b95
Refactor duplicate error class into one place (#26985)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-09-11 20:43:17 +08:00
yah01
9605c03c3c
Fix the number of rows of column not correct (#26347)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-08-16 13:35:33 +08:00
yah01
127c23d999
Check data consistency after loading (#26312)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-08-14 09:01:32 +08:00
yah01
a173486d2e
Fix calculation of memory usage prediction for mmap mode (#26264)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-08-12 17:19:31 +08:00
yah01
300fef446b
Enable mmap for vector index (#25877)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-08-10 13:59:15 +08:00
yah01
dd5f896dc8
Load batch by batch (#25212)
This will significantly reduce the memory usage while loading
- 1x memory usage and MBs overhead for buffer (memory mode)
- only MBs overhead for buffer (mmap mode)

Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-07-06 13:58:27 +08:00
yah01
cb4b88d5cf
Refactor the column type (#25147)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-06-27 19:50:45 +08:00
xige-16
04082b3de2
Migrate the ability to upload and download binlog to cpp (#22984)
Signed-off-by: xige-16 <xi.ge@zilliz.com>
2023-06-25 14:38:44 +08:00