147 Commits

Author SHA1 Message Date
Bingyi Sun
a75bb85f3a
feat: support chunked column for sealed segment (#35764)
This PR splits sealed segment to chunked data to avoid unnecessary
memory copy and save memory usage when loading segments so that loading
can be accelerated.

To support rollback to previous version, we add an option
`multipleChunkedEnable` which is false by default.

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-10-12 15:04:52 +08:00
Rijin-N
a05a37a583
enhance: GCS native support (GCS implemented using Google Cloud Storage libraries) (#36214)
Native support for Google cloud storage using the Google Cloud Storage
libraries. Authentication is performed using GCS service account
credentials JSON.

Currently, Milvus supports Google Cloud Storage using S3-compatible APIs
via the AWS SDK. This approach has the following limitations:

1. Overhead: Translating requests between S3-compatible APIs and GCS can
introduce additional overhead.
2. Compatibility Limitations: Some features of the original S3 API may
not fully translate or work as expected with GCS.

To address these limitations, This enhancement is needed.

Related Issue: #36212
2024-09-30 13:23:32 +08:00
Buqian Zheng
8495bc6bbc
fix: fix broken Sparse Float Vector raw data mmap (#36183)
issue: https://github.com/milvus-io/milvus/issues/36182

* improved `Column.h` to make the code much more readable and
maintainable, and added detailed comments.
* fixed an issue where `ArrayColumn::NumRows()` always returns 0 when
the mmap backing storage is a file.
* removed unused `ColumnBase` constructors and unnecessary members so we
don't get confused.
* Updated `test_chunk_cache.cpp` to make the tests parameterized: to
test both mmap enabled and disabled. Added sparse field in the test to
add coverage.
* re-enabled test `Sealed::GetSparseVectorFromChunkCache`. 
* But 2 other disabled tests `Sealed::WarmupChunkCache` and
`Sealed::GetVectorFromChunkCache` remain disabled, there seems to be
errors. @bigsheeper PTAL.

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-09-25 18:59:13 +08:00
yihao.dai
8cda48a96a
enhance: Use mmap.scalarIndex config for text index (#36400)
issue: https://github.com/milvus-io/milvus/issues/35273

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-09-24 12:21:13 +08:00
jaime
2ff3765058
enhance: catch std::stoi exception and improve error msg (#36267)
issue: #36255

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-09-14 16:17:08 +08:00
Jiquan Long
89bf226f0b
feat: support keyword text match (#35923)
fix: #35922

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-09-10 15:11:08 +08:00
cqy123456
560e8e70b0
enhance: reduce mmap_rss after chunkcache warmup (#35974)
related pr: https://github.com/milvus-io/milvus/pull/35965

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-09-05 18:07:05 +08:00
foxspy
9da86529a7
enhance: Add disk filemananger parallel load control to reduce the memory consumption (#35281)
issue: #35280 
add parallel control to limit the memory consumption during index file
loading

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2024-09-03 18:01:03 +08:00
smellthemoon
a3f2f044d6
fix: not set nullable when stream writer write headers (#35799)
#35802

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-08-29 20:59:00 +08:00
zhagnlu
4d2f96c760
enhance: support bitmap mmap (#35399)
#32900

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-08-27 16:34:59 +08:00
yihao.dai
f2b83d316b
enhance: Support memory mode chunk cache (#35347)
Chunk cache supports loading raw vectors into memory.

issue: https://github.com/milvus-io/milvus/issues/35273

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-08-25 15:42:58 +08:00
Zhen Ye
a773836b89
enhance: optimize milvus core building (#35610)
issue: #35549,#35611,#35633

- remove milvus_segcore milvus_indexbuilder..., add libmilvus_core
- core building only link once
- move opendal compilation into cmake
- fix odr

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-08-23 12:35:02 +08:00
Chun Han
337e065902
fix: querynode hang when failing to allocate disk space for mmap(#35184) (#35187)
related: #35184

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2024-08-19 15:30:55 +08:00
smellthemoon
80dbe87759
enhance: support null value in index (#35238)
#31728

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-08-16 15:30:54 +08:00
Jiquan Long
91df03afe8
feat: put inverted index into ram (#35222)
fix: https://github.com/milvus-io/milvus/issues/35224

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-08-06 11:54:16 +08:00
zhenshan.cao
aa247f192d
enhance: remove unused code for StorageV2 (#35132)
issue: https://github.com/milvus-io/milvus/issues/34168

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-08-01 12:08:13 +08:00
congqixia
de8a266d8a
enhance: Enable linux code checker (#35084)
See also #34483

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-30 15:53:51 +08:00
smellthemoon
5616b7e8d2
enhance: support null in c data_datacodec and load null value (#32183)
1. support read and write null in segcore
    will store valid_data(use uint8_t type to save memory) in fieldData.
2. support load null
binlog reader read and write data into column(sealed segment),
insertRecord(growing segment). In sealed segment, store valid_data
directly. In growing segment, considering prior implementation and easy
code reading, it covert uint8_t to fbvector<bool>, which may optimize in
future.
3.  retrieve valid_data.
    parse valid_data in search/query.
#31728

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-07-23 16:07:51 +08:00
foxspy
8e64bf929c
enhance: add scalar filtering and vector search latency metrics (#34785)
add scalar filtering and vector search latency metrics to distinguish
the cost of scalar filtering.
To add metrics in query chain, add a monitor module and move the metric
files from original storage module.
issue: #34780

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2024-07-19 14:01:39 +08:00
zhagnlu
f1b2f7b640
enhance: refactor bitmap index and internal hybrid index (#34450)
#32900

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-07-18 10:39:42 +08:00
smellthemoon
ef3ced8138
fix: descriptor event in previous version not has nullable to parse error (#34235)
#34176

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-07-01 16:38:06 +08:00
congqixia
14e827dc6c
fix: Implement singleflight for segcore ChunkCache (#34250)
See also #34249

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-01 11:46:06 +08:00
Patrick Weizhi Xu
b961767005
enhance: support integral type for MV and skip MV if there is only one category (#33161)
issue: #29892

---------

Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>
2024-06-24 10:20:01 +08:00
zhagnlu
0d7ea8ec42
enhance: Enhance and correct exception module (#33705)
#33704

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-06-23 21:22:01 +08:00
cqy123456
dc4437ff82
enhance: use segment id and type to register in MmapChunkManager and opt malloc in variableChunk (#33993)
issue: https://github.com/milvus-io/milvus/issues/32984

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-06-20 17:42:02 +08:00
smellthemoon
2a1356985d
enhance: support null in go payload (#32296)
#31728

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-06-19 17:08:00 +08:00
Gao
0d20303e54
fix: fix binary vector data size (#33750)
issue: https://github.com/milvus-io/milvus/issues/22837

- fix byte size wrong for binary vectors
- fix the expect/actual error msg

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2024-06-18 21:39:59 +08:00
cqy123456
32f685ff12
enhance: growing segment support mmap (#32633)
issue: https://github.com/milvus-io/milvus/issues/32984

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-06-18 14:42:00 +08:00
zhagnlu
d43ec4db0b
enhance: support array bitmap index (#33527)
#32900

---------

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-06-16 21:51:58 +08:00
Buqian Zheng
47b04ea167
enhance: support sparse cardinal hnsw index (#33656)
issue: #29419

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-06-12 16:57:55 +08:00
Buqian Zheng
8cb350598c
enhance: Improve GetVectorById of Sparse Float Vector (#33209)
issue: #29419

* sparse float vector to support raw data mmap

For get vector from chunk cache, I added a unit test but marking it as
skipped due to a known issue. I have tested it locally.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-06-12 10:09:55 +08:00
cai.zhang
27cc9f2630
enhance: Support analyze data (#33651)
issue: #30633

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Co-authored-by: chasingegg <chao.gao@zilliz.com>
2024-06-06 17:37:51 +08:00
wei liu
b69740c8f3
enhance: Remove unnecessary log info during load segment (#33663)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-06-06 14:13:50 +08:00
cqy123456
703fc73f71
enhance: disk index support binary vector (#33631)
issue:https://github.com/milvus-io/milvus/issues/22837
related https://github.com/milvus-io/milvus/pull/33575

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-06-05 19:37:57 +08:00
Jiquan Long
0c5d8660aa
feat: support inverted index for array (#33452)
issue: https://github.com/milvus-io/milvus/issues/27704

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-05-31 09:47:47 +08:00
cai.zhang
32d3e22d7d
fix: Throw an exception after all the threads in thread pool finished (#32810)
issue: #32487

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-05-23 11:47:40 +08:00
Bingyi Sun
fecd9c21ba
feat: LRU cache implementation (#32567)
issue: https://github.com/milvus-io/milvus/issues/32783
This pr is the implementation of lru cache on branch lru-dev.

Signed-off-by: sunby <sunbingyi1992@gmail.com>
Co-authored-by: chyezh <chyezh@outlook.com>
Co-authored-by: MrPresent-Han <chun.han@zilliz.com>
Co-authored-by: Ted Xu <ted.xu@zilliz.com>
Co-authored-by: jaime <yun.zhang@zilliz.com>
Co-authored-by: wayblink <anyang.wang@zilliz.com>
2024-05-06 20:29:30 +08:00
PowderLi
6289f3a9eb
fix: build milvus in rockylinux8 (#32619)
issue: #32299

1. xz utils recovers
2. forget to install ninja

Signed-off-by: PowderLi <min.li@zilliz.com>
2024-04-29 14:53:26 +08:00
smellthemoon
4fb8044a27
enhance: delete some no lint code (#32182)
#31728

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-04-26 14:15:26 +08:00
cqy123456
aba4993c6c
fix: fix some fp16/bf16 code miss in segcore. (#31771)
issue:https://github.com/milvus-io/milvus/issues/22837

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-04-07 14:13:16 +08:00
Cai Yudong
246586be27
enhance: Unify data type check APIs under internal/core (#31800)
Issue: #22837 

Move and rename following C++ APIs:
datatype_sizeof() ==> GetDataTypeSize()
datatype_name() ==> GetDataTypeName()
datatype_is_vector() / IsVectorType() ==> IsVectorDataType()
datatype_is_variable() ==> IsVariableDataType()
datatype_is_sparse_vector() ==> IsSparseFloatVectorDataType()
datatype_is_string() / IsString() ==> IsDataTypeString()
datatype_is_floating() / IsFloat() ==> IsDataTypeFloat()
datatype_is_binary() ==> IsDataTypeBinary()
datatype_is_json() ==> IsDataTypeJson()
datatype_is_array() ==> IsDataTypeArray()
datatype_is_variable() == IsDataTypeVariable()
datatype_is_integer() / IsIntegral() ==> IsDataTypeInteger()

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2024-04-02 19:15:14 +08:00
PowderLi
d299fa502e
fix: use milvus-io/vcpkg (#31770)
issue: #31769
GitHub Disables The XZ Repository because of CVE-2024-3094

Signed-off-by: PowderLi <min.li@zilliz.com>
2024-04-01 15:01:13 +08:00
chyezh
5655ec4fc0
enhance: add mmap usage metrics (#31708)
issue: #31707

Signed-off-by: chyezh <chyezh@outlook.com>
2024-04-01 11:35:12 +08:00
groot
5be395354c
fix: minio ssl compatible issue (#31607)
issue: https://github.com/milvus-io/milvus/issues/30709

Signed-off-by: yhmo <yihua.mo@zilliz.com>
2024-03-27 14:41:20 +08:00
groot
c81909bfab
enhance: Support MinIO TLS connection (#31311)
issue: https://github.com/milvus-io/milvus/issues/30709
pr: #31292

Signed-off-by: yhmo <yihua.mo@zilliz.com>
Co-authored-by: Chen Rao <chenrao317328@163.com>
2024-03-21 11:15:20 +08:00
Buqian Zheng
070dfc77bf
feat: [Sparse Float Vector] segcore basics and index building (#30357)
This commit adds sparse float vector support to segcore with the
following:

1. data type enum declarations
2. Adds corresponding data structures for handling sparse float vectors
in various scenarios, including:
* FieldData as a bridge between the binlog and the in memory data
structures
* mmap::Column as the in memory representation of a sparse float vector
column of a sealed segment;
* ConcurrentVector as the in memory representation of a sparse float
vector of a growing segment which supports inserts.
3. Adds logic in payload reader/writer to serialize/deserialize from/to
binlog
4. Adds the ability to allow the index node to build sparse float vector
index
5. Adds the ability to allow the query node to build growing index for
growing segment and temp index for sealed segment without index built

This commit also includes some code cleanness, comment improvement, and
some unit tests for sparse vector.

https://github.com/milvus-io/milvus/issues/29419

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-03-11 14:45:02 +08:00
zhagnlu
976b6fc0e4
enhance: change opendal as compile configurable (#30384)
#30373

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-02-20 19:16:52 +08:00
congqixia
18c351efa6
fix: Prevent ChunkCache use absolute path in All-in-one mode (#30666)
See also #30651

Append operator of `std::filesystem::path` will replace whole path when
the param of "/" operation is an absolute path.

In "All-in-one" mode, this shall cause ChunkCache removing the original
vector data file when building chunk cache during/after load procedure.

This PR changes the ChunkCache path generation logic to a separate
function in which will check whether the file path is absolute or not.
If the file path is absolute, it removes the root path prefix and return
concatenated file path.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-02-19 20:58:51 +08:00
PowderLi
5cf9bb236e
enhance: restful support import jobs (#30343)
issue: #28521 #29732

include
1. list collection's import jobs
2. create a new import job
3. get the progress of an import job

fix:
1. mix the order of dbName & collectionName #29728
2. trace log keep the same as v1
3. support traceID
4. azure precheck, blob name cannot end with / #29703

---------

Signed-off-by: PowderLi <min.li@zilliz.com>
2024-01-31 17:57:04 +08:00
yah01
878c4c9463
enhance: limit the max pool size to 16 (#30371)
according to our benchmark, concurrency level 16 is enough to fully
utilize the object storage network bandwidth

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-31 14:13:06 +08:00