71 Commits

Author SHA1 Message Date
Buqian Zheng
640a49ffb6
fix: fix chunk cache madvise when sparse raw data is mmaped (#39145)
instead of marking as not supported,
`ChunkedSparseFloatColumn::DataByteSize` can simply use the impl of
super class.

issue: https://github.com/milvus-io/milvus/issues/39158

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-01-13 10:34:57 +08:00
Cai Yudong
d6206ad2de
fix: Remove duplicated Macro definition (#39076)
Issue: #39102

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2025-01-09 15:26:56 +08:00
Chun Han
3739446a33
enhance: refine array view to optimize memory usage(#38736) (#38808)
related: #38736

700m data, array_length=10
non-mmap_offsets_uint64: 2.0G
mmap_offsets_uint64: 1.1G
mmap_offsets_uint32: 880MB

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-01-07 13:26:55 +08:00
congqixia
19052ef3e5
enhance: Add buffered writer to reduce fwrite syscall (#38570)
Related to previous PR #38157

If mmapped row is too small, frequent fwrite call still cost too much
cpu time for context switching. This PR add buffered write to avoid this
bad case with extra buffer per variable field.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-12-27 12:20:50 +08:00
Ted Xu
acc8fb7af6
enhance: eliminate compile warnings (part2) (#38535)
See #38435

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-12-25 15:30:50 +08:00
Chun Han
decdfdae10
fix: growing-groupby-crush(#38533) (#38538)
related: #38533

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2024-12-17 21:05:12 +08:00
Ted Xu
4919ccf543
enhance: eliminate compile warnings (#38420)
See: #38435

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-12-16 09:58:43 +08:00
Gao
994fc544e7
enhance: support iterative filter execution (#37363)
issue: #37360

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2024-12-11 11:32:44 +08:00
tinswzy
262f6db3d8
enhance: Add mmap file usage metric (#38193)
issue: #38156  Add mmap file usage metric

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2024-12-04 16:12:47 +08:00
congqixia
767b7e6218
enhance: Use fdopen, fwrite to reduce direct syscall (#38157)
`File.Write` and `File.WriteInt` use `write`, which may be just direct
syscall in some systems. When mappding field data and write line by
line, this could cost lost of CPU time when the row number is large.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-12-03 15:24:39 +08:00
smellthemoon
3d28d99411
fix: to use the correct offset in span (#37780)
#37734

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-11-18 21:56:30 +08:00
Bingyi Sun
65d3c6622a
enhance: Optimize GetChunkIDByOffset and add ut (#37704)
Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-11-15 14:16:31 +08:00
Bingyi Sun
40ba5a3414
fix: fix chunked segment term filter expression and add ut (#37392)
issue: https://github.com/milvus-io/milvus/issues/37143

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-11-07 11:04:19 -08:00
cai.zhang
625b6176cd
fix: Search for pk using raw data to reduce the overhead caused by views (#37202)
issue: #37152

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-11-05 20:36:24 +08:00
Bingyi Sun
bd04cac4b3
fix: fix group by on chunked segment (#37292)
https://github.com/milvus-io/milvus/issues/37244

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-11-05 17:12:23 +08:00
zhenshan.cao
63843dce33
fix: Fix conan gdal building problem (#37338)
issue:https://github.com/milvus-io/milvus/issues/27576

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-10-31 21:04:16 +08:00
Hao Tan
67c4340565
feat: Geospatial Data Type and GIS Function Support for milvus server (#35990)
issue:https://github.com/milvus-io/milvus/issues/27576

# Main Goals
1. Create and describe collections with geospatial fields, enabling both
client and server to recognize and process geo fields.
2. Insert geospatial data as payload values in the insert binlog, and
print the values for verification.
3. Load segments containing geospatial data into memory.
4. Ensure query outputs can display geospatial data.
5. Support filtering on GIS functions for geospatial columns.

# Solution
1. **Add Type**: Modify the Milvus core by adding a Geospatial type in
both the C++ and Go code layers, defining the Geospatial data structure
and the corresponding interfaces.
2. **Dependency Libraries**: Introduce necessary geospatial data
processing libraries. In the C++ source code, use Conan package
management to include the GDAL library. In the Go source code, add the
go-geom library to the go.mod file.
3. **Protocol Interface**: Revise the Milvus protocol to provide
mechanisms for Geospatial message serialization and deserialization.
4. **Data Pipeline**: Facilitate interaction between the client and
proxy using the WKT format for geospatial data. The proxy will convert
all data into WKB format for downstream processing, providing column
data interfaces, segment encapsulation, segment loading, payload
writing, and cache block management.
5. **Query Operators**: Implement simple display and support for filter
queries. Initially, focus on filtering based on spatial relationships
for a single column of geospatial literal values, providing parsing and
execution for query expressions.
6. **Client Modification**: Enable the client to handle user input for
geospatial data and facilitate end-to-end testing.Check the modification
in pymilvus.

---------

Signed-off-by: tasty-gumi <1021989072@qq.com>
2024-10-31 20:58:20 +08:00
Gao
2092dc0ba1
enhance: reserve vector space to reduce reallocate cost in Views() and StringViews() (#37182)
issue: #37152

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2024-10-31 10:02:21 +08:00
Bingyi Sun
90948e9444
fix: add SearchOnSealed unit test and fix a bug (#37241)
issue: https://github.com/milvus-io/milvus/issues/37244

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-10-30 10:26:19 +08:00
Bingyi Sun
b81f162f6a
fix: fix several bugs and refactor some codes related with chunked segment (#37168)
issue: https://github.com/milvus-io/milvus/issues/37147

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-10-28 14:17:30 +08:00
Bingyi Sun
90b3907a92
fix: fix missing return value in chunked column (#37064)
issue: https://github.com/milvus-io/milvus/issues/36834

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-10-22 10:29:19 -07:00
Bingyi Sun
b2037c95a8
fix: use chunk_row_nums to iterate (#36882)
Fix segmentation fault error and remove useless codes.
https://github.com/milvus-io/milvus/issues/36834

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-10-16 11:15:25 +08:00
cqy123456
aa904be6ec
enhance: support sparse vector mmap in growing segment type (#36566)
issue: https://github.com/milvus-io/milvus/issues/32984
related pr: https://github.com/milvus-io/milvus/pull/36565

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-10-15 10:59:23 +08:00
Bingyi Sun
3a09b438c2
fix: fix macos code checker (#36817)
https://github.com/milvus-io/milvus/issues/36829

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-10-14 11:11:51 +08:00
Bingyi Sun
a75bb85f3a
feat: support chunked column for sealed segment (#35764)
This PR splits sealed segment to chunked data to avoid unnecessary
memory copy and save memory usage when loading segments so that loading
can be accelerated.

To support rollback to previous version, we add an option
`multipleChunkedEnable` which is false by default.

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-10-12 15:04:52 +08:00
Buqian Zheng
94005b7198
fix: Sparse float vector incorrectly ExpandData at mmap mode (#36603)
issue: #36561

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-09-30 10:39:16 +08:00
Buqian Zheng
8495bc6bbc
fix: fix broken Sparse Float Vector raw data mmap (#36183)
issue: https://github.com/milvus-io/milvus/issues/36182

* improved `Column.h` to make the code much more readable and
maintainable, and added detailed comments.
* fixed an issue where `ArrayColumn::NumRows()` always returns 0 when
the mmap backing storage is a file.
* removed unused `ColumnBase` constructors and unnecessary members so we
don't get confused.
* Updated `test_chunk_cache.cpp` to make the tests parameterized: to
test both mmap enabled and disabled. Added sparse field in the test to
add coverage.
* re-enabled test `Sealed::GetSparseVectorFromChunkCache`. 
* But 2 other disabled tests `Sealed::WarmupChunkCache` and
`Sealed::GetVectorFromChunkCache` remain disabled, there seems to be
errors. @bigsheeper PTAL.

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-09-25 18:59:13 +08:00
Bingyi Sun
53a8a24554
fix: fix empty indices of sparse float (#35403)
https://github.com/milvus-io/milvus/issues/35401

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-09-10 14:23:07 +08:00
Zhen Ye
98866205fa
fix: munmap deallocate too much memory (#35725)
issue: #35693

Signed-off-by: chyezh <chyezh@outlook.com>
2024-08-27 17:18:59 +08:00
smellthemoon
80dbe87759
enhance: support null value in index (#35238)
#31728

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-08-16 15:30:54 +08:00
smellthemoon
475c333fa2
enhance: add valid_data in span (#35030)
#31728

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-08-02 15:40:14 +08:00
zhagnlu
dd0c26cf58
enhance: redefine variable column block size (#35040)
#35013

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-07-30 19:23:50 +08:00
smellthemoon
5616b7e8d2
enhance: support null in c data_datacodec and load null value (#32183)
1. support read and write null in segcore
    will store valid_data(use uint8_t type to save memory) in fieldData.
2. support load null
binlog reader read and write data into column(sealed segment),
insertRecord(growing segment). In sealed segment, store valid_data
directly. In growing segment, considering prior implementation and easy
code reading, it covert uint8_t to fbvector<bool>, which may optimize in
future.
3.  retrieve valid_data.
    parse valid_data in search/query.
#31728

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-07-23 16:07:51 +08:00
foxspy
8e64bf929c
enhance: add scalar filtering and vector search latency metrics (#34785)
add scalar filtering and vector search latency metrics to distinguish
the cost of scalar filtering.
To add metrics in query chain, add a monitor module and move the metric
files from original storage module.
issue: #34780

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2024-07-19 14:01:39 +08:00
zhagnlu
bd9727a1f7
fix: fix bug that set incorrect info to columnbase (#34428)
#34427

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-07-14 22:27:46 +08:00
zhagnlu
18c83c6466
fix: fix auto merge error (#34661)
#33704

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-07-13 13:07:37 +08:00
congqixia
4850336ca3
fix: Write padding at end of mmap file not chunk (#34529)
Related to #34508

The padding bytes shall be written only at the end of the mmap file not
the chunk of each field data file.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-10 11:12:14 +08:00
congqixia
6b4d977a10
fix: Write padding into mmap file in case of SIGBUS (#34443)
See also #34442

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-05 17:44:09 +08:00
zhagnlu
3030e4625e
enhance: refactor variable column to reduce memory cost (#33875)
#33874

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-06-30 20:16:06 +08:00
zhagnlu
0d7ea8ec42
enhance: Enhance and correct exception module (#33705)
#33704

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-06-23 21:22:01 +08:00
cqy123456
dc4437ff82
enhance: use segment id and type to register in MmapChunkManager and opt malloc in variableChunk (#33993)
issue: https://github.com/milvus-io/milvus/issues/32984

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-06-20 17:42:02 +08:00
cqy123456
32f685ff12
enhance: growing segment support mmap (#32633)
issue: https://github.com/milvus-io/milvus/issues/32984

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-06-18 14:42:00 +08:00
Buqian Zheng
8cb350598c
enhance: Improve GetVectorById of Sparse Float Vector (#33209)
issue: #29419

* sparse float vector to support raw data mmap

For get vector from chunk cache, I added a unit test but marking it as
skipped due to a known issue. I have tested it locally.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-06-12 10:09:55 +08:00
zhagnlu
6ce9df913f
fix: clean vector memory (#33692)
#33533

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-06-07 10:37:54 +08:00
zhagnlu
c6f8a73bb2
enhance: optimize some cache to reduce memory usage (#33534)
#33533

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-06-04 14:09:47 +08:00
Cai Yudong
246586be27
enhance: Unify data type check APIs under internal/core (#31800)
Issue: #22837 

Move and rename following C++ APIs:
datatype_sizeof() ==> GetDataTypeSize()
datatype_name() ==> GetDataTypeName()
datatype_is_vector() / IsVectorType() ==> IsVectorDataType()
datatype_is_variable() ==> IsVariableDataType()
datatype_is_sparse_vector() ==> IsSparseFloatVectorDataType()
datatype_is_string() / IsString() ==> IsDataTypeString()
datatype_is_floating() / IsFloat() ==> IsDataTypeFloat()
datatype_is_binary() ==> IsDataTypeBinary()
datatype_is_json() ==> IsDataTypeJson()
datatype_is_array() ==> IsDataTypeArray()
datatype_is_variable() == IsDataTypeVariable()
datatype_is_integer() / IsIntegral() ==> IsDataTypeInteger()

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2024-04-02 19:15:14 +08:00
chyezh
5655ec4fc0
enhance: add mmap usage metrics (#31708)
issue: #31707

Signed-off-by: chyezh <chyezh@outlook.com>
2024-04-01 11:35:12 +08:00
zhagnlu
cf5109ec17
fix: fix mmap failed when string field all value is empty (#31406)
#31162

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-03-21 10:41:07 +08:00
Buqian Zheng
96cfae55a5
feat: [Sparse Float Vector] segcore to support sparse vector search and get raw vector by id (#30629)
This PR adds the ability to search/get sparse float vectors in segcore,
and added unit tests by modifying lots of existing tests into
parameterized ones.

https://github.com/milvus-io/milvus/issues/29419

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-03-12 09:16:30 -07:00
Buqian Zheng
070dfc77bf
feat: [Sparse Float Vector] segcore basics and index building (#30357)
This commit adds sparse float vector support to segcore with the
following:

1. data type enum declarations
2. Adds corresponding data structures for handling sparse float vectors
in various scenarios, including:
* FieldData as a bridge between the binlog and the in memory data
structures
* mmap::Column as the in memory representation of a sparse float vector
column of a sealed segment;
* ConcurrentVector as the in memory representation of a sparse float
vector of a growing segment which supports inserts.
3. Adds logic in payload reader/writer to serialize/deserialize from/to
binlog
4. Adds the ability to allow the index node to build sparse float vector
index
5. Adds the ability to allow the query node to build growing index for
growing segment and temp index for sealed segment without index built

This commit also includes some code cleanness, comment improvement, and
some unit tests for sparse vector.

https://github.com/milvus-io/milvus/issues/29419

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-03-11 14:45:02 +08:00