211 Commits

Author SHA1 Message Date
Patrick Weizhi Xu
04fff74a56
feat: introduce Text data type (#39874)
issue: https://github.com/milvus-io/milvus/issues/39818

This PR mimics Varchar data type, allows insert, search, query, delete,
full-text search and others.
Functionalities related to filter expressions are disabled temporarily. 

Storage changes for Text data type will be in the following PRs.

Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>
2025-02-19 11:04:51 +08:00
zhagnlu
316534e065
enhance: optimize delete init construct code (#39327)
#39326

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-02-17 21:05:26 +08:00
Bingyi Sun
b59555057d
feat: support json index (#36750)
https://github.com/milvus-io/milvus/issues/35528

This PR adds json index support for json and dynamic fields. Now you can
only do unary query like 'a["b"] > 1' using this index. We will support
more filter type later.

basic usage:
```
collection.create_index("json_field", {"index_type": "INVERTED",
    "params": {"json_cast_type": DataType.STRING, "json_path":
'json_field["a"]["b"]'}})
```

There are some limits to use this index:
1. If a record does not have the json path you specify, it will be
ignored and there will not be an error.
2. If a value of the json path fails to be cast to the type you specify,
it will be ignored and there will not be an error.
3. A specific json path can have only one json index.
4. If you try to create more than one json indexes for one json field,
sdk(pymilvus<=2.4.7) may return immediately because of internal
implementation. This will be fixed in a later version.

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-02-15 14:06:15 +08:00
Cai Yudong
341d6c1eb7
feat: Update segcore for VECTOR_INT8 (#39415)
Issue: #38666

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2025-01-21 11:03:03 +08:00
congqixia
45d49df89b
fix: Skip load extra indexes for sorted segment pk field (#39389)
Related to #39339

Extra indexes can be ignored for most cases since sorted pk column
already provided indexing features

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-01-20 18:13:15 +08:00
congqixia
7cac87caca
fix: Skip erase field if index build on PK field (#39370)
Related to #39339

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-01-17 20:31:02 +08:00
congqixia
da1b786ef8
enhance: Utilize "find0" in segment.find_first (#39229)
Related to #39003

Previous PR #39004 has to clone & flip bitset due to bitset does not
support find0 operator. #39176 added this feature so clone & flip could
be removed now.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-01-14 14:14:58 +08:00
Zhen Ye
5f94954bb4
fix: data race when accessing field_ when retrieving (#39151)
issue: #39148

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-13 11:23:04 +08:00
Ted Xu
3dc95153b7
fix: build break under debug mode (#38790)
See #38435

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-01-07 17:36:56 +08:00
Chun Han
3739446a33
enhance: refine array view to optimize memory usage(#38736) (#38808)
related: #38736

700m data, array_length=10
non-mmap_offsets_uint64: 2.0G
mmap_offsets_uint64: 1.1G
mmap_offsets_uint32: 880MB

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-01-07 13:26:55 +08:00
congqixia
72f5b85c05
enhance: Accelerate find_first by utilizing bitset simd methods (#39004)
Related to #39003

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-01-07 10:34:54 +08:00
aoiasd
bc15ad24f2
fix: sealed segment get empty index params when brute force search for bm25 (#38707)
relate: https://github.com/milvus-io/milvus/issues/38236

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-12-25 19:06:51 +08:00
Ted Xu
acc8fb7af6
enhance: eliminate compile warnings (part2) (#38535)
See #38435

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-12-25 15:30:50 +08:00
zhagnlu
01de0afc4e
enhance: refactor delete mvcc function (#38066)
#37413

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-12-15 18:02:43 +08:00
Gao
994fc544e7
enhance: support iterative filter execution (#37363)
issue: #37360

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2024-12-11 11:32:44 +08:00
aoiasd
e9391acf80
fix: bm25 brute force search need index params k1 and b (#37721)
relate: https://github.com/milvus-io/milvus/issues/35853

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-11-18 15:44:31 +08:00
zhagnlu
e4b6773d0a
fix: fix create text index dir conflict bug (#37693)
#37623

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-11-15 18:26:30 +08:00
smellthemoon
3389a6b500
enhance: support null in text match index (#37517)
#37508

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-11-13 11:08:29 +08:00
aoiasd
12951f0abb
enhance: rename tokenizer to analyzer and check analyzer params (#37478)
relate: https://github.com/milvus-io/milvus/issues/35853

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-11-10 16:12:26 +08:00
aoiasd
d67853fa89
feat: Tokenizer support build with params and clone for concurrency (#37048)
relate: https://github.com/milvus-io/milvus/issues/35853
https://github.com/milvus-io/milvus/issues/36751

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-11-06 17:48:24 +08:00
cai.zhang
625b6176cd
fix: Search for pk using raw data to reduce the overhead caused by views (#37202)
issue: #37152

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-11-05 20:36:24 +08:00
zhenshan.cao
63843dce33
fix: Fix conan gdal building problem (#37338)
issue:https://github.com/milvus-io/milvus/issues/27576

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-10-31 21:04:16 +08:00
Hao Tan
67c4340565
feat: Geospatial Data Type and GIS Function Support for milvus server (#35990)
issue:https://github.com/milvus-io/milvus/issues/27576

# Main Goals
1. Create and describe collections with geospatial fields, enabling both
client and server to recognize and process geo fields.
2. Insert geospatial data as payload values in the insert binlog, and
print the values for verification.
3. Load segments containing geospatial data into memory.
4. Ensure query outputs can display geospatial data.
5. Support filtering on GIS functions for geospatial columns.

# Solution
1. **Add Type**: Modify the Milvus core by adding a Geospatial type in
both the C++ and Go code layers, defining the Geospatial data structure
and the corresponding interfaces.
2. **Dependency Libraries**: Introduce necessary geospatial data
processing libraries. In the C++ source code, use Conan package
management to include the GDAL library. In the Go source code, add the
go-geom library to the go.mod file.
3. **Protocol Interface**: Revise the Milvus protocol to provide
mechanisms for Geospatial message serialization and deserialization.
4. **Data Pipeline**: Facilitate interaction between the client and
proxy using the WKT format for geospatial data. The proxy will convert
all data into WKB format for downstream processing, providing column
data interfaces, segment encapsulation, segment loading, payload
writing, and cache block management.
5. **Query Operators**: Implement simple display and support for filter
queries. Initially, focus on filtering based on spatial relationships
for a single column of geospatial literal values, providing parsing and
execution for query expressions.
6. **Client Modification**: Enable the client to handle user input for
geospatial data and facilitate end-to-end testing.Check the modification
in pymilvus.

---------

Signed-off-by: tasty-gumi <1021989072@qq.com>
2024-10-31 20:58:20 +08:00
cai.zhang
86687bd8ed
enhance: Refine code for get_deleted_bitmap (#36819)
issue: #33744 

Check whether the PK is truly sorted in the debug model.

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-10-28 15:19:30 +08:00
smellthemoon
eb3e4583ec
enhance: all op(Null) is false in expr (#35527)
#31728

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-10-17 21:14:30 +08:00
cqy123456
b474374ea5
enhance: use growingMmapEnabled to control the behavior of interim index, not vectorField (#36500)
issue:https://github.com/milvus-io/milvus/issues/36392
related pr: https://github.com/milvus-io/milvus/pull/36391

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-10-17 20:25:24 +08:00
Bingyi Sun
a75bb85f3a
feat: support chunked column for sealed segment (#35764)
This PR splits sealed segment to chunked data to avoid unnecessary
memory copy and save memory usage when loading segments so that loading
can be accelerated.

To support rollback to previous version, we add an option
`multipleChunkedEnable` which is false by default.

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-10-12 15:04:52 +08:00
SimFG
130a923dec
enhance: the estimate method when loading the collection (#36307)
- issue: #36530

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
Co-authored-by: xianliang.li <xianliang.li@zilliz.com>
2024-10-09 17:35:19 +08:00
Buqian Zheng
8495bc6bbc
fix: fix broken Sparse Float Vector raw data mmap (#36183)
issue: https://github.com/milvus-io/milvus/issues/36182

* improved `Column.h` to make the code much more readable and
maintainable, and added detailed comments.
* fixed an issue where `ArrayColumn::NumRows()` always returns 0 when
the mmap backing storage is a file.
* removed unused `ColumnBase` constructors and unnecessary members so we
don't get confused.
* Updated `test_chunk_cache.cpp` to make the tests parameterized: to
test both mmap enabled and disabled. Added sparse field in the test to
add coverage.
* re-enabled test `Sealed::GetSparseVectorFromChunkCache`. 
* But 2 other disabled tests `Sealed::WarmupChunkCache` and
`Sealed::GetVectorFromChunkCache` remain disabled, there seems to be
errors. @bigsheeper PTAL.

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-09-25 18:59:13 +08:00
yihao.dai
8cda48a96a
enhance: Use mmap.scalarIndex config for text index (#36400)
issue: https://github.com/milvus-io/milvus/issues/35273

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-09-24 12:21:13 +08:00
zhagnlu
489087d18b
enhance: refactor executor framework V2 (#35251)
#32636

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-09-13 20:57:09 +08:00
congqixia
58d3200986
enhance: Filter out non-hit delete records during load delta (#36207)
Related to #35303

This PR utilizes pk index in segment to exclude non-hit delete record
during load delete records. This ability is crucial when l0/delete
forward policy only replies on segment itself(without BF filtering).

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-09-13 19:05:08 +08:00
Jiquan Long
89bf226f0b
feat: support keyword text match (#35923)
fix: #35922

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-09-10 15:11:08 +08:00
cqy123456
560e8e70b0
enhance: reduce mmap_rss after chunkcache warmup (#35974)
related pr: https://github.com/milvus-io/milvus/pull/35965

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-09-05 18:07:05 +08:00
zhagnlu
74048ce34f
fix:rename mmap file path to avoid directory conflict (#35810)
#35784

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-09-03 16:05:03 +08:00
cai.zhang
2c9bb4dfa3
feat: Support stats task to sort segment by PK (#35054)
issue: #33744 

This PR includes the following changes:
1. Added a new task type to the task scheduler in datacoord: stats task,
which sorts segments by primary key.
2. Implemented segment sorting in indexnode.
3. Added a new field `FieldStatsLog` to SegmentInfo to store token index
information.

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-09-02 14:19:03 +08:00
yihao.dai
f2b83d316b
enhance: Support memory mode chunk cache (#35347)
Chunk cache supports loading raw vectors into memory.

issue: https://github.com/milvus-io/milvus/issues/35273

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-08-25 15:42:58 +08:00
zhagnlu
3107701fe8
enhance: optimize retrieve on dynamic field (#35580)
#35514

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
Co-authored-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-08-22 14:24:56 +08:00
smellthemoon
80dbe87759
enhance: support null value in index (#35238)
#31728

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-08-16 15:30:54 +08:00
zhagnlu
4b553b0333
enhance: revert remove duplicated pk function (#35103)
issue: #34778
 Revert "fix: fix query count(*) concurrently"
 Revert "enhance: mark duplicated pk as deleted "

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-08-05 10:48:17 +08:00
smellthemoon
475c333fa2
enhance: add valid_data in span (#35030)
#31728

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-08-02 15:40:14 +08:00
zhenshan.cao
aa247f192d
enhance: remove unused code for StorageV2 (#35132)
issue: https://github.com/milvus-io/milvus/issues/34168

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-08-01 12:08:13 +08:00
zhagnlu
dd0c26cf58
enhance: redefine variable column block size (#35040)
#35013

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-07-30 19:23:50 +08:00
smellthemoon
5616b7e8d2
enhance: support null in c data_datacodec and load null value (#32183)
1. support read and write null in segcore
    will store valid_data(use uint8_t type to save memory) in fieldData.
2. support load null
binlog reader read and write data into column(sealed segment),
insertRecord(growing segment). In sealed segment, store valid_data
directly. In growing segment, considering prior implementation and easy
code reading, it covert uint8_t to fbvector<bool>, which may optimize in
future.
3.  retrieve valid_data.
    parse valid_data in search/query.
#31728

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-07-23 16:07:51 +08:00
zhagnlu
804dd5409a
enhance: mark duplicated pk as deleted (#34586)
fix #34247

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-07-16 14:25:39 +08:00
congqixia
4850336ca3
fix: Write padding at end of mmap file not chunk (#34529)
Related to #34508

The padding bytes shall be written only at the end of the mmap file not
the chunk of each field data file.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-10 11:12:14 +08:00
zhagnlu
3030e4625e
enhance: refactor variable column to reduce memory cost (#33875)
#33874

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-06-30 20:16:06 +08:00
zhagnlu
03a3f50892
enhance: add skip using array index when some situation (#33947)
#32900

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-06-23 21:26:02 +08:00
cqy123456
dc4437ff82
enhance: use segment id and type to register in MmapChunkManager and opt malloc in variableChunk (#33993)
issue: https://github.com/milvus-io/milvus/issues/32984

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-06-20 17:42:02 +08:00
cqy123456
b460862537
fix: can't find Chunk struct after growing support mmap (#33951)
issue: https://github.com/milvus-io/milvus/issues/32984

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-06-18 18:37:58 +08:00