347 Commits

Author SHA1 Message Date
Gao
994fc544e7
enhance: support iterative filter execution (#37363)
issue: #37360

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2024-12-11 11:32:44 +08:00
congqixia
767b7e6218
enhance: Use fdopen, fwrite to reduce direct syscall (#38157)
`File.Write` and `File.WriteInt` use `write`, which may be just direct
syscall in some systems. When mappding field data and write line by
line, this could cost lost of CPU time when the row number is large.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-12-03 15:24:39 +08:00
cqy123456
8216345b07
enhance: reduce copy of bitset and id conversion of brurtforce search (#37675)
issue: https://github.com/milvus-io/milvus/issues/37798

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-11-19 15:48:40 +08:00
Zhen Ye
3f1614e9d9
enhance: add trace_id into segcore logs (#37656)
issue: #37655

Signed-off-by: chyezh <chyezh@outlook.com>
2024-11-18 10:20:30 +08:00
smellthemoon
7999367c0c
fix: use not retried err when get wrong parameter (#37707)
#37508

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-11-15 19:14:30 +08:00
Chun Han
2d29dcd30c
enhance:refine group_strict_size parameter(#37482) (#37483)
related: #37482

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2024-11-12 09:56:28 +08:00
aoiasd
12951f0abb
enhance: rename tokenizer to analyzer and check analyzer params (#37478)
relate: https://github.com/milvus-io/milvus/issues/35853

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-11-10 16:12:26 +08:00
aoiasd
d67853fa89
feat: Tokenizer support build with params and clone for concurrency (#37048)
relate: https://github.com/milvus-io/milvus/issues/35853
https://github.com/milvus-io/milvus/issues/36751

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-11-06 17:48:24 +08:00
cai.zhang
625b6176cd
fix: Search for pk using raw data to reduce the overhead caused by views (#37202)
issue: #37152

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-11-05 20:36:24 +08:00
Bingyi Sun
bd04cac4b3
fix: fix group by on chunked segment (#37292)
https://github.com/milvus-io/milvus/issues/37244

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-11-05 17:12:23 +08:00
Zhen Ye
9a0e1c82bc
fix: repeated error code in milvus and segcore (#37359)
issue: #37357

Signed-off-by: chyezh <chyezh@outlook.com>
2024-11-05 16:28:23 +08:00
smellthemoon
51cb2fbf97
fix: parse fail in empty json (#37294)
#37200

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-11-03 19:00:21 +08:00
zhenshan.cao
63843dce33
fix: Fix conan gdal building problem (#37338)
issue:https://github.com/milvus-io/milvus/issues/27576

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-10-31 21:04:16 +08:00
Hao Tan
67c4340565
feat: Geospatial Data Type and GIS Function Support for milvus server (#35990)
issue:https://github.com/milvus-io/milvus/issues/27576

# Main Goals
1. Create and describe collections with geospatial fields, enabling both
client and server to recognize and process geo fields.
2. Insert geospatial data as payload values in the insert binlog, and
print the values for verification.
3. Load segments containing geospatial data into memory.
4. Ensure query outputs can display geospatial data.
5. Support filtering on GIS functions for geospatial columns.

# Solution
1. **Add Type**: Modify the Milvus core by adding a Geospatial type in
both the C++ and Go code layers, defining the Geospatial data structure
and the corresponding interfaces.
2. **Dependency Libraries**: Introduce necessary geospatial data
processing libraries. In the C++ source code, use Conan package
management to include the GDAL library. In the Go source code, add the
go-geom library to the go.mod file.
3. **Protocol Interface**: Revise the Milvus protocol to provide
mechanisms for Geospatial message serialization and deserialization.
4. **Data Pipeline**: Facilitate interaction between the client and
proxy using the WKT format for geospatial data. The proxy will convert
all data into WKB format for downstream processing, providing column
data interfaces, segment encapsulation, segment loading, payload
writing, and cache block management.
5. **Query Operators**: Implement simple display and support for filter
queries. Initially, focus on filtering based on spatial relationships
for a single column of geospatial literal values, providing parsing and
execution for query expressions.
6. **Client Modification**: Enable the client to handle user input for
geospatial data and facilitate end-to-end testing.Check the modification
in pymilvus.

---------

Signed-off-by: tasty-gumi <1021989072@qq.com>
2024-10-31 20:58:20 +08:00
Bingyi Sun
b81f162f6a
fix: fix several bugs and refactor some codes related with chunked segment (#37168)
issue: https://github.com/milvus-io/milvus/issues/37147

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-10-28 14:17:30 +08:00
smellthemoon
44ddcb5a63
fix: not check has_value before get value in JSON (#37128)
https://github.com/milvus-io/milvus/issues/36236
also: https://github.com/milvus-io/milvus/issues/37113

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-10-25 17:19:28 +08:00
Yinzuo Jiang
3628593d20
feat: Implement custom function module in milvus expr (#36560)
OSPP 2024 project:
https://summer-ospp.ac.cn/org/prodetail/247410235?list=org&navpage=org

Solutions:

- parser (planparserv2)
    - add CallExpr in planparserv2/Plan.g4
    - update parser_visitor and show_visitor
- grpc protobuf
    - add CallExpr in plan.proto
- execution (`core/src/exec`)
- add `CallExpr` `ValueExpr` and `ColumnExpr` (both logical and
physical) for function call and function parameters
- function factory (`core/src/exec/expression/function`)
    - create a global hashmap when starting milvus (see server.go)
- the global hashmap stores function signatures and their function
pointers, the CallExpr in execution engine can get the function pointer
by function signature.
- custom functions
    - empty(string)
    - starts_with(string, string)
- add cpp/go unittests and E2E tests

closes: #36559

Signed-off-by: Yinzuo Jiang <jiangyinzuo@foxmail.com>
2024-10-25 15:25:30 +08:00
Bingyi Sun
bf956a3ec2
fix: fix string field has invalid utf-8 (#37104)
issue: https://github.com/milvus-io/milvus/issues/37083
We use vector of string_view to save data temporally but real string
data will be released after record batch is deconstructed.
Change it to vector of string to avoid memory corruption.

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-10-24 18:33:47 -07:00
smellthemoon
eb3e4583ec
enhance: all op(Null) is false in expr (#35527)
#31728

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-10-17 21:14:30 +08:00
cqy123456
aa904be6ec
enhance: support sparse vector mmap in growing segment type (#36566)
issue: https://github.com/milvus-io/milvus/issues/32984
related pr: https://github.com/milvus-io/milvus/pull/36565

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-10-15 10:59:23 +08:00
Bingyi Sun
3a09b438c2
fix: fix macos code checker (#36817)
https://github.com/milvus-io/milvus/issues/36829

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-10-14 11:11:51 +08:00
Bingyi Sun
a75bb85f3a
feat: support chunked column for sealed segment (#35764)
This PR splits sealed segment to chunked data to avoid unnecessary
memory copy and save memory usage when loading segments so that loading
can be accelerated.

To support rollback to previous version, we add an option
`multipleChunkedEnable` which is false by default.

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-10-12 15:04:52 +08:00
aoiasd
db34572c56
feat: support load and query with bm25 metric (#36071)
relate: https://github.com/milvus-io/milvus/issues/35853

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-10-11 10:23:20 +08:00
Buqian Zheng
f7b811450d
feat: add enable_tokenizer params to VarChar field (#36480)
issue: #35922

add an enable_tokenizer param to varchar field: must be set to true so
that a varchar field can enable_match or used as input of BM25 function

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-10-10 20:33:21 +08:00
SimFG
130a923dec
enhance: the estimate method when loading the collection (#36307)
- issue: #36530

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
Co-authored-by: xianliang.li <xianliang.li@zilliz.com>
2024-10-09 17:35:19 +08:00
Rijin-N
a05a37a583
enhance: GCS native support (GCS implemented using Google Cloud Storage libraries) (#36214)
Native support for Google cloud storage using the Google Cloud Storage
libraries. Authentication is performed using GCS service account
credentials JSON.

Currently, Milvus supports Google Cloud Storage using S3-compatible APIs
via the AWS SDK. This approach has the following limitations:

1. Overhead: Translating requests between S3-compatible APIs and GCS can
introduce additional overhead.
2. Compatibility Limitations: Some features of the original S3 API may
not fully translate or work as expected with GCS.

To address these limitations, This enhancement is needed.

Related Issue: #36212
2024-09-30 13:23:32 +08:00
Buqian Zheng
8495bc6bbc
fix: fix broken Sparse Float Vector raw data mmap (#36183)
issue: https://github.com/milvus-io/milvus/issues/36182

* improved `Column.h` to make the code much more readable and
maintainable, and added detailed comments.
* fixed an issue where `ArrayColumn::NumRows()` always returns 0 when
the mmap backing storage is a file.
* removed unused `ColumnBase` constructors and unnecessary members so we
don't get confused.
* Updated `test_chunk_cache.cpp` to make the tests parameterized: to
test both mmap enabled and disabled. Added sparse field in the test to
add coverage.
* re-enabled test `Sealed::GetSparseVectorFromChunkCache`. 
* But 2 other disabled tests `Sealed::WarmupChunkCache` and
`Sealed::GetVectorFromChunkCache` remain disabled, there seems to be
errors. @bigsheeper PTAL.

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-09-25 18:59:13 +08:00
yihao.dai
8cda48a96a
enhance: Use mmap.scalarIndex config for text index (#36400)
issue: https://github.com/milvus-io/milvus/issues/35273

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-09-24 12:21:13 +08:00
zhagnlu
489087d18b
enhance: refactor executor framework V2 (#35251)
#32636

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-09-13 20:57:09 +08:00
Jiquan Long
f0f2fb4cf0
enhance: span tracing of c++ part (#36205)
fix: https://github.com/milvus-io/milvus/issues/36204

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-09-13 11:19:09 +08:00
zhagnlu
5e5e87cc2f
enhance: rename some params and reduce default bitmapCardinalityLimit… (#36138)
#32900

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-09-12 12:09:08 +08:00
Jiquan Long
89bf226f0b
feat: support keyword text match (#35923)
fix: #35922

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-09-10 15:11:08 +08:00
congqixia
c61eea737b
enhance: Fix trace.cpp lint format issue (#36004)
Introduced by #35928

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-09-05 16:33:04 +08:00
congqixia
9e96ed4873
fix: Fix tracing config update logic (#35928)
Related to #35927

There are serveral issue this PR addresses:
- Use `ResetTraceConfig` method instead init one in update event handler
- Implement dynamic stats.Handler to receive tracing config update event
- Update `enable_trace` flag when `ResetTraceConfig` is invoked
- Change `enable_trace` to `std::atomic<bool>` in case of data race

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-09-05 14:27:04 +08:00
Chun Han
4641fd9195
enhance: make search groupby stop when reaching topk groups (#35814)
related: #33544

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2024-09-02 18:25:03 +08:00
zhagnlu
576ac2bbed
fix: Fix the reference to a variable after it has been moved (#35875)
#35607

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-09-02 10:05:02 +08:00
Patrick Weizhi Xu
b3089b5bdc
feat: support range search pagination retains order (#35738)
issue: #35464

Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>
2024-08-29 14:09:00 +08:00
zhagnlu
4d2f96c760
enhance: support bitmap mmap (#35399)
#32900

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-08-27 16:34:59 +08:00
Zhen Ye
a773836b89
enhance: optimize milvus core building (#35610)
issue: #35549,#35611,#35633

- remove milvus_segcore milvus_indexbuilder..., add libmilvus_core
- core building only link once
- move opendal compilation into cmake
- fix odr

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-08-23 12:35:02 +08:00
zhagnlu
3107701fe8
enhance: optimize retrieve on dynamic field (#35580)
#35514

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
Co-authored-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-08-22 14:24:56 +08:00
congqixia
3491608256
fix: Match int8_t and int16_t in Array::get_data (#35579)
Related to #35578

Previously int16/int8 bitmap index may read int32 array as int16, which
may cause build index with half of the data(if array is full) and half
zeros. This causes BITMAP index lost information.

This PR matches int8_t & int16_t while `get_data` when building index.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-08-20 16:10:56 +08:00
smellthemoon
80dbe87759
enhance: support null value in index (#35238)
#31728

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-08-16 15:30:54 +08:00
Buqian Zheng
f4a91e135b
enhance: Allow empty sparse row (#34700)
issue: #29419

* If a sparse vector with 0 non-zero value is inserted, no ANN search on
this sparse vector field will return it as a result. User may retrieve
this row via scalar query or ANN search on another vector field though.
* If the user uses an empty sparse vector as the query vector for a ANN
search, no neighbor will be returned.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-08-16 14:14:54 +08:00
Abdullah Ahmed
d20d6ea551
fix: Functional-notation casting vulnerability fix (#35252)
Fix for issue: https://github.com/milvus-io/milvus/issues/35200
2024-08-15 16:20:53 +08:00
Cai Yudong
3c9a47c8db
feat: Encode traceID and spanID as hex string (#34807)
Issue: https://github.com/zilliztech/knowhere/pull/714

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2024-08-06 15:20:16 +08:00
smellthemoon
475c333fa2
enhance: add valid_data in span (#35030)
#31728

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-08-02 15:40:14 +08:00
congqixia
a642a26ed4
enhance: Resolve ChunkFileWriter lint issue (#35166)
See also #34483

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-08-01 16:52:13 +08:00
Bingyi Sun
f229f244d2
enhance: add chunk basic impl (#34634)
https://github.com/milvus-io/milvus/issues/35112
This pr would not affect milvus functionality by now.
It implments a Chunk memory layout that looks like 
```
VariableColumn
|offset|offset|offset|
|data|data|data|

```
We maybe move offsets to the beginning and add null bitmaps later but
not in this PR.
And mmap test will also be added in another PR.

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-08-01 10:29:51 +08:00
congqixia
de8a266d8a
enhance: Enable linux code checker (#35084)
See also #34483

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-30 15:53:51 +08:00
congqixia
972752258a
enhance: Support otlp http exporter (#35053)
See also #35052

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-29 17:43:49 +08:00