204 Commits

Author SHA1 Message Date
Buqian Zheng
75e64b993f
enhance: add metrics for counting number of nun-zeros/tokens of sparse/FTS search (#38329)
sparse vectors may have arbitrary number of non zeros and it is hard to
optimize without knowing the actual distribution of nnz. this PR adds a
metric for analyzing that.

issue: https://github.com/milvus-io/milvus/issues/35853

comparing with https://github.com/milvus-io/milvus/pull/38328, this
includes also metric for FTS in query node delegator

also fixed a bug of sparse when searching by pk

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-12-12 16:22:43 +08:00
Gao
8977454311
enhance: support recall estimation (#38017)
issue: #37899 
Only `search` api will be supported

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2024-12-11 20:40:48 +08:00
jaime
1e8ea4a7e7
feat: add segment/channel/task/slow query render (#37561)
issue: #36621

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-11-12 17:44:29 +08:00
aoiasd
d67853fa89
feat: Tokenizer support build with params and clone for concurrency (#37048)
relate: https://github.com/milvus-io/milvus/issues/35853
https://github.com/milvus-io/milvus/issues/36751

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-11-06 17:48:24 +08:00
zhenshan.cao
63843dce33
fix: Fix conan gdal building problem (#37338)
issue:https://github.com/milvus-io/milvus/issues/27576

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-10-31 21:04:16 +08:00
Hao Tan
67c4340565
feat: Geospatial Data Type and GIS Function Support for milvus server (#35990)
issue:https://github.com/milvus-io/milvus/issues/27576

# Main Goals
1. Create and describe collections with geospatial fields, enabling both
client and server to recognize and process geo fields.
2. Insert geospatial data as payload values in the insert binlog, and
print the values for verification.
3. Load segments containing geospatial data into memory.
4. Ensure query outputs can display geospatial data.
5. Support filtering on GIS functions for geospatial columns.

# Solution
1. **Add Type**: Modify the Milvus core by adding a Geospatial type in
both the C++ and Go code layers, defining the Geospatial data structure
and the corresponding interfaces.
2. **Dependency Libraries**: Introduce necessary geospatial data
processing libraries. In the C++ source code, use Conan package
management to include the GDAL library. In the Go source code, add the
go-geom library to the go.mod file.
3. **Protocol Interface**: Revise the Milvus protocol to provide
mechanisms for Geospatial message serialization and deserialization.
4. **Data Pipeline**: Facilitate interaction between the client and
proxy using the WKT format for geospatial data. The proxy will convert
all data into WKB format for downstream processing, providing column
data interfaces, segment encapsulation, segment loading, payload
writing, and cache block management.
5. **Query Operators**: Implement simple display and support for filter
queries. Initially, focus on filtering based on spatial relationships
for a single column of geospatial literal values, providing parsing and
execution for query expressions.
6. **Client Modification**: Enable the client to handle user input for
geospatial data and facilitate end-to-end testing.Check the modification
in pymilvus.

---------

Signed-off-by: tasty-gumi <1021989072@qq.com>
2024-10-31 20:58:20 +08:00
cai.zhang
2ef6cbbf59
feat: The expression supports filling elements through templates (#37033)
issue: #36672

The expression supports filling elements through templates, which helps
to reduce the overhead of parsing the elements.

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-10-31 14:20:22 +08:00
Patrick Weizhi Xu
43ad9af529
fix: use max MvccTs for iterator (#37247)
issue: #37158

Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>
2024-10-30 13:58:20 +08:00
Patrick Weizhi Xu
fc69df44a1
fix: set guarantee ts for seach/query iterator (#37180)
issue: #37158

Return the GuaranteeTS so that the subsequent requests following the
correct TS.

BeginTS is the current timestamp when the task is created.
The GuaranteeTS is the one parsed based on both consistency level and
beginTS, in PreExecute of the task on Proxy.
The delegator will wait until GuaranteeTS is met.
In PostExecute of the task on Proxy, the TS of the first iterator
request will be returned to the SDK and add it to the subsequent
requests.
Hence, if the default consistency level is Eventually or Bounded, the
order of TS will be
> Guarantee TS < BeginTS

If it returns the BeginTS, the second request will need to catch up and
result in extra 200ms max of latency, which results in something like

| Call | Latency |
| --- | --- |
| first call on `Next()` | 30ms |
| second call on `Next()` | 210ms |
| third call on `Next()` | 10ms |
| fourth call on `Next()` | 11 ms |
| ... | ... |

where we expect

| Call | Latency |
| --- | --- |
| first call on `Next()` | 30ms |
| second call on `Next()` | 10ms |
| third call on `Next()` | 10ms |
| fourth call on `Next()` | 11 ms |
| ... | ... |

Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>
2024-10-28 15:57:35 +08:00
Gao
1d61b604e1
enhance: support retry search when topk is reduced and result not enough (#35645)
issue: #35576 

This pr is to cover those cases when queryHook optimize search params
and make the result size insufficient, add retry search mechanism and
add related metrics for alarming.

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2024-10-23 19:19:30 +08:00
Chun Han
903450f5c6
enhance: add ts support for iterator(#22718) (#36572)
related: #22718

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2024-10-16 18:51:23 +08:00
Buqian Zheng
16b533cbf0
feat: Restful support for BM25 function (#36713)
issue: https://github.com/milvus-io/milvus/issues/35853

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-10-13 17:41:21 +08:00
aoiasd
db34572c56
feat: support load and query with bm25 metric (#36071)
relate: https://github.com/milvus-io/milvus/issues/35853

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-10-11 10:23:20 +08:00
Chun Han
eb23e23cd2
enhance: refine parameter relationship for hybridsearch_group_by(#35096) (#36289)
related: #35096

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2024-09-20 14:55:11 +08:00
zhenshan.cao
9d8d332c88
fix: Fix improper use of offset in HybridSearch (#36244)
issue :https://github.com/milvus-io/milvus/issues/36243

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-09-13 22:05:15 +08:00
Ted Xu
e7ea1d7a04
enhance: improve log encoding performance on proxy nodes (#36123)
See #36122

This PR is designed to enhance log performance through two improvements:

1. Optimize JSON encoding by switching JSON serializer to
`json-iterator`.
2. Adding support of lazy initialization `WithLazy`.

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-09-11 14:51:07 +08:00
Chun Han
e480b103bd
feat: supporing hybrid search group_by (#35982)
related: #35096

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2024-09-08 17:09:04 +08:00
Chun Han
bfd9d86fe9
feat: support groupby size on go-layer(#33544) (#33845)
related: #33544

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2024-08-27 14:21:00 +08:00
zhagnlu
3107701fe8
enhance: optimize retrieve on dynamic field (#35580)
#35514

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
Co-authored-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-08-22 14:24:56 +08:00
Patrick Weizhi Xu
3c7f73137e
fix: disallow expr when partition key isolation is enabled (#35031)
issue: #34336 

Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>
(cherry picked from commit 2889481ce9a14230e9c7f1c8f9c3c1decde77e03)
2024-07-29 14:21:50 +08:00
wei liu
c45f38aa61
enhance: Update protobuf-go to protobuf-go v2 (#34394)
issue: #34252

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-07-29 11:31:51 +08:00
congqixia
e2f40fc2a8
fix: Check legacy guarantee ts when skipping alloc ts (#34981)
See also #34980

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-25 10:17:45 +08:00
Jiquan Long
a2ac84bd64
feat: record the duration waiting in the proxy queue (#34744)
fix: https://github.com/milvus-io/milvus/issues/34743

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-07-23 14:23:52 +08:00
Patrick Weizhi Xu
104d0966b7
feat: support partition key isolation (#34336)
issue: #34332

---------

Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>
2024-07-11 19:01:35 +08:00
aoiasd
186757e622
enhance: support mark error as user error (#33498)
relate: https://github.com/milvus-io/milvus/issues/33492

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-07-01 14:56:12 +08:00
Jiquan Long
1f58cda957
enhance: add more trace for search & query (#32734)
issue: https://github.com/milvus-io/milvus/issues/32728

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-05-07 13:03:29 +08:00
smellthemoon
365e50b63e
fix: revert add range search params check in proxy (#32366)
no need to check params in empty segment.
#30365

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-04-23 17:41:23 +08:00
SimFG
8594b55ad5
enhance: add max insert request size and must use partition key configs (#32433)
issue: https://github.com/milvus-io/milvus/issues/30577
/kind improvement

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-04-19 10:31:20 +08:00
zhenshan.cao
02f17b842a
fix: fix incomplete hybrid search result when nq > 1 (#32177)
issue: https://github.com/milvus-io/milvus/issues/25639

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-04-17 17:09:32 +08:00
zhenshan.cao
88c6828d6c
fix: failed to raise metric_type not match error (#32202)
issue: https://github.com/milvus-io/milvus/issues/32176

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-04-12 16:19:18 +08:00
Patrick Weizhi Xu
52ae47c850
enhance: gather materialized view search info once per request (#31996)
issue: #29892 

This PR:
1. Move the process of gathering materialized search info to when the
search plan is created, before it goes to each segment, to avoid
repeated work and access the plan node under multi-threaded
circumstances.
2. Enforce the supported MV type to `VARCHAR`
3. Add integration test

Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>
2024-04-11 15:21:19 +08:00
SimFG
90bed1caf9
enhance: add the related data size for the read apis (#31816)
issue: #30436
origin pr: #30438
related pr: #31772

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-04-10 15:07:17 +08:00
Cai Yudong
a0a4ec8b67
enhance: make range search param check message more meaningful (#32006)
Issue: #31970

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2024-04-09 16:17:26 +08:00
zhenshan.cao
089c805e0a
enhance:Refactor hybrid search (#32020)
issue: https://github.com/milvus-io/milvus/issues/25639
https://github.com/milvus-io/milvus/issues/31368

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-04-09 14:21:18 +08:00
Chun Han
92c84ccc3f
enhance: ban groupby on hybrid-search(#29968) (#31810)
related: #29968

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2024-04-02 15:47:18 +08:00
SimFG
b1a1cca10b
feat: add more operation detail info for better allocation (#30438)
issue: #30436

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-03-28 06:33:11 +08:00
Patrick Weizhi Xu
982dd2834b
enhance: add materialized view search info (#30888)
issue: #29892 

This PR
1. Pass Materialized View (MV) search information obtained from the
expression parsing planning procedure to Knowhere. It only performs when
MV is enabled and the partition key is involved in the expression. The
search information includes:
1. Touched field_id and the count of related categories in the
expression. E.g., `color == red && color == blue` yields `field_id ->
2`.
2. Whether the expression only includes AND (&&) logical operator,
default `true`.
    3. Whether the expression has NOT (!) operator, default `false`.
4. Store if turning on MV on the proxy to eliminate reading from
paramtable for every search request.
5. Renames to MV.

## Rebuttals
1. Did not write in `ExtractInfoPlanNodeVisitor` since the new scalar
framework was introduced and this part might be removed in the future.
2. Currently only interested in `==` and `in` expression, `string` data
type, anything else is a bonus.
3. Leave handling expressions like `F == A || F == A` for future works
of the optimizer.

## Detailed MV Info

![image](https://github.com/milvus-io/milvus/assets/6563846/b27c08a0-9fd3-4474-8897-30a3d6d6b36f)

Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>
2024-03-21 11:19:07 +08:00
wei liu
68af832405
fix: Hybrid search doesn't expire shard leader cache (#31380)
issue: #31351
This PR fixed that search/hybrid_search doesn't expire shard leader
cache when send request to query node failed, which make every request
keep trying to connect a offline query node

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-19 18:03:07 +08:00
yihao.dai
c4ace0f9d2
fix: Return specific error code when encountering incomplete requery results (#31343)
During requery, segments may change (e.g., due to compaction), so we
need to return specific error codes when encountering incomplete requery
results. Clients can then retry to avoid this issue.

issue: https://github.com/milvus-io/milvus/issues/29656

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-03-18 14:15:04 +08:00
smellthemoon
056e9f0cf5
fix: unmarshal nil when check search params (#31139)
#31020

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-03-15 09:15:04 +08:00
Chun Han
3574bdf858
enhance: ban range-search iteration for search-group-by (#30824)
related: #30033

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2024-03-08 14:17:00 +08:00
MrPresent-Han
d0eeea4b44
fix: reduce incorrectly for group-by with offset(#30828) (#30882)
related: #30828

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2024-03-06 16:47:00 +08:00
smellthemoon
a4f3e01a3a
fix: add the range search params check in proxy (#30423)
if check in Segcore, will not do the it when not insert data.
so, check "radius" and "range_filter" in proxy.
related with #30365

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-02-28 11:24:58 +08:00
cai.zhang
40ca98f57f
enhance: Skip timestamp allocation when search/query consistency level is eventually (#29773)
issue: #29772 
1. Skip timestamp allocation when search/query consistency level is
eventually.

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-02-21 09:52:59 +08:00
xige-16
0a78b38bb8
fix: fix requery without partitionIDs in hybrid search (#30444)
issue: #30412 
Signed-off-by: xige-16 <xi.ge@zilliz.com>

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2024-02-02 16:47:13 +08:00
xige-16
060c8603a3
fix: Support mvcc with hybrid serach (#30114)
issue: https://github.com/milvus-io/milvus/issues/29656
/kind bug

Signed-off-by: xige-16 <xi.ge@zilliz.com>

---------

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2024-02-01 16:03:03 +08:00
congqixia
8e8ac213aa
enhance: Utilize partition key optimization in reQuery (#30253)
See also #30250

This PR add requery flag in query task. When reQuery flag is true, query
task shall skip partition name conversion and use pre-calculated
partitionIDs passed from search task.

TODO: hybrid search does not have partition id information. we shall
apply same logic for hybrid search later.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-25 11:05:07 +08:00
congqixia
8a6e1a4b27
enhance: pre-allocate result FieldData space to reduce copy & growslice (#29726)
See also: #29113

Add a new utitliy function in `pkg/util/typetuil` to pre-allocate field
data slice capacity acoording to search limit. This shall avoid copying
the data during `AppendFieldData` when previous slice is out of space.
And shall also save CPU time during high paylog.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-09 15:48:55 +08:00
yah01
f030f31d92
enhance: make the error of parsing expression to ParameterInvalid (#29681)
before this, the error is unexpected error

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-09 15:36:47 +08:00
zhenshan.cao
60e88fb833
fix: Restore the MVCC functionality. (#29749)
When the TimeTravel functionality was previously removed, it
inadvertently affected the MVCC functionality within the system. This PR
aims to reintroduce the internal MVCC functionality as follows:

1. Add MvccTimestamp to the requests of Search/Query and the results of
Search internally.
2. When the delegator receives a Query/Search request and there is no
MVCC timestamp set in the request, set the delegator's current tsafe as
the MVCC timestamp of the request. If the request already has an MVCC
timestamp, do not modify it.
3. When the Proxy handles Search and triggers the second phase ReQuery,
divide the ReQuery into different shards and pass the MVCC timestamp to
the corresponding Query requests.

issue: #29656

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-01-09 11:38:48 +08:00