mirror of
https://gitee.com/milvus-io/milvus.git
synced 2025-12-08 01:58:34 +08:00
### Is there an existing issue for this? - [x] I have searched the existing issues --- Please see: https://github.com/milvus-io/milvus/issues/44593 for the background This PR makes https://github.com/milvus-io/milvus/pull/44638 redundant, which can be closed. The PR comments for the original implementation suggested an alternative and a better approach, this new PR has that implementation. --- This PR - Adds an optional `minimum_should_match` argument to `text_match(...)` and wires it through the parser, planner/visitor, index bindings, and client-level tests/examples so full-text queries can require a minimum number of tokens to match. Motivation - Provide a way to require an expression to match a minimum number of tokens in lexical search. What changed - Parser / grammar - Added grammar rule and token: `MINIMUM_SHOULD_MATCH` and `textMatchOption` in `internal/parser/planparserv2/Plan.g4`. - Regenerated parser outputs: `internal/parser/planparserv2/generated/*` (parser, lexer, visitor, etc.) to support the new rule. - Planner / visitor - `parser_visitor.go`: parse and validate the `minimum_should_match` integer; propagate as an extra value on the `TextMatch` expression so downstream components receive it. - Added `VisitTextMatchOption` visitor method handling. - Client (Golang) - Added a unit test to verify `text_match(..., minimum_should_match=...)` appears in the generated DSL and is accepted by client code: `client/milvusclient/read_test.go` (new test coverage). - Added an integration-style test for the feature to the go-client testcase suite: `tests/go_client/testcases/full_text_search_test.go` (exercise min=1, min=3, large min). - Added an example demonstrating `text_match` usage: `client/milvusclient/read_example_test.go` (example name conforms to godoc mapping). - Engine / index - Updated C++ index interface: `TextMatchIndex::MatchQuery` - Added/updated unit tests for the index behavior: `internal/core/src/index/TextMatchIndexTest.cpp`. - Tantivy binding - Added `match_query_with_minimum` implementation and unit tests to `internal/core/thirdparty/tantivy/tantivy-binding/src/index_reader_text.rs` that construct boolean queries with minimum required clauses. Behavioral / compatibility notes - This adds an optional argument to `text_match` only; default behavior (no `minimum_should_match`) is unchanged. - Internal API change: `TextMatchIndex::MatchQuery` signature changed (internal component). Callers in the repo were updated accordingly. - Parser changes required regenerating ANTLR outputs Tests and verification - New/updated tests: - Go client unit test: `client/milvusclient/read_test.go` (mocked Search request asserts DSL contains `minimum_should_match=2`). - Go e2e-style test: `tests/go_client/testcases/full_text_search_test.go` (exercises min=1, 3 and a large min). - C++ unit tests for index behavior: `internal/core/src/index/TextMatchIndexTest.cpp`. - Rust binding unit tests for `match_query_with_minimum`. - Local verification commands to run: - Go client tests: `cd client && go test ./milvusclient -run ^$` (client package) - Go testcases: `cd tests/go_client && go test ./testcases -run TestTextMatchMinimumShouldMatch` (requires a running Milvus instance) - C++ unit tests / build: run core build/test per repo instructions (the change touches core index code). - Rust binding tests: `cd internal/core/thirdparty/tantivy/tantivy-binding && cargo test` (if developing locally). --------- Signed-off-by: Amit Kumar <amit.kumar@reddit.com> Co-authored-by: Amit Kumar <amit.kumar@reddit.com>
93 lines
916 B
Plaintext
93 lines
916 B
Plaintext
T__0=1
|
|
T__1=2
|
|
T__2=3
|
|
T__3=4
|
|
T__4=5
|
|
LBRACE=6
|
|
RBRACE=7
|
|
LT=8
|
|
LE=9
|
|
GT=10
|
|
GE=11
|
|
EQ=12
|
|
NE=13
|
|
LIKE=14
|
|
EXISTS=15
|
|
TEXTMATCH=16
|
|
PHRASEMATCH=17
|
|
RANDOMSAMPLE=18
|
|
INTERVAL=19
|
|
ISO=20
|
|
MINIMUM_SHOULD_MATCH=21
|
|
ASSIGN=22
|
|
ADD=23
|
|
SUB=24
|
|
MUL=25
|
|
DIV=26
|
|
MOD=27
|
|
POW=28
|
|
SHL=29
|
|
SHR=30
|
|
BAND=31
|
|
BOR=32
|
|
BXOR=33
|
|
AND=34
|
|
OR=35
|
|
ISNULL=36
|
|
ISNOTNULL=37
|
|
BNOT=38
|
|
NOT=39
|
|
IN=40
|
|
EmptyArray=41
|
|
JSONContains=42
|
|
JSONContainsAll=43
|
|
JSONContainsAny=44
|
|
ArrayContains=45
|
|
ArrayContainsAll=46
|
|
ArrayContainsAny=47
|
|
ArrayLength=48
|
|
STEuqals=49
|
|
STTouches=50
|
|
STOverlaps=51
|
|
STCrosses=52
|
|
STContains=53
|
|
STIntersects=54
|
|
STWithin=55
|
|
STDWithin=56
|
|
BooleanConstant=57
|
|
IntegerConstant=58
|
|
FloatingConstant=59
|
|
Identifier=60
|
|
Meta=61
|
|
StringLiteral=62
|
|
JSONIdentifier=63
|
|
Whitespace=64
|
|
Newline=65
|
|
'('=1
|
|
')'=2
|
|
'['=3
|
|
','=4
|
|
']'=5
|
|
'{'=6
|
|
'}'=7
|
|
'<'=8
|
|
'<='=9
|
|
'>'=10
|
|
'>='=11
|
|
'=='=12
|
|
'!='=13
|
|
'='=22
|
|
'+'=23
|
|
'-'=24
|
|
'*'=25
|
|
'/'=26
|
|
'%'=27
|
|
'**'=28
|
|
'<<'=29
|
|
'>>'=30
|
|
'&'=31
|
|
'|'=32
|
|
'^'=33
|
|
'~'=38
|
|
'$meta'=61
|