Amit Kumar 388d56fdc7
enhance: Add support for minimum_should_match in text_match (parser, engine, client, and tests) (#44988)
### Is there an existing issue for this?

- [x] I have searched the existing issues

---

Please see: https://github.com/milvus-io/milvus/issues/44593 for the
background

This PR makes https://github.com/milvus-io/milvus/pull/44638 redundant,
which can be closed. The PR comments for the original implementation
suggested an alternative and a better approach, this new PR has that
implementation.

---

This PR

- Adds an optional `minimum_should_match` argument to `text_match(...)`
and wires it through the parser, planner/visitor, index bindings, and
client-level tests/examples so full-text queries can require a minimum
number of tokens to match.

Motivation
- Provide a way to require an expression to match a minimum number of
tokens in lexical search.

What changed
- Parser / grammar
- Added grammar rule and token: `MINIMUM_SHOULD_MATCH` and
`textMatchOption` in `internal/parser/planparserv2/Plan.g4`.
- Regenerated parser outputs: `internal/parser/planparserv2/generated/*`
(parser, lexer, visitor, etc.) to support the new rule.
- Planner / visitor
- `parser_visitor.go`: parse and validate the `minimum_should_match`
integer; propagate as an extra value on the `TextMatch` expression so
downstream components receive it.
  - Added `VisitTextMatchOption` visitor method handling.
- Client (Golang)
- Added a unit test to verify `text_match(...,
minimum_should_match=...)` appears in the generated DSL and is accepted
by client code: `client/milvusclient/read_test.go` (new test coverage).
- Added an integration-style test for the feature to the go-client
testcase suite: `tests/go_client/testcases/full_text_search_test.go`
(exercise min=1, min=3, large min).
- Added an example demonstrating `text_match` usage:
`client/milvusclient/read_example_test.go` (example name conforms to
godoc mapping).
- Engine / index
  - Updated C++ index interface: `TextMatchIndex::MatchQuery`
- Added/updated unit tests for the index behavior:
`internal/core/src/index/TextMatchIndexTest.cpp`.
- Tantivy binding 
- Added `match_query_with_minimum` implementation and unit tests to
`internal/core/thirdparty/tantivy/tantivy-binding/src/index_reader_text.rs`
that construct boolean queries with minimum required clauses.



Behavioral / compatibility notes
- This adds an optional argument to `text_match` only; default behavior
(no `minimum_should_match`) is unchanged.
- Internal API change: `TextMatchIndex::MatchQuery` signature changed
(internal component). Callers in the repo were updated accordingly.
- Parser changes required regenerating ANTLR outputs 

Tests and verification
- New/updated tests:
- Go client unit test: `client/milvusclient/read_test.go` (mocked Search
request asserts DSL contains `minimum_should_match=2`).
- Go e2e-style test:
`tests/go_client/testcases/full_text_search_test.go` (exercises min=1, 3
and a large min).
- C++ unit tests for index behavior:
`internal/core/src/index/TextMatchIndexTest.cpp`.
  - Rust binding unit tests for `match_query_with_minimum`.
- Local verification commands to run:
- Go client tests: `cd client && go test ./milvusclient -run ^$` (client
package)
- Go testcases: `cd tests/go_client && go test ./testcases -run
TestTextMatchMinimumShouldMatch` (requires a running Milvus instance)
- C++ unit tests / build: run core build/test per repo instructions (the
change touches core index code).
- Rust binding tests: `cd
internal/core/thirdparty/tantivy/tantivy-binding && cargo test` (if
developing locally).

---------

Signed-off-by: Amit Kumar <amit.kumar@reddit.com>
Co-authored-by: Amit Kumar <amit.kumar@reddit.com>
2025-11-07 16:07:11 +08:00

93 lines
916 B
Plaintext

T__0=1
T__1=2
T__2=3
T__3=4
T__4=5
LBRACE=6
RBRACE=7
LT=8
LE=9
GT=10
GE=11
EQ=12
NE=13
LIKE=14
EXISTS=15
TEXTMATCH=16
PHRASEMATCH=17
RANDOMSAMPLE=18
INTERVAL=19
ISO=20
MINIMUM_SHOULD_MATCH=21
ASSIGN=22
ADD=23
SUB=24
MUL=25
DIV=26
MOD=27
POW=28
SHL=29
SHR=30
BAND=31
BOR=32
BXOR=33
AND=34
OR=35
ISNULL=36
ISNOTNULL=37
BNOT=38
NOT=39
IN=40
EmptyArray=41
JSONContains=42
JSONContainsAll=43
JSONContainsAny=44
ArrayContains=45
ArrayContainsAll=46
ArrayContainsAny=47
ArrayLength=48
STEuqals=49
STTouches=50
STOverlaps=51
STCrosses=52
STContains=53
STIntersects=54
STWithin=55
STDWithin=56
BooleanConstant=57
IntegerConstant=58
FloatingConstant=59
Identifier=60
Meta=61
StringLiteral=62
JSONIdentifier=63
Whitespace=64
Newline=65
'('=1
')'=2
'['=3
','=4
']'=5
'{'=6
'}'=7
'<'=8
'<='=9
'>'=10
'>='=11
'=='=12
'!='=13
'='=22
'+'=23
'-'=24
'*'=25
'/'=26
'%'=27
'**'=28
'<<'=29
'>>'=30
'&'=31
'|'=32
'^'=33
'~'=38
'$meta'=61