Amit Kumar 388d56fdc7
enhance: Add support for minimum_should_match in text_match (parser, engine, client, and tests) (#44988)
### Is there an existing issue for this?

- [x] I have searched the existing issues

---

Please see: https://github.com/milvus-io/milvus/issues/44593 for the
background

This PR makes https://github.com/milvus-io/milvus/pull/44638 redundant,
which can be closed. The PR comments for the original implementation
suggested an alternative and a better approach, this new PR has that
implementation.

---

This PR

- Adds an optional `minimum_should_match` argument to `text_match(...)`
and wires it through the parser, planner/visitor, index bindings, and
client-level tests/examples so full-text queries can require a minimum
number of tokens to match.

Motivation
- Provide a way to require an expression to match a minimum number of
tokens in lexical search.

What changed
- Parser / grammar
- Added grammar rule and token: `MINIMUM_SHOULD_MATCH` and
`textMatchOption` in `internal/parser/planparserv2/Plan.g4`.
- Regenerated parser outputs: `internal/parser/planparserv2/generated/*`
(parser, lexer, visitor, etc.) to support the new rule.
- Planner / visitor
- `parser_visitor.go`: parse and validate the `minimum_should_match`
integer; propagate as an extra value on the `TextMatch` expression so
downstream components receive it.
  - Added `VisitTextMatchOption` visitor method handling.
- Client (Golang)
- Added a unit test to verify `text_match(...,
minimum_should_match=...)` appears in the generated DSL and is accepted
by client code: `client/milvusclient/read_test.go` (new test coverage).
- Added an integration-style test for the feature to the go-client
testcase suite: `tests/go_client/testcases/full_text_search_test.go`
(exercise min=1, min=3, large min).
- Added an example demonstrating `text_match` usage:
`client/milvusclient/read_example_test.go` (example name conforms to
godoc mapping).
- Engine / index
  - Updated C++ index interface: `TextMatchIndex::MatchQuery`
- Added/updated unit tests for the index behavior:
`internal/core/src/index/TextMatchIndexTest.cpp`.
- Tantivy binding 
- Added `match_query_with_minimum` implementation and unit tests to
`internal/core/thirdparty/tantivy/tantivy-binding/src/index_reader_text.rs`
that construct boolean queries with minimum required clauses.



Behavioral / compatibility notes
- This adds an optional argument to `text_match` only; default behavior
(no `minimum_should_match`) is unchanged.
- Internal API change: `TextMatchIndex::MatchQuery` signature changed
(internal component). Callers in the repo were updated accordingly.
- Parser changes required regenerating ANTLR outputs 

Tests and verification
- New/updated tests:
- Go client unit test: `client/milvusclient/read_test.go` (mocked Search
request asserts DSL contains `minimum_should_match=2`).
- Go e2e-style test:
`tests/go_client/testcases/full_text_search_test.go` (exercises min=1, 3
and a large min).
- C++ unit tests for index behavior:
`internal/core/src/index/TextMatchIndexTest.cpp`.
  - Rust binding unit tests for `match_query_with_minimum`.
- Local verification commands to run:
- Go client tests: `cd client && go test ./milvusclient -run ^$` (client
package)
- Go testcases: `cd tests/go_client && go test ./testcases -run
TestTextMatchMinimumShouldMatch` (requires a running Milvus instance)
- C++ unit tests / build: run core build/test per repo instructions (the
change touches core index code).
- Rust binding tests: `cd
internal/core/thirdparty/tantivy/tantivy-binding && cargo test` (if
developing locally).

---------

Signed-off-by: Amit Kumar <amit.kumar@reddit.com>
Co-authored-by: Amit Kumar <amit.kumar@reddit.com>
2025-11-07 16:07:11 +08:00

143 lines
8.9 KiB
Plaintext

token literal names:
null
'('
')'
'['
','
']'
'{'
'}'
'<'
'<='
'>'
'>='
'=='
'!='
null
null
null
null
null
null
null
null
'='
'+'
'-'
'*'
'/'
'%'
'**'
'<<'
'>>'
'&'
'|'
'^'
null
null
null
null
'~'
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
'$meta'
null
null
null
null
token symbolic names:
null
null
null
null
null
null
LBRACE
RBRACE
LT
LE
GT
GE
EQ
NE
LIKE
EXISTS
TEXTMATCH
PHRASEMATCH
RANDOMSAMPLE
INTERVAL
ISO
MINIMUM_SHOULD_MATCH
ASSIGN
ADD
SUB
MUL
DIV
MOD
POW
SHL
SHR
BAND
BOR
BXOR
AND
OR
ISNULL
ISNOTNULL
BNOT
NOT
IN
EmptyArray
JSONContains
JSONContainsAll
JSONContainsAny
ArrayContains
ArrayContainsAll
ArrayContainsAny
ArrayLength
STEuqals
STTouches
STOverlaps
STCrosses
STContains
STIntersects
STWithin
STDWithin
BooleanConstant
IntegerConstant
FloatingConstant
Identifier
Meta
StringLiteral
JSONIdentifier
Whitespace
Newline
rule names:
expr
textMatchOption
atn:
[4, 1, 65, 240, 2, 0, 7, 0, 2, 1, 7, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 3, 0, 10, 8, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 3, 0, 22, 8, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 5, 0, 41, 8, 0, 10, 0, 12, 0, 44, 9, 0, 1, 0, 3, 0, 47, 8, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 3, 0, 61, 8, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 3, 0, 71, 8, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 5, 0, 162, 8, 0, 10, 0, 12, 0, 165, 9, 0, 1, 0, 3, 0, 168, 8, 0, 3, 0, 170, 8, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 3, 0, 177, 8, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 3, 0, 193, 8, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 5, 0, 231, 8, 0, 10, 0, 12, 0, 234, 9, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 2, 0, 2, 0, 14, 1, 0, 23, 24, 1, 0, 8, 13, 1, 0, 60, 61, 2, 0, 23, 24, 38, 39, 2, 0, 42, 42, 45, 45, 2, 0, 43, 43, 46, 46, 2, 0, 44, 44, 47, 47, 2, 0, 60, 60, 63, 63, 1, 0, 25, 27, 1, 0, 29, 30, 1, 0, 8, 9, 1, 0, 10, 11, 1, 0, 8, 11, 1, 0, 12, 13, 293, 0, 176, 1, 0, 0, 0, 2, 235, 1, 0, 0, 0, 4, 5, 6, 0, -1, 0, 5, 9, 5, 60, 0, 0, 6, 7, 7, 0, 0, 0, 7, 8, 5, 19, 0, 0, 8, 10, 5, 62, 0, 0, 9, 6, 1, 0, 0, 0, 9, 10, 1, 0, 0, 0, 10, 11, 1, 0, 0, 0, 11, 12, 7, 1, 0, 0, 12, 13, 5, 20, 0, 0, 13, 177, 5, 62, 0, 0, 14, 15, 5, 20, 0, 0, 15, 16, 5, 62, 0, 0, 16, 17, 7, 1, 0, 0, 17, 21, 5, 60, 0, 0, 18, 19, 7, 0, 0, 0, 19, 20, 5, 19, 0, 0, 20, 22, 5, 62, 0, 0, 21, 18, 1, 0, 0, 0, 21, 22, 1, 0, 0, 0, 22, 177, 1, 0, 0, 0, 23, 177, 5, 58, 0, 0, 24, 177, 5, 59, 0, 0, 25, 177, 5, 57, 0, 0, 26, 177, 5, 62, 0, 0, 27, 177, 7, 2, 0, 0, 28, 177, 5, 63, 0, 0, 29, 30, 5, 6, 0, 0, 30, 31, 5, 60, 0, 0, 31, 177, 5, 7, 0, 0, 32, 33, 5, 1, 0, 0, 33, 34, 3, 0, 0, 0, 34, 35, 5, 2, 0, 0, 35, 177, 1, 0, 0, 0, 36, 37, 5, 3, 0, 0, 37, 42, 3, 0, 0, 0, 38, 39, 5, 4, 0, 0, 39, 41, 3, 0, 0, 0, 40, 38, 1, 0, 0, 0, 41, 44, 1, 0, 0, 0, 42, 40, 1, 0, 0, 0, 42, 43, 1, 0, 0, 0, 43, 46, 1, 0, 0, 0, 44, 42, 1, 0, 0, 0, 45, 47, 5, 4, 0, 0, 46, 45, 1, 0, 0, 0, 46, 47, 1, 0, 0, 0, 47, 48, 1, 0, 0, 0, 48, 49, 5, 5, 0, 0, 49, 177, 1, 0, 0, 0, 50, 177, 5, 41, 0, 0, 51, 52, 5, 15, 0, 0, 52, 177, 3, 0, 0, 35, 53, 54, 5, 16, 0, 0, 54, 55, 5, 1, 0, 0, 55, 56, 5, 60, 0, 0, 56, 57, 5, 4, 0, 0, 57, 60, 5, 62, 0, 0, 58, 59, 5, 4, 0, 0, 59, 61, 3, 2, 1, 0, 60, 58, 1, 0, 0, 0, 60, 61, 1, 0, 0, 0, 61, 62, 1, 0, 0, 0, 62, 177, 5, 2, 0, 0, 63, 64, 5, 17, 0, 0, 64, 65, 5, 1, 0, 0, 65, 66, 5, 60, 0, 0, 66, 67, 5, 4, 0, 0, 67, 70, 5, 62, 0, 0, 68, 69, 5, 4, 0, 0, 69, 71, 3, 0, 0, 0, 70, 68, 1, 0, 0, 0, 70, 71, 1, 0, 0, 0, 71, 72, 1, 0, 0, 0, 72, 177, 5, 2, 0, 0, 73, 74, 5, 18, 0, 0, 74, 75, 5, 1, 0, 0, 75, 76, 3, 0, 0, 0, 76, 77, 5, 2, 0, 0, 77, 177, 1, 0, 0, 0, 78, 79, 7, 3, 0, 0, 79, 177, 3, 0, 0, 29, 80, 81, 7, 4, 0, 0, 81, 82, 5, 1, 0, 0, 82, 83, 3, 0, 0, 0, 83, 84, 5, 4, 0, 0, 84, 85, 3, 0, 0, 0, 85, 86, 5, 2, 0, 0, 86, 177, 1, 0, 0, 0, 87, 88, 7, 5, 0, 0, 88, 89, 5, 1, 0, 0, 89, 90, 3, 0, 0, 0, 90, 91, 5, 4, 0, 0, 91, 92, 3, 0, 0, 0, 92, 93, 5, 2, 0, 0, 93, 177, 1, 0, 0, 0, 94, 95, 7, 6, 0, 0, 95, 96, 5, 1, 0, 0, 96, 97, 3, 0, 0, 0, 97, 98, 5, 4, 0, 0, 98, 99, 3, 0, 0, 0, 99, 100, 5, 2, 0, 0, 100, 177, 1, 0, 0, 0, 101, 102, 5, 49, 0, 0, 102, 103, 5, 1, 0, 0, 103, 104, 5, 60, 0, 0, 104, 105, 5, 4, 0, 0, 105, 106, 5, 62, 0, 0, 106, 177, 5, 2, 0, 0, 107, 108, 5, 50, 0, 0, 108, 109, 5, 1, 0, 0, 109, 110, 5, 60, 0, 0, 110, 111, 5, 4, 0, 0, 111, 112, 5, 62, 0, 0, 112, 177, 5, 2, 0, 0, 113, 114, 5, 51, 0, 0, 114, 115, 5, 1, 0, 0, 115, 116, 5, 60, 0, 0, 116, 117, 5, 4, 0, 0, 117, 118, 5, 62, 0, 0, 118, 177, 5, 2, 0, 0, 119, 120, 5, 52, 0, 0, 120, 121, 5, 1, 0, 0, 121, 122, 5, 60, 0, 0, 122, 123, 5, 4, 0, 0, 123, 124, 5, 62, 0, 0, 124, 177, 5, 2, 0, 0, 125, 126, 5, 53, 0, 0, 126, 127, 5, 1, 0, 0, 127, 128, 5, 60, 0, 0, 128, 129, 5, 4, 0, 0, 129, 130, 5, 62, 0, 0, 130, 177, 5, 2, 0, 0, 131, 132, 5, 54, 0, 0, 132, 133, 5, 1, 0, 0, 133, 134, 5, 60, 0, 0, 134, 135, 5, 4, 0, 0, 135, 136, 5, 62, 0, 0, 136, 177, 5, 2, 0, 0, 137, 138, 5, 55, 0, 0, 138, 139, 5, 1, 0, 0, 139, 140, 5, 60, 0, 0, 140, 141, 5, 4, 0, 0, 141, 142, 5, 62, 0, 0, 142, 177, 5, 2, 0, 0, 143, 144, 5, 56, 0, 0, 144, 145, 5, 1, 0, 0, 145, 146, 5, 60, 0, 0, 146, 147, 5, 4, 0, 0, 147, 148, 5, 62, 0, 0, 148, 149, 5, 4, 0, 0, 149, 150, 3, 0, 0, 0, 150, 151, 5, 2, 0, 0, 151, 177, 1, 0, 0, 0, 152, 153, 5, 48, 0, 0, 153, 154, 5, 1, 0, 0, 154, 155, 7, 7, 0, 0, 155, 177, 5, 2, 0, 0, 156, 157, 5, 60, 0, 0, 157, 169, 5, 1, 0, 0, 158, 163, 3, 0, 0, 0, 159, 160, 5, 4, 0, 0, 160, 162, 3, 0, 0, 0, 161, 159, 1, 0, 0, 0, 162, 165, 1, 0, 0, 0, 163, 161, 1, 0, 0, 0, 163, 164, 1, 0, 0, 0, 164, 167, 1, 0, 0, 0, 165, 163, 1, 0, 0, 0, 166, 168, 5, 4, 0, 0, 167, 166, 1, 0, 0, 0, 167, 168, 1, 0, 0, 0, 168, 170, 1, 0, 0, 0, 169, 158, 1, 0, 0, 0, 169, 170, 1, 0, 0, 0, 170, 171, 1, 0, 0, 0, 171, 177, 5, 2, 0, 0, 172, 173, 7, 7, 0, 0, 173, 177, 5, 36, 0, 0, 174, 175, 7, 7, 0, 0, 175, 177, 5, 37, 0, 0, 176, 4, 1, 0, 0, 0, 176, 14, 1, 0, 0, 0, 176, 23, 1, 0, 0, 0, 176, 24, 1, 0, 0, 0, 176, 25, 1, 0, 0, 0, 176, 26, 1, 0, 0, 0, 176, 27, 1, 0, 0, 0, 176, 28, 1, 0, 0, 0, 176, 29, 1, 0, 0, 0, 176, 32, 1, 0, 0, 0, 176, 36, 1, 0, 0, 0, 176, 50, 1, 0, 0, 0, 176, 51, 1, 0, 0, 0, 176, 53, 1, 0, 0, 0, 176, 63, 1, 0, 0, 0, 176, 73, 1, 0, 0, 0, 176, 78, 1, 0, 0, 0, 176, 80, 1, 0, 0, 0, 176, 87, 1, 0, 0, 0, 176, 94, 1, 0, 0, 0, 176, 101, 1, 0, 0, 0, 176, 107, 1, 0, 0, 0, 176, 113, 1, 0, 0, 0, 176, 119, 1, 0, 0, 0, 176, 125, 1, 0, 0, 0, 176, 131, 1, 0, 0, 0, 176, 137, 1, 0, 0, 0, 176, 143, 1, 0, 0, 0, 176, 152, 1, 0, 0, 0, 176, 156, 1, 0, 0, 0, 176, 172, 1, 0, 0, 0, 176, 174, 1, 0, 0, 0, 177, 232, 1, 0, 0, 0, 178, 179, 10, 30, 0, 0, 179, 180, 5, 28, 0, 0, 180, 231, 3, 0, 0, 31, 181, 182, 10, 28, 0, 0, 182, 183, 7, 8, 0, 0, 183, 231, 3, 0, 0, 29, 184, 185, 10, 27, 0, 0, 185, 186, 7, 0, 0, 0, 186, 231, 3, 0, 0, 28, 187, 188, 10, 26, 0, 0, 188, 189, 7, 9, 0, 0, 189, 231, 3, 0, 0, 27, 190, 192, 10, 25, 0, 0, 191, 193, 5, 39, 0, 0, 192, 191, 1, 0, 0, 0, 192, 193, 1, 0, 0, 0, 193, 194, 1, 0, 0, 0, 194, 195, 5, 40, 0, 0, 195, 231, 3, 0, 0, 26, 196, 197, 10, 11, 0, 0, 197, 198, 7, 10, 0, 0, 198, 199, 7, 7, 0, 0, 199, 200, 7, 10, 0, 0, 200, 231, 3, 0, 0, 12, 201, 202, 10, 10, 0, 0, 202, 203, 7, 11, 0, 0, 203, 204, 7, 7, 0, 0, 204, 205, 7, 11, 0, 0, 205, 231, 3, 0, 0, 11, 206, 207, 10, 9, 0, 0, 207, 208, 7, 12, 0, 0, 208, 231, 3, 0, 0, 10, 209, 210, 10, 8, 0, 0, 210, 211, 7, 13, 0, 0, 211, 231, 3, 0, 0, 9, 212, 213, 10, 7, 0, 0, 213, 214, 5, 31, 0, 0, 214, 231, 3, 0, 0, 8, 215, 216, 10, 6, 0, 0, 216, 217, 5, 33, 0, 0, 217, 231, 3, 0, 0, 7, 218, 219, 10, 5, 0, 0, 219, 220, 5, 32, 0, 0, 220, 231, 3, 0, 0, 6, 221, 222, 10, 4, 0, 0, 222, 223, 5, 34, 0, 0, 223, 231, 3, 0, 0, 5, 224, 225, 10, 3, 0, 0, 225, 226, 5, 35, 0, 0, 226, 231, 3, 0, 0, 4, 227, 228, 10, 34, 0, 0, 228, 229, 5, 14, 0, 0, 229, 231, 5, 62, 0, 0, 230, 178, 1, 0, 0, 0, 230, 181, 1, 0, 0, 0, 230, 184, 1, 0, 0, 0, 230, 187, 1, 0, 0, 0, 230, 190, 1, 0, 0, 0, 230, 196, 1, 0, 0, 0, 230, 201, 1, 0, 0, 0, 230, 206, 1, 0, 0, 0, 230, 209, 1, 0, 0, 0, 230, 212, 1, 0, 0, 0, 230, 215, 1, 0, 0, 0, 230, 218, 1, 0, 0, 0, 230, 221, 1, 0, 0, 0, 230, 224, 1, 0, 0, 0, 230, 227, 1, 0, 0, 0, 231, 234, 1, 0, 0, 0, 232, 230, 1, 0, 0, 0, 232, 233, 1, 0, 0, 0, 233, 1, 1, 0, 0, 0, 234, 232, 1, 0, 0, 0, 235, 236, 5, 21, 0, 0, 236, 237, 5, 22, 0, 0, 237, 238, 5, 58, 0, 0, 238, 3, 1, 0, 0, 0, 13, 9, 21, 42, 46, 60, 70, 163, 167, 169, 176, 192, 230, 232]