Amit Kumar 388d56fdc7
enhance: Add support for minimum_should_match in text_match (parser, engine, client, and tests) (#44988)
### Is there an existing issue for this?

- [x] I have searched the existing issues

---

Please see: https://github.com/milvus-io/milvus/issues/44593 for the
background

This PR makes https://github.com/milvus-io/milvus/pull/44638 redundant,
which can be closed. The PR comments for the original implementation
suggested an alternative and a better approach, this new PR has that
implementation.

---

This PR

- Adds an optional `minimum_should_match` argument to `text_match(...)`
and wires it through the parser, planner/visitor, index bindings, and
client-level tests/examples so full-text queries can require a minimum
number of tokens to match.

Motivation
- Provide a way to require an expression to match a minimum number of
tokens in lexical search.

What changed
- Parser / grammar
- Added grammar rule and token: `MINIMUM_SHOULD_MATCH` and
`textMatchOption` in `internal/parser/planparserv2/Plan.g4`.
- Regenerated parser outputs: `internal/parser/planparserv2/generated/*`
(parser, lexer, visitor, etc.) to support the new rule.
- Planner / visitor
- `parser_visitor.go`: parse and validate the `minimum_should_match`
integer; propagate as an extra value on the `TextMatch` expression so
downstream components receive it.
  - Added `VisitTextMatchOption` visitor method handling.
- Client (Golang)
- Added a unit test to verify `text_match(...,
minimum_should_match=...)` appears in the generated DSL and is accepted
by client code: `client/milvusclient/read_test.go` (new test coverage).
- Added an integration-style test for the feature to the go-client
testcase suite: `tests/go_client/testcases/full_text_search_test.go`
(exercise min=1, min=3, large min).
- Added an example demonstrating `text_match` usage:
`client/milvusclient/read_example_test.go` (example name conforms to
godoc mapping).
- Engine / index
  - Updated C++ index interface: `TextMatchIndex::MatchQuery`
- Added/updated unit tests for the index behavior:
`internal/core/src/index/TextMatchIndexTest.cpp`.
- Tantivy binding 
- Added `match_query_with_minimum` implementation and unit tests to
`internal/core/thirdparty/tantivy/tantivy-binding/src/index_reader_text.rs`
that construct boolean queries with minimum required clauses.



Behavioral / compatibility notes
- This adds an optional argument to `text_match` only; default behavior
(no `minimum_should_match`) is unchanged.
- Internal API change: `TextMatchIndex::MatchQuery` signature changed
(internal component). Callers in the repo were updated accordingly.
- Parser changes required regenerating ANTLR outputs 

Tests and verification
- New/updated tests:
- Go client unit test: `client/milvusclient/read_test.go` (mocked Search
request asserts DSL contains `minimum_should_match=2`).
- Go e2e-style test:
`tests/go_client/testcases/full_text_search_test.go` (exercises min=1, 3
and a large min).
- C++ unit tests for index behavior:
`internal/core/src/index/TextMatchIndexTest.cpp`.
  - Rust binding unit tests for `match_query_with_minimum`.
- Local verification commands to run:
- Go client tests: `cd client && go test ./milvusclient -run ^$` (client
package)
- Go testcases: `cd tests/go_client && go test ./testcases -run
TestTextMatchMinimumShouldMatch` (requires a running Milvus instance)
- C++ unit tests / build: run core build/test per repo instructions (the
change touches core index code).
- Rust binding tests: `cd
internal/core/thirdparty/tantivy/tantivy-binding && cargo test` (if
developing locally).

---------

Signed-off-by: Amit Kumar <amit.kumar@reddit.com>
Co-authored-by: Amit Kumar <amit.kumar@reddit.com>
2025-11-07 16:07:11 +08:00
..

Milvus Go Client Test Framework

Overview

This is a comprehensive test framework for the Milvus Go Client, designed to validate various functionalities of the Milvus vector database client. The framework provides a structured approach to writing tests with reusable components and helper functions.

Framework Architecture

Directory Structure

/go_client/
├── testcases/           # Main test cases
│   ├── helper/          # Helper functions and utilities
│   │   ├── helper.go
│   │   ├── data_helper.go
│   │   └── collection_helper.go
│   ├── search_test.go   # Search functionality tests
│   ├── index_test.go    # Index management tests
│   └── ...
├── common/             # Common utilities and constants
└── base/               # Base infrastructure code

Key Components

  • Collection Preparation: Utilities for creating and managing collections
  • Data Generation: Tools for generating test data
  • Helper Functions: Common operations and validations
  • Test Cases: Organized by functionality

Writing Test Cases

Basic Test Structure

func TestYourFeature(t *testing.T) {
    // 1. Setup context and client
    ctx := hp.CreateContext(t, time.Second*common.DefaultTimeout)
    mc := createDefaultMilvusClient(ctx, t)

    // 2. Prepare collection
    prepare, schema := hp.CollPrepare.CreateCollection(
        ctx, t, mc,
        hp.NewCreateCollectionParams(hp.Int64Vec),
        hp.TNewFieldsOption(),
        hp.TNewSchemaOption(),
    )

    // 3. Insert test data
    prepare.InsertData(ctx, t, mc,
        hp.NewInsertParams(schema),
        hp.TNewDataOption(),
    )

    // 4. Execute test operations
    // ... your test logic here ...

    // 5. Validate results
    require.NoError(t, err)
    require.Equal(t, expected, actual)
}

Using Custom Parameters

  1. Collection Creation Parameters
fieldsOption := hp.TNewFieldsOption().
    TWithEnableAnalyzer(true).
    TWithAnalyzerParams(map[string]any{
        "tokenizer": "standard",
    })

schemaOption := hp.TNewSchemaOption().
    TWithEnableDynamicField(true).
    TWithDescription("Custom schema").
    TWithAutoID(false)
  1. Data Insertion Options
insertOption := hp.TNewDataOption().
    TWithNb(1000).           // Number of records
    TWithDim(128).           // Vector dimension
    TWithStart(100).         // Starting ID
    TWithMaxLen(256).        // Maximum length
    TWithTextLang("en")      // Text language
  1. Index Parameters
indexParams := hp.TNewIndexParams(schema).
    TWithFieldIndex(map[string]index.Index{
        common.DefaultVectorFieldName: index.NewIVFSQIndex(
            &index.IVFSQConfig{
                MetricType: entity.L2,
                NList:     128,
            },
        ),
    })
  1. Search Parameters
searchOpt := client.NewSearchOption(schema.CollectionName, 100, vectors).
    WithOffset(0).
    WithLimit(100).
    WithConsistencyLevel(entity.ClStrong).
    WithFilter("int64 >= 100").
    WithOutputFields([]string{"*"}).
    WithSearchParams(map[string]any{
        "nprobe": 16,
        "ef":     64,
    })

Adding New Parameters

  1. Define New Option Type
// In helper/data_helper.go
type YourNewOption struct {
    newParam1 string
    newParam2 int
}
  1. Add Constructor
func TNewYourOption() *YourNewOption {
    return &YourNewOption{
        newParam1: "default",
        newParam2: 0,
    }
}
  1. Add Parameter Methods
func (opt *YourNewOption) TWithNewParam1(value string) *YourNewOption {
    opt.newParam1 = value
    return opt
}

func (opt *YourNewOption) TWithNewParam2(value int) *YourNewOption {
    opt.newParam2 = value
    return opt
}

Best Practices

  1. Test Organization

    • Group related tests in the same file
    • Use clear and descriptive test names
    • Add comments explaining test purpose
  2. Data Generation

    • Use helper functions for generating test data
    • Ensure data is appropriate for the test case
    • Clean up test data after use
  3. Error Handling

    • Use common.CheckErr for consistent error checking
    • Test both success and failure scenarios
    • Validate error messages when appropriate
  4. Performance Considerations

    • Use appropriate timeouts
    • Clean up resources after tests
    • Consider test execution time

Running Tests

# Run all tests
go test ./testcases/...

# Run specific test
go test -run TestYourFeature ./testcases/

# Run with verbose output
go test -v ./testcases/...

# gotestsum
Recommend you to use gotestsum https://github.com/gotestyourself/gotestsum

# Run all default cases
gotestsum --format testname --hide-summary=output -v ./testcases/... --addr=127.0.0.1:19530 -timeout=30m

# Run a specified file
gotestsum --format testname --hide-summary=output ./testcases/collection_test.go ./testcases/main_test.go --addr=127.0.0.1:19530

# Run L3 rg cases
gotestsum --format testname --hide-summary=output -v ./testcases/advcases/... --addr=127.0.0.1:19530 -timeout=30m -tags=rg

# Run advanced rg cases and default cases
# rg cases conflicts with default cases, so -p=1 is required
gotestsum --format testname --hide-summary=output -v ./testcases/... --addr=127.0.0.1:19530 -timeout=30m -tags=rg -p 1

Contributing

  1. Follow the existing code structure
  2. Add comprehensive test cases
  3. Document new parameters and options
  4. Update this README for significant changes
  5. Ensure code quality standards:
    • Run golangci-lint run to check for style mistakes
    • Use gofmt -w your/code/path to format your code before submitting
    • CI will verify both golint and go format compliance