mirror of
https://gitee.com/milvus-io/milvus.git
synced 2025-12-06 17:18:35 +08:00
### Is there an existing issue for this? - [x] I have searched the existing issues --- Please see: https://github.com/milvus-io/milvus/issues/44593 for the background This PR makes https://github.com/milvus-io/milvus/pull/44638 redundant, which can be closed. The PR comments for the original implementation suggested an alternative and a better approach, this new PR has that implementation. --- This PR - Adds an optional `minimum_should_match` argument to `text_match(...)` and wires it through the parser, planner/visitor, index bindings, and client-level tests/examples so full-text queries can require a minimum number of tokens to match. Motivation - Provide a way to require an expression to match a minimum number of tokens in lexical search. What changed - Parser / grammar - Added grammar rule and token: `MINIMUM_SHOULD_MATCH` and `textMatchOption` in `internal/parser/planparserv2/Plan.g4`. - Regenerated parser outputs: `internal/parser/planparserv2/generated/*` (parser, lexer, visitor, etc.) to support the new rule. - Planner / visitor - `parser_visitor.go`: parse and validate the `minimum_should_match` integer; propagate as an extra value on the `TextMatch` expression so downstream components receive it. - Added `VisitTextMatchOption` visitor method handling. - Client (Golang) - Added a unit test to verify `text_match(..., minimum_should_match=...)` appears in the generated DSL and is accepted by client code: `client/milvusclient/read_test.go` (new test coverage). - Added an integration-style test for the feature to the go-client testcase suite: `tests/go_client/testcases/full_text_search_test.go` (exercise min=1, min=3, large min). - Added an example demonstrating `text_match` usage: `client/milvusclient/read_example_test.go` (example name conforms to godoc mapping). - Engine / index - Updated C++ index interface: `TextMatchIndex::MatchQuery` - Added/updated unit tests for the index behavior: `internal/core/src/index/TextMatchIndexTest.cpp`. - Tantivy binding - Added `match_query_with_minimum` implementation and unit tests to `internal/core/thirdparty/tantivy/tantivy-binding/src/index_reader_text.rs` that construct boolean queries with minimum required clauses. Behavioral / compatibility notes - This adds an optional argument to `text_match` only; default behavior (no `minimum_should_match`) is unchanged. - Internal API change: `TextMatchIndex::MatchQuery` signature changed (internal component). Callers in the repo were updated accordingly. - Parser changes required regenerating ANTLR outputs Tests and verification - New/updated tests: - Go client unit test: `client/milvusclient/read_test.go` (mocked Search request asserts DSL contains `minimum_should_match=2`). - Go e2e-style test: `tests/go_client/testcases/full_text_search_test.go` (exercises min=1, 3 and a large min). - C++ unit tests for index behavior: `internal/core/src/index/TextMatchIndexTest.cpp`. - Rust binding unit tests for `match_query_with_minimum`. - Local verification commands to run: - Go client tests: `cd client && go test ./milvusclient -run ^$` (client package) - Go testcases: `cd tests/go_client && go test ./testcases -run TestTextMatchMinimumShouldMatch` (requires a running Milvus instance) - C++ unit tests / build: run core build/test per repo instructions (the change touches core index code). - Rust binding tests: `cd internal/core/thirdparty/tantivy/tantivy-binding && cargo test` (if developing locally). --------- Signed-off-by: Amit Kumar <amit.kumar@reddit.com> Co-authored-by: Amit Kumar <amit.kumar@reddit.com>
Milvus Go Client Test Framework
Overview
This is a comprehensive test framework for the Milvus Go Client, designed to validate various functionalities of the Milvus vector database client. The framework provides a structured approach to writing tests with reusable components and helper functions.
Framework Architecture
Directory Structure
/go_client/
├── testcases/ # Main test cases
│ ├── helper/ # Helper functions and utilities
│ │ ├── helper.go
│ │ ├── data_helper.go
│ │ └── collection_helper.go
│ ├── search_test.go # Search functionality tests
│ ├── index_test.go # Index management tests
│ └── ...
├── common/ # Common utilities and constants
└── base/ # Base infrastructure code
Key Components
- Collection Preparation: Utilities for creating and managing collections
- Data Generation: Tools for generating test data
- Helper Functions: Common operations and validations
- Test Cases: Organized by functionality
Writing Test Cases
Basic Test Structure
func TestYourFeature(t *testing.T) {
// 1. Setup context and client
ctx := hp.CreateContext(t, time.Second*common.DefaultTimeout)
mc := createDefaultMilvusClient(ctx, t)
// 2. Prepare collection
prepare, schema := hp.CollPrepare.CreateCollection(
ctx, t, mc,
hp.NewCreateCollectionParams(hp.Int64Vec),
hp.TNewFieldsOption(),
hp.TNewSchemaOption(),
)
// 3. Insert test data
prepare.InsertData(ctx, t, mc,
hp.NewInsertParams(schema),
hp.TNewDataOption(),
)
// 4. Execute test operations
// ... your test logic here ...
// 5. Validate results
require.NoError(t, err)
require.Equal(t, expected, actual)
}
Using Custom Parameters
- Collection Creation Parameters
fieldsOption := hp.TNewFieldsOption().
TWithEnableAnalyzer(true).
TWithAnalyzerParams(map[string]any{
"tokenizer": "standard",
})
schemaOption := hp.TNewSchemaOption().
TWithEnableDynamicField(true).
TWithDescription("Custom schema").
TWithAutoID(false)
- Data Insertion Options
insertOption := hp.TNewDataOption().
TWithNb(1000). // Number of records
TWithDim(128). // Vector dimension
TWithStart(100). // Starting ID
TWithMaxLen(256). // Maximum length
TWithTextLang("en") // Text language
- Index Parameters
indexParams := hp.TNewIndexParams(schema).
TWithFieldIndex(map[string]index.Index{
common.DefaultVectorFieldName: index.NewIVFSQIndex(
&index.IVFSQConfig{
MetricType: entity.L2,
NList: 128,
},
),
})
- Search Parameters
searchOpt := client.NewSearchOption(schema.CollectionName, 100, vectors).
WithOffset(0).
WithLimit(100).
WithConsistencyLevel(entity.ClStrong).
WithFilter("int64 >= 100").
WithOutputFields([]string{"*"}).
WithSearchParams(map[string]any{
"nprobe": 16,
"ef": 64,
})
Adding New Parameters
- Define New Option Type
// In helper/data_helper.go
type YourNewOption struct {
newParam1 string
newParam2 int
}
- Add Constructor
func TNewYourOption() *YourNewOption {
return &YourNewOption{
newParam1: "default",
newParam2: 0,
}
}
- Add Parameter Methods
func (opt *YourNewOption) TWithNewParam1(value string) *YourNewOption {
opt.newParam1 = value
return opt
}
func (opt *YourNewOption) TWithNewParam2(value int) *YourNewOption {
opt.newParam2 = value
return opt
}
Best Practices
-
Test Organization
- Group related tests in the same file
- Use clear and descriptive test names
- Add comments explaining test purpose
-
Data Generation
- Use helper functions for generating test data
- Ensure data is appropriate for the test case
- Clean up test data after use
-
Error Handling
- Use
common.CheckErrfor consistent error checking - Test both success and failure scenarios
- Validate error messages when appropriate
- Use
-
Performance Considerations
- Use appropriate timeouts
- Clean up resources after tests
- Consider test execution time
Running Tests
# Run all tests
go test ./testcases/...
# Run specific test
go test -run TestYourFeature ./testcases/
# Run with verbose output
go test -v ./testcases/...
# gotestsum
Recommend you to use gotestsum https://github.com/gotestyourself/gotestsum
# Run all default cases
gotestsum --format testname --hide-summary=output -v ./testcases/... --addr=127.0.0.1:19530 -timeout=30m
# Run a specified file
gotestsum --format testname --hide-summary=output ./testcases/collection_test.go ./testcases/main_test.go --addr=127.0.0.1:19530
# Run L3 rg cases
gotestsum --format testname --hide-summary=output -v ./testcases/advcases/... --addr=127.0.0.1:19530 -timeout=30m -tags=rg
# Run advanced rg cases and default cases
# rg cases conflicts with default cases, so -p=1 is required
gotestsum --format testname --hide-summary=output -v ./testcases/... --addr=127.0.0.1:19530 -timeout=30m -tags=rg -p 1
Contributing
- Follow the existing code structure
- Add comprehensive test cases
- Document new parameters and options
- Update this README for significant changes
- Ensure code quality standards:
- Run
golangci-lint runto check for style mistakes - Use
gofmt -w your/code/pathto format your code before submitting - CI will verify both golint and go format compliance
- Run