mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-07 19:31:51 +08:00

History

test: add complex json expression test (#44211 )

<test>: <add test case for complex json expression

 On branch feature/json-shredding
 Changes to be committed:
       modified:   milvus_client/expressions/README.md
modified:
milvus_client/expressions/test_milvus_client_scalar_filtering.py

---------

Signed-off-by: Eric Hou <eric.hou@zilliz.com>
Co-authored-by: Eric Hou <eric.hou@zilliz.com>

2025-09-11 19:57:58 +08:00

README.md

test: add complex json expression test (#44211 )

2025-09-11 19:57:58 +08:00

test_milvus_client_scalar_filtering.py

test: add complex json expression test (#44211 )

2025-09-11 19:57:58 +08:00

README.md

Expression Filtering Tests

This directory contains comprehensive test modules for Milvus client expression filtering capabilities.

Test Modules

1. `test_milvus_client_scalar_expression_filtering_optimized.py`

Primary test module for comprehensive scalar expression filtering

Features:

Tests all Milvus-supported scalar data types (INT8, INT16, INT32, INT64, BOOL, FLOAT, DOUBLE, VARCHAR, ARRAY, JSON)
Covers all operators: Comparison (==, !=, >, <, >=, <=), Range (IN, LIKE), Arithmetic (+, -, *, /, %, **), Logical (AND, OR, NOT), Null (IS NULL, IS NOT NULL)
Single collection design with multiple index types for efficiency
Index consistency verification (same results for indexed vs non-indexed fields)
Comprehensive error handling and failure debugging
Automatic reproduction script generation
Test complex Json expression (JSON[JSON], JSON[LIST[JSON]], JSON[JSON[LIST]], etc)

Key Design:

One collection containing all data types
Each data type has multiple fields representing different index types
10% of data is NULL to test IS NULL/IS NOT NULL operators
Specific VARCHAR patterns: str_xxx, xxx_str, xxx_str_xxx
Comprehensive LIKE pattern coverage with escape handling
Create examples of typed, dynamic, and shared keys in json
Generate expressions to valida query result

2. `test_milvus_client_scalar_expression_filtering.py`

Legacy comprehensive scalar expression filtering test

Features:

Original comprehensive test implementation
Multiple collection approach
Extensive test coverage for all data types and operators
Detailed validation logic

3. `test_milvus_client_random_expression_generator.py`

Random expression generation for edge case testing

Features:

Generates random complex expressions
Tests edge cases and unusual combinations
Stress testing for expression parsing
Random data generation with various patterns

Data Type Coverage

Supported Scalar Types

Numeric: INT8, INT16, INT32, INT64, FLOAT, DOUBLE
Boolean: BOOL
String: VARCHAR
Array: ARRAY (with all element types)
JSON: JSON (with complex nested structures)

Array Element Types

All scalar types: INT8, INT16, INT32, INT64, BOOL, FLOAT, DOUBLE, VARCHAR

Operator Coverage

Comparison Operators

==, !=, >, <, >=, <=

Range Operators

IN (with array indexing support)
LIKE (with comprehensive pattern coverage)

Arithmetic Operators

+, -, *, /, %, **

Logical Operators

AND, OR, NOT

Null Operators

IS NULL, IS NOT NULL

Array Functions

Array indexing: field[index]

JSON Functions

JSON key access: field['key']

Index Type Support

Scalar Index Types

Data Types	INVERTED	BITMAP	STL_SORT	Trie	NGRAM	AUTOINDEX
INT8, INT16, INT32, INT64	yes	yes	yes	no	no	yes
BOOL	yes	yes	no	no	no	yes
FLOAT, DOUBLE	yes	no	yes	no	no	yes
VARCHAR	yes	yes	no	yes	yes	yes
JSON	yes	no	no	no	yes*	yes
ARRAY (elements: BOOL, INT8, INT16, INT32, INT64, VARCHAR)	yes	yes	no	no	no	yes
ARRAY (elements: FLOAT, DOUBLE)	yes	no	no	no	no	yes

*JSON fields require json_path and json_cast_type: "varchar" parameters for NGRAM index

NGRAM Index Specific Features

The NGRAM index is specialized for efficient text partial matching and fuzzy search on VARCHAR and JSON fields.

Supported Fields:

VARCHAR: Direct text content indexing
JSON: Requires json_path parameter to specify the JSON field path (e.g., field_name['key'])

Index Parameters:

min_gram: Minimum n-gram length (required, positive integer)
max_gram: Maximum n-gram length (required, positive integer, ≥ min_gram)
json_path: JSON field path for JSON fields (e.g., "json_field['body']")
json_cast_type: Must be "varchar" for JSON fields

Performance Characteristics:

Optimized for LIKE queries with % and _ wildcards
Two-phase query execution: n-gram filtering + secondary validation
Query strings shorter than min_gram fall back to full table scan
Supports multilingual text including Chinese, Japanese, and Korean

Example Index Creation:

# VARCHAR field
index_params.add_index(
    field_name="content",
    index_type="NGRAM",
    params={"min_gram": 2, "max_gram": 3}
)

# JSON field
index_params.add_index(
    field_name="json_field",
    index_type="NGRAM",
    params={
        "min_gram": 2,
        "max_gram": 3,
        "json_path": "json_field['body']",
        "json_cast_type": "varchar"
    }
)

Test Features

Error Handling

Parsing error detection and skipping
Graceful handling of unsupported expressions
Detailed error reporting

Debugging Support

Automatic debug info saving on failure
Parquet file export for test data
Reproduction script generation
Schema and configuration preservation

Validation Logic

Ground truth calculation using Python lambdas
Result count and ID verification
Index consistency verification

LIKE Pattern Coverage

Prefix patterns: str%
Suffix patterns: %str
Contains patterns: %str%
Single character wildcard: str_, _str
Combination patterns: str_%, %_str
Escape patterns: str\%, str\_

NGRAM Index Optimization:

LIKE queries on VARCHAR and JSON fields with NGRAM index are automatically optimized
Query performance significantly improves for pattern matching operations
Supports all LIKE patterns with % and _ wildcards
Automatic fallback to full scan when query length < min_gram

Usage

Running Tests

# Run optimized test
pytest test_milvus_client_scalar_expression_filtering_optimized.py

# Run legacy comprehensive test
pytest test_milvus_client_scalar_expression_filtering.py

# Run random expression generator
pytest test_milvus_client_random_expression_generator.py

# Run NGRAM index specific tests
pytest ../../testcases/indexes/test_ngram.py

Debug Information

On test failure, debug information is automatically saved to /tmp/ci_logs/:

Test data as Parquet files
Collection schema and configuration
Failed expressions list
Reproduction script

Reproduction Script

The generated reproduction script can:

Rebuild the entire test environment
Recreate schema, data, and indexes
Re-run failed expressions
Validate results

Design Principles

Comprehensive Coverage: Test all supported data types, operators, and index types (including NGRAM)
Efficiency: Single collection design for optimal performance
Reliability: Robust error handling and debugging
Maintainability: Clear code structure and documentation
Reproducibility: Automatic failure reproduction capabilities
Index Optimization: Validate performance improvements with specialized indexes like NGRAM

README.md

Expression Filtering Tests

Test Modules

1. test_milvus_client_scalar_expression_filtering_optimized.py

2. test_milvus_client_scalar_expression_filtering.py

3. test_milvus_client_random_expression_generator.py

Data Type Coverage

Supported Scalar Types

Array Element Types

Operator Coverage

Comparison Operators

Range Operators

Arithmetic Operators

Logical Operators

Null Operators

Array Functions

JSON Functions

Index Type Support

Scalar Index Types

NGRAM Index Specific Features

Test Features

Error Handling

Debugging Support

Validation Logic

LIKE Pattern Coverage

Usage

Running Tests

Debug Information

Reproduction Script

Design Principles

1. `test_milvus_client_scalar_expression_filtering_optimized.py`

2. `test_milvus_client_scalar_expression_filtering.py`

3. `test_milvus_client_random_expression_generator.py`