milvus/internal/parser/planparserv2/rewriter/README.md

## Expression Rewriter (planparserv2/rewriter)

This module performs rule-based logical rewrites on parsed `planpb.Expr` trees right after template value filling and before planning/execution.

### Entry
- `RewriteExpr(*planpb.Expr) *planpb.Expr` (in `entry.go`)
  - Recursively visits the expression tree and applies a set of composable, side-effect-free rewrite rules.
  - Uses global configuration from `paramtable.Get().CommonCfg.EnabledOptimizeExpr`
- `RewriteExprWithConfig(*planpb.Expr, bool) *planpb.Expr` (in `entry.go`)
  - Same as `RewriteExpr` but allows custom configuration for testing or special cases.

### Configuration

The rewriter can be configured via the following parameter (refreshable at runtime):

| Parameter | Default | Description |
|-----------|---------|-------------|
| `common.enabledOptimizeExpr` | `true` | Enable query expression optimization including range simplification, IN/NOT IN merge, TEXT_MATCH merge, and all other optimizations |

**IMPORTANT**: IN/NOT IN value list sorting and deduplication **always** runs regardless of this configuration setting, because the execution engine depends on sorted value lists.

### Implemented Rules

1) IN / NOT IN normalization and merges (`term_in.go`)
- OR-equals to IN (same column):
  - `a == v1 OR a == v2 ...` → `a IN (v1, v2, ...)`
  - Numeric columns only merge when count > threshold (default 150); others when count > 1.
- AND-not-equals to NOT IN (same column):
  - `a != v1 AND a != v2 ...` → `NOT (a IN (v1, v2, ...))`
  - Same thresholds as above.
- IN vs Equal redundancy elimination (same column):
  - AND: `(a ∈ S) AND (a = v)`:
    - if `v ∈ S` → `a = v`
    - if `v ∉ S` → contradiction → constant `false`
  - OR:  `(a ∈ S) OR (a = v)` → `a ∈ (S ∪ {v})` (always union)
- IN with IN union:
  - OR: `(a ∈ S1) OR (a ∈ S2)` → `a ∈ (S1 ∪ S2)` with sorting/dedup
  - AND: `(a ∈ S1) AND (a ∈ S2)` → `a ∈ (S1 ∩ S2)`; empty intersection → constant `false`
- Sort and deduplicate `IN` / `NOT IN` value lists (supported types: bool, int64, float64, string).

2) TEXT_MATCH OR merge (`text_match.go`)
- Merge ORs of `TEXT_MATCH(field, "literal")` on the same column (no options):
  - Concatenate literals with a single space in the order they appear; no tokenization, deduplication, or sorting is performed.
  - Example: `TEXT_MATCH(f, "A C") OR TEXT_MATCH(f, "B D")` → `TEXT_MATCH(f, "A C B D")`
- If any `TEXT_MATCH` in the group has options (e.g., `minimum_should_match`), this optimization is skipped for that group.

3) Range predicate simplification (`range.go`)
- AND tighten (same column):
  - Lower bounds: `a > 10 AND a > 20` → `a > 20` (pick strongest lower)
  - Upper bounds: `a < 50 AND a < 60` → `a < 50` (pick strongest upper)
  - Mixed lower and upper: `a > 10 AND a < 50` → `10 < a < 50` (BinaryRangeExpr)
  - Inclusion respected (>, >=, <, <=). On ties, exclusive is considered stronger than inclusive for tightening.
- OR weaken (same column, same direction):
  - Lower bounds: `a > 10 OR a > 20` → `a > 10` (pick weakest lower)
  - Upper bounds: `a < 10 OR a < 20` → `a < 20` (pick weakest upper)
  - Inclusion respected, preferring inclusive for weakening in ties.
- Mixed-direction OR (lower vs upper) is not merged.
- Equivalent-bound collapses (same column, same value):
  - AND: `a ≥ x AND a > x` → `a > x`; `a ≤ y AND a < y` → `a < y`
  - OR:  `a ≥ x OR a > x` → `a ≥ x`; `a ≤ y OR a < y` → `a ≤ y`
  - Symmetric dedup: `a > 10 AND a ≥ 10` → `a > 10`; `a < 5 OR a ≤ 5` → `a ≤ 5`
- IN ∩ range filtering:
  - AND: `(a ∈ {…}) AND (range)` → keep only values in the set that satisfy the range
    - e.g., `{1,3,5} AND a > 3` → `{5}`
- Supported columns for range optimization:
  - Scalar: Int8/Int16/Int32/Int64, Float/Double, VarChar
  - Array element access: when indexing an element (e.g., `ArrayInt[0]`), the element type above applies
  - JSON/dynamic fields with nested paths (e.g., `JSONField["price"]`, `$meta["age"]`) are range-optimized
    - Type determined from literal value (int, float, string)
    - Numeric types (int and float) are compatible and normalized to Double for merging
    - Different type categories are not merged (e.g., `json["a"] > 10` and `json["a"] > "hello"` remain separate)
    - Bool literals are not optimized (no meaningful ranges)
- Literal compatibility:
  - Integer columns require integer literals (e.g., `Int64Field > 10`)
  - Float/Double columns accept both integer and float literals (e.g., `FloatField > 10` or `> 10.5`)
- Column identity:
  - Merges only happen within the same `ColumnInfo` (including nested path and element index). For example, `ArrayInt[0]` and `ArrayInt[1]` are different columns and are not merged with each other.
- BinaryRangeExpr merging:
  - AND: Merge multiple `BinaryRangeExpr` nodes on the same column to compute intersection (max lower, min upper)
    - `(10 < x < 50) AND (20 < x < 40)` → `(20 < x < 40)`
    - Empty intersection → constant `false`
  - AND with UnaryRangeExpr: Update appropriate bound of `BinaryRangeExpr`
    - `(10 < x < 50) AND (x > 30)` → `(30 < x < 50)`
  - OR: Merge overlapping or adjacent `BinaryRangeExpr` nodes into wider interval
    - `(10 < x < 25) OR (20 < x < 40)` → `(10 < x < 40)` (overlapping)
    - `(10 < x <= 20) OR (20 <= x < 30)` → `(10 < x < 30)` (adjacent with inclusive)
    - Disjoint intervals remain separate: `(10 < x < 20) OR (30 < x < 40)` → remains as OR
  - Inclusivity handling: AND prefers exclusive on equal bounds (stronger), OR prefers inclusive (weaker)

### General Notes
- All merges require operands to target the same column (same `ColumnInfo`, including nested path/element type).
- Rewrite runs after template value filling; template placeholders do not appear here.
- Sorting/dedup for IN/NOT IN is deterministic; duplicates are removed post-sort.
- Numeric-threshold for OR→IN / AND≠→NOT IN is defined in `util.go` (`defaultConvertOrToInNumericLimit`, default 150).

### Pass Ordering (current)
- OR branch:
  1. Flatten
  2. OR `==` → IN
  3. TEXT_MATCH merge (no options)
  4. Range weaken (same-direction bounds)
  5. BinaryRangeExpr merge (overlapping/adjacent intervals)
  6. IN with `!=` short-circuiting
  7. IN ∪ IN union
  8. IN vs Equal redundancy elimination
  9. Fold back to BinaryExpr
- AND branch:
  1. Flatten
  2. Range tighten / interval construction
  3. BinaryRangeExpr merge (intersection, also with UnaryRangeExpr)
  4. IN ∪ IN intersection (if any)
  5. IN with `!=` filtering
  6. IN ∩ range filtering
  7. IN vs Equal redundancy elimination
  8. AND `!=` → NOT IN
  9. Fold back to BinaryExpr

Each construction of IN will be normalized (sorted and deduplicated). TEXT_MATCH OR merge concatenates literals with a single space; no tokenization, deduplication, or sorting is performed.

### File Structure
- `entry.go`      — rewrite entry and visitor orchestration
- `util.go`       — shared helpers (column keying, value classification, sorting/dedup, constructors)
- `term_in.go`    — IN/NOT IN normalization and conversions
- `text_match.go` — TEXT_MATCH OR merge (no options)
- `range.go`      — range tightening/weakening and interval construction

### Future Extensions
- More IN-range algebra (e.g., `IN` vs exact equality propagation across subtrees).
- Merging phrase_match or other string ops with clearly-defined token rules.
- More algebraic simplifications around equality and null checks:
  - Contradiction detection: `(a == 1) AND (a == 2)` → `false`; `(a > 10) AND (a == 5)` → `false`
  - Tautology detection: `(a > 10) OR (a <= 10)` → `true` (for non-NULL values)
  - Absorption laws: `(a > 10) OR ((a > 10) AND (b > 20))` → `a > 10`
- Advanced BinaryRangeExpr merging:
  - OR with 3+ intervals: Currently limited to 2 intervals. Full interval merging algorithm needed for `(10 < x < 20) OR (15 < x < 25) OR (22 < x < 30)` → `(10 < x < 30)`.
  - OR with unbounded + bounded: Currently skipped. Could optimize `(x > 10) OR (5 < x < 15)` → `x > 5`.