8 Commits

Author SHA1 Message Date
Buqian Zheng
724598d231
fix: handle mixed int64/float types in BinaryRangeExpr for JSON fields (#46681)
test: add unit tests for mixed int64/float types in BinaryRangeExpr

When processing binary range expressions (e.g., `x > 499 && x <= 512.0`)
on JSON/dynamic fields with expression templates, the lower and upper
bounds could have different numeric types (int64 vs float64). This
caused an assertion failure in GetValueFromProto when the template type
didn't match the actual proto value type.

Fixes:
1. Go side (fill_expression_value.go): Normalize numeric types for JSON
fields - if either bound is float and the other is int, convert the int
to float.

2. C++ side (BinaryRangeExpr.cpp):
   - Check both lower_val and upper_val types when dispatching
   - Use double template when either bound is float
- Use GetValueWithCastNumber instead of GetValueFromProto to safely
handle int64->double conversion

issue: #46588

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: JSON field binary-range expressions must present
numeric bounds to the evaluator with a consistent numeric type; if
either bound is floating-point, both bounds must be treated as double to
avoid proto-type mismatches during template instantiation.
- Bug fix (issue #46588 & concrete change): mixed int64/float bounds
could dispatch the wrong template (e.g.,
ExecRangeVisitorImplForJson<int64_t>) and trigger assertions in
GetValueFromProto. Fixes: (1) Go parser (FillBinaryRangeExpressionValue
in fill_expression_value.go) normalizes mixed JSON numeric bounds by
promoting the int bound to float; (2) C++ evaluator
(PhyBinaryRangeFilterExpr::Eval in BinaryRangeExpr.cpp) inspects both
lower_type and upper_type, sets use_double when either is float, selects
ExecRangeVisitorImplForJson<double> for mixed numeric cases, and
replaces GetValueFromProto with GetValueWithCastNumber so int64→double
conversions are handled safely.
- Removed / simplified logic: the previous evaluator branched on only
the lower bound's proto type and had separate index/non-index handling
for int64 vs float; that per-bound branching is replaced by unified
numeric handling (convert to double when needed) and a single numeric
path for index use — eliminating redundant, error-prone branches that
assumed homogeneous bound types.
- No data loss or regression: changes only promote int→double for
JSON-range comparisons when the other bound is float; integer-only and
float-only paths remain unchanged. Promotion uses IEEE double (C++
double and Go float64) and only affects template dispatch and
value-extraction paths; GetValueWithCastNumber safely converts int64 to
double and index/non-index code paths both normalize consistently,
preserving semantics for comparisons and avoiding assertion failures.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-12-31 11:52:24 +08:00
Spade A
0114bd1dc9
feat: support match operator family (#46518)
issue: https://github.com/milvus-io/milvus/issues/46517
ref: https://github.com/milvus-io/milvus/issues/42148

This PR supports match operator family with struct array and brute force
search only.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: match operators only target struct-array element-level
predicates and assume callers provide a correct row_start so element
indices form a contiguous range; IArrayOffsets implementations convert
row-level bitmaps/rows (starting at row_start) into element-level
bitmaps or a contiguous element-offset vector used by brute-force
evaluation.

- New capability added: end-to-end support for MATCH_* semantics
(match_any, match_all, match_least, match_most, match_exact) — parser
(grammar + proto), planner (ParseMatchExprs), expr model
(expr::MatchExpr), compilation (Expr→PhyMatchFilterExpr), execution
(PhyMatchFilterExpr::Eval uses element offsets/bitmaps), and unit tests
(MatchExprTest + parser tests). Implementation currently works for
struct-array inputs and uses brute-force element counting via
RowBitsetToElementOffsets/RowBitsetToElementBitset.

- Logic removed or simplified and why: removed the ad-hoc
DocBitsetToElementOffsets helper and consolidated offset/bitset
derivation into IArrayOffsets::RowBitsetToElementOffsets and a
row_start-aware RowBitsetToElementBitset, and removed EvalCtx overloads
that embedded ExprSet (now EvalCtx(exec_ctx, offset_input)). This
centralizes array-layout logic in ArrayOffsets and removes duplicated
offset conversion and EvalCtx variants that were redundant for
element-level evaluation.

- No data loss / no behavior regression: persistent formats are
unchanged (no proto storage or on-disk layout changed); callers were
updated to supply row_start and now route through the centralized
ArrayOffsets APIs which still use the authoritative
row_to_element_start_ mapping, preserving exact element index mappings.
Eval logic changes are limited to in-memory plumbing (how
offsets/bitmaps are produced and how EvalCtx is constructed); expression
evaluation still invokes exprs_->Eval where needed, so existing behavior
and stored data remain intact.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-12-29 11:03:26 +08:00
Spade A
f6f716bcfd
feat: impl StructArray -- support embedding searches embeddings in embedding list with element level filter expression (#45830)
issue: https://github.com/milvus-io/milvus/issues/42148

For a vector field inside a STRUCT, since a STRUCT can only appear as
the element type of an ARRAY field, the vector field in STRUCT is
effectively an array of vectors, i.e. an embedding list.
Milvus already supports searching embedding lists with metrics whose
names start with the prefix MAX_SIM_.

This PR allows Milvus to search embeddings inside an embedding list
using the same metrics as normal embedding fields. Each embedding in the
list is treated as an independent vector and participates in ANN search.

Further, since STRUCT may contain scalar fields that are highly related
to the embedding field, this PR introduces an element-level filter
expression to refine search results.
The grammar of the element-level filter is:

element_filter(structFieldName, $[subFieldName] == 3)

where $[subFieldName] refers to the value of subFieldName in each
element of the STRUCT array structFieldName.

It can be combined with existing filter expressions, for example:

"varcharField == 'aaa' && element_filter(struct_field, $[struct_int] ==
3)"

A full example:
```
struct_schema = milvus_client.create_struct_field_schema()
struct_schema.add_field("struct_str", DataType.VARCHAR, max_length=65535)
struct_schema.add_field("struct_int", DataType.INT32)
struct_schema.add_field("struct_float_vec", DataType.FLOAT_VECTOR, dim=EMBEDDING_DIM)

schema.add_field(
    "struct_field",
    datatype=DataType.ARRAY,
    element_type=DataType.STRUCT,
    struct_schema=struct_schema,
    max_capacity=1000,
)
...

filter = "varcharField == 'aaa' && element_filter(struct_field, $[struct_int] == 3 && $[struct_str] == 'abc')"
res = milvus_client.search(
    COLLECTION_NAME,
    data=query_embeddings,
    limit=10,
    anns_field="struct_field[struct_float_vec]",
    filter=filter,
    output_fields=["struct_field[struct_int]", "varcharField"],
)

```
TODO:
1. When an `element_filter` expression is used, a regular filter
expression must also be present. Remove this restriction.
2. Implement `element_filter` expressions in the `query`.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-12-15 12:01:15 +08:00
SimFG
91d40fa558
fix: Update logging context and upgrade dependencies (#41318)
- issue: #41291

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-04-23 10:52:38 +08:00
congqixia
cb7f2fa6fd
enhance: Use v2 package name for pkg module (#39990)
Related to #39095

https://go.dev/doc/modules/version-numbers

Update pkg version according to golang dep version convention

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-02-22 23:15:58 +08:00
Spade A
0dc21f0aeb
feat: support random sample (#39532)
issue: #39541

This PR implements random sample, the syntax is:
```
filter="random_sample(factor)"
or 
filter="boolean_expression && random_sample(factor)"

where 
factor is a float between (0, 1) and 
boolean_expression is like
 "1 <= number < 10", "color in ["read, "blue"]" or others
```

---------

Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-02-18 12:40:50 +08:00
Zhen Ye
bb8d1ab3bf
enhance: make new go package to manage proto (#39114)
issue: #39095

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-10 10:49:01 +08:00
cai.zhang
2ef6cbbf59
feat: The expression supports filling elements through templates (#37033)
issue: #36672

The expression supports filling elements through templates, which helps
to reduce the overhead of parsing the elements.

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-10-31 14:20:22 +08:00