related: #36380
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: aggregation is centralized and schema-aware — all
aggregate functions are created via the exec Aggregate registry
(milvus::exec::Aggregate) and validated by ValidateAggFieldType, use a
single in-memory accumulator layout (Accumulator/RowContainer) and
grouping primitives (GroupingSet, HashTable, VectorHasher), ensuring
consistent typing, null semantics and offsets across planner → exec →
reducer conversion paths (toAggregateInfo, Aggregate::create,
GroupingSet, AggResult converters).
- Removed / simplified logic: removed ad‑hoc count/group-by and reducer
code (CountNode/PhyCountNode, GroupByNode/PhyGroupByNode, cntReducer and
its tests) and consolidated into a unified AggregationNode →
PhyAggregationNode + GroupingSet + HashTable execution path and
centralized reducers (MilvusAggReducer, InternalAggReducer,
SegcoreAggReducer). AVG now implemented compositionally (SUM + COUNT)
rather than a bespoke operator, eliminating duplicate implementations.
- Why this does NOT cause data loss or regressions: existing data-access
and serialization paths are preserved and explicitly validated —
bulk_subscript / bulk_script_field_data and FieldData creation are used
for output materialization; converters (InternalResult2AggResult ↔
AggResult2internalResult, SegcoreResults2AggResult ↔
AggResult2segcoreResult) enforce shape/type/row-count validation; proxy
and plan-level checks (MatchAggregationExpression,
translateOutputFields, ValidateAggFieldType, translateGroupByFieldIds)
reject unsupported inputs (ARRAY/JSON, unsupported datatypes) early.
Empty-result generation and explicit error returns guard against silent
corruption.
- New capability and scope: end-to-end GROUP BY and aggregation support
added across the stack — proto (plan.proto, RetrieveRequest fields
group_by_field_ids/aggregates), planner nodes (AggregationNode,
ProjectNode, SearchGroupByNode), exec operators (PhyAggregationNode,
PhyProjectNode) and aggregation core (Aggregate implementations:
Sum/Count/Min/Max, SimpleNumericAggregate, RowContainer, GroupingSet,
HashTable) plus proxy/querynode reducers and tests — enabling grouped
and global aggregation (sum, count, min, max, avg via sum+count) with
schema-aware validation and reduction.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
/kind improvement
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: index parameter validation and test expectations for
the HNSW-family must be explicit, consistent, and deterministic — this
PR enforces that by adding exhaustive parameter matrices for HNSW_PRQ
(tests/python_client/testcases/indexes/{idx_hnsw_prq.py,
test_hnsw_prq.py}) and normalizing expectations in idx_hnsw_pq.py via a
shared success variable.
- Logic removed / simplified: brittle, ad-hoc string expectations were
consolidated — literal "success" occurrences were replaced with a single
success variable and ambiguous short error messages were replaced by the
canonical descriptive error text; this reduces duplicated assertion
logic in tests and removes dependence on fragile, truncated messages.
- Bug fix (tests): corrected HNSW_PQ test expectations to assert the
full, authoritative error for invalid PQ m ("The dimension of the vector
(dim) should be a multiple of the number of subquantizers (m).") and
aligned HNSW_PRQ test matrices (idx_hnsw_prq.py) to the same explicit
expectations — the change targets test assertions only and fixes false
negatives caused by mismatched messages.
- No data loss or behavior regression: only test code is added/modified
(tests/python_client/testcases/indexes/*). Production code paths remain
unmodified — collection creation, insert/flush, client.create_index,
wait_for_index_ready, load_collection, search, and client.describe_index
are invoked by tests but not changed; therefore persisted data, index
artifacts, and runtime behavior are unaffected.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: zilliz <jiaming.li@zilliz.com>
related issue: #46616
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: these tests assume the v2 group-by search
implementation (TestMilvusClientV2Base and pymilvus v2 APIs such as
AnnSearchRequest/WeightedRanker) is functionally correct; the PR extends
coverage to validate group-by semantics when using JSON fields and
dynamic fields (see
tests/python_client/milvus_client_v2/test_milvus_client_search_group_by.py
— TestGroupSearch.setup_class and parametrized group_by_field cases).
- Logic removed/simplified: legacy v1 test scaffolding and duplicated
parametrized fixtures/test permutations were consolidated into
v2-focused suites (TestGroupSearch now inherits TestMilvusClientV2Base;
old TestGroupSearch/TestcaseBase patterns and large blocks in
test_mix_scenes were removed) to avoid redundant fixture permutations
and duplicate assertions while reusing shared helpers in common_func
(e.g., gen_scalar_field, gen_row_data_by_schema) and common_type
constants.
- Why this does NOT introduce data loss or behavior regression: only
test code, test helpers, and test imports were changed — no
production/server code altered. Test helper changes are
backward-compatible (gen_scalar_field forces primary key nullable=False
and only affects test data generation paths in
tests/python_client/common/common_func.py; get_field_dtype_by_field_name
now accepts schema dicts/ORM schemas and is used only by tests to choose
vector generation) and collection creation/insertion in tests use the
same CollectionSchema/FieldSchema paths, so production
storage/serialization logic is untouched.
- New capability (test addition): adds v2 test coverage for group-by
search over JSON and dynamic fields plus related scenarios — pagination,
strict/non-strict group_size, min/max group constraints, multi-field
group-bys and binary vector cases — implemented in
tests/python_client/milvus_client_v2/test_milvus_client_search_group_by.py
to address issue #46616.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: yanliang567 <yanliang.qiao@zilliz.com>
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Pull Request Summary: Test Case Updates for API Behavior Changes
**Core Invariant**: These test case updates reflect backend API
improvements to error messaging and schema information returned by
collection operations. The changes maintain backward compatibility—no
public signatures change, and all modifications are test expectation
updates.
**Updated Error Messages for Better Diagnostics**:
- `test_add_field_feature.py`: Updated expected error when adding a
vector field without dimension specification from a generic "not support
to add vector field" to the more descriptive "vector field must have
dimension specified, field name = {field_name}: invalid parameter". This
change is non-breaking for clients that only check error codes; it
improves developer experience with clearer error context.
**Schema Information Extension**:
- `test_milvus_client_collection.py`: Added `enable_namespace: False` to
the expected `describe_collection()` output. This is a new boolean field
in the collection metadata that defaults to False, representing an
opt-in feature. Existing code querying describe_collection continues to
work; the new field is simply an additional property in the response
dictionary.
**Dynamic Error Message Construction**:
- `test_milvus_client_search_invalid.py`: Replaced hardcoded error
message with conditional logic that generates the appropriate error
based on input state (None vectors vs invalid vector data). This
prevents test brittle failure if multiple error conditions exist, and
correctly validates the API's behavior handles both "missing data" and
"malformed data" cases distinctly.
**No Regression Risk**: All changes update test expectations to match
improved backend behavior. The changes are additive (new field in
schema) or clarifying (better error messages), with no modifications to
existing response structures or behavior for valid inputs.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: nico <cheng.yuan@zilliz.com>
Issue: #46627
add one more test case to cover duplicate pk partial update
On branch feature/partial-update
Changes to be committed:
modified: milvus_client/test_milvus_client_partial_update.py
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: upserts with partial_update=True consolidate records
by primary key (PK) rather than creating duplicate rows; this test
verifies the partial-update upsert path preserves PK identity and merge
semantics.
- Change: adds test
test_milvus_client_partial_update_duplicate_pk_partial_update which
inserts duplicate-PK batches, then calls client.upsert(...,
partial_update=True) on a subset of fields and asserts final row count
equals default_nb, exercising the partial-update code path (upsert →
partial update handling → query) not previously covered.
- No production logic removed/simplified: this PR only adds test
coverage (no code paths removed or altered); nothing in production code
is changed or simplified by the PR.
- No data loss or regression introduced: the test validates concrete
code paths — upsert with partial_update True followed by
query(out_fields/with_vec, pk checks) — and asserts deduplication
(2×default_nb → default_nb). Because the PR only adds assertions against
existing behavior and does not modify runtime logic, it cannot cause
data loss or behavioral regressions.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: Eric Hou <eric.hou@zilliz.com>
Co-authored-by: Eric Hou <eric.hou@zilliz.com>
Issue: #46424
test:add_collection_field(invalid_default_value)
hybrid_search(NOT supported_
simplify some test cases using one single collection to save time.
query with different time shift and timezone settings
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: TIMESTAMPTZ values are treated as absolute instants
(timezone-preserving). Tests assume conversions between stored instants
and display timezones/time-shifts are deterministic and reversible; the
PR validates queries/reads across different timezone and time-shift
settings against that invariant.
- Removed/simplified logic: duplicated per-test create/insert/teardown
flows and several isolated timestamptz unit cases (edge_case, Feb_29,
partial_update, standalone query) were consolidated into a module-scoped
fixture that creates a single COLLECTION_NAME, inserts ROWS, and handles
teardown. This removes redundant setup/teardown code and repeated
scaffolding while preserving the same API exercise points
(create_collection, insert, query, alter_collection_properties,
alter_database_properties, describe_collection, describe_database).
- No data loss or behavior regression: only test code was reorganized
and new assertions exercise the same production APIs and code paths used
previously (create_collection → insert → query / alter_properties →
describe). The fixture inserts the same ROWS and tests still
convert/compare timestamptz values via cf.convert_timestamptz and query
check routines; the new invalid-default-value test only asserts error
handling when adding a TIMESTAMPTZ field with an invalid default and
does not mutate persisted data or change production logic.
- PR type (Enhancement/Test): expands and reorganizes E2E test coverage
for TIMESTAMPTZ—centralizes collection setup to reduce runtime and
flakiness, adds explicit coverage for invalid-default-value behavior,
and increases timezone/time-shift query scenarios without altering
product behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Eric Hou <eric.hou@zilliz.com>
Co-authored-by: Eric Hou <eric.hou@zilliz.com>
/kind improvement
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: tests' persistence of EventRecords and RequestRecords
must be append-safe under concurrent writers; this PR replaces Parquet
with JSONL and uses per-file locks and explicit buffer flushes to
guarantee atomic, append-safe writes (EventRecords uses event_lock +
append per line; RequestRecords buffers under request_lock and flushes
to file when threshold or on sink()).
- Logic removed/simplified and rationale: DataFrame-based parquet
append/read logic (pyarrow/fastparquet) and implicit parquet buffering
were removed in favor of simple line-oriented JSON writes and explicit
buffer management. The complex Parquet append/merge paths were redundant
because parquet append under concurrent test-writer patterns caused
corruption; JSONL removes the append-mode complexity and the
parquet-specific buffering/serialization code.
- Why no data loss or behavior regression (concrete code paths):
EventRecords.insert writes a complete JSON object per event under
event_lock to /tmp/ci_logs/event_records_*.jsonl and get_records_df
reads every JSON line under the same lock (or returns an empty DataFrame
with the same schema on FileNotFound/Error), preserving all fields
event_name/event_status/event_ts. RequestRecords.insert appends to an
in-memory buffer under request_lock and triggers _flush_buffer() when
len(buffer) >= 100; _flush_buffer() writes each buffered JSON line to
/tmp/ci_logs/request_records_*.jsonl and clears the buffer; sink() calls
_flush_buffer() under request_lock before get_records_df() reads the
file — ensuring all buffered records are persisted before reads. Both
read paths handle FileNotFoundError and exceptions by returning empty
DataFrames with identical column schemas, so external callers see the
same API and no silent record loss.
- Enhancement summary (concrete): Replaces flaky Parquet append/read
with JSONL + explicit locking and deterministic flush semantics,
removing the root cause of parquet append corruption in tests while
keeping the original DataFrame-based analysis consumers unchanged
(get_records_df returns equivalent schemas).
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>
/kind improvement
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Tests**
* Updated test suite dependencies to pymilvus 2.7.0rc91.
* Enhanced text highlighting validation in test checkers.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>
- Updates e2e test error code and message after pymilvus update
see #46677
Signed-off-by: silas.jiang <silas.jiang@zilliz.com>
Co-authored-by: silas.jiang <silas.jiang@zilliz.com>
/kind improvement
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: test infrastructure treats insertion granularity as
orthogonal to data semantics—bulk generation
gen_row_data_by_schema(nb=2000, start=0, random_pk=False) yields the
same sequential PKs and vector payloads as prior multi-batch inserts, so
tests relying on collection lifecycle, flush, index build, load and
search behave identically.
- What changed / simplified: added a full HNSW_PQ parameterized test
suite (tests/python_client/testcases/indexes/idx_hnsw_pq.py and
test_hnsw_pq.py) and simplified HNSW_SQ test insertion by replacing
looped per-batch generation+insert with a single bulk
gen_row_data_by_schema(...) + insert. The per-batch PK sequencing and
repeated vector generation were redundant for correctness and were
removed to reduce complexity.
- Why this does NOT cause data loss or behavior regression: the
post-insert code paths remain unchanged—tests still call client.flush(),
create_index(...), util.wait_for_index_ready(), collection.load(), and
perform searches that assert describe_index and search outputs. Because
start=0 and random_pk=False reproduce identical sequential PKs (0..1999)
and the same vectors, index creation and search validation operate on
identical data and index parameters, preserving previous assertions and
outcomes.
- New capability: comprehensive HNSW_PQ coverage (build params: M,
efConstruction, m, nbits, refine, refine_type; search params: ef,
refine_k) across vector types (FLOAT_VECTOR, FLOAT16_VECTOR,
BFLOAT16_VECTOR, INT8_VECTOR) and metrics (L2, IP, COSINE), implemented
as data-driven tests to validate success and failure/error messages for
boundary, type-mismatch and inter-parameter constraints.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: zilliz <jiaming.li@zilliz.com>
### **User description**
issue: #46367
___
### **PR Type**
Bug fix, Tests
___
### **Description**
- Fix unstable test case by adjusting float precision
- Change listMix float value from 1.1 to 1.111
- Improves test stability for json_contains_any query
___
### Diagram Walkthrough
```mermaid
flowchart LR
A["Test Data Generation"] -- "Adjust float precision" --> B["listMix field value"]
B -- "1.1 to 1.111" --> C["Improved test stability"]
```
<details><summary><h3>File Walkthrough</h3></summary>
<table><thead><tr><th></th><th align="left">Relevant
files</th></tr></thead><tbody><tr><td><strong>Bug
fix</strong></td><td><table>
<tr>
<td>
<details>
<summary><strong>test_milvus_client_query.py</strong><dd><code>Adjust
float precision in test data</code>
</dd></summary>
<hr>
tests/python_client/milvus_client/test_milvus_client_query.py
<ul><li>Modified test data generation in
<br><code>test_milvus_client_query_expr_all_datatype_json_contains_all</code>
method<br> <li> Changed <code>listMix</code> field float value from
<code>1.1</code> to <code>1.111</code> for improved <br>precision<br>
<li> Addresses test instability issue by adjusting floating-point test
data</ul>
</details>
</td>
<td><a
href="https://github.com/milvus-io/milvus/pull/46506/files#diff-d6fe357e4678415bc62596b802571043fa571c7d1b8e841aa43124437dd2f739">+1/-1</a>
</td>
</tr>
</table></td></tr></tbody></table>
</details>
___
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: the test assumes stable float equality/containment
behavior for JSON-typed fields when generating test rows; small changes
in stored float precision can make json_contains_any assertions flaky.
- Exact fix for the bug (refs #46367): in
tests/python_client/milvus_client/test_milvus_client_query.py, the test
data value for the second element of the "listMix" JSON field was
adjusted from i * 1.1 to i * 1.111 in
test_milvus_client_query_expr_all_datatype_json_contains_all to increase
numeric precision and remove instability in json_contains_any
assertions.
- Logic removed/simplified: no production logic was changed or removed —
only a one-line test-data change. There is no control-flow or
algorithmic simplification because the test’s intent and checks remain
identical; the change removes the redundant dependence on a borderline
float value that caused flakiness.
- No data loss or behavior regression: this change only updates
test-generated input
(test_milvus_client_query_expr_all_datatype_json_contains_all) and does
not touch any library or runtime code paths. Production code paths
(query parsing/execution, JSON handling) are unchanged, so no persisted
data, API behavior, or client logic is affected.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: nico <cheng.yuan@zilliz.com>
When computing load diff, binlogs in v1/legacy format have empty
child_fields. In this case, the field_id itself should be used as the
child_id (group_id == field_id for legacy format).
Without this fix, legacy format binlogs are not recognized during diff
computation, causing segments to fail loading and TestProxy to timeout.
Changes:
- Add fallback to use fieldid as child_id when child_fields is empty
- Add LoadDiff::ToString() for debugging
- Add logging for diff in Load/Reopen operations
- Add comprehensive unit tests for legacy format handling
Related to #46594
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: load-diff computation must enumerate every binlog
child group for a field so current vs new segment state comparisons
include all column-group/binlog groups; for legacy (v1) binlogs that
have empty child_fields, the code must treat group_id == field_id to
preserve that mapping.
- Bug fix (resolves#46594): SegmentLoadInfo now normalizes
field_binlog.child_fields() into a vector and falls back to using
field_id as the single child group when child_fields is empty; the same
normalization is applied for both current and new-info paths, ensuring
legacy v1 binlogs are discovered and included in Load/ComputeDiff
results so segments load correctly.
- Logic simplified: removed the implicit assumption that child_fields is
always present by centralizing a single normalization/fallback step used
symmetrically for both diff paths, avoiding ad-hoc special-casing and
unifying iteration over child groups.
- No data loss / no behavior regression: the fallback only activates
when child_fields is empty — non-legacy binlogs continue to use their
child_fields unchanged. Add/drop semantics are preserved because the
same normalization is applied to both sides of the diff. Unit tests
(v1-only, v4-only, mixed cases) were added to validate correctness;
LoadDiff::ToString() and extra logging are diagnostic only.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Co-authored-by: Cai Zhang <cai.zhang@zilliz.com>
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
relate: https://github.com/milvus-io/milvus/issues/46571
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: the LexicalHighlighter API now expects the match
queries under the parameter name highlight_query (not queries); all call
sites must pass highlight_query to supply match data. This PR assumes
the underlying highlighter behavior and processing of those query values
are unchanged.
- Logic simplified/removed: removed the legacy keyword queries in tests
and updated calls to use highlight_query
(tests/python_client/milvus_client/test_milvus_client_highlighter.py).
This eliminates a redundant/incorrect keyword alias and aligns tests
with the consolidated LexicalHighlighter constructor parameter name.
- Why this does NOT introduce data loss or behavior regression: the
change is a parameter-name rename only — no parsing, matching, or
storage logic was modified. Tests now construct LexicalHighlighter with
pre_tags/post_tags/highlight_search_text/fragment_* and pass the query
list under highlight_query; the highlighter execution path
(client.search → highlighter processing → result['highlight']) is
untouched, so existing highlight outputs and stored data remain
unchanged.
- Other changes: bumped pymilvus test dependency to 2.7.0rc93 in
tests/python_client/requirements.txt to match the updated constructor
signature; scope of change is limited to tests and dependency pinning
(no production code changes).
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
related: #45993
This commit extends nullable vector support to the proxy layer,
querynode,
and adds comprehensive validation, search reduce, and field data
handling
for nullable vectors with sparse storage.
Proxy layer changes:
- Update validate_util.go checkAligned() with getExpectedVectorRows()
helper
to validate nullable vector field alignment using valid data count
- Update checkFloatVectorFieldData/checkSparseFloatVectorFieldData for
nullable vector validation with proper row count expectations
- Add FieldDataIdxComputer in typeutil/schema.go for logical-to-physical
index translation during search reduce operations
- Update search_reduce_util.go reduceSearchResultData to use
idxComputers
for correct field data indexing with nullable vectors
- Update task.go, task_query.go, task_upsert.go for nullable vector
handling
- Update msg_pack.go with nullable vector field data processing
QueryNode layer changes:
- Update segments/result.go for nullable vector result handling
- Update segments/search_reduce.go with nullable vector offset
translation
Storage and index changes:
- Update data_codec.go and utils.go for nullable vector serialization
- Update indexcgowrapper/dataset.go and index.go for nullable vector
indexing
Utility changes:
- Add FieldDataIdxComputer struct with Compute() method for efficient
logical-to-physical index mapping across multiple field data
- Update EstimateEntitySize() and AppendFieldData() with fieldIdxs
parameter
- Update funcutil.go with nullable vector support functions
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Full support for nullable vector fields (float, binary, float16,
bfloat16, int8, sparse) across ingest, storage, indexing, search and
retrieval; logical↔physical offset mapping preserves row semantics.
* Client: compaction control and compaction-state APIs.
* **Bug Fixes**
* Improved validation for adding vector fields (nullable + dimension
checks) and corrected search/query behavior for nullable vectors.
* **Chores**
* Persisted validity maps with indexes and on-disk formats.
* **Tests**
* Extensive new and updated end-to-end nullable-vector tests.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
### **User description**
Issue: #46504
test: create e2e test case for highlighter
On branch feature/highlighter
Changes to be committed:
new file: milvus_client/test_milvus_client_highlighter.py
___
### **PR Type**
Tests
___
### **Description**
- Add comprehensive e2e test suite for LexicalHighlighter functionality
- Test highlighter initialization with collection setup and data
insertion
- Validate highlighter with various parameters (tags, fragments,
offsets)
- Test edge cases including Chinese characters, long text, and invalid
inputs
- Verify error handling for invalid fragment sizes, offsets, and
configurations
___
### Diagram Walkthrough
```mermaid
flowchart LR
A["Test Suite Setup"] --> B["Highlighter Init Tests"]
B --> C["Valid Test Cases"]
C --> D["Fragment Parameters"]
C --> E["Search Variations"]
C --> F["Language Support"]
B --> G["Invalid Test Cases"]
G --> H["Parameter Validation"]
G --> I["Error Handling"]
```
<details><summary><h3>File Walkthrough</h3></summary>
<table><thead><tr><th></th><th align="left">Relevant
files</th></tr></thead><tbody><tr><td><strong>Tests</strong></td><td><table>
<tr>
<td>
<details>
<summary><strong>test_milvus_client_highlighter.py</strong><dd><code>Add
comprehensive LexicalHighlighter e2e test suite</code>
</dd></summary>
<hr>
tests/python_client/milvus_client/test_milvus_client_highlighter.py
<ul><li>Create new test file with 1163 lines of comprehensive
highlighter test <br>cases<br> <li> Implement
<code>TestMilvusClientHighlighterInit</code> class to initialize
<br>collection with pre-defined test data including English, Chinese,
and <br>long text samples<br> <li> Implement
<code>TestMilvusClientHighlighterValid</code> class with 15+ test
methods <br>covering basic usage, multiple tags, fragment parameters,
offsets, <br>numbers, sentences, and language support<br> <li> Implement
<code>TestMilvusClientHighlighterInvalid</code> class with 8+ test
<br>methods validating error handling for invalid parameters and
<br>configurations<br> <li> Test highlighter with BM25 search, text
matching, and various analyzer <br>configurations</ul>
</details>
</td>
<td><a
href="https://github.com/milvus-io/milvus/pull/46505/files#diff-443e3fefb65fbdb088d5920083306ecfe3605745b1e2714198c6566ca67b3736">+1163/-0</a></td>
</tr>
</table></td></tr></tbody></table>
</details>
___
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Tests**
* Added a comprehensive highlighter test suite covering:
- Core highlighting with single and multi-analyzer setups and multi-tag
variations
- Fragment parameter behaviors and edge cases (size, offset, count)
- Text-match and query-based highlighting, including BM25 and vector
interactions
- Sub-word, long-text/tag, case sensitivity, Chinese/multi-language
scenarios
- Error handling for invalid parameters, no-match cases, and other edge
conditions
- Module-scoped fixture preparing multilingual, long-form test data and
teardown
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Eric Hou <eric.hou@zilliz.com>
Co-authored-by: Eric Hou <eric.hou@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/45525
see added README.md for added optimizations
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added query expression optimization feature with a new `optimizeExpr`
configuration flag to enable automatic simplification of filter
predicates, including range predicate optimization, merging of IN/NOT IN
conditions, and flattening of nested logical operators.
* **Bug Fixes**
* Adjusted delete operation behavior to correctly handle expression
evaluation.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
issue: #46349
When using brute-force search, the iterator results from multiple chunks
are merged; at that point, we need to pay attention to how the metric
affects result ranking.
Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
Issue: #46333
test: re-write convert timestamp logic to cover daylight saving time
Signed-off-by: Eric Hou <eric.hou@zilliz.com>
Co-authored-by: Eric Hou <eric.hou@zilliz.com>
/kind improvement
- Add normalize_error_message function to extract and normalize error
text
- Collect unique error messages during chaos test execution
- Display error details in assertion messages for better debugging
Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>
issue: #45511
our tantivy inverted index currently does not include item index if the
value is an array, thus we can't do `a[0] == 'b'` type of look up in the
inverted index. for such, we need to skip the index and use brute force
search.
we may improve our index in the future, so this is a temp solution
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
Issue: #46188
Bug was caused by inconsistent version of tzdata as well as wrong month
assignment in convert_timestamptz function.
Also fix when debug_mode=True the compare function can correctly return
True or False.
---------
Signed-off-by: Eric Hou <eric.hou@zilliz.com>
Co-authored-by: Eric Hou <eric.hou@zilliz.com>
issue: #44320
Replace the DeduplicateFieldData function with CheckDuplicatePkExist
that returns an error when duplicate primary keys are detected in the
same batch, instead of silently deduplicating.
Changes:
- Replace DeduplicateFieldData with CheckDuplicatePkExist in util.go
- Update upsertTask.PreExecute to return error on duplicate PKs
- Simplify helper function from findLastOccurrenceIndices to
hasDuplicates
- Update unit tests to verify the new error behavior
- Add Python integration tests for duplicate PK error cases
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Issue #46015
<test>: <add timestamptz to more bulk writer>
On branch feature/timestamps
Changes to be committed:
modified: testcases/test_bulk_insert.py
Signed-off-by: Eric Hou <eric.hou@zilliz.com>
Co-authored-by: Eric Hou <eric.hou@zilliz.com>
/kind improvement
Replace direct self.schema access and describe_collection() calls with
get_schema() method to ensure consistent schema handling with complete
struct_fields information. Also fix FlushChecker error handling and
change schema log level from info to debug.
Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>
- Refactor connection logic to prioritize uri and token parameters over
host/port/user/password for a more modern connection approach
- Add explicit limit parameter (limit=5) to search and query operations
in chaos checkers to avoid returning unlimited results
- Migrate test_all_collections_after_chaos.py from Collection wrapper to
MilvusClient API style
- Update pytest fixtures in chaos test files to support uri/token params
Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>
Issue: #45756
1. add bulk insert scenario
2. fix small issue in e2e cases
3. add search group by test case
4. add timestampstz to gen_all_datatype_collection_schema
5. modify partial update testcase to ensure correct result from
timestamptz field
On branch feature/timestamps
Changes to be committed:
modified: common/bulk_insert_data.py
modified: common/common_func.py
modified: common/common_type.py
modified: milvus_client/test_milvus_client_partial_update.py
modified: milvus_client/test_milvus_client_timestamptz.py
modified: pytest.ini
modified: testcases/test_bulk_insert.py
Signed-off-by: Eric Hou <eric.hou@zilliz.com>
Co-authored-by: Eric Hou <eric.hou@zilliz.com>