Issue: #46424
test:add_collection_field(invalid_default_value)
hybrid_search(NOT supported_
simplify some test cases using one single collection to save time.
query with different time shift and timezone settings
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: TIMESTAMPTZ values are treated as absolute instants
(timezone-preserving). Tests assume conversions between stored instants
and display timezones/time-shifts are deterministic and reversible; the
PR validates queries/reads across different timezone and time-shift
settings against that invariant.
- Removed/simplified logic: duplicated per-test create/insert/teardown
flows and several isolated timestamptz unit cases (edge_case, Feb_29,
partial_update, standalone query) were consolidated into a module-scoped
fixture that creates a single COLLECTION_NAME, inserts ROWS, and handles
teardown. This removes redundant setup/teardown code and repeated
scaffolding while preserving the same API exercise points
(create_collection, insert, query, alter_collection_properties,
alter_database_properties, describe_collection, describe_database).
- No data loss or behavior regression: only test code was reorganized
and new assertions exercise the same production APIs and code paths used
previously (create_collection → insert → query / alter_properties →
describe). The fixture inserts the same ROWS and tests still
convert/compare timestamptz values via cf.convert_timestamptz and query
check routines; the new invalid-default-value test only asserts error
handling when adding a TIMESTAMPTZ field with an invalid default and
does not mutate persisted data or change production logic.
- PR type (Enhancement/Test): expands and reorganizes E2E test coverage
for TIMESTAMPTZ—centralizes collection setup to reduce runtime and
flakiness, adds explicit coverage for invalid-default-value behavior,
and increases timezone/time-shift query scenarios without altering
product behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Eric Hou <eric.hou@zilliz.com>
Co-authored-by: Eric Hou <eric.hou@zilliz.com>
/kind improvement
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: tests' persistence of EventRecords and RequestRecords
must be append-safe under concurrent writers; this PR replaces Parquet
with JSONL and uses per-file locks and explicit buffer flushes to
guarantee atomic, append-safe writes (EventRecords uses event_lock +
append per line; RequestRecords buffers under request_lock and flushes
to file when threshold or on sink()).
- Logic removed/simplified and rationale: DataFrame-based parquet
append/read logic (pyarrow/fastparquet) and implicit parquet buffering
were removed in favor of simple line-oriented JSON writes and explicit
buffer management. The complex Parquet append/merge paths were redundant
because parquet append under concurrent test-writer patterns caused
corruption; JSONL removes the append-mode complexity and the
parquet-specific buffering/serialization code.
- Why no data loss or behavior regression (concrete code paths):
EventRecords.insert writes a complete JSON object per event under
event_lock to /tmp/ci_logs/event_records_*.jsonl and get_records_df
reads every JSON line under the same lock (or returns an empty DataFrame
with the same schema on FileNotFound/Error), preserving all fields
event_name/event_status/event_ts. RequestRecords.insert appends to an
in-memory buffer under request_lock and triggers _flush_buffer() when
len(buffer) >= 100; _flush_buffer() writes each buffered JSON line to
/tmp/ci_logs/request_records_*.jsonl and clears the buffer; sink() calls
_flush_buffer() under request_lock before get_records_df() reads the
file — ensuring all buffered records are persisted before reads. Both
read paths handle FileNotFoundError and exceptions by returning empty
DataFrames with identical column schemas, so external callers see the
same API and no silent record loss.
- Enhancement summary (concrete): Replaces flaky Parquet append/read
with JSONL + explicit locking and deterministic flush semantics,
removing the root cause of parquet append corruption in tests while
keeping the original DataFrame-based analysis consumers unchanged
(get_records_df returns equivalent schemas).
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>
/kind improvement
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Tests**
* Updated test suite dependencies to pymilvus 2.7.0rc91.
* Enhanced text highlighting validation in test checkers.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>
- Updates e2e test error code and message after pymilvus update
see #46677
Signed-off-by: silas.jiang <silas.jiang@zilliz.com>
Co-authored-by: silas.jiang <silas.jiang@zilliz.com>
/kind improvement
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: test infrastructure treats insertion granularity as
orthogonal to data semantics—bulk generation
gen_row_data_by_schema(nb=2000, start=0, random_pk=False) yields the
same sequential PKs and vector payloads as prior multi-batch inserts, so
tests relying on collection lifecycle, flush, index build, load and
search behave identically.
- What changed / simplified: added a full HNSW_PQ parameterized test
suite (tests/python_client/testcases/indexes/idx_hnsw_pq.py and
test_hnsw_pq.py) and simplified HNSW_SQ test insertion by replacing
looped per-batch generation+insert with a single bulk
gen_row_data_by_schema(...) + insert. The per-batch PK sequencing and
repeated vector generation were redundant for correctness and were
removed to reduce complexity.
- Why this does NOT cause data loss or behavior regression: the
post-insert code paths remain unchanged—tests still call client.flush(),
create_index(...), util.wait_for_index_ready(), collection.load(), and
perform searches that assert describe_index and search outputs. Because
start=0 and random_pk=False reproduce identical sequential PKs (0..1999)
and the same vectors, index creation and search validation operate on
identical data and index parameters, preserving previous assertions and
outcomes.
- New capability: comprehensive HNSW_PQ coverage (build params: M,
efConstruction, m, nbits, refine, refine_type; search params: ef,
refine_k) across vector types (FLOAT_VECTOR, FLOAT16_VECTOR,
BFLOAT16_VECTOR, INT8_VECTOR) and metrics (L2, IP, COSINE), implemented
as data-driven tests to validate success and failure/error messages for
boundary, type-mismatch and inter-parameter constraints.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: zilliz <jiaming.li@zilliz.com>
### **User description**
issue: #46367
___
### **PR Type**
Bug fix, Tests
___
### **Description**
- Fix unstable test case by adjusting float precision
- Change listMix float value from 1.1 to 1.111
- Improves test stability for json_contains_any query
___
### Diagram Walkthrough
```mermaid
flowchart LR
A["Test Data Generation"] -- "Adjust float precision" --> B["listMix field value"]
B -- "1.1 to 1.111" --> C["Improved test stability"]
```
<details><summary><h3>File Walkthrough</h3></summary>
<table><thead><tr><th></th><th align="left">Relevant
files</th></tr></thead><tbody><tr><td><strong>Bug
fix</strong></td><td><table>
<tr>
<td>
<details>
<summary><strong>test_milvus_client_query.py</strong><dd><code>Adjust
float precision in test data</code>
</dd></summary>
<hr>
tests/python_client/milvus_client/test_milvus_client_query.py
<ul><li>Modified test data generation in
<br><code>test_milvus_client_query_expr_all_datatype_json_contains_all</code>
method<br> <li> Changed <code>listMix</code> field float value from
<code>1.1</code> to <code>1.111</code> for improved <br>precision<br>
<li> Addresses test instability issue by adjusting floating-point test
data</ul>
</details>
</td>
<td><a
href="https://github.com/milvus-io/milvus/pull/46506/files#diff-d6fe357e4678415bc62596b802571043fa571c7d1b8e841aa43124437dd2f739">+1/-1</a>
</td>
</tr>
</table></td></tr></tbody></table>
</details>
___
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: the test assumes stable float equality/containment
behavior for JSON-typed fields when generating test rows; small changes
in stored float precision can make json_contains_any assertions flaky.
- Exact fix for the bug (refs #46367): in
tests/python_client/milvus_client/test_milvus_client_query.py, the test
data value for the second element of the "listMix" JSON field was
adjusted from i * 1.1 to i * 1.111 in
test_milvus_client_query_expr_all_datatype_json_contains_all to increase
numeric precision and remove instability in json_contains_any
assertions.
- Logic removed/simplified: no production logic was changed or removed —
only a one-line test-data change. There is no control-flow or
algorithmic simplification because the test’s intent and checks remain
identical; the change removes the redundant dependence on a borderline
float value that caused flakiness.
- No data loss or behavior regression: this change only updates
test-generated input
(test_milvus_client_query_expr_all_datatype_json_contains_all) and does
not touch any library or runtime code paths. Production code paths
(query parsing/execution, JSON handling) are unchanged, so no persisted
data, API behavior, or client logic is affected.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: nico <cheng.yuan@zilliz.com>
When computing load diff, binlogs in v1/legacy format have empty
child_fields. In this case, the field_id itself should be used as the
child_id (group_id == field_id for legacy format).
Without this fix, legacy format binlogs are not recognized during diff
computation, causing segments to fail loading and TestProxy to timeout.
Changes:
- Add fallback to use fieldid as child_id when child_fields is empty
- Add LoadDiff::ToString() for debugging
- Add logging for diff in Load/Reopen operations
- Add comprehensive unit tests for legacy format handling
Related to #46594
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: load-diff computation must enumerate every binlog
child group for a field so current vs new segment state comparisons
include all column-group/binlog groups; for legacy (v1) binlogs that
have empty child_fields, the code must treat group_id == field_id to
preserve that mapping.
- Bug fix (resolves#46594): SegmentLoadInfo now normalizes
field_binlog.child_fields() into a vector and falls back to using
field_id as the single child group when child_fields is empty; the same
normalization is applied for both current and new-info paths, ensuring
legacy v1 binlogs are discovered and included in Load/ComputeDiff
results so segments load correctly.
- Logic simplified: removed the implicit assumption that child_fields is
always present by centralizing a single normalization/fallback step used
symmetrically for both diff paths, avoiding ad-hoc special-casing and
unifying iteration over child groups.
- No data loss / no behavior regression: the fallback only activates
when child_fields is empty — non-legacy binlogs continue to use their
child_fields unchanged. Add/drop semantics are preserved because the
same normalization is applied to both sides of the diff. Unit tests
(v1-only, v4-only, mixed cases) were added to validate correctness;
LoadDiff::ToString() and extra logging are diagnostic only.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Co-authored-by: Cai Zhang <cai.zhang@zilliz.com>
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
relate: https://github.com/milvus-io/milvus/issues/46571
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: the LexicalHighlighter API now expects the match
queries under the parameter name highlight_query (not queries); all call
sites must pass highlight_query to supply match data. This PR assumes
the underlying highlighter behavior and processing of those query values
are unchanged.
- Logic simplified/removed: removed the legacy keyword queries in tests
and updated calls to use highlight_query
(tests/python_client/milvus_client/test_milvus_client_highlighter.py).
This eliminates a redundant/incorrect keyword alias and aligns tests
with the consolidated LexicalHighlighter constructor parameter name.
- Why this does NOT introduce data loss or behavior regression: the
change is a parameter-name rename only — no parsing, matching, or
storage logic was modified. Tests now construct LexicalHighlighter with
pre_tags/post_tags/highlight_search_text/fragment_* and pass the query
list under highlight_query; the highlighter execution path
(client.search → highlighter processing → result['highlight']) is
untouched, so existing highlight outputs and stored data remain
unchanged.
- Other changes: bumped pymilvus test dependency to 2.7.0rc93 in
tests/python_client/requirements.txt to match the updated constructor
signature; scope of change is limited to tests and dependency pinning
(no production code changes).
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
related: #45993
This commit extends nullable vector support to the proxy layer,
querynode,
and adds comprehensive validation, search reduce, and field data
handling
for nullable vectors with sparse storage.
Proxy layer changes:
- Update validate_util.go checkAligned() with getExpectedVectorRows()
helper
to validate nullable vector field alignment using valid data count
- Update checkFloatVectorFieldData/checkSparseFloatVectorFieldData for
nullable vector validation with proper row count expectations
- Add FieldDataIdxComputer in typeutil/schema.go for logical-to-physical
index translation during search reduce operations
- Update search_reduce_util.go reduceSearchResultData to use
idxComputers
for correct field data indexing with nullable vectors
- Update task.go, task_query.go, task_upsert.go for nullable vector
handling
- Update msg_pack.go with nullable vector field data processing
QueryNode layer changes:
- Update segments/result.go for nullable vector result handling
- Update segments/search_reduce.go with nullable vector offset
translation
Storage and index changes:
- Update data_codec.go and utils.go for nullable vector serialization
- Update indexcgowrapper/dataset.go and index.go for nullable vector
indexing
Utility changes:
- Add FieldDataIdxComputer struct with Compute() method for efficient
logical-to-physical index mapping across multiple field data
- Update EstimateEntitySize() and AppendFieldData() with fieldIdxs
parameter
- Update funcutil.go with nullable vector support functions
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Full support for nullable vector fields (float, binary, float16,
bfloat16, int8, sparse) across ingest, storage, indexing, search and
retrieval; logical↔physical offset mapping preserves row semantics.
* Client: compaction control and compaction-state APIs.
* **Bug Fixes**
* Improved validation for adding vector fields (nullable + dimension
checks) and corrected search/query behavior for nullable vectors.
* **Chores**
* Persisted validity maps with indexes and on-disk formats.
* **Tests**
* Extensive new and updated end-to-end nullable-vector tests.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
### **User description**
Issue: #46504
test: create e2e test case for highlighter
On branch feature/highlighter
Changes to be committed:
new file: milvus_client/test_milvus_client_highlighter.py
___
### **PR Type**
Tests
___
### **Description**
- Add comprehensive e2e test suite for LexicalHighlighter functionality
- Test highlighter initialization with collection setup and data
insertion
- Validate highlighter with various parameters (tags, fragments,
offsets)
- Test edge cases including Chinese characters, long text, and invalid
inputs
- Verify error handling for invalid fragment sizes, offsets, and
configurations
___
### Diagram Walkthrough
```mermaid
flowchart LR
A["Test Suite Setup"] --> B["Highlighter Init Tests"]
B --> C["Valid Test Cases"]
C --> D["Fragment Parameters"]
C --> E["Search Variations"]
C --> F["Language Support"]
B --> G["Invalid Test Cases"]
G --> H["Parameter Validation"]
G --> I["Error Handling"]
```
<details><summary><h3>File Walkthrough</h3></summary>
<table><thead><tr><th></th><th align="left">Relevant
files</th></tr></thead><tbody><tr><td><strong>Tests</strong></td><td><table>
<tr>
<td>
<details>
<summary><strong>test_milvus_client_highlighter.py</strong><dd><code>Add
comprehensive LexicalHighlighter e2e test suite</code>
</dd></summary>
<hr>
tests/python_client/milvus_client/test_milvus_client_highlighter.py
<ul><li>Create new test file with 1163 lines of comprehensive
highlighter test <br>cases<br> <li> Implement
<code>TestMilvusClientHighlighterInit</code> class to initialize
<br>collection with pre-defined test data including English, Chinese,
and <br>long text samples<br> <li> Implement
<code>TestMilvusClientHighlighterValid</code> class with 15+ test
methods <br>covering basic usage, multiple tags, fragment parameters,
offsets, <br>numbers, sentences, and language support<br> <li> Implement
<code>TestMilvusClientHighlighterInvalid</code> class with 8+ test
<br>methods validating error handling for invalid parameters and
<br>configurations<br> <li> Test highlighter with BM25 search, text
matching, and various analyzer <br>configurations</ul>
</details>
</td>
<td><a
href="https://github.com/milvus-io/milvus/pull/46505/files#diff-443e3fefb65fbdb088d5920083306ecfe3605745b1e2714198c6566ca67b3736">+1163/-0</a></td>
</tr>
</table></td></tr></tbody></table>
</details>
___
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Tests**
* Added a comprehensive highlighter test suite covering:
- Core highlighting with single and multi-analyzer setups and multi-tag
variations
- Fragment parameter behaviors and edge cases (size, offset, count)
- Text-match and query-based highlighting, including BM25 and vector
interactions
- Sub-word, long-text/tag, case sensitivity, Chinese/multi-language
scenarios
- Error handling for invalid parameters, no-match cases, and other edge
conditions
- Module-scoped fixture preparing multilingual, long-form test data and
teardown
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Eric Hou <eric.hou@zilliz.com>
Co-authored-by: Eric Hou <eric.hou@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/45525
see added README.md for added optimizations
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added query expression optimization feature with a new `optimizeExpr`
configuration flag to enable automatic simplification of filter
predicates, including range predicate optimization, merging of IN/NOT IN
conditions, and flattening of nested logical operators.
* **Bug Fixes**
* Adjusted delete operation behavior to correctly handle expression
evaluation.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
issue: #46349
When using brute-force search, the iterator results from multiple chunks
are merged; at that point, we need to pay attention to how the metric
affects result ranking.
Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
Issue: #46333
test: re-write convert timestamp logic to cover daylight saving time
Signed-off-by: Eric Hou <eric.hou@zilliz.com>
Co-authored-by: Eric Hou <eric.hou@zilliz.com>
/kind improvement
- Add normalize_error_message function to extract and normalize error
text
- Collect unique error messages during chaos test execution
- Display error details in assertion messages for better debugging
Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>
issue: #45511
our tantivy inverted index currently does not include item index if the
value is an array, thus we can't do `a[0] == 'b'` type of look up in the
inverted index. for such, we need to skip the index and use brute force
search.
we may improve our index in the future, so this is a temp solution
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
Issue: #46188
Bug was caused by inconsistent version of tzdata as well as wrong month
assignment in convert_timestamptz function.
Also fix when debug_mode=True the compare function can correctly return
True or False.
---------
Signed-off-by: Eric Hou <eric.hou@zilliz.com>
Co-authored-by: Eric Hou <eric.hou@zilliz.com>
issue: #44320
Replace the DeduplicateFieldData function with CheckDuplicatePkExist
that returns an error when duplicate primary keys are detected in the
same batch, instead of silently deduplicating.
Changes:
- Replace DeduplicateFieldData with CheckDuplicatePkExist in util.go
- Update upsertTask.PreExecute to return error on duplicate PKs
- Simplify helper function from findLastOccurrenceIndices to
hasDuplicates
- Update unit tests to verify the new error behavior
- Add Python integration tests for duplicate PK error cases
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Issue #46015
<test>: <add timestamptz to more bulk writer>
On branch feature/timestamps
Changes to be committed:
modified: testcases/test_bulk_insert.py
Signed-off-by: Eric Hou <eric.hou@zilliz.com>
Co-authored-by: Eric Hou <eric.hou@zilliz.com>
/kind improvement
Replace direct self.schema access and describe_collection() calls with
get_schema() method to ensure consistent schema handling with complete
struct_fields information. Also fix FlushChecker error handling and
change schema log level from info to debug.
Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>
- Refactor connection logic to prioritize uri and token parameters over
host/port/user/password for a more modern connection approach
- Add explicit limit parameter (limit=5) to search and query operations
in chaos checkers to avoid returning unlimited results
- Migrate test_all_collections_after_chaos.py from Collection wrapper to
MilvusClient API style
- Update pytest fixtures in chaos test files to support uri/token params
Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>
Issue: #45756
1. add bulk insert scenario
2. fix small issue in e2e cases
3. add search group by test case
4. add timestampstz to gen_all_datatype_collection_schema
5. modify partial update testcase to ensure correct result from
timestamptz field
On branch feature/timestamps
Changes to be committed:
modified: common/bulk_insert_data.py
modified: common/common_func.py
modified: common/common_type.py
modified: milvus_client/test_milvus_client_partial_update.py
modified: milvus_client/test_milvus_client_timestamptz.py
modified: pytest.ini
modified: testcases/test_bulk_insert.py
Signed-off-by: Eric Hou <eric.hou@zilliz.com>
Co-authored-by: Eric Hou <eric.hou@zilliz.com>
To prevent this issue from blocking other PRs, we are temporarily
disabling this test. A proper fix will be implemented before the 2.6.6
release.
issue: https://github.com/milvus-io/milvus/issues/45511
---------
Signed-off-by: sunby <sunbingyi1992@gmail.com>
issue: #44320
This change adds deduplication logic to handle duplicate primary keys
within a single upsert batch, keeping the last occurrence of each
primary key.
Key changes:
- Add DeduplicateFieldData function to remove duplicate PKs from field
data, supporting both Int64 and VarChar primary keys
- Refactor fillFieldPropertiesBySchema into two separate functions:
validateFieldDataColumns for validation and fillFieldPropertiesOnly for
property filling, improving code clarity and reusability
- Integrate deduplication logic in upsertTask.PreExecute to
automatically deduplicate data before processing
- Add comprehensive unit tests for deduplication with various PK types
(Int64, VarChar) and field types (scalar, vector)
- Add Python integration tests to verify end-to-end behavior
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #45403, #45463
- fix the Nightly E2E failures.
- fix the wrong update timetick of altering collection to fix the
related load failure.
Signed-off-by: chyezh <chyezh@outlook.com>