milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-07 19:31:51 +08:00

Author	SHA1	Message	Date
Chun Han	b7ee93fc52	feat: support query aggregtion(#36380 ) (#44394 ) related: #36380 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - Core invariant: aggregation is centralized and schema-aware — all aggregate functions are created via the exec Aggregate registry (milvus::exec::Aggregate) and validated by ValidateAggFieldType, use a single in-memory accumulator layout (Accumulator/RowContainer) and grouping primitives (GroupingSet, HashTable, VectorHasher), ensuring consistent typing, null semantics and offsets across planner → exec → reducer conversion paths (toAggregateInfo, Aggregate::create, GroupingSet, AggResult converters). - Removed / simplified logic: removed ad‑hoc count/group-by and reducer code (CountNode/PhyCountNode, GroupByNode/PhyGroupByNode, cntReducer and its tests) and consolidated into a unified AggregationNode → PhyAggregationNode + GroupingSet + HashTable execution path and centralized reducers (MilvusAggReducer, InternalAggReducer, SegcoreAggReducer). AVG now implemented compositionally (SUM + COUNT) rather than a bespoke operator, eliminating duplicate implementations. - Why this does NOT cause data loss or regressions: existing data-access and serialization paths are preserved and explicitly validated — bulk_subscript / bulk_script_field_data and FieldData creation are used for output materialization; converters (InternalResult2AggResult ↔ AggResult2internalResult, SegcoreResults2AggResult ↔ AggResult2segcoreResult) enforce shape/type/row-count validation; proxy and plan-level checks (MatchAggregationExpression, translateOutputFields, ValidateAggFieldType, translateGroupByFieldIds) reject unsupported inputs (ARRAY/JSON, unsupported datatypes) early. Empty-result generation and explicit error returns guard against silent corruption. - New capability and scope: end-to-end GROUP BY and aggregation support added across the stack — proto (plan.proto, RetrieveRequest fields group_by_field_ids/aggregates), planner nodes (AggregationNode, ProjectNode, SearchGroupByNode), exec operators (PhyAggregationNode, PhyProjectNode) and aggregation core (Aggregate implementations: Sum/Count/Min/Max, SimpleNumericAggregate, RowContainer, GroupingSet, HashTable) plus proxy/querynode reducers and tests — enabling grouped and global aggregation (sum, count, min, max, avg via sum+count) with schema-aware validation and reduction. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2026-01-06 16:29:25 +08:00
yanliang567	9996e8d1ce	test: Update error msg for search by ids tests (#46792 ) related issue: #46789 Signed-off-by: yanliang567 <yanliang.qiao@zilliz.com>	2026-01-05 20:09:24 +08:00
jiamingli-maker	d0e6a624a7	test: Skip HNSW_PRQ test cases (#46771 ) /kind improvement Skipping HNSW_PRQ tests due to index creation timeout. Signed-off-by: zilliz <jiaming.li@zilliz.com>	2026-01-05 18:55:24 +08:00
yanliang567	7018151c7d	test: Add tests for search by ids (#46756 ) related issue: #46755 Signed-off-by: yanliang567 <yanliang.qiao@zilliz.com>	2026-01-05 13:25:23 +08:00
jiamingli-maker	c10cf53b4b	test: Add HNSW_PRQ test cases and fix HNSW_PQ (#46680 ) /kind improvement <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - Core invariant: index parameter validation and test expectations for the HNSW-family must be explicit, consistent, and deterministic — this PR enforces that by adding exhaustive parameter matrices for HNSW_PRQ (tests/python_client/testcases/indexes/{idx_hnsw_prq.py, test_hnsw_prq.py}) and normalizing expectations in idx_hnsw_pq.py via a shared success variable. - Logic removed / simplified: brittle, ad-hoc string expectations were consolidated — literal "success" occurrences were replaced with a single success variable and ambiguous short error messages were replaced by the canonical descriptive error text; this reduces duplicated assertion logic in tests and removes dependence on fragile, truncated messages. - Bug fix (tests): corrected HNSW_PQ test expectations to assert the full, authoritative error for invalid PQ m ("The dimension of the vector (dim) should be a multiple of the number of subquantizers (m).") and aligned HNSW_PRQ test matrices (idx_hnsw_prq.py) to the same explicit expectations — the change targets test assertions only and fixes false negatives caused by mismatched messages. - No data loss or behavior regression: only test code is added/modified (tests/python_client/testcases/indexes/*). Production code paths remain unmodified — collection creation, insert/flush, client.create_index, wait_for_index_ready, load_collection, search, and client.describe_index are invoked by tests but not changed; therefore persisted data, index artifacts, and runtime behavior are unaffected. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: zilliz <jiaming.li@zilliz.com>	2026-01-04 18:57:22 +08:00
yanliang567	15ce8aedd8	test: Add some tests for group by search support json and dynamic field (#46630 ) related issue: #46616 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - Core invariant: these tests assume the v2 group-by search implementation (TestMilvusClientV2Base and pymilvus v2 APIs such as AnnSearchRequest/WeightedRanker) is functionally correct; the PR extends coverage to validate group-by semantics when using JSON fields and dynamic fields (see tests/python_client/milvus_client_v2/test_milvus_client_search_group_by.py — TestGroupSearch.setup_class and parametrized group_by_field cases). - Logic removed/simplified: legacy v1 test scaffolding and duplicated parametrized fixtures/test permutations were consolidated into v2-focused suites (TestGroupSearch now inherits TestMilvusClientV2Base; old TestGroupSearch/TestcaseBase patterns and large blocks in test_mix_scenes were removed) to avoid redundant fixture permutations and duplicate assertions while reusing shared helpers in common_func (e.g., gen_scalar_field, gen_row_data_by_schema) and common_type constants. - Why this does NOT introduce data loss or behavior regression: only test code, test helpers, and test imports were changed — no production/server code altered. Test helper changes are backward-compatible (gen_scalar_field forces primary key nullable=False and only affects test data generation paths in tests/python_client/common/common_func.py; get_field_dtype_by_field_name now accepts schema dicts/ORM schemas and is used only by tests to choose vector generation) and collection creation/insertion in tests use the same CollectionSchema/FieldSchema paths, so production storage/serialization logic is untouched. - New capability (test addition): adds v2 test coverage for group-by search over JSON and dynamic fields plus related scenarios — pagination, strict/non-strict group_size, min/max group constraints, multi-field group-bys and binary vector cases — implemented in tests/python_client/milvus_client_v2/test_milvus_client_search_group_by.py to address issue #46616. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: yanliang567 <yanliang.qiao@zilliz.com>	2025-12-31 11:03:21 +08:00
nico	e75ad275aa	test: update tets cases (#46699 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Pull Request Summary: Test Case Updates for API Behavior Changes Core Invariant: These test case updates reflect backend API improvements to error messaging and schema information returned by collection operations. The changes maintain backward compatibility—no public signatures change, and all modifications are test expectation updates. Updated Error Messages for Better Diagnostics: - `test_add_field_feature.py`: Updated expected error when adding a vector field without dimension specification from a generic "not support to add vector field" to the more descriptive "vector field must have dimension specified, field name = {field_name}: invalid parameter". This change is non-breaking for clients that only check error codes; it improves developer experience with clearer error context. Schema Information Extension: - `test_milvus_client_collection.py`: Added `enable_namespace: False` to the expected `describe_collection()` output. This is a new boolean field in the collection metadata that defaults to False, representing an opt-in feature. Existing code querying describe_collection continues to work; the new field is simply an additional property in the response dictionary. Dynamic Error Message Construction: - `test_milvus_client_search_invalid.py`: Replaced hardcoded error message with conditional logic that generates the appropriate error based on input state (None vectors vs invalid vector data). This prevents test brittle failure if multiple error conditions exist, and correctly validates the API's behavior handles both "missing data" and "malformed data" cases distinctly. No Regression Risk: All changes update test expectations to match improved backend behavior. The changes are additive (new field in schema) or clarifying (better error messages), with no modifications to existing response structures or behavior for valid inputs. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: nico <cheng.yuan@zilliz.com>	2025-12-31 10:17:22 +08:00
Feilong Hou	a1721bb47b	test: add more test case on partial update (#46628 ) Issue: #46627 add one more test case to cover duplicate pk partial update On branch feature/partial-update Changes to be committed: modified: milvus_client/test_milvus_client_partial_update.py <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - Core invariant: upserts with partial_update=True consolidate records by primary key (PK) rather than creating duplicate rows; this test verifies the partial-update upsert path preserves PK identity and merge semantics. - Change: adds test test_milvus_client_partial_update_duplicate_pk_partial_update which inserts duplicate-PK batches, then calls client.upsert(..., partial_update=True) on a subset of fields and asserts final row count equals default_nb, exercising the partial-update code path (upsert → partial update handling → query) not previously covered. - No production logic removed/simplified: this PR only adds test coverage (no code paths removed or altered); nothing in production code is changed or simplified by the PR. - No data loss or regression introduced: the test validates concrete code paths — upsert with partial_update True followed by query(out_fields/with_vec, pk checks) — and asserts deduplication (2×default_nb → default_nb). Because the PR only adds assertions against existing behavior and does not modify runtime logic, it cannot cause data loss or behavioral regressions. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Eric Hou <eric.hou@zilliz.com> Co-authored-by: Eric Hou <eric.hou@zilliz.com>	2025-12-30 16:01:26 +08:00
Feilong Hou	69a2d202b0	test: cover more timesamptz e2e (#46575 ) Issue: #46424 test:add_collection_field(invalid_default_value) hybrid_search(NOT supported_ simplify some test cases using one single collection to save time. query with different time shift and timezone settings <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - Core invariant: TIMESTAMPTZ values are treated as absolute instants (timezone-preserving). Tests assume conversions between stored instants and display timezones/time-shifts are deterministic and reversible; the PR validates queries/reads across different timezone and time-shift settings against that invariant. - Removed/simplified logic: duplicated per-test create/insert/teardown flows and several isolated timestamptz unit cases (edge_case, Feb_29, partial_update, standalone query) were consolidated into a module-scoped fixture that creates a single COLLECTION_NAME, inserts ROWS, and handles teardown. This removes redundant setup/teardown code and repeated scaffolding while preserving the same API exercise points (create_collection, insert, query, alter_collection_properties, alter_database_properties, describe_collection, describe_database). - No data loss or behavior regression: only test code was reorganized and new assertions exercise the same production APIs and code paths used previously (create_collection → insert → query / alter_properties → describe). The fixture inserts the same ROWS and tests still convert/compare timestamptz values via cf.convert_timestamptz and query check routines; the new invalid-default-value test only asserts error handling when adding a TIMESTAMPTZ field with an invalid default and does not mutate persisted data or change production logic. - PR type (Enhancement/Test): expands and reorganizes E2E test coverage for TIMESTAMPTZ—centralizes collection setup to reduce runtime and flakiness, adds explicit coverage for invalid-default-value behavior, and increases timezone/time-shift query scenarios without altering product behavior. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Eric Hou <eric.hou@zilliz.com> Co-authored-by: Eric Hou <eric.hou@zilliz.com>	2025-12-30 15:59:21 +08:00
zhuwenxing	e3a85be435	test: replace parquet with jsonl for EventRecords and RequestRecords in checker (#46671 ) /kind improvement <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - Core invariant: tests' persistence of EventRecords and RequestRecords must be append-safe under concurrent writers; this PR replaces Parquet with JSONL and uses per-file locks and explicit buffer flushes to guarantee atomic, append-safe writes (EventRecords uses event_lock + append per line; RequestRecords buffers under request_lock and flushes to file when threshold or on sink()). - Logic removed/simplified and rationale: DataFrame-based parquet append/read logic (pyarrow/fastparquet) and implicit parquet buffering were removed in favor of simple line-oriented JSON writes and explicit buffer management. The complex Parquet append/merge paths were redundant because parquet append under concurrent test-writer patterns caused corruption; JSONL removes the append-mode complexity and the parquet-specific buffering/serialization code. - Why no data loss or behavior regression (concrete code paths): EventRecords.insert writes a complete JSON object per event under event_lock to /tmp/ci_logs/event_records_.jsonl and get_records_df reads every JSON line under the same lock (or returns an empty DataFrame with the same schema on FileNotFound/Error), preserving all fields event_name/event_status/event_ts. RequestRecords.insert appends to an in-memory buffer under request_lock and triggers _flush_buffer() when len(buffer) >= 100; _flush_buffer() writes each buffered JSON line to /tmp/ci_logs/request_records_.jsonl and clears the buffer; sink() calls _flush_buffer() under request_lock before get_records_df() reads the file — ensuring all buffered records are persisted before reads. Both read paths handle FileNotFoundError and exceptions by returning empty DataFrames with identical column schemas, so external callers see the same API and no silent record loss. - Enhancement summary (concrete): Replaces flaky Parquet append/read with JSONL + explicit locking and deterministic flush semantics, removing the root cause of parquet append corruption in tests while keeping the original DataFrame-based analysis consumers unchanged (get_records_df returns equivalent schemas). <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>	2025-12-30 14:13:21 +08:00
zhuwenxing	f4f5e0f4dc	test: add highlighter in checker (#46289 ) /kind improvement <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Tests * Updated test suite dependencies to pymilvus 2.7.0rc91. * Enhanced text highlighting validation in test checkers. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>	2025-12-30 14:11:26 +08:00
jac	898e6d6e94	fix: [master]update ngram test error codes to match actual server responses (#46675 ) - Updates e2e test error code and message after pymilvus update see #46677 Signed-off-by: silas.jiang <silas.jiang@zilliz.com> Co-authored-by: silas.jiang <silas.jiang@zilliz.com>	2025-12-30 13:07:20 +08:00
jiamingli-maker	ebe82db4fe	test: Add HNSW_PQ test cases and update HNSW_SQ (#46604 ) /kind improvement <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - Core invariant: test infrastructure treats insertion granularity as orthogonal to data semantics—bulk generation gen_row_data_by_schema(nb=2000, start=0, random_pk=False) yields the same sequential PKs and vector payloads as prior multi-batch inserts, so tests relying on collection lifecycle, flush, index build, load and search behave identically. - What changed / simplified: added a full HNSW_PQ parameterized test suite (tests/python_client/testcases/indexes/idx_hnsw_pq.py and test_hnsw_pq.py) and simplified HNSW_SQ test insertion by replacing looped per-batch generation+insert with a single bulk gen_row_data_by_schema(...) + insert. The per-batch PK sequencing and repeated vector generation were redundant for correctness and were removed to reduce complexity. - Why this does NOT cause data loss or behavior regression: the post-insert code paths remain unchanged—tests still call client.flush(), create_index(...), util.wait_for_index_ready(), collection.load(), and perform searches that assert describe_index and search outputs. Because start=0 and random_pk=False reproduce identical sequential PKs (0..1999) and the same vectors, index creation and search validation operate on identical data and index parameters, preserving previous assertions and outcomes. - New capability: comprehensive HNSW_PQ coverage (build params: M, efConstruction, m, nbits, refine, refine_type; search params: ef, refine_k) across vector types (FLOAT_VECTOR, FLOAT16_VECTOR, BFLOAT16_VECTOR, INT8_VECTOR) and metrics (L2, IP, COSINE), implemented as data-driven tests to validate success and failure/error messages for boundary, type-mismatch and inter-parameter constraints. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: zilliz <jiaming.li@zilliz.com>	2025-12-30 10:07:21 +08:00
nico	db3f065a61	test: test case json_contains_any is not stable (#46506 ) ### User description issue: #46367 ___ ### PR Type Bug fix, Tests ___ ### Description - Fix unstable test case by adjusting float precision - Change listMix float value from 1.1 to 1.111 - Improves test stability for json_contains_any query ___ ### Diagram Walkthrough ```mermaid flowchart LR A["Test Data Generation"] -- "Adjust float precision" --> B["listMix field value"] B -- "1.1 to 1.111" --> C["Improved test stability"] ``` <details><summary><h3>File Walkthrough</h3></summary> <table><thead><tr><th></th><th align="left">Relevant files</th></tr></thead><tbody><tr><td><strong>Bug fix</strong></td><td><table> <tr> <td> <details> <summary><strong>test_milvus_client_query.py</strong><dd><code>Adjust float precision in test data</code>                                            </dd></summary> <hr> tests/python_client/milvus_client/test_milvus_client_query.py <ul><li>Modified test data generation in <br><code>test_milvus_client_query_expr_all_datatype_json_contains_all</code> method<br> <li> Changed <code>listMix</code> field float value from <code>1.1</code> to <code>1.111</code> for improved <br>precision<br> <li> Addresses test instability issue by adjusting floating-point test data</ul> </details> </td> <td><a href="https://github.com/milvus-io/milvus/pull/46506/files#diff-d6fe357e4678415bc62596b802571043fa571c7d1b8e841aa43124437dd2f739">+1/-1</a>      </td> </tr> </table></td></tr></tbody></table> </details> ___ <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - Core invariant: the test assumes stable float equality/containment behavior for JSON-typed fields when generating test rows; small changes in stored float precision can make json_contains_any assertions flaky. - Exact fix for the bug (refs #46367): in tests/python_client/milvus_client/test_milvus_client_query.py, the test data value for the second element of the "listMix" JSON field was adjusted from i * 1.1 to i * 1.111 in test_milvus_client_query_expr_all_datatype_json_contains_all to increase numeric precision and remove instability in json_contains_any assertions. - Logic removed/simplified: no production logic was changed or removed — only a one-line test-data change. There is no control-flow or algorithmic simplification because the test’s intent and checks remain identical; the change removes the redundant dependence on a borderline float value that caused flakiness. - No data loss or behavior regression: this change only updates test-generated input (test_milvus_client_query_expr_all_datatype_json_contains_all) and does not touch any library or runtime code paths. Production code paths (query parsing/execution, JSON handling) are unchanged, so no persisted data, API behavior, or client logic is affected. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: nico <cheng.yuan@zilliz.com>	2025-12-26 15:13:19 +08:00
congqixia	6f94d8c41a	fix: Handle legacy binlog format (v1) in segment load diff computation (#46598 ) When computing load diff, binlogs in v1/legacy format have empty child_fields. In this case, the field_id itself should be used as the child_id (group_id == field_id for legacy format). Without this fix, legacy format binlogs are not recognized during diff computation, causing segments to fail loading and TestProxy to timeout. Changes: - Add fallback to use fieldid as child_id when child_fields is empty - Add LoadDiff::ToString() for debugging - Add logging for diff in Load/Reopen operations - Add comprehensive unit tests for legacy format handling Related to #46594 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - Core invariant: load-diff computation must enumerate every binlog child group for a field so current vs new segment state comparisons include all column-group/binlog groups; for legacy (v1) binlogs that have empty child_fields, the code must treat group_id == field_id to preserve that mapping. - Bug fix (resolves #46594): SegmentLoadInfo now normalizes field_binlog.child_fields() into a vector and falls back to using field_id as the single child group when child_fields is empty; the same normalization is applied for both current and new-info paths, ensuring legacy v1 binlogs are discovered and included in Load/ComputeDiff results so segments load correctly. - Logic simplified: removed the implicit assumption that child_fields is always present by centralizing a single normalization/fallback step used symmetrically for both diff paths, avoiding ad-hoc special-casing and unifying iteration over child groups. - No data loss / no behavior regression: the fallback only activates when child_fields is empty — non-legacy binlogs continue to use their child_fields unchanged. Add/drop semantics are preserved because the same normalization is applied to both sides of the diff. Unit tests (v1-only, v4-only, mixed cases) were added to validate correctness; LoadDiff::ToString() and extra logging are diagnostic only. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Co-authored-by: Cai Zhang <cai.zhang@zilliz.com> --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-12-25 23:33:19 +08:00
aoiasd	342ba550bf	enhance: update highlight ci (#46573 ) relate: https://github.com/milvus-io/milvus/issues/46571 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - Core invariant: the LexicalHighlighter API now expects the match queries under the parameter name highlight_query (not queries); all call sites must pass highlight_query to supply match data. This PR assumes the underlying highlighter behavior and processing of those query values are unchanged. - Logic simplified/removed: removed the legacy keyword queries in tests and updated calls to use highlight_query (tests/python_client/milvus_client/test_milvus_client_highlighter.py). This eliminates a redundant/incorrect keyword alias and aligns tests with the consolidated LexicalHighlighter constructor parameter name. - Why this does NOT introduce data loss or behavior regression: the change is a parameter-name rename only — no parsing, matching, or storage logic was modified. Tests now construct LexicalHighlighter with pre_tags/post_tags/highlight_search_text/fragment_* and pass the query list under highlight_query; the highlighter execution path (client.search → highlighter processing → result['highlight']) is untouched, so existing highlight outputs and stored data remain unchanged. - Other changes: bumped pymilvus test dependency to 2.7.0rc93 in tests/python_client/requirements.txt to match the updated constructor signature; scope of change is limited to tests and dependency pinning (no production code changes). <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-12-24 19:07:18 +08:00
marcelo-cjl	3b599441fd	feat: Add nullable vector support for proxy and querynode (#46305 ) related: #45993 This commit extends nullable vector support to the proxy layer, querynode, and adds comprehensive validation, search reduce, and field data handling for nullable vectors with sparse storage. Proxy layer changes: - Update validate_util.go checkAligned() with getExpectedVectorRows() helper to validate nullable vector field alignment using valid data count - Update checkFloatVectorFieldData/checkSparseFloatVectorFieldData for nullable vector validation with proper row count expectations - Add FieldDataIdxComputer in typeutil/schema.go for logical-to-physical index translation during search reduce operations - Update search_reduce_util.go reduceSearchResultData to use idxComputers for correct field data indexing with nullable vectors - Update task.go, task_query.go, task_upsert.go for nullable vector handling - Update msg_pack.go with nullable vector field data processing QueryNode layer changes: - Update segments/result.go for nullable vector result handling - Update segments/search_reduce.go with nullable vector offset translation Storage and index changes: - Update data_codec.go and utils.go for nullable vector serialization - Update indexcgowrapper/dataset.go and index.go for nullable vector indexing Utility changes: - Add FieldDataIdxComputer struct with Compute() method for efficient logical-to-physical index mapping across multiple field data - Update EstimateEntitySize() and AppendFieldData() with fieldIdxs parameter - Update funcutil.go with nullable vector support functions <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Full support for nullable vector fields (float, binary, float16, bfloat16, int8, sparse) across ingest, storage, indexing, search and retrieval; logical↔physical offset mapping preserves row semantics. * Client: compaction control and compaction-state APIs. * Bug Fixes * Improved validation for adding vector fields (nullable + dimension checks) and corrected search/query behavior for nullable vectors. * Chores * Persisted validity maps with indexes and on-disk formats. * Tests * Extensive new and updated end-to-end nullable-vector tests. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>	2025-12-24 10:13:19 +08:00
Feilong Hou	e4b0f48bc0	test: add e2e test cases for highlighter (#46505 ) ### User description Issue: #46504 test: create e2e test case for highlighter On branch feature/highlighter Changes to be committed: new file: milvus_client/test_milvus_client_highlighter.py ___ ### PR Type Tests ___ ### Description - Add comprehensive e2e test suite for LexicalHighlighter functionality - Test highlighter initialization with collection setup and data insertion - Validate highlighter with various parameters (tags, fragments, offsets) - Test edge cases including Chinese characters, long text, and invalid inputs - Verify error handling for invalid fragment sizes, offsets, and configurations ___ ### Diagram Walkthrough ```mermaid flowchart LR A["Test Suite Setup"] --> B["Highlighter Init Tests"] B --> C["Valid Test Cases"] C --> D["Fragment Parameters"] C --> E["Search Variations"] C --> F["Language Support"] B --> G["Invalid Test Cases"] G --> H["Parameter Validation"] G --> I["Error Handling"] ``` <details><summary><h3>File Walkthrough</h3></summary> <table><thead><tr><th></th><th align="left">Relevant files</th></tr></thead><tbody><tr><td><strong>Tests</strong></td><td><table> <tr> <td> <details> <summary><strong>test_milvus_client_highlighter.py</strong><dd><code>Add comprehensive LexicalHighlighter e2e test suite</code>            </dd></summary> <hr> tests/python_client/milvus_client/test_milvus_client_highlighter.py <ul><li>Create new test file with 1163 lines of comprehensive highlighter test <br>cases<br> <li> Implement <code>TestMilvusClientHighlighterInit</code> class to initialize <br>collection with pre-defined test data including English, Chinese, and <br>long text samples<br> <li> Implement <code>TestMilvusClientHighlighterValid</code> class with 15+ test methods <br>covering basic usage, multiple tags, fragment parameters, offsets, <br>numbers, sentences, and language support<br> <li> Implement <code>TestMilvusClientHighlighterInvalid</code> class with 8+ test <br>methods validating error handling for invalid parameters and <br>configurations<br> <li> Test highlighter with BM25 search, text matching, and various analyzer <br>configurations</ul> </details> </td> <td><a href="https://github.com/milvus-io/milvus/pull/46505/files#diff-443e3fefb65fbdb088d5920083306ecfe3605745b1e2714198c6566ca67b3736">+1163/-0</a></td> </tr> </table></td></tr></tbody></table> </details> ___ <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Tests * Added a comprehensive highlighter test suite covering: - Core highlighting with single and multi-analyzer setups and multi-tag variations - Fragment parameter behaviors and edge cases (size, offset, count) - Text-match and query-based highlighting, including BM25 and vector interactions - Sub-word, long-text/tag, case sensitivity, Chinese/multi-language scenarios - Error handling for invalid parameters, no-match cases, and other edge conditions - Module-scoped fixture preparing multilingual, long-form test data and teardown <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Eric Hou <eric.hou@zilliz.com> Co-authored-by: Eric Hou <eric.hou@zilliz.com>	2025-12-24 09:49:19 +08:00
Buqian Zheng	e379b1f0f4	enhance: moved query optimization to proxy, added various optimizations (#45526 ) issue: https://github.com/milvus-io/milvus/issues/45525 see added README.md for added optimizations <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Added query expression optimization feature with a new `optimizeExpr` configuration flag to enable automatic simplification of filter predicates, including range predicate optimization, merging of IN/NOT IN conditions, and flattening of nested logical operators. * Bug Fixes * Adjusted delete operation behavior to correctly handle expression evaluation. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-12-24 00:39:19 +08:00
foxspy	ab03521588	fix: fix chunk iterator merge order (#46461 ) issue: #46349 When using brute-force search, the iterator results from multiple chunks are merged; at that point, we need to pay attention to how the metric affects result ranking. Signed-off-by: xianliang.li <xianliang.li@zilliz.com>	2025-12-23 10:33:17 +08:00
jiamingli-maker	b9fe8e9f9e	test: add HNSW_SQ test cases (#46428 ) /kind improvement /assign @yanliang567 Signed-off-by: zilliz <jiaming.li@zilliz.com>	2025-12-22 11:29:18 +08:00
nico	51350f4ef8	test: optimize ci test about compaction and flush (#46097 ) Signed-off-by: nico <cheng.yuan@zilliz.com>	2025-12-20 12:37:21 +08:00
Feilong Hou	a7eb327746	test: fix unstable timestamptz test cases (#46403 ) Issue: #46333 test: re-write convert timestamp logic to cover daylight saving time Signed-off-by: Eric Hou <eric.hou@zilliz.com> Co-authored-by: Eric Hou <eric.hou@zilliz.com>	2025-12-17 21:13:16 +08:00
aoiasd	df80f54151	feat: support use user's file as dictionary for analyzer filter (#46145 ) relate: https://github.com/milvus-io/milvus/issues/43687 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-12-16 11:45:16 +08:00
Feilong Hou	971085b033	test: enable debug_mode to observe test case instability. (#46341 ) Issue: #46333 Signed-off-by: Eric Hou <eric.hou@zilliz.com> Co-authored-by: Eric Hou <eric.hou@zilliz.com>	2025-12-15 17:55:16 +08:00
zhuwenxing	3aa0b769e5	test: add unique error message collection in chaos checker (#46262 ) /kind improvement - Add normalize_error_message function to extract and normalize error text - Collect unique error messages during chaos test execution - Display error details in assertion messages for better debugging Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>	2025-12-11 13:49:12 +08:00
zhuwenxing	75d6f0d509	test: add ST_ISVALID geometry function test cases (#46232 ) /kind improvement Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>	2025-12-11 13:47:21 +08:00
Buqian Zheng	85a7a7b1e3	fix: skip json path index if the query path includes number (#46200 ) issue: #45511 our tantivy inverted index currently does not include item index if the value is an array, thus we can't do `a[0] == 'b'` type of look up in the inverted index. for such, we need to skip the index and use brute force search. we may improve our index in the future, so this is a temp solution Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-12-10 13:59:13 +08:00
zhuwenxing	f9ff0e8402	test: add testcases for add/alter/drop text embedding function (#46229 ) /kind improvement Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>	2025-12-09 19:23:14 +08:00
zhuwenxing	abe0318bec	test: use predefined fake_de instead of creating new Faker instances to reduce run time (#46194 ) related: https://github.com/milvus-io/milvus/issues/46014 /kind improvement Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>	2025-12-09 17:59:14 +08:00
Feilong Hou	624147740b	test: fix timestamptz e2e case failure on Jenkins Weekly (#46210 ) Issue: #46188 Bug was caused by inconsistent version of tzdata as well as wrong month assignment in convert_timestamptz function. Also fix when debug_mode=True the compare function can correctly return True or False. --------- Signed-off-by: Eric Hou <eric.hou@zilliz.com> Co-authored-by: Eric Hou <eric.hou@zilliz.com>	2025-12-09 16:09:15 +08:00
zhuwenxing	8fac376afd	test: upgrade minio sdk from 7.1.5 to 7.2.0 (#46186 ) /kind improvement fix minio sdk error ``` 2025-12-08T07:54:48Z {container="step-test"} def _put_object(self, bucket_name, object_name, data, headers, 2025-12-08T07:54:48Z {container="step-test"} query_params=None): 2025-12-08T07:54:48Z {container="step-test"} """Execute PutObject S3 API.""" 2025-12-08T07:54:48Z {container="step-test"} response = self._execute( 2025-12-08T07:54:48Z {container="step-test"} "PUT", 2025-12-08T07:54:48Z {container="step-test"} bucket_name, 2025-12-08T07:54:48Z {container="step-test"} object_name, 2025-12-08T07:54:48Z {container="step-test"} body=data, 2025-12-08T07:54:48Z {container="step-test"} headers=headers, 2025-12-08T07:54:48Z {container="step-test"} query_params=query_params, 2025-12-08T07:54:48Z {container="step-test"} no_body_trace=True, 2025-12-08T07:54:48Z {container="step-test"} ) 2025-12-08T07:54:48Z {container="step-test"} return ObjectWriteResult( 2025-12-08T07:54:48Z {container="step-test"} bucket_name, 2025-12-08T07:54:48Z {container="step-test"} object_name, 2025-12-08T07:54:48Z {container="step-test"} > response.getheader("x-amz-version-id"), 2025-12-08T07:54:48Z {container="step-test"} response.getheader("etag").replace('"', ""), 2025-12-08T07:54:48Z {container="step-test"} response.getheaders(), 2025-12-08T07:54:48Z {container="step-test"} ) 2025-12-08T07:54:48Z {container="step-test"} E AttributeError: 'HTTPResponse' object has no attribute 'getheader' 2025-12-08T07:54:48Z {container="step-test"} 2025-12-08T07:54:48Z {container="step-test"} /usr/local/lib/python3.10/site-packages/minio/api.py:1582: AttributeError ``` --------- Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>	2025-12-09 11:39:12 +08:00
zhuwenxing	4fe41ff14d	test: add emb list recall test (#46135 ) /kind improvement Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>	2025-12-08 19:21:13 +08:00
XuanYang-cn	2cb6073bdd	test: Increase PyMilvus version to 2.7.0rc83 for master branch (#46072 ) Automated daily bump from pymilvus master branch. Updates tests/python_client/requirements.txt. Signed-off-by: XuanYang-cn <xuan.yang@zilliz.com>	2025-12-04 17:33:20 +08:00
nico	43fe215787	test: update sdk version and skip some debug log (#46040 ) Signed-off-by: nico <cheng.yuan@zilliz.com>	2025-12-04 10:33:11 +08:00
wei liu	f85e86a6ec	fix: change upsert duplicate PK behavior from dedup to error (#45997 ) issue: #44320 Replace the DeduplicateFieldData function with CheckDuplicatePkExist that returns an error when duplicate primary keys are detected in the same batch, instead of silently deduplicating. Changes: - Replace DeduplicateFieldData with CheckDuplicatePkExist in util.go - Update upsertTask.PreExecute to return error on duplicate PKs - Simplify helper function from findLastOccurrenceIndices to hasDuplicates - Update unit tests to verify the new error behavior - Add Python integration tests for duplicate PK error cases Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-12-04 10:23:11 +08:00
Feilong Hou	dd3797f3ac	test: add timestamptz to more bulk writer case (#46016 ) Issue #46015 <test>: <add timestamptz to more bulk writer> On branch feature/timestamps Changes to be committed: modified: testcases/test_bulk_insert.py Signed-off-by: Eric Hou <eric.hou@zilliz.com> Co-authored-by: Eric Hou <eric.hou@zilliz.com>	2025-12-03 10:05:10 +08:00
yanliang567	13a52016ac	test: Update hybrid search tests with milvus client (#46003 ) related issue: https://github.com/milvus-io/milvus/issues/45326 Signed-off-by: yanliang567 <yanliang.qiao@zilliz.com>	2025-12-02 18:11:10 +08:00
zhuwenxing	f68bd44f35	test: unify schema retrieval to use get_schema() method in chaos checker (#45985 ) /kind improvement Replace direct self.schema access and describe_collection() calls with get_schema() method to ensure consistent schema handling with complete struct_fields information. Also fix FlushChecker error handling and change schema log level from info to debug. Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>	2025-12-02 09:43:10 +08:00
zhuwenxing	2d7574b5a3	test: refactor connection method to prioritize uri/token and add query limit (#45901 ) - Refactor connection logic to prioritize uri and token parameters over host/port/user/password for a more modern connection approach - Add explicit limit parameter (limit=5) to search and query operations in chaos checkers to avoid returning unlimited results - Migrate test_all_collections_after_chaos.py from Collection wrapper to MilvusClient API style - Update pytest fixtures in chaos test files to support uri/token params Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>	2025-11-27 19:25:07 +08:00
zhuwenxing	464a805c63	test: add dynamicfield.enabled property alter in chaos checker (#45625 ) /kind improvement Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>	2025-11-27 14:53:08 +08:00
jac	97da6bd7e3	test: Increase PyMilvus version to 2.7.0rc72 for master branch and fix async teardown logic (#45809 ) Signed-off-by: silas.jiang <silas.jiang@zilliz.com> Co-authored-by: silas.jiang <silas.jiang@zilliz.com>	2025-11-25 11:01:07 +08:00
zhuwenxing	256e073e8d	test: add more testcases for geo and struct (#45414 ) /kind improvement Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>	2025-11-25 10:51:06 +08:00
yanliang567	1da75c0ee2	test: Update hybrid search tests to milvus client style (#45772 ) related issue: #45326 --------- Signed-off-by: yanliang567 <yanliang.qiao@zilliz.com>	2025-11-24 17:55:07 +08:00
Feilong Hou	228eb0f5d0	test: add more test cases and add bulk insert scenario (#45770 ) Issue: #45756 1. add bulk insert scenario 2. fix small issue in e2e cases 3. add search group by test case 4. add timestampstz to gen_all_datatype_collection_schema 5. modify partial update testcase to ensure correct result from timestamptz field On branch feature/timestamps Changes to be committed: modified: common/bulk_insert_data.py modified: common/common_func.py modified: common/common_type.py modified: milvus_client/test_milvus_client_partial_update.py modified: milvus_client/test_milvus_client_timestamptz.py modified: pytest.ini modified: testcases/test_bulk_insert.py Signed-off-by: Eric Hou <eric.hou@zilliz.com> Co-authored-by: Eric Hou <eric.hou@zilliz.com>	2025-11-24 15:21:06 +08:00
Feilong Hou	0231a3edf8	test: enable all timestamptz case (#45128 ) Issue: #44518 --------- Signed-off-by: Eric Hou <eric.hou@zilliz.com> Co-authored-by: Eric Hou <eric.hou@zilliz.com>	2025-11-21 11:03:06 +08:00
qixuan	3202847092	test: add field case about dynamic and compaction (#45694 ) related issue: #42126 Signed-off-by: qixuan <673771573@qq.com>	2025-11-21 10:07:05 +08:00
XuanYang-cn	b95bbaffae	test: Increase PyMilvus version to 2.7.0rc60 for master branch (#45600 ) Automated daily bump from pymilvus master branch. Updates tests/python_client/requirements.txt. Signed-off-by: XuanYang-cn <xuan.yang@zilliz.com>	2025-11-20 16:53:08 +08:00
zhikunyao	aa0870d2ff	test: add e2e-v2 helm for amd (#45621 ) Signed-off-by: Zhikun Yao <zhikun.yao@zilliz.com>	2025-11-20 13:45:11 +08:00
zhuwenxing	e0df44481d	test: refactor checker to using milvus client (#45524 ) /kind improvement Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>	2025-11-20 11:59:08 +08:00

1 2 3 4 5 ...

2135 Commits