Related to #31293
Implement QueryIterator for the Go SDK to enable efficient iteration
over large query result sets using PK-based pagination.
Key changes:
- Add QueryIterator interface and implementation with PK-based
pagination
- Support Int64 and VarChar primary key types for pagination filtering
- Add QueryIteratorOption with batchSize, limit, filter, outputFields
config
- Fix ResultSet.Slice to handle Query results without IDs/Scores
- Add comprehensive unit tests and integration tests
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: the iterator requires the collection primary key (PK)
to be present in outputFields so PK-based pagination and accurate row
counting work. The constructor enforces this by appending the PK to
outputFields when absent, and all pagination (lastPK tracking, PK-range
filters) and ResultCount calculations depend on that guaranteed PK
column.
- New capability: adds a public QueryIterator API (Client.QueryIterator,
QueryIterator interface, QueryIteratorOption) that issues server-side
Query RPCs in configurable batches and implements PK-based pagination
supporting Int64 and VarChar PKs, with options for batchSize, limit,
filter, outputFields and an upfront first-batch validation to fail fast
on invalid params.
- Removed/simplified logic: ResultSet.Slice no longer assumes IDs and
Scores are always present — it branches on presence of IDs (use IDs
length when non-nil; otherwise derive row count from Fields[0]) and
guards Scores slicing. This eliminates redundant/unsafe assumptions and
centralizes correct row-count logic based on actual returned fields.
- No data loss or behavior regression: pagination composes the user
filter with a PK-range filter and always requests the PK field, so
lastPK is extracted from a real column and fetchNextBatch only advances
when rows are returned; EOF is returned only when the server returns no
rows or iterator limit is reached. ResultSet.Slice guards prevent panics
for queries that lack IDs/Scores; Query RPC → ResultSet.Fields remains
the authoritative path for row data, so rows are not dropped and
existing query behavior is preserved.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
related: #45993
This commit extends nullable vector support to the proxy layer,
querynode,
and adds comprehensive validation, search reduce, and field data
handling
for nullable vectors with sparse storage.
Proxy layer changes:
- Update validate_util.go checkAligned() with getExpectedVectorRows()
helper
to validate nullable vector field alignment using valid data count
- Update checkFloatVectorFieldData/checkSparseFloatVectorFieldData for
nullable vector validation with proper row count expectations
- Add FieldDataIdxComputer in typeutil/schema.go for logical-to-physical
index translation during search reduce operations
- Update search_reduce_util.go reduceSearchResultData to use
idxComputers
for correct field data indexing with nullable vectors
- Update task.go, task_query.go, task_upsert.go for nullable vector
handling
- Update msg_pack.go with nullable vector field data processing
QueryNode layer changes:
- Update segments/result.go for nullable vector result handling
- Update segments/search_reduce.go with nullable vector offset
translation
Storage and index changes:
- Update data_codec.go and utils.go for nullable vector serialization
- Update indexcgowrapper/dataset.go and index.go for nullable vector
indexing
Utility changes:
- Add FieldDataIdxComputer struct with Compute() method for efficient
logical-to-physical index mapping across multiple field data
- Update EstimateEntitySize() and AppendFieldData() with fieldIdxs
parameter
- Update funcutil.go with nullable vector support functions
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Full support for nullable vector fields (float, binary, float16,
bfloat16, int8, sparse) across ingest, storage, indexing, search and
retrieval; logical↔physical offset mapping preserves row semantics.
* Client: compaction control and compaction-state APIs.
* **Bug Fixes**
* Improved validation for adding vector fields (nullable + dimension
checks) and corrected search/query behavior for nullable vectors.
* **Chores**
* Persisted validity maps with indexes and on-disk formats.
* **Tests**
* Extensive new and updated end-to-end nullable-vector tests.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/45525
see added README.md for added optimizations
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added query expression optimization feature with a new `optimizeExpr`
configuration flag to enable automatic simplification of filter
predicates, including range predicate optimization, merging of IN/NOT IN
conditions, and flattening of nested logical operators.
* **Bug Fixes**
* Adjusted delete operation behavior to correctly handle expression
evaluation.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
issue: #46451
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Session versioning added to validate coordinator compatibility during
registration and active takeover.
* **Changes**
* Active–standby flow simplified: standby-to-active activation now
always enabled and initialized unconditionally.
* Registration uses version-aware transactions to ensure version
consistency during takeover.
* Startup/health startup path streamlined.
* **Tests**
* Added version-key integration test; removed test for disabling
active-standby.
* Updated flush test to assert rate-limiter errors occur.
* **Chores**
* Removed centralized connection manager and its test suite.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: chyezh <chyezh@outlook.com>
### **User description**
issue: #46097
- the flush rate is modified into 4qps, so the testcase is fail.
___
### **PR Type**
Tests, Bug fix
___
### **Description**
- Replace sequential flush calls with concurrent requests to trigger
rate limiting
- Add sync.WaitGroup for concurrent goroutine execution
- Check for rate limit errors across multiple concurrent flush
operations
- Remove hardcoded error message expectation for flexibility
___
### Diagram Walkthrough
```mermaid
flowchart LR
A["Sequential Flush Calls"] -->|Replace with| B["Concurrent Flush Requests"]
B -->|Use| C["sync.WaitGroup"]
C -->|Validate| D["Rate Limit Errors"]
```
<details><summary><h3>File Walkthrough</h3></summary>
<table><thead><tr><th></th><th align="left">Relevant
files</th></tr></thead><tbody><tr><td><strong>Tests</strong></td><td><table>
<tr>
<td>
<details>
<summary><strong>insert_test.go</strong><dd><code>Refactor flush rate
test to use concurrent requests</code>
</dd></summary>
<hr>
tests/go_client/testcases/insert_test.go
<ul><li>Added <code>sync</code> package import for concurrent goroutine
synchronization<br> <li> Replaced sequential flush calls with 10
concurrent flush operations <br>using goroutines<br> <li> Implemented
WaitGroup to synchronize all concurrent flush requests<br> <li> Modified
error validation to check for rate limit errors across all
<br>concurrent attempts instead of expecting specific sequential
behavior<br> <li> Relaxed error message matching to only check for "rate
limit exceeded" <br>substring</ul>
</details>
</td>
<td><a
href="https://github.com/milvus-io/milvus/pull/46497/files#diff-89a4ddfa15d096e6a5f647da0e461715e5a692b375b04a3d01939f419b00f529">+19/-4</a>
</td>
</tr>
</table></td></tr></tbody></table>
</details>
___
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **Tests**
* Enhanced testing of concurrent flush operations to improve validation
of system reliability under concurrent load scenarios.
---
**Note:** This release contains internal testing improvements with no
direct user-facing feature changes.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #44320
Replace the DeduplicateFieldData function with CheckDuplicatePkExist
that returns an error when duplicate primary keys are detected in the
same batch, instead of silently deduplicating.
Changes:
- Replace DeduplicateFieldData with CheckDuplicatePkExist in util.go
- Update upsertTask.PreExecute to return error on duplicate PKs
- Simplify helper function from findLastOccurrenceIndices to
hasDuplicates
- Update unit tests to verify the new error behavior
- Add Python integration tests for duplicate PK error cases
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
### Is there an existing issue for this?
- [x] I have searched the existing issues
---
Please see: https://github.com/milvus-io/milvus/issues/44593 for the
background
This PR makes https://github.com/milvus-io/milvus/pull/44638 redundant,
which can be closed. The PR comments for the original implementation
suggested an alternative and a better approach, this new PR has that
implementation.
---
This PR
- Adds an optional `minimum_should_match` argument to `text_match(...)`
and wires it through the parser, planner/visitor, index bindings, and
client-level tests/examples so full-text queries can require a minimum
number of tokens to match.
Motivation
- Provide a way to require an expression to match a minimum number of
tokens in lexical search.
What changed
- Parser / grammar
- Added grammar rule and token: `MINIMUM_SHOULD_MATCH` and
`textMatchOption` in `internal/parser/planparserv2/Plan.g4`.
- Regenerated parser outputs: `internal/parser/planparserv2/generated/*`
(parser, lexer, visitor, etc.) to support the new rule.
- Planner / visitor
- `parser_visitor.go`: parse and validate the `minimum_should_match`
integer; propagate as an extra value on the `TextMatch` expression so
downstream components receive it.
- Added `VisitTextMatchOption` visitor method handling.
- Client (Golang)
- Added a unit test to verify `text_match(...,
minimum_should_match=...)` appears in the generated DSL and is accepted
by client code: `client/milvusclient/read_test.go` (new test coverage).
- Added an integration-style test for the feature to the go-client
testcase suite: `tests/go_client/testcases/full_text_search_test.go`
(exercise min=1, min=3, large min).
- Added an example demonstrating `text_match` usage:
`client/milvusclient/read_example_test.go` (example name conforms to
godoc mapping).
- Engine / index
- Updated C++ index interface: `TextMatchIndex::MatchQuery`
- Added/updated unit tests for the index behavior:
`internal/core/src/index/TextMatchIndexTest.cpp`.
- Tantivy binding
- Added `match_query_with_minimum` implementation and unit tests to
`internal/core/thirdparty/tantivy/tantivy-binding/src/index_reader_text.rs`
that construct boolean queries with minimum required clauses.
Behavioral / compatibility notes
- This adds an optional argument to `text_match` only; default behavior
(no `minimum_should_match`) is unchanged.
- Internal API change: `TextMatchIndex::MatchQuery` signature changed
(internal component). Callers in the repo were updated accordingly.
- Parser changes required regenerating ANTLR outputs
Tests and verification
- New/updated tests:
- Go client unit test: `client/milvusclient/read_test.go` (mocked Search
request asserts DSL contains `minimum_should_match=2`).
- Go e2e-style test:
`tests/go_client/testcases/full_text_search_test.go` (exercises min=1, 3
and a large min).
- C++ unit tests for index behavior:
`internal/core/src/index/TextMatchIndexTest.cpp`.
- Rust binding unit tests for `match_query_with_minimum`.
- Local verification commands to run:
- Go client tests: `cd client && go test ./milvusclient -run ^$` (client
package)
- Go testcases: `cd tests/go_client && go test ./testcases -run
TestTextMatchMinimumShouldMatch` (requires a running Milvus instance)
- C++ unit tests / build: run core build/test per repo instructions (the
change touches core index code).
- Rust binding tests: `cd
internal/core/thirdparty/tantivy/tantivy-binding && cargo test` (if
developing locally).
---------
Signed-off-by: Amit Kumar <amit.kumar@reddit.com>
Co-authored-by: Amit Kumar <amit.kumar@reddit.com>
issue: #43897
- Alter collection/database is implemented by WAL-based DDL framework
now.
- Support AlterCollection/AlterDatabase in wal now.
- Alter operation can be synced by new CDC now.
- Refactor some UT for alter DDL.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #43897
- Part of collection/index related DDL is implemented by WAL-based DDL
framework now.
- Support following message type in wal, CreateCollection,
DropCollection, CreatePartition, DropPartition, CreateIndex, AlterIndex,
DropIndex.
- Part of collection/index related DDL can be synced by new CDC now.
- Refactor some UT for collection/index DDL.
- Add Tombstone scheduler to manage the tombstone GC for collection or
partition meta.
- Move the vchannel allocation into streaming pchannel manager.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
Related to #45105
This commit refactors the test MilvusClient wrapper to leverage Go's
embedding pattern and improves test organization with subtests.
**File**: `tests/go_client/base/milvus_client.go`
- **Use `typeutil.NewSet` for rate limiting**: Replace map-based
`rateLogMethods` with `typeutil.NewSet` for cleaner and more efficient
membership checking
- **Embed `*client.Client` directly**: Change `MilvusClient` structure
from wrapping the client as a field to embedding it directly
- **Remove ~380 lines of wrapper methods**: All wrapper methods
(database, collection, partition, index, read/write, RBAC, etc.) are now
unnecessary thanks to Go's embedding feature, which automatically
promotes embedded methods to the outer type
- **Simplify initialization**: Update `NewMilvusClient` and `Close` to
use embedded client directly
- **Fix typo**: Correct comment "Ike the actual method" → "Invoke the
actual method"
**File**: `tests/go_client/testcases/search_test.go`
- **Wrap assertions in subtests**: Each search expression test is now
wrapped in `t.Run()` with descriptive names
- **Dynamic subtest naming**: Format:
`expr={expression}_dynamic-{true/false}` for clear test identification
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
relate: https://github.com/milvus-io/milvus/issues/43687
Add global analyzer options, avoid having to merge some milvus params
into user's analyzer params.
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/44399
This PR implements STL_SORT for VARCHAR data type for both RAM and MMAP
mode.
The general idea is that we deduplicate field values and maintains a
posting list for each unique value.
The serialization format of the index is:
```
[unique_count][string_offsets][string_data][post_list_offsets][post_list_data][magic_code]
string_offsets: array of offsets into string_data section
string_data: str_len1, str1, str_len2, str2, ...
post_list_offsets: array of offsets into post_list_data section
post_list_data: post_list_len1, row_id1, row_id2, ..., post_list_len2, row_id1, row_id2, ...
```
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
issue: #43427
This pr's main goal is merge #37417 to milvus 2.5 without conflicts.
# Main Goals
1. Create and describe collections with geospatial type
2. Insert geospatial data into the insert binlog
3. Load segments containing geospatial data into memory
4. Enable query and search can display geospatial data
5. Support using GIS funtions like ST_EQUALS in query
6. Support R-Tree index for geometry type
# Solution
1. **Add Type**: Modify the Milvus core by adding a Geospatial type in
both the C++ and Go code layers, defining the Geospatial data structure
and the corresponding interfaces.
2. **Dependency Libraries**: Introduce necessary geospatial data
processing libraries. In the C++ source code, use Conan package
management to include the GDAL library. In the Go source code, add the
go-geom library to the go.mod file.
3. **Protocol Interface**: Revise the Milvus protocol to provide
mechanisms for Geospatial message serialization and deserialization.
4. **Data Pipeline**: Facilitate interaction between the client and
proxy using the WKT format for geospatial data. The proxy will convert
all data into WKB format for downstream processing, providing column
data interfaces, segment encapsulation, segment loading, payload
writing, and cache block management.
5. **Query Operators**: Implement simple display and support for filter
queries. Initially, focus on filtering based on spatial relationships
for a single column of geospatial literal values, providing parsing and
execution for query expressions.Now only support brutal search
7. **Client Modification**: Enable the client to handle user input for
geospatial data and facilitate end-to-end testing.Check the modification
in pymilvus.
---------
Signed-off-by: Yinwei Li <yinwei.li@zilliz.com>
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Co-authored-by: ZhuXi <150327960+Yinwei-Yu@users.noreply.github.com>
issue: https://github.com/milvus-io/milvus/issues/27467
>My plan is as follows.
>- [x] M1 Create collection with timestamptz field
>- [x] M2 Insert timestamptz field data
>- [x] M3 Retrieve timestamptz field data
>- [x] M4 Implement handoff
>- [x] M5 Implement compare operator
>- [x] M6 Implement extract operator
>- [x] M8 Support database/collection level default timezone
>- [x] M7 Support STL-SORT index for datatype timestamptz
---
The third PR of issue: https://github.com/milvus-io/milvus/issues/27467,
which completes M5, M6, M7, M8 described above.
## M8 Default Timezone
We will be able to use alter_collection() and alter_database() in a
future Python SDK release to modify the default timezone at the
collection or database level.
For insert requests, the timezone will be resolved using the following
order of precedence: String Literal-> Collection Default -> Database
Default.
For retrieval requests, the timezone will be resolved in this order:
Query Parameters -> Collection Default -> Database Default.
In both cases, the final fallback timezone is UTC.
## M5: Comparison Operators
We can now use the following expression format to filter on the
timestamptz field:
- `timestamptz_field [+/- INTERVAL 'interval_string'] {comparison_op}
ISO 'iso_string' `
- The interval_string follows the ISO 8601 duration format, for example:
P1Y2M3DT1H2M3S.
- The iso_string follows the ISO 8601 timestamp format, for example:
2025-01-03T00:00:00+08:00.
- Example expressions: "tsz + INTERVAL 'P0D' != ISO
'2025-01-03T00:00:00+08:00'" or "tsz != ISO
'2025-01-03T00:00:00+08:00'".
## M6: Extract
We will be able to extract sepecific time filed by kwargs in a future
Python SDK release.
The key is `time_fields`, and value should be one or more of "year,
month, day, hour, minute, second, microsecond", seperated by comma or
space. Then the result of each record would be an array of int64.
## M7: Indexing Support
Expressions without interval arithmetic can be accelerated using an
STL-SORT index. However, expressions that include interval arithmetic
cannot be indexed. This is because the result of an interval calculation
depends on the specific timestamp value. For example, adding one month
to a date in February results in a different number of added days than
adding one month to a date in March.
---
After this PR, the input / output type of timestamptz would be iso
string. Timestampz would be stored as timestamptz data, which is int64_t
finally.
> for more information, see https://en.wikipedia.org/wiki/ISO_8601
---------
Signed-off-by: xtx <xtianx@smail.nju.edu.cn>
issue: #29735
Implement partial field update functionality for upsert operations,
supporting scalar, vector, and dynamic JSON fields without requiring all
collection fields.
Changes:
- Add queryPreExecute to retrieve existing records before upsert
- Implement UpdateFieldData function for merging data
- Add IDsChecker utility for efficient primary key lookups
- Fix JSON data creation in tests using proper map marshaling
- Add test cases for partial updates of different field types
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
There are some unstable cases in go sdk e2e cases, which used default
bounded consistency level. This patch make these cases use strong level
to avoid unstable test results
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #38399
- add an adaptor type to adapt the streaming service client and
msgstream client to reuse the msgdispatcher.
Signed-off-by: chyezh <chyezh@outlook.com>
Related to #41460
This PR looses insert data check based on schema. These check shall
actually happen at milvus server side.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #39718
This PR:
- Use WAL broadcast timestamp as Collection update timestamp
- Remove request_fields size assertion
- Remove proxy schema cache loaded field check & skip related cases
- other minor issues
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #39095
Previous PR #39990 update pkg module path using "/v2" package name, this
PR update milvusclient go sdk dependency for this update
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/39818
This PR mimics Varchar data type, allows insert, search, query, delete,
full-text search and others.
Functionalities related to filter expressions are disabled temporarily.
Storage changes for Text data type will be in the following PRs.
Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>