issue: https://github.com/milvus-io/milvus/issues/42053
The splitted literals in `match` execution should be handled in `and`
manner rather than `or`.
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
issue: https://github.com/milvus-io/milvus/issues/45890
ComputePhraseMatchSlop accepts three pararms:
1. A string: query text
2. Some trings: data texts
3. Analyzer params,
Slop will be calculated for the query text with each data text in the
context of phrase match where they are tokenized with tokenizer with
analyzer params.
So two array will be returned:
1. is_match: is phrase match can sucess
2. slop: the related slop if phrase match can sucess, or -1 is cannot.
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Related to #46453
The test was flaky because Submit() returns a Future and executes
asynchronously. The test was setting sig=true immediately after Submit()
returned, but the task's Run() might not have completed yet, causing
mock expectation failures.
Fix by calling future.Await() to wait for task execution to complete
before signaling. Also remove dead commented code.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Set the auto-appended dynamic field to be nullable with a default value
of empty JSON object `{}`. This allows collections with dynamic schema
to handle rows that don't have any dynamic fields more gracefully,
avoiding potential null reference issues when the dynamic field is not
explicitly set during insert.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #46358
Add segment reopen mechanism in QueryCoord to handle segment data
updates when the manifest path changes. This enables QueryNode to reload
segment data without full segment reload, supporting storage v2
incremental updates.
Changes:
- Add ActionTypeReopen action type and LoadScope_Reopen in protobuf
- Track ManifestPath in segment distribution metadata
- Add CheckSegmentDataReady utility to verify segment data matches
target
- Extend getSealedSegmentDiff to detect segments needing reopen
- Create segment reopen tasks when manifest path differs from target
- Block target update until segment data is ready
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #43897
also for issue: #46166
add ack_sync_up flag into broadcast message header, which indicates that
whether the broadcast operation is need to be synced up between the
streaming node and the coordinator.
If the ack_sync_up is false, the broadcast operation will be acked once
the recovery storage see the message at current vchannel, the fast ack
operation can be applied to speed up the broadcast operation.
If the ack_sync_up is true, the broadcast operation will be acked after
the checkpoint of current vchannel reach current message.
The fast ack operation can not be applied to speed up the broadcast
operation, because the ack operation need to be synced up with streaming
node.
e.g. if truncate collection operation want to call ack once callback
after the all segment are flushed at current vchannel, it should set the
ack_sync_up to be true.
TODO: current implementation doesn't promise the ack sync up semantic,
it only promise FastAck operation will not be applied, wait for 3.0 to
implement the ack sync up semantic. only for truncate api now.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
Related to #44956
This change propagates the useLoonFFI configuration through the import
pipeline to enable LOON FFI usage during data import operations.
Key changes:
- Add use_loon_ffi field to ImportRequest protobuf message
- Add manifest_path field to ImportSegmentInfo for tracking manifest
- Initialize manifest path when creating segments (both import and
growing)
- Pass useLoonFFI flag through NewSyncTask in import tasks
- Simplify pack_writer_v2 by removing GetManifestInfo method and relying
on pre-initialized manifest path from segment creation
- Update segment meta with manifest path after import completion
This allows the import workflow to use the LOON FFI based packed writer
when the common.useLoonFFI configuration is enabled.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #46358
This PR implements segment reopening functionality on query nodes,
enabling the application of data or schema changes to already-loaded
segments without requiring a full reload.
### Core (C++)
**New SegmentLoadInfo class**
(`internal/core/src/segcore/SegmentLoadInfo.h/cpp`):
- Encapsulates segment load configuration with structured access
- Implements `ComputeDiff()` to calculate differences between old and
new load states
- Tracks indexes, binlogs, and column groups that need to be loaded or
dropped
- Provides `ConvertFieldIndexInfoToLoadIndexInfo()` for index loading
**ChunkedSegmentSealedImpl modifications**:
- Added `Reopen(const SegmentLoadInfo&)` method to apply incremental
changes based on computed diff
- Refactored `LoadColumnGroups()` and `LoadColumnGroup()` to support
selective loading via field ID map
- Extracted `LoadBatchIndexes()` and `LoadBatchFieldData()` for reusable
batch loading logic
- Added `LoadManifest()` for manifest-based loading path
- Updated all methods to use `SegmentLoadInfo` wrapper instead of direct
proto access
**SegmentGrowingImpl modifications**:
- Added `Reopen()` stub method for interface compliance
**C API additions** (`segment_c.h/cpp`):
- Added `ReopenSegment()` function exposing reopen to Go layer
### Go Side
**QueryNode handlers** (`internal/querynodev2/`):
- Added `HandleReopen()` in handlers.go
- Added `ReopenSegments()` RPC in services.go
**Segment interface** (`internal/querynodev2/segments/`):
- Extended `Segment` interface with `Reopen()` method
- Implemented `Reopen()` in LocalSegment
- Added `Reopen()` to segment loader
**Segcore wrapper** (`internal/util/segcore/`):
- Added `Reopen()` method in segment.go
- Added `ReopenSegmentRequest` in requests.go
### Proto
- Added new fields to support reopen in `query_coord.proto`
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
This commit addresses an intermittent test failure in TestTargetObserver
with a mock panic error.
Problem:
--------
The original test TestTriggerUpdateTarget was a monolithic test that
cleared and recreated mock expectations mid-test execution. This created
a race condition:
1. Background goroutine in TargetObserver runs every 3 seconds, calling
broker.ListIndexes() and broker.DescribeCollection()
2. Test cleared all mock expectations at line 200 to prepare for next
phase
3. Test only re-mocked GetRecoveryInfoV2, leaving ListIndexes unmocked
4. If background goroutine triggered during this ~0.01s window (lines
200-213), it would call the unmocked ListIndexes() method, causing panic
and timeout
Error observed:
```
panic: test timed out after 10m0s
mock: I don't know what to return because the method call was unexpected.
Either do Mock.On("ListIndexes").Return(...) first, or remove the call.
```
Solution:
---------
Split the monolithic test into two independent test cases:
1. TestInitialLoad_ShouldNotUpdateCurrentTarget
- Tests that CurrentTarget remains empty during initial load
- Verifies the two-phase update mechanism works correctly
2. TestIncrementalUpdate_WithNewSegment
- Tests incremental updates when new segments arrive
- Properly sets up ALL required mocks before Eventually() calls
- Lines 241-242 now include ListIndexes and DescribeCollection mocks
Benefits:
---------
- Eliminates race condition entirely (no mid-test mock clearing)
- Better test isolation and maintainability
- Clearer test intent with descriptive names
- Tests can run independently and in parallel
- Follows FIRST principles (Fast, Isolated, Repeatable, Self-validating,
Timely)
Signed-off-by: Li Liu <li.liu@zilliz.com>
Related to #44956
Pass ManifestPath field to SegmentLoadInfo when loading growing segments
in loadGrowingSegments function. This ensures storage v2 can properly
locate segment data via manifest path, consistent with other segment
loading paths.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #46087, #46327
The previous implementation only checked if there were any ready
delegators before updating the current target. This could lead to
partial target updates when only some channels had ready delegators.
This regression was introduced by #46088, which removed the check for
all channels being ready. This fix ensures that
shouldUpdateCurrentTarget returns true only when ALL channels have been
successfully synced, preventing incomplete target updates that could
cause query inconsistencies.
Added unit tests to cover:
- All channels synced scenario (should return true)
- Partial channels synced scenario (should return false)
- No ready delegators scenario (should return false)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Related #44956
Add manifest_path field to CreateStatsRequest and propagate it through
the stats task pipeline. This enables stats tasks and text index
building to access segment manifest for storage v2 format operations.
- Add manifest_path field to CreateStatsRequest proto
- Set ManifestPath from segment metadata in DataCoord
- Pass manifest to BuildIndexInfo in stats task builder
- Include manifest in compaction text index creation
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
fixes: https://github.com/milvus-io/milvus/issues/45934
pinIndex is a const and only do read operations rlock would be the right
choice for performance
Signed-off-by: Lanqing Yang <lanqingy93@gmail.com>
Related to #44647
Update milvus-storage from 91df193 to 839a8e5 to include
milvus-io/milvus-storage#342, which fixes a race condition in
S3GlobalContext initialization.
The fix moves the is_initialized_ flag update from before DoInitialize()
to after it completes. This ensures the initialization flag is only set
to true after the actual initialization is done, preventing potential
issues if DoInitialize() fails or if other code checks the flag during
initialization.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #44956
**Support specified version manifest write**
- Add `baseVersion` parameter to `NewPackedRecordManifestWriter` and
`NewFFIPackedWriter` to support writing manifest based on a specific
version instead of always overwriting the latest
- Add `manifestPath` tracking in `BulkPackWriterV2` to maintain manifest
state across writes
- Add `GetManifestInfo` method to parse existing manifest path and
extract base path and version
- Add `UpdateManifestPath` metacache action to track manifest path in
segment info
- Update `transaction_begin` FFI call to use the specified base version
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/42148
For a vector field inside a STRUCT, since a STRUCT can only appear as
the element type of an ARRAY field, the vector field in STRUCT is
effectively an array of vectors, i.e. an embedding list.
Milvus already supports searching embedding lists with metrics whose
names start with the prefix MAX_SIM_.
This PR allows Milvus to search embeddings inside an embedding list
using the same metrics as normal embedding fields. Each embedding in the
list is treated as an independent vector and participates in ANN search.
Further, since STRUCT may contain scalar fields that are highly related
to the embedding field, this PR introduces an element-level filter
expression to refine search results.
The grammar of the element-level filter is:
element_filter(structFieldName, $[subFieldName] == 3)
where $[subFieldName] refers to the value of subFieldName in each
element of the STRUCT array structFieldName.
It can be combined with existing filter expressions, for example:
"varcharField == 'aaa' && element_filter(struct_field, $[struct_int] ==
3)"
A full example:
```
struct_schema = milvus_client.create_struct_field_schema()
struct_schema.add_field("struct_str", DataType.VARCHAR, max_length=65535)
struct_schema.add_field("struct_int", DataType.INT32)
struct_schema.add_field("struct_float_vec", DataType.FLOAT_VECTOR, dim=EMBEDDING_DIM)
schema.add_field(
"struct_field",
datatype=DataType.ARRAY,
element_type=DataType.STRUCT,
struct_schema=struct_schema,
max_capacity=1000,
)
...
filter = "varcharField == 'aaa' && element_filter(struct_field, $[struct_int] == 3 && $[struct_str] == 'abc')"
res = milvus_client.search(
COLLECTION_NAME,
data=query_embeddings,
limit=10,
anns_field="struct_field[struct_float_vec]",
filter=filter,
output_fields=["struct_field[struct_int]", "varcharField"],
)
```
TODO:
1. When an `element_filter` expression is used, a regular filter
expression must also be present. Remove this restriction.
2. Implement `element_filter` expressions in the `query`.
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
issue: #46277
- Update db/collection/partition disk quota metrics when cluster disk
quota changes, since they use cluster quota as default value
- Fix incorrect label "collection" to "partition" in disk quota per
partition watcher
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #46087
The previous implementation checked if the total number of ready
delegators >= replicaNum per channel. This could cause target updates to
block indefinitely when dynamically increasing replicas, because some
replicas might lack nodes while the total count still met the threshold.
This change switches to a replica-based check approach:
- Iterate through each replica individually
- For each replica, verify all channels have at least one ready
delegator
- Only sync delegators from fully ready replicas
- Skip replicas that are not ready (e.g., missing nodes for some
channels)
This ensures target updates can proceed with ready replicas while
replicas that lack nodes during dynamic scaling are gracefully skipped.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Upgrade milvus-storage from 33bf815 to 91df193.
This includes the fix from milvus-io/milvus-storage#337, which resolves
a namespace collision where both Milvus and milvus-storage defined
identical credentials provider classes in the same namespace. Although
no compile-time redefinition errors occurred, the dynamic linker could
resolve to the wrong implementation at runtime, potentially causing
cloud authentication failures due to configuration mismatches.
The fix changes milvus-storage's credentials provider namespace to
`milvus_storage`, ensuring each project uses its own implementation.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
This PR:
1. Define and implement the new FlushAllMessage.
2. Refactor FlushAll to flush the entire cluster.
issue: https://github.com/milvus-io/milvus/issues/45919
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Related to #46225
Replace the heterogeneous insert data handling logic that modified
schema_ while holding a shared lock with an assertion. The previous
implementation had a concurrency bug where schema modification
operations were performed under a shared_lock, which violates mutex
semantics and can lead to data races.
Issue: #46225 reported two problems:
1. Schema modification under shared_lock (not exclusive lock)
2. Access to schema_ not protected by mutex in growing segment
The removed code attempted to handle "added fields" by:
- Adding new field to schema (schema_->AddField)
- Appending field metadata to insert_record_
- Setting default data for existing rows
All these write operations were performed while holding only a
shared_lock, which is incorrect since shared_locks are meant for
read-only operations.
This fix replaces the unsafe modification with an assertion that fails
if an unexpected new field is encountered in a growing segment with
existing data. The proper handling of schema changes should go through
the Reopen() path which correctly acquires a unique_lock before
modifying schema_.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #45511
our tantivy inverted index currently does not include item index if the
value is an array, thus we can't do `a[0] == 'b'` type of look up in the
inverted index. for such, we need to skip the index and use brute force
search.
we may improve our index in the future, so this is a temp solution
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
issue: #39157
Overview:
Support search by PK by resolving IDs to vectors on Proxy side. Upgrade
go-api to adapt to new proto definitions.
Design:
- Upgrade milvus-proto/go-api to latest master.
- Implement handleIfSearchByPK in Proxy: resolve IDs to vectors via
internal Query, then rewrite SearchRequest.
- Adapt to 'SearchInput' oneof field in SearchRequest across client and
handlers.
- Fix binary vector stride calculation bug in placeholder utils.
Compatibility:
- Old Pymilvus can still work w/o this feature
What is included:
- Dense and Sparse
- Multi vector fields
- Rejection on BM25
What is **not** include:
- Hybrid Search
- EmbeddingList
- Restful API
Signed-off-by: Li Liu <li.liu@zilliz.com>