Related #44956
Add manifest_path field to CreateStatsRequest and propagate it through
the stats task pipeline. This enables stats tasks and text index
building to access segment manifest for storage v2 format operations.
- Add manifest_path field to CreateStatsRequest proto
- Set ManifestPath from segment metadata in DataCoord
- Pass manifest to BuildIndexInfo in stats task builder
- Include manifest in compaction text index creation
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
fixes: https://github.com/milvus-io/milvus/issues/45934
pinIndex is a const and only do read operations rlock would be the right
choice for performance
Signed-off-by: Lanqing Yang <lanqingy93@gmail.com>
Related to #44647
Update milvus-storage from 91df193 to 839a8e5 to include
milvus-io/milvus-storage#342, which fixes a race condition in
S3GlobalContext initialization.
The fix moves the is_initialized_ flag update from before DoInitialize()
to after it completes. This ensures the initialization flag is only set
to true after the actual initialization is done, preventing potential
issues if DoInitialize() fails or if other code checks the flag during
initialization.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #44956
**Support specified version manifest write**
- Add `baseVersion` parameter to `NewPackedRecordManifestWriter` and
`NewFFIPackedWriter` to support writing manifest based on a specific
version instead of always overwriting the latest
- Add `manifestPath` tracking in `BulkPackWriterV2` to maintain manifest
state across writes
- Add `GetManifestInfo` method to parse existing manifest path and
extract base path and version
- Add `UpdateManifestPath` metacache action to track manifest path in
segment info
- Update `transaction_begin` FFI call to use the specified base version
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/42148
For a vector field inside a STRUCT, since a STRUCT can only appear as
the element type of an ARRAY field, the vector field in STRUCT is
effectively an array of vectors, i.e. an embedding list.
Milvus already supports searching embedding lists with metrics whose
names start with the prefix MAX_SIM_.
This PR allows Milvus to search embeddings inside an embedding list
using the same metrics as normal embedding fields. Each embedding in the
list is treated as an independent vector and participates in ANN search.
Further, since STRUCT may contain scalar fields that are highly related
to the embedding field, this PR introduces an element-level filter
expression to refine search results.
The grammar of the element-level filter is:
element_filter(structFieldName, $[subFieldName] == 3)
where $[subFieldName] refers to the value of subFieldName in each
element of the STRUCT array structFieldName.
It can be combined with existing filter expressions, for example:
"varcharField == 'aaa' && element_filter(struct_field, $[struct_int] ==
3)"
A full example:
```
struct_schema = milvus_client.create_struct_field_schema()
struct_schema.add_field("struct_str", DataType.VARCHAR, max_length=65535)
struct_schema.add_field("struct_int", DataType.INT32)
struct_schema.add_field("struct_float_vec", DataType.FLOAT_VECTOR, dim=EMBEDDING_DIM)
schema.add_field(
"struct_field",
datatype=DataType.ARRAY,
element_type=DataType.STRUCT,
struct_schema=struct_schema,
max_capacity=1000,
)
...
filter = "varcharField == 'aaa' && element_filter(struct_field, $[struct_int] == 3 && $[struct_str] == 'abc')"
res = milvus_client.search(
COLLECTION_NAME,
data=query_embeddings,
limit=10,
anns_field="struct_field[struct_float_vec]",
filter=filter,
output_fields=["struct_field[struct_int]", "varcharField"],
)
```
TODO:
1. When an `element_filter` expression is used, a regular filter
expression must also be present. Remove this restriction.
2. Implement `element_filter` expressions in the `query`.
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
issue: #46277
- Update db/collection/partition disk quota metrics when cluster disk
quota changes, since they use cluster quota as default value
- Fix incorrect label "collection" to "partition" in disk quota per
partition watcher
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #46087
The previous implementation checked if the total number of ready
delegators >= replicaNum per channel. This could cause target updates to
block indefinitely when dynamically increasing replicas, because some
replicas might lack nodes while the total count still met the threshold.
This change switches to a replica-based check approach:
- Iterate through each replica individually
- For each replica, verify all channels have at least one ready
delegator
- Only sync delegators from fully ready replicas
- Skip replicas that are not ready (e.g., missing nodes for some
channels)
This ensures target updates can proceed with ready replicas while
replicas that lack nodes during dynamic scaling are gracefully skipped.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Issue: #46253
On branch feature/timestamps
Changes to be committed:
new file: testcases/test_timestamptz.py
Signed-off-by: Eric Hou <eric.hou@zilliz.com>
Co-authored-by: Eric Hou <eric.hou@zilliz.com>
/kind improvement
- Add normalize_error_message function to extract and normalize error
text
- Collect unique error messages during chaos test execution
- Display error details in assertion messages for better debugging
Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>
Upgrade milvus-storage from 33bf815 to 91df193.
This includes the fix from milvus-io/milvus-storage#337, which resolves
a namespace collision where both Milvus and milvus-storage defined
identical credentials provider classes in the same namespace. Although
no compile-time redefinition errors occurred, the dynamic linker could
resolve to the wrong implementation at runtime, potentially causing
cloud authentication failures due to configuration mismatches.
The fix changes milvus-storage's credentials provider namespace to
`milvus_storage`, ensuring each project uses its own implementation.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #46255
Change RootCoord default gRPC port from 53100 to 22125 to avoid
conflicts with ephemeral port range.
The previous port 53100 falls within the Linux ephemeral port range
(32768-60999), which could cause conflicts with other connections
including Milvus's own outbound connections. Port 22125 is outside the
ephemeral range and provides more reliable service binding.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
This PR:
1. Define and implement the new FlushAllMessage.
2. Refactor FlushAll to flush the entire cluster.
issue: https://github.com/milvus-io/milvus/issues/45919
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Related to #46225
Replace the heterogeneous insert data handling logic that modified
schema_ while holding a shared lock with an assertion. The previous
implementation had a concurrency bug where schema modification
operations were performed under a shared_lock, which violates mutex
semantics and can lead to data races.
Issue: #46225 reported two problems:
1. Schema modification under shared_lock (not exclusive lock)
2. Access to schema_ not protected by mutex in growing segment
The removed code attempted to handle "added fields" by:
- Adding new field to schema (schema_->AddField)
- Appending field metadata to insert_record_
- Setting default data for existing rows
All these write operations were performed while holding only a
shared_lock, which is incorrect since shared_locks are meant for
read-only operations.
This fix replaces the unsafe modification with an assertion that fails
if an unexpected new field is encountered in a growing segment with
existing data. The proper handling of schema changes should go through
the Reopen() path which correctly acquires a unique_lock before
modifying schema_.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #45511
our tantivy inverted index currently does not include item index if the
value is an array, thus we can't do `a[0] == 'b'` type of look up in the
inverted index. for such, we need to skip the index and use brute force
search.
we may improve our index in the future, so this is a temp solution
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
issue: #39157
Overview:
Support search by PK by resolving IDs to vectors on Proxy side. Upgrade
go-api to adapt to new proto definitions.
Design:
- Upgrade milvus-proto/go-api to latest master.
- Implement handleIfSearchByPK in Proxy: resolve IDs to vectors via
internal Query, then rewrite SearchRequest.
- Adapt to 'SearchInput' oneof field in SearchRequest across client and
handlers.
- Fix binary vector stride calculation bug in placeholder utils.
Compatibility:
- Old Pymilvus can still work w/o this feature
What is included:
- Dense and Sparse
- Multi vector fields
- Rejection on BM25
What is **not** include:
- Hybrid Search
- EmbeddingList
- Restful API
Signed-off-by: Li Liu <li.liu@zilliz.com>
issue: #46217
The test was failing intermittently because it didn't wait for the
pipeline to finish processing messages before exiting. The test sent a
message to the pipeline and immediately returned, causing the deferred
Close() to execute before ProcessInsert, ProcessDelete, and UpdateTSafe
could be called.
Fix by:
- Moving message construction before mock expectations setup
- Adding a done channel to synchronize on UpdateTSafe completion
- Waiting for the signal before test exits
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Issue: #46188
Bug was caused by inconsistent version of tzdata as well as wrong month
assignment in convert_timestamptz function.
Also fix when debug_mode=True the compare function can correctly return
True or False.
---------
Signed-off-by: Eric Hou <eric.hou@zilliz.com>
Co-authored-by: Eric Hou <eric.hou@zilliz.com>
issue: #46176
- Add checkAligned validation before processing partial update field
data to prevent index out of range panic when field data arrays have
mismatched lengths
- Fix GetNumRowOfFieldDataWithSchema to handle Timestamptz string format
and Geometry WKT format properly
- Add unit tests for empty data array scenarios in partial update
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
This PR fixes two issues related to segment loading and index
deserialization:
1. Fill partition_id in LoadIndexInfo when converting field index info,
which is required by cardinal (DiskANN) index deserialization.
2. Close RemoteOutputStream in destructor to ensure buffer flushed and
resources released properly.
issue: #46141
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>