issue: https://github.com/milvus-io/milvus/issues/41690
This commit implements partial search result functionality when query
nodes go down, improving system availability during node failures. The
changes include:
- Enhanced load balancing in proxy (lb_policy.go) to handle node
failures with retry support
- Added partial search result capability in querynode delegator and
distribution logic
- Implemented tests for various partial result scenarios when nodes go
down
- Added metrics to track partial search results in querynode_metrics.go
- Updated parameter configuration to support partial result required
data ratio
- Replaced old partial_search_test.go with more comprehensive
partial_result_on_node_down_test.go
- Updated proto definitions and improved retry logic
These changes improve query resilience by returning partial results to
users when some query nodes are unavailable, ensuring that queries don't
completely fail when a portion of data remains accessible.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
1. Add global scheduler for datacoord.
2. Define and implement new CreateTask, QueryTask, DropTask interfaces.
3. Refine Import, Compaction, Stats, Index task.
issue: https://github.com/milvus-io/milvus/issues/41123
Co-authored-by: Cai Zhang <cai.zhang@zilliz.com>
See #36264
In this PR:
- Enhanced error handling in parse of grouping field.
- Fixed null handling in reduce tasks in proxy nodes.
- Updated tests to reflect changes in error handling and data processing
logic.
---------
Signed-off-by: Ted Xu <ted.xu@zilliz.com>
When autoID is enabled, the preimport task estimates row distribution by
evenly dividing the total row count (numRows) across all vchannels:
`estimatedCount = numRows / vchannelNum`.
However, the actual import task hashes real auto-generated IDs to
determine
the target vchannel. This mismatch can lead to inaccurate row
distribution estimation
in such corner cases:
- Importing 1 row into 2 vchannels:
• Preimport: 1 / 2 = 0 → both v0 and v1 are estimated to have 0 rows
• Import: real autoID (e.g., 457975852966809057) hashes to v1
→ actual result: v0 = 0, v1 = 1
To resolve such corner case, we now allocate at least one segment for
each vchannel
when autoID is enabled, ensuring all vchannels are prepared to receive
data even
if no rows are estimated for them.
issue: https://github.com/milvus-io/milvus/issues/41759
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Merge RootCoord, DataCoord And QueryCoord into MixCoord
Make Session into one
issue : https://github.com/milvus-io/milvus/issues/37764
---------
Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
The fields and partitions information are stored and fetched with
different prefixes in the metadata. In the CreateCollectionTask, the
RootCoord checks the existing collection information against the
metadata. This check fails if the order of the fields or partitions info
differs, leading to an error after restarting Milvus. To resolve this,
we should use a map in the check logic to ensure consistency.
related: https://github.com/milvus-io/milvus/issues/40955
---------
Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
issue: #40638
- Add `ChannelID` for streaming replica in future.
- Remove the pchannel count fair balance policy for streaming.
- Add Score based vchannel fair balance policy for streaming.
- Add pchannel stats manager to collect the stats of pchannel for
balancer.
- Add configuration and metrics for new balance policy
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #39551
This PR remove querycoord's scheduling of l0 segments:
- only load l0 segment when watch channel
- only release l0 segment when release channel or sync data distribution
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #39680
if compaction/gc happens, load collection may stuck due to
SegmentNotFound, we should trigger UpdateNextTarget to get a new data
view to execute loading operation.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #38399
- Add a pchannel level checkpoint for flush processing
- Refactor the recovery of flushers of wal
- make a shared wal scanner first, then make multi datasyncservice on it
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #38399
- broadcast message can carry multi resource key now.
- implement event-based notification for broadcast messages
- broadcast message use broadcast id as a unique identifier in message
- broadcasted message on vchannels keep the broadcasted vchannel now.
- broadcasted message and broadcast message have a common broadcast
header now.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #38399
- Make the wal scanner interface same with streaming scanner.
- Use wal if the wal is located at current node.
- Otherwise fallback the old logic.
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #38399
- make broadcast service available for msgstream by reusing the
architecture streaming service
---------
Signed-off-by: chyezh <chyezh@outlook.com>
related: #37031
* built-in privilege group privileges in listPrivilegeGroups() should be
the same as in milvus.yaml
* collections granted by collection level built-in privilege group
should be list in showCollections()
Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
When delete by partition_key, Milvus will generates L0 segments
globally. During L0 Compaction, those L0 segments will touch all
partitions collection wise. Due to the false-positive rate of segment
bloomfilters, L0 compactions will append false deltalogs to completed
irrelevant partitions, which causes *partition deletion amplification.
This PR uses partition_key to set targeted partitionID when producing
deleteMsgs into MsgStreams. This'll narrow down L0 segments scope to
partition level, and remove the false-positive influence
collection-wise.
However, due to DeleteMsg structure, we can only label one partition to
one deleteMsg, so this enhancement fails if user wants to delete over 2
partition_keys in one deletion.
See also: #34665
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
issue: #38325
the old impl only to check grant in default db before drop role, which
may cause role be dropped when grant still exist.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
related issue: https://github.com/milvus-io/milvus/issues/37031
fixed issues #38042: The interface "grant_v2" does not support empty
collectionName while the error says it does
Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
related issue: https://github.com/milvus-io/milvus/issues/37031
fixed issues:
#37974: better error messages for grant v2 interface
#37903: fix meta built-in privilege group object name
#37843: better error messages for custom privilege group interface
#38002: fix built-in privilege group meta to pass proxy interceptor
check
#38008: fix revoke v2 to support revoking v1 granted privileges
Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
issue : https://github.com/milvus-io/milvus/issues/36864
I have a few questions regarding my approach.I will consolidate them
here for feedback and review.Thanks
---------
Signed-off-by: Nischay Yadav <nischay.yadav@ibm.com>
Signed-off-by: Nischay <Nischay.Yadav@ibm.com>
issue: #37679
pr #36549 introduce the logic error which update current target when
only parts of channel is ready.
This PR fix the logic error and let dist handler keep pull distribution
on querynode until all delegator becomes serviceable.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #36293#36242
after qn recover, delegator may be loaded in new node, after all segment
has been loaded, delegator becomes serviceable. but delegator's target
version hasn't been synced, and if search/query comes, delegator will
use wrong target version to filter out a empty segment list, which
caused empty search result.
This pr will block delegator's serviceable status until target version
is synced
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #37289
cause pr #37116 introduce retry on get shard leader, which make search
won't fail during query node down.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #37289
those test case use search to verify replica's status, but if the search
gap is 1s, the node down's effect may be fixed up by balance.
This PR remove the 1 second gap between search operation.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue:https://github.com/milvus-io/milvus/issues/27576
# Main Goals
1. Create and describe collections with geospatial fields, enabling both
client and server to recognize and process geo fields.
2. Insert geospatial data as payload values in the insert binlog, and
print the values for verification.
3. Load segments containing geospatial data into memory.
4. Ensure query outputs can display geospatial data.
5. Support filtering on GIS functions for geospatial columns.
# Solution
1. **Add Type**: Modify the Milvus core by adding a Geospatial type in
both the C++ and Go code layers, defining the Geospatial data structure
and the corresponding interfaces.
2. **Dependency Libraries**: Introduce necessary geospatial data
processing libraries. In the C++ source code, use Conan package
management to include the GDAL library. In the Go source code, add the
go-geom library to the go.mod file.
3. **Protocol Interface**: Revise the Milvus protocol to provide
mechanisms for Geospatial message serialization and deserialization.
4. **Data Pipeline**: Facilitate interaction between the client and
proxy using the WKT format for geospatial data. The proxy will convert
all data into WKB format for downstream processing, providing column
data interfaces, segment encapsulation, segment loading, payload
writing, and cache block management.
5. **Query Operators**: Implement simple display and support for filter
queries. Initially, focus on filtering based on spatial relationships
for a single column of geospatial literal values, providing parsing and
execution for query expressions.
6. **Client Modification**: Enable the client to handle user input for
geospatial data and facilitate end-to-end testing.Check the modification
in pymilvus.
---------
Signed-off-by: tasty-gumi <1021989072@qq.com>