issue: #44320
Replace the DeduplicateFieldData function with CheckDuplicatePkExist
that returns an error when duplicate primary keys are detected in the
same batch, instead of silently deduplicating.
Changes:
- Replace DeduplicateFieldData with CheckDuplicatePkExist in util.go
- Update upsertTask.PreExecute to return error on duplicate PKs
- Simplify helper function from findLastOccurrenceIndices to
hasDuplicates
- Update unit tests to verify the new error behavior
- Add Python integration tests for duplicate PK error cases
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Previously, search with highlight only supported using BM25 search text
as the highlight target.
This PR adds support for highlighting with user-defined queries.
relate: https://github.com/milvus-io/milvus/issues/42589
---------
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/45691
Add persistent task management for external collections with automatic
detection of external_source and external_spec changes. When source
changes, the system aborts running tasks and creates new ones, ensuring
only one active task per collection. Tasks validate their source on
completion to prevent superseded tasks from committing results.
---------
Signed-off-by: sunby <sunbingyi1992@gmail.com>
https://github.com/milvus-io/milvus/issues/45544
- Add batch_factor configuration parameter (default: 5) to control
embedding provider batch sizes
- Add disable_func_runtime_check property to bypass function validation
during collection creation
- Add database interceptor support for AddCollectionFunction,
AlterCollectionFunction, and DropCollectionFunction requests
Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>
issue: #44320
This change adds deduplication logic to handle duplicate primary keys
within a single upsert batch, keeping the last occurrence of each
primary key.
Key changes:
- Add DeduplicateFieldData function to remove duplicate PKs from field
data, supporting both Int64 and VarChar primary keys
- Refactor fillFieldPropertiesBySchema into two separate functions:
validateFieldDataColumns for validation and fillFieldPropertiesOnly for
property filling, improving code clarity and reusability
- Integrate deduplication logic in upsertTask.PreExecute to
automatically deduplicate data before processing
- Add comprehensive unit tests for deduplication with various PK types
(Int64, VarChar) and field types (scalar, vector)
- Add Python integration tests to verify end-to-end behavior
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #43897
- Part of collection/index related DDL is implemented by WAL-based DDL
framework now.
- Support following message type in wal, CreateCollection,
DropCollection, CreatePartition, DropPartition, CreateIndex, AlterIndex,
DropIndex.
- Part of collection/index related DDL can be synced by new CDC now.
- Refactor some UT for collection/index DDL.
- Add Tombstone scheduler to manage the tombstone GC for collection or
partition meta.
- Move the vchannel allocation into streaming pchannel manager.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
relate: https://github.com/milvus-io/milvus/issues/43687
We used to run the temporary analyzer and validate analyzer on the
proxy, but the proxy should not be a computation-heavy node. This PR
move all analyzer calculations to the streaming node.
---------
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
Related to #44761#38339
This commit consolidates the proxy package's mockery generation to use a
centralized `.mockery.yaml` configuration file, aligning with the
pattern used by other packages like querycoordv2.
Changes
- **Makefile**: Replace multiple individual mockery commands with a
single config-based invocation for `generate-mockery-proxy` target
- **internal/proxy/.mockery.yaml**: Add mockery configuration defining
all mock interfaces for proxy and proxy/shardclient packages
- **Mock files**: Regenerate mocks using the new configuration:
- `mock_cache.go`: Clean up by removing unused interface methods
(credential, shard cache, policy methods)
- `shardclient/mock_lb_balancer.go`: Update type comments (nodeInfo →
NodeInfo)
- `shardclient/mock_lb_policy.go`: Update formatting
- `shardclient/mock_shardclient_manager.go`: Fix parameter naming
consistency (nodeInfo1 → nodeInfo)
- **task_search_test.go**: Remove obsolete mock expectations for
deprecated cache methods
Benefits
- Centralized mockery configuration for easier maintenance
- Consistent with other packages (querycoordv2, etc.)
- Cleaner mock interfaces by removing unused methods
- Better type consistency in generated mocks
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #44761
Refactor proxy shard client management by creating a new
internal/proxy/shardclient package. This improves code organization and
modularity by:
- Moving load balancing logic (LookAsideBalancer, RoundRobinBalancer) to
shardclient package
- Extracting shard client manager and related interfaces into separate
package
- Relocating shard leader management and client lifecycle code
- Adding package documentation (README.md, OWNERS)
- Updating proxy code to use the new shardclient package interfaces
This change makes the shard client functionality more maintainable and
better encapsulated, reducing coupling in the proxy layer.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #44697, #44696
- The DDL executing order of secondary keep same with order of control
channel timetick now.
- filtering the control channel operation on shard manager of
streamingnode to avoid wrong vchannel of create segment.
- fix that the immutable txn message lost replicate header.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #44909
When requery optimization is enabled, search results contain IDs
but empty FieldsData. During reduce/rerank operations, if the
first shard has empty FieldsData while others have data,
PrepareResultFieldData initializes an empty array, causing
AppendFieldData to panic when accessing array indices.
Changes:
- Find first non-empty FieldsData as template in 3 functions:
reduceAdvanceGroupBy, reduceSearchResultDataWithGroupBy,
reduceSearchResultDataNoGroupBy
- Add length check before 2 AppendFieldData calls in reduce
functions to prevent panic
- Improve newRerankOutputs to find first non-empty fieldData
using len(FieldsData) check instead of GetSizeOfIDs
- Add length check in appendResult before AppendFieldData
- Add comprehensive unit tests for empty and partial empty
FieldsData scenarios in both reduce and rerank functions
This fix handles both pure requery (all empty) and mixed
scenarios (some empty, some with data) without breaking normal
search flow. The key improvement is checking FieldsData length
directly rather than IDs, as requery may have IDs but empty
FieldsData.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #44648
If the value is `null` during insertion, it will be omitted instead of
being filled with nil. Therefore, when performing checks, there’s no
need to retrieve data based on the valid offset.
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
issue: #44800
This commit enhances the upsert and validation logic to properly handle
nullable Geometry (WKT/WKB) and Timestamptz data types:
- Add ToCompressedFormatNullable support for TimestamptzData,
GeometryWktData, and GeometryData to filter out null values during data
compression
- Implement GenNullableFieldData for Timestamptz and Geometry types to
generate nullable field data structures
- Update FillWithNullValue to handle both GeometryData and
GeometryWktData with null value filling logic
- Add UpdateFieldData support for Timestamptz, GeometryData, and
GeometryWktData field updates
- Comprehensive unit tests covering all new data type handling scenarios
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Related to #44761
This commit refactors the privilege management system in the proxy
component by:
1. **Separation of Concerns**: Extracts privilege-related functionality
from MetaCache into a dedicated `internal/proxy/privilege` package,
improving code organization and maintainability.
2. **New Package Structure**: Creates `internal/proxy/privilege/` with:
- `cache.go`: Core privilege cache implementation (PrivilegeCache)
- `result_cache.go`: Privilege enforcement result caching
- `model.go`: Casbin model and policy enforcement functions
- `meta_cache_adapter.go`: Casbin adapter for MetaCache integration
- Corresponding test files and mock implementations
3. **MetaCache Simplification**: Removes privilege and credential
management methods from MetaCache interface and implementation:
- Removed: GetCredentialInfo, RemoveCredential, UpdateCredential
- Removed: GetPrivilegeInfo, GetUserRole, RefreshPolicyInfo,
InitPolicyInfo
- Deleted: meta_cache_adapter.go, privilege_cache.go and their tests
4. **Updated References**: Updates all callsites to use the new
privilegeCache global:
- Authentication interceptor now uses privilegeCache for password
verification
- Credential cache operations (InvalidateCredentialCache,
UpdateCredentialCache, UpdateCredential) now use privilegeCache
- Policy refresh operations (RefreshPolicyInfoCache) now use
privilegeCache
- Privilege interceptor uses new privilege.GetEnforcer() and privilege
result cache
5. **Improved API**: Renames cache functions for clarity:
- GetPrivilegeCache → GetResultCache
- SetPrivilegeCache → SetResultCache
- CleanPrivilegeCache → CleanResultCache
This refactoring makes the codebase more modular, separates privilege
management concerns from general metadata caching, and provides a
clearer API for privilege enforcement operations.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #36672
Add accesslog field displaying value length for search/query request may
help developers debug related issues
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>