issue: #45831
This PR fixes a race condition in `TestZillizClient` test cases where
`t.Logf`
could be called after the test function had returned, leading to a panic
or data race failure.
Root cause:
1. Incorrect defer order: `defer s.Stop()` was called after `defer
lis.Close()`.
This caused `lis.Accept()` to return an error before `s.Stop()` had set
the quit flag,
triggering the error handling path in the server goroutine.
2. Unsafe logging: The error handling path used `t.Logf` inside a
goroutine, which
is unsafe if the main test function exits before the log is printed.
Fixes:
- Changed defer order to ensure `s.Stop()` is called before
`lis.Close()`.
- Replaced `t.Logf` with `fmt.Printf` in the background goroutine to
avoid panic on test completion.
Signed-off-by: Li Liu <li.liu@zilliz.com>
issue: #45728
When mixcoord is in standby mode and shutdown is triggered, the
ProcessActiveStandBy goroutine may panic if context cancellation occurs.
This happens because the error handling didn't check for
context.Canceled errors before panicking.
Changes:
- Add context cancellation check in mix_coord Register() before panic
- Check s.ctx.Err() == context.Canceled and gracefully exit
- Remove unused ForceActiveStandby() function from session_util
This ensures standby mixcoord can shutdown gracefully without panic when
context is cancelled during the standby process.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Related to #44956
**New Translator (C++)**
- Added `ManifestGroupTranslator`
(`internal/core/src/segcore/storagev2translator/`)
- Translates manifest-based column groups to Milvus internal format
- Implements `GroupCTMeta` interface for chunk-based column access
- Supports both memory and mmap storage modes
- Handles cache warmup policies for vector and scalar data
**ChunkedSegmentSealedImpl**
(`internal/core/src/segcore/ChunkedSegmentSealedImpl.cpp:333`)
- Added `LoadColumnGroups(const std::string& manifest_path)`: Main entry
point for manifest-based loading
- Creates milvus-storage Reader from manifest file
- Parallelizes column group loading using thread pool
- Aggregates loading exceptions and reports errors
- Added `LoadColumnGroup()`: Loads individual column group
- Extracts field IDs from column group metadata
- Creates ManifestGroupTranslator for each column group
- Builds ProxyChunkColumn for field access
- Special handling for timestamp field index construction
**SegmentGrowingImpl**
(`internal/core/src/segcore/SegmentGrowingImpl.cpp`)
- Added similar `LoadColumnGroups()` and `LoadColumnGroup()` methods for
growing segments
- Maintains consistency with sealed segment loading path
Storage FFI Utilities
**loon_ffi/util** (`internal/core/src/storage/loon_ffi/util.cpp`)
- Added `MakeInternalPropertiesFromStorageConfig()`: Converts C storage
config to internal Properties
- Maps all storage configuration fields (S3, GCS, Azure, local)
- Handles SSL, IAM, virtual host settings
- Configures connection timeouts and max connections
- Added `MakeInternalLocalProperies()`: Creates local filesystem
properties
- Added `ToCStorageConfig()`: Converts Go StorageConfig to C
representation
- Added `GetColumnGroups()`: Extracts column groups from manifest file
using Transaction API
Protocol Buffer Changes
**segcore.proto** (`pkg/proto/segcore.proto:121`)
- Added `manifest_path` field to `SegmentLoadInfo` message
- Enables passing manifest file path from Go layer to C++ core
Go Integration
**segment.go** (`internal/util/segcore/segment.go:372`)
- Updated `ConvertToSegcoreSegmentLoadInfo()` to propagate
`ManifestPath` field
- Bridges QueryNode segment load info to Segcore format
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #44620
Related to unstable ut "internal/querycoordv2 TestServer/TestNodeUp"
Introduce SessionWatcher interface to fix race condition and goroutine
leak that caused unstable unit test TestServer/TestNodeUp.
Changes:
- Add SessionWatcher interface with EventChannel() and Stop() methods
- Refactor WatchServices() to return SessionWatcher instead of raw
channel
- Fix cleanup order in QueryCoordV2: stop watcher before session
- Update DataCoord, ConnectionManager to use SessionWatcher
- Add MockSessionWatcher for testing
Fixes race condition between session context cancellation and internal
loop exit. Eliminates goroutine leak by providing explicit lifecycle
management.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
https://github.com/milvus-io/milvus/issues/45544
- Add batch_factor configuration parameter (default: 5) to control
embedding provider batch sizes
- Add disable_func_runtime_check property to bypass function validation
during collection creation
- Add database interceptor support for AddCollectionFunction,
AlterCollectionFunction, and DropCollectionFunction requests
Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>
The tests were failing with "grpc: Server.RegisterService after
Server.Serve" because setupMockServer() was starting the gRPC server
before tests could register their services. gRPC requires all services
to be registered before Server.Serve() is called.
Changes:
- Remove s.Serve() from setupMockServer() helper function
- Add s.Serve() to each test after service registration
- Apply fix consistently to all 6 affected tests:
* TestZillizClient_Embedding
* TestZillizClient_Embedding_Error
* TestZillizClient_Rerank
* TestZillizClient_Rerank_Error
* TestNewZilliClient_WithMockServer
* TestZillizClient_Embedding_EmptyResponse
This follows the correct gRPC server lifecycle:
1. Create server
2. Register services
3. Start serving
Related to #44620
Case: "internal/util/function/models/zilliz TestZillizClient_Rerank"
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #45060
Refactor segment loading architecture to make segments autonomously
manage their own loading process, moving the orchestration logic from Go
(segment_loader.go) to C++ (segcore).
**C++ Layer (segcore):**
- Added `SetLoadInfo()` and `Load()` methods to `SegmentInterface` and
implementations
- Implemented `ChunkedSegmentSealedImpl::Load()` with parallel loading
strategy:
- Separates indexed fields from non-indexed fields
- Loads indexes concurrently using thread pools
- Loads field data for non-indexed fields in parallel
- Implemented `SegmentGrowingImpl::Load()` to convert and load field
data
- Extracted `LoadIndexData()` as a reusable utility function in
`Utils.cpp`
- Added `SegmentLoad()` C binding in `segment_c.cpp`
**Go Layer:**
- Added `Load()` method to segment interfaces
- Updated mock implementations and test interfaces
- Integrated new C++ `SegmentLoad()` binding in Go segment wrapper
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #45338
When using bulk vector search in hybrid search with rerank functions,
the output field values for different queries were all equal to the
values returned by the first query, instead of the correct values
belonging to each document ID. The document IDs were correct, but the
entity field values were wrong.
In rerank functions (RRF, weighted, decay, model), when processing
multiple queries in a batch, the `idLocations` stored only the relative
offset within each result set (`idx`), not accounting for the absolute
position within the entire batch. This caused `FillFieldData` to
retrieve field data from the wrong positions, always using offsets
relative to the first query.
This fix ensures that when processing bulk searches with rerank
functions, each result correctly retrieves its corresponding field data
based on the absolute offset within the entire batch, resolving the
issue where all queries returned the first query's field values.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
This PR changes the config layout according to the latest design, and
adds two external credential configs for aws kms
See also: #45169
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
Currently, the index type AiSAQ RAM usage estimation is not being
calculated correctly.
AiSAQ index type consumes less RAM usage while loading the index than
DISKANN does, and the query node module is missing the implementation of
the RAM usage estimation for that AiSAQ index type.
We suggest that the AiSAQ RAM usage estimation calculation should be as
follows:
UsedDiskMemoryRatioAisaq = 1024 (contrary to the UsedDiskMemoryRatio,
which is 4)
neededMemSize = indexInfo.IndexSize / UsedDiskMemoryRatioAisaq
neededDiskSize = indexInfo.IndexSize
Reported issue is #45247
---------
Signed-off-by: Lior Friedman <lior.friedman@il.kioxia.com>
Signed-off-by: friedl <lior.friedman@kioxia.com>
Co-authored-by: friedl <lior.friedman@kioxia.com>
issue: #43897
- Load/Release collection/partition is implemented by WAL-based DDL
framework now.
- Support AlterLoadConfig/DropLoadConfig in wal now.
- Load/Release operation can be synced by new CDC now.
- Refactor some UT for load/release DDL.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
relate: https://github.com/milvus-io/milvus/issues/43687
Add global analyzer options, avoid having to merge some milvus params
into user's analyzer params.
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
This commit introduces the foundation for enabling segments to manage
their own loading process by passing load information during segment
creation.
Changes:
C++ Layer:
- Add NewSegmentWithLoadInfo() C API to create segments with serialized
load info
- Add SetLoadInfo() method to SegmentInterface for storing load
information
- Refactor segment creation logic into shared CreateSegment() helper
function
- Add comprehensive documentation for the new API
Go Layer:
- Extend CreateCSegmentRequest to support optional LoadInfo field
- Update segment creation in querynode to pass SegmentLoadInfo when
available
- Add ConvertToSegcoreSegmentLoadInfo() and helper converters for proto
translation
Proto Definitions:
- Add segcorepb.SegmentLoadInfo message with essential loading metadata
- Add supporting messages: Binlog, FieldBinlog, FieldIndexInfo,
TextIndexStats, JsonKeyStats
- Remove dependency on data_coord.proto by creating segcore-specific
definitions
Testing:
- Add comprehensive unit tests for proto conversion functions
- Test edge cases including nil inputs, empty data, and nil array/map
elements
This is the first step toward issue #45060 - enabling segments to
autonomously manage their loading process, which will:
- Clarify responsibilities between Go and C++ layers
- Reduce cross-language call overhead
- Enable precise resource management at the C++ level
- Support better integration with caching layer
- Enable proactive schema evolution handling
Related to #45060
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/45006
ref: https://github.com/milvus-io/milvus/issues/42148
Previsouly, the parquet import is implemented based on that the STRUCT
in the parquet files is hanlded in the way that each field in struct is
stored in a single column.
However, in the user's perspective, the array of STRUCT contains data is
something like STRUCT_A:
for one row, [struct{field1_1, field2_1, field3_1}, struct{field1_2,
field2_2, field3_2}, ...], rather than {[field1_1, field1_2, ...],
[field2_1, field2_2, ...], [field3_1, field3_2, field3_3, ...]}.
This PR fixes this.
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
issue: https://github.com/milvus-io/milvus/issues/44399
This PR implements STL_SORT for VARCHAR data type for both RAM and MMAP
mode.
The general idea is that we deduplicate field values and maintains a
posting list for each unique value.
The serialization format of the index is:
```
[unique_count][string_offsets][string_data][post_list_offsets][post_list_data][magic_code]
string_offsets: array of offsets into string_data section
string_data: str_len1, str1, str_len2, str2, ...
post_list_offsets: array of offsets into post_list_data section
post_list_data: post_list_len1, row_id1, row_id2, ..., post_list_len2, row_id1, row_id2, ...
```
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
relate: https://github.com/milvus-io/milvus/issues/43687
We used to run the temporary analyzer and validate analyzer on the
proxy, but the proxy should not be a computation-heavy node. This PR
move all analyzer calculations to the streaming node.
---------
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
issue: #44909
When requery optimization is enabled, search results contain IDs
but empty FieldsData. During reduce/rerank operations, if the
first shard has empty FieldsData while others have data,
PrepareResultFieldData initializes an empty array, causing
AppendFieldData to panic when accessing array indices.
Changes:
- Find first non-empty FieldsData as template in 3 functions:
reduceAdvanceGroupBy, reduceSearchResultDataWithGroupBy,
reduceSearchResultDataNoGroupBy
- Add length check before 2 AppendFieldData calls in reduce
functions to prevent panic
- Improve newRerankOutputs to find first non-empty fieldData
using len(FieldsData) check instead of GetSizeOfIDs
- Add length check in appendResult before AppendFieldData
- Add comprehensive unit tests for empty and partial empty
FieldsData scenarios in both reduce and rerank functions
This fix handles both pure requery (all empty) and mixed
scenarios (some empty, some with data) without breaking normal
search flow. The key improvement is checking FieldsData length
directly rather than IDs, as requery may have IDs but empty
FieldsData.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #44014
- On standalone, the query node inside need to load segment and watch
channel, so the querynode is not a embeded querynode in streamingnode
without `LabelStreamingNodeEmbeddedQueryNode`. The channel dist manager
can not confirm a standalone node is a embededStreamingNode.
Bug is introduced by #44099
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #44730
Fix the issue where logs were not outputting as expected due to
incorrect log package imports across multiple components.
Changes include:
- Add golangci-lint rule to forbid github.com/pingcap/log usage
- Replace github.com/pingcap/log with
github.com/milvus-io/milvus/pkg/v2/log
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Related to #44620
Make unittest stable by waiting eventch instead of async read
len(eventCh), which is unstable
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #44596
- querynode already init the storage v2 and segcore, so streamingnode
should not do this again.
- It also fix the gcp object storage access denied.
Signed-off-by: chyezh <chyezh@outlook.com>