Related to #45910
When IndexNodeBinding mode is enabled, DataCoord skips session watching
for datanodes but the dnSessionWatcher field remains nil. This causes a
panic when other code attempts to access the watcher.
This fix introduces an EmptySessionWatcher as a placeholder for the
IndexNodeBinding mode scenario. The empty watcher implements the
SessionWatcher interface with no-op methods, preventing nil pointer
dereferences while maintaining the expected interface contract.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #44956
- Update milvus-storage version to ba7df7b for chunk reader fix
- Pass manifest path to index build request in DataCoord/DataNode
- Add null chunk assertion with detailed debug info in
ManifestGroupTranslator
- Fix memory corruption by removing premature transaction handle
destruction
- Clean up log message in ChunkedSegmentSealedImpl
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/45691
Add persistent task management for external collections with automatic
detection of external_source and external_spec changes. When source
changes, the system aborts running tasks and creates new ones, ensuring
only one active task per collection. Tasks validate their source on
completion to prevent superseded tasks from committing results.
---------
Signed-off-by: sunby <sunbingyi1992@gmail.com>
Skip redundant failure marking for completed import jobs when the
collection is dropped or import jobs are timeout.
issue: https://github.com/milvus-io/milvus/issues/45766
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Related #44956
This commit integrates the Storage V2 FFI (Foreign Function Interface)
interface throughout the Milvus codebase, enabling unified storage
access through the Loon FFI layer. This is a significant step towards
standardizing storage operations across different storage versions.
1. Configuration Support
- **configs/milvus.yaml**: Added `useLoonFFI` configuration flag under
`common.storage.file.splitByAvgSize` section
- Allows runtime toggle between traditional binlog readers and new
FFI-based manifest readers
- Default: `false` (maintains backward compatibility)
2. Core FFI Infrastructure
Enhanced Utilities (internal/core/src/storage/loon_ffi/util.cpp/h)
- **ToCStorageConfig()**: Converts Go's `StorageConfig` to C's
`CStorageConfig` struct for FFI calls
- **GetManifest()**: Parses manifest JSON and retrieves latest column
groups using FFI
- Accepts manifest path with `base_path` and `ver` fields
- Calls `get_latest_column_groups()` FFI function
- Returns column group information as string
- Comprehensive error handling for JSON parsing and FFI errors
3. Dependency Updates
- **internal/core/thirdparty/milvus-storage/CMakeLists.txt**:
- Updated milvus-storage version from `0883026` to `302143c`
- Ensures compatibility with latest FFI interfaces
4. Data Coordinator Changes
All compaction task builders now include manifest path in segment
binlogs:
- **compaction_task_clustering.go**: Added `Manifest:
segInfo.GetManifestPath()` to segment binlogs
- **compaction_task_l0.go**: Added manifest path to both L0 segment
selection and compaction plan building
- **compaction_task_mix.go**: Added manifest path to mixed compaction
segment binlogs
- **meta.go**: Updated metadata completion logic:
- `completeClusterCompactionMutation()`: Set `ManifestPath` in new
segment info
- `completeMixCompactionMutation()`: Preserve manifest path in compacted
segments
- `completeSortCompactionMutation()`: Include manifest path in sorted
segments
5. Data Node Compactor Enhancements
All compactors updated to support dual-mode reading (binlog vs
manifest):
6. Flush & Sync Manager Updates
Pack Writer V2 (pack_writer_v2.go)
- **BulkPackWriterV2.Write()**: Extended return signature to include
`manifest string`
- Implementation:
- Generate manifest path: `path.Join(pack.segmentID, "manifest.json")`
- Write packed data using FFI-based writer
- Return manifest path along with binlogs, deltas, and stats
Task Handling (task.go)
- Updated all sync task result handling to accommodate new manifest
return value
- Ensured backward compatibility for callers not using manifest
7. Go Storage Layer Integration
New Interfaces and Implementations
- **record_reader.go**: Interface for unified record reading across
storage versions
- **record_writer.go**: Interface for unified record writing across
storage versions
- **binlog_record_writer.go**: Concrete implementation for traditional
binlog-based writing
Enhanced Schema Support (schema.go, schema_test.go)
- Schema conversion utilities to support FFI-based storage operations
- Ensures proper Arrow schema mapping for V2 storage
Serialization Updates
- **serde.go, serde_events.go, serde_events_v2.go**: Updated to work
with new reader/writer interfaces
- Test files updated to validate dual-mode serialization
8. Storage V2 Packed Format
FFI Common (storagev2/packed/ffi_common.go)
- Common FFI utilities and type conversions for packed storage format
Packed Writer FFI (storagev2/packed/packed_writer_ffi.go)
- FFI-based implementation of packed writer
- Integrates with Loon storage layer for efficient columnar writes
Packed Reader FFI (storagev2/packed/packed_reader_ffi.go)
- Already existed, now complemented by writer implementation
9. Protocol Buffer Updates
data_coord.proto & datapb/data_coord.pb.go
- Added `manifest` field to compaction segment messages
- Enables passing manifest metadata through compaction pipeline
worker.proto & workerpb/worker.pb.go
- Added compaction parameter for `useLoonFFI` flag
- Allows workers to receive FFI configuration from coordinator
10. Parameter Configuration
component_param.go
- Added `UseLoonFFI` parameter to compaction configuration
- Reads from `common.storage.file.useLoonFFI` config path
- Default: `false` for safe rollout
11. Test Updates
- **clustering_compactor_storage_v2_test.go**: Updated signatures to
handle manifest return value
- **mix_compactor_storage_v2_test.go**: Updated test helpers for
manifest support
- **namespace_compactor_test.go**: Adjusted writer calls to expect
manifest
- **pack_writer_v2_test.go**: Validated manifest generation in pack
writing
This integration follows a **dual-mode approach**:
1. **Legacy Path**: Traditional binlog-based reading/writing (when
`useLoonFFI=false` or no manifest)
2. **FFI Path**: Manifest-based reading/writing through Loon FFI (when
`useLoonFFI=true` and manifest exists)
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #44620
Related to unstable ut "internal/querycoordv2 TestServer/TestNodeUp"
Introduce SessionWatcher interface to fix race condition and goroutine
leak that caused unstable unit test TestServer/TestNodeUp.
Changes:
- Add SessionWatcher interface with EventChannel() and Stop() methods
- Refactor WatchServices() to return SessionWatcher instead of raw
channel
- Fix cleanup order in QueryCoordV2: stop watcher before session
- Update DataCoord, ConnectionManager to use SessionWatcher
- Add MockSessionWatcher for testing
Fixes race condition between session context cancellation and internal
loop exit. Eliminates goroutine leak by providing explicit lifecycle
management.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
https://github.com/milvus-io/milvus/issues/45544
- Add batch_factor configuration parameter (default: 5) to control
embedding provider batch sizes
- Add disable_func_runtime_check property to bypass function validation
during collection creation
- Add database interceptor support for AddCollectionFunction,
AlterCollectionFunction, and DropCollectionFunction requests
Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>
When there're no growing segments in the collection, L0 Compaction will
try to choose all L0 segments that hits all L1/L2 segments.
However, if there's Sealed Segment still under flushing in DataNode at
the same time L0 Compaction selects satisfied L1/L2 segments, L0
Compaction will ignore this Segment because it's not in "FlushState",
which is wrong, causing missing deletes on the Sealed Segment.
This quick solution here is to fail this L0 compaction task once
selected a Sealed segment.
See also: #45339
---------
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
issue: #45080, #45274, #45285
- LoadCollection doesn't ignore the ignorable request, for false field
array.
- CreatIndex doesn't ignore the ignorable request, for wrong index.
- index meta is not thread safe.
- lost parameter check of DDL.
- DDL Ack scheduler may get stuck and DDL is block until next incoming
DDL.
- lost parameter checker of ddl
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #43897
- Part of collection/index related DDL is implemented by WAL-based DDL
framework now.
- Support following message type in wal, CreateCollection,
DropCollection, CreatePartition, DropPartition, CreateIndex, AlterIndex,
DropIndex.
- Part of collection/index related DDL can be synced by new CDC now.
- Refactor some UT for collection/index DDL.
- Add Tombstone scheduler to manage the tombstone GC for collection or
partition meta.
- Move the vchannel allocation into streaming pchannel manager.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
Related to #44956
Add manifest_path field throughout the data path to support LOON Storage
V2 manifest tracking. The manifest stores metadata for segment data
files and enables the unified Storage V2 FFI interface.
Changes include:
- Add manifest_path field to SegmentInfo and SaveBinlogPathsRequest
proto messages
- Add UpdateManifest operator to datacoord meta operations
- Update metacache, sync manager, and meta writer to propagate manifest
paths
- Include manifest_path in segment load info for query coordinator
This is part of the Storage V2 FFI interface integration.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
relate: https://github.com/milvus-io/milvus/issues/43687
We used to run the temporary analyzer and validate analyzer on the
proxy, but the proxy should not be a computation-heavy node. This PR
move all analyzer calculations to the streaming node.
---------
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
1. Import in replicating cluster is not supported yet, so disable it for
now.
2. Remove GetReplicateConfiguration wal API
issue: https://github.com/milvus-io/milvus/issues/44123
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
This commit addresses issue #44788 where the
`datacoord_stored_binlog_size` metric could become inaccurate when
multiple concurrent `GetMetrics` calls arrived at DataCoord.
### Problem
The original implementation called `Reset()` followed by `Add()`
operations on Prometheus metrics within the `GetQuotaInfo()` method.
When multiple goroutines invoked this method concurrently, race
conditions occurred:
- Thread 1: Reset() → Add(value1)
- Thread 2: Reset() → Add(value2)
- Result: Metrics could be reset multiple times and values added in an
interleaved fashion, leading to inaccurate and inflated metric values
### Solution
Changed the approach from `Reset() + Add()` to aggregating metric values
in local maps first, then using `Set()` to update metrics atomically:
1. Collect segment size data into local maps:
- `storedBinlogSize`: tracks size per collection per segment state
- `binlogFileSize`: tracks total file count per collection
- `coll2DbName`: maps collection IDs to database names
2. After aggregation is complete, use `Set()` (instead of `Add()`) to
update metrics in a single operation per label combination
This ensures that concurrent `GetMetrics` calls don't interfere with
each other, as each invocation works with its own local state and only
updates the final metric value atomically.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #41611
- After enabling streaming arch, channel manager of data coord is a
redundant component.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #43897
- Return LastConfirmedMessageID when wal append operation.
- Add resource-key-based locker for broadcast-ack operation to protect
the coord state when executing ddl.
- Resource-key-based locker is held until the broadcast operation is
acked.
- ResourceKey support shared and exclusive lock.
- Add FastAck execute ack right away after the broadcast done to speed
up ddl.
- Ack callback will support broadcast message result now.
- Add tombstone for broadcaster to avoid to repeatedly commit DDL and
ABA issue.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #44156
Enhance FlushAll functionality to support targeting specific collections
within databases instead of only database-level flushing.
Changes include:
- Add FlushAllTarget message in data_coord.proto for granular targeting
- Support collection-specific flush operations within databases
- Maintain backward compatibility with deprecated db_name field
This enhancement allows users to flush specific collections without
affecting other collections in the same database, providing more precise
control over data persistence operations.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #43897
- add ddl messages proto and add some message utilities.
- support shard/exclusive resource-key-lock.
- add all ddl callbacks future into broadcast registry.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: https://github.com/milvus-io/milvus/issues/42148
Optimized from
Go VectorArray → VectorArray Proto → Binary → C++ VectorArray Proto →
C++ VectorArray local impl → Memory
to
Go VectorArray → Arrow ListArray → Memory
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
issue: #42942
This pr includes the following changes:
1. Added checks for index checker in querycoord to generate drop index
tasks
2. Added drop index interface to querynode
3. To avoid search failure after dropping the index, the querynode
allows the use of lazy mode (warmup=disable) to load raw data even when
indexes contain raw data.
4. In segcore, loading the index no longer deletes raw data; instead, it
evicts it.
5. In expr, the index is pinned to prevent concurrent errors.
---------
Signed-off-by: sunby <sunbingyi1992@gmail.com>