milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-07 19:31:51 +08:00

Author	SHA1	Message	Date
congqixia	01da5010f5	enhance: Add MAP_POPULATE flag for mmap to reduce page faults (#46761 ) Add configurable MAP_POPULATE flag support for mmap operations to reduce page faults and improve first read performance. Key changes: - Add `queryNode.mmap.populate` config (default: true) to control MAP_POPULATE flag usage - Add `mmap_populate` parameter to MmapChunkTarget, ChunkTranslator, GroupChunkTranslator, and ManifestGroupTranslator - Apply MAP_POPULATE to both MmapChunkTarget and MemChunkTarget - Propagate mmap_populate setting through chunk creation pipeline When enabled, MAP_POPULATE pre-faults the mapped pages into memory, eliminating page faults during subsequent access and improving query performance for the first read operations. issue: #46760 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2026-01-05 18:57:24 +08:00
cai.zhang	a16d04f5d1	feat: Support ttl field for entity level expiration (#46342 ) issue： #46033 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Pull Request Summary: Entity-Level TTL Field Support ### Core Invariant and Design This PR introduces per-entity TTL (time-to-live) expiration via a dedicated TIMESTAMPTZ field as a fine-grained alternative to collection-level TTL. The key invariant is mutual exclusivity: collection-level TTL and entity-level TTL field cannot coexist on the same collection. Validation is enforced at the proxy layer during collection creation/alteration (`validateTTL()` prevents both being set simultaneously). ### What Is Removed and Why - Global `EntityExpirationTTL` parameter removed from config (`configs/milvus.yaml`, `pkg/util/paramtable/component_param.go`). This was the only mechanism for collection-level expiration. The removal is safe because: - The collection-level TTL path (`isEntityExpired(ts)` check) remains intact in the codebase for backward compatibility - TTL field check (`isEntityExpiredByTTLField()`) is a secondary path invoked only when a TTL field is configured - Existing deployments using collection TTL can continue without modification The global parameter was removed specifically because entity-level TTL makes per-entity control redundant with a collection-wide setting, and the PR chooses one mechanism per collection rather than layering both. ### No Data Loss or Behavior Regression TTL filtering logic is additive and safe: 1. Collection-level TTL unaffected: The `isEntityExpired(ts)` check still applies when no TTL field is configured; callers of `EntityFilter.Filtered()` pass `-1` as the TTL expiration timestamp when no field exists, causing `isEntityExpiredByTTLField()` to return false immediately 2. Null/invalid TTL values treated safely: Rows with null TTL or TTL ≤ 0 are marked as "never expire" (using sentinel value `int64(^uint64(0) >> 1)`) and are preserved across compactions; percentile calculations only include positive TTL values 3. Query-time filtering automatic: TTL filtering is transparently added to expression compilation via `AddTTLFieldFilterExpressions()`, which appends `(ttl_field IS NULL OR ttl_field > current_time)` to the filter pipeline. Entities with null TTL always pass the filter 4. Compaction triggering granular: Percentile-based expiration (20%, 40%, 60%, 80%, 100%) allows configurable compaction thresholds via `SingleCompactionRatioThreshold`, preventing premature data deletion ### Capability Added: Per-Entity Expiration with Data Distribution Awareness Users can now specify a TIMESTAMPTZ collection property `ttl_field` naming a schema field. During data writes, TTL values are collected per segment and percentile quantiles (5-value array) are computed and stored in segment metadata. At query time, the TTL field is automatically filtered. At compaction time, segment-level percentiles drive expiration-based compaction decisions, enabling intelligent compaction of segments where a configurable fraction of data has expired (e.g., compact when 40% of rows are expired, controlled by threshold ratio). <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2026-01-05 10:27:24 +08:00
Zhen Ye	27525d57cc	enhance: add glog sink to transfer cgo log into zap (#46721 ) issue: #45640 - After async logging, the C log and go log has no order promise, meanwhile the C log format is not consistent with Go Log; so we close the output of glog, just forward the log result operation into Go side which will be handled by the async zap logger. - Use CGO to filter all cgo logging and promise the order between c log and go log. - Also fix the metric name, add new metric to count the logging. - TODO: after woodpecker use the logger of milvus, we can add bigger buffer for logging. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - Core invariant: all C (glog) and Go logs must be routed through the same zap async pipeline so ordering and formatting are preserved; this PR ensures every glog emission is captured and forwarded to zap before any async buffering diverges the outputs. - Logic removed/simplified: direct glog outputs and hard stdout/stderr/log_dir settings are disabled (configs/glog.conf and flags in internal/core/src/config/ConfigKnowhere.cpp) because they are redundant once a single zap sink handles all logs; logging metrics were simplified from per-length/volatile gauges to totalized counters (pkg/metrics/logging_metrics.go & pkg/log/), removing duplicate length-tracking and making accounting consistent. - No data loss or behavior regression (concrete code paths): Google logging now adds a GoZapSink (internal/core/src/common/logging_c.h, logging_c.cpp) that calls the exported CGO bridge goZapLogExt (internal/util/cgo/logging/logging.go). Go side uses C.GoStringN/C.GoString to capture full message and file, maps glog severities to zapcore levels, preserves caller info, and writes via the existing zap async core (same write path used by Go logs). The C++ send() trims glog's trailing newline and forwards exact buffers/lengths, so message content, file, line, and severity are preserved and serialized through the same async writer—no log entries are dropped or reordered relative to Go logs. - Capability added (where it takes effect): a CGO bridge that forwards glog into zap—new Go-exported function goZapLogExt (internal/util/cgo/logging/logging.go), a GoZapSink in C++ that forwards glog sends (internal/core/src/common/logging_c.h/.cpp), and blank imports of the cgo initializer across multiple packages (various internal/ files) to ensure the bridge is registered early so all C logs are captured. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: chyezh <chyezh@outlook.com>	2026-01-04 14:45:23 +08:00
Zhen Ye	ca8740c7c0	fix: remove redundant log (#46695 ) issue: #45841 - CPP log make the multi log line in one debug, remove the "\n\t". - remove some log that make no sense. - slow down some log like ChannelDistManager. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - Core invariant: logging is purely observational — this PR only reduces, consolidates, or reformats diagnostic output (removing per-item/noise logs, consolidating batched logs, and converting multi-line log strings) while preserving all control flow, return values, and state mutations across affected code paths. - Removed / simplified logic: deleted low-value per-operation debug/info logs (e.g., ListIndexes, GetRecoveryInfo, GcConfirm, push-to-reorder-buffer, several streaming/wal/debug traces), replaced per-item inline logs with single batched deferred logs in querynodev2/delegator (logExcludeInfo) and CleanInvalid, changed C++ PlanNode ToString() multi-line output to compact single-line bracketed format (removed "\n\t"), and added thresholded interceptor logging (InterceptorMetrics.ShouldBeLogged) and message-type-driven log levels to avoid verbose entries. - Why this does NOT cause data loss or behavioral regression: no function signatures, branching, state updates, persistence calls, or return values were changed — examples: ListIndexes still returns the same Status/IndexInfos; GcConfirm still constructs and returns resp.GetGcFinished(); Insert and CleanInvalid still perform the same insert/removal operations (only their per-item logging was aggregated); PlanNode ToString changes only affect emitted debug strings. All error handling and control flow paths remain intact. - Enhancement intent: reduce log volume and improve signal-to-noise for debugging by removing redundant, noisy logs and emitting concise, rate-/threshold-limited summaries while preserving necessary diagnostics and original program behavior. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-12-31 15:35:21 +08:00
Buqian Zheng	724598d231	fix: handle mixed int64/float types in BinaryRangeExpr for JSON fields (#46681 ) test: add unit tests for mixed int64/float types in BinaryRangeExpr When processing binary range expressions (e.g., `x > 499 && x <= 512.0`) on JSON/dynamic fields with expression templates, the lower and upper bounds could have different numeric types (int64 vs float64). This caused an assertion failure in GetValueFromProto when the template type didn't match the actual proto value type. Fixes: 1. Go side (fill_expression_value.go): Normalize numeric types for JSON fields - if either bound is float and the other is int, convert the int to float. 2. C++ side (BinaryRangeExpr.cpp): - Check both lower_val and upper_val types when dispatching - Use double template when either bound is float - Use GetValueWithCastNumber instead of GetValueFromProto to safely handle int64->double conversion issue: #46588 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - Core invariant: JSON field binary-range expressions must present numeric bounds to the evaluator with a consistent numeric type; if either bound is floating-point, both bounds must be treated as double to avoid proto-type mismatches during template instantiation. - Bug fix (issue #46588 & concrete change): mixed int64/float bounds could dispatch the wrong template (e.g., ExecRangeVisitorImplForJson<int64_t>) and trigger assertions in GetValueFromProto. Fixes: (1) Go parser (FillBinaryRangeExpressionValue in fill_expression_value.go) normalizes mixed JSON numeric bounds by promoting the int bound to float; (2) C++ evaluator (PhyBinaryRangeFilterExpr::Eval in BinaryRangeExpr.cpp) inspects both lower_type and upper_type, sets use_double when either is float, selects ExecRangeVisitorImplForJson<double> for mixed numeric cases, and replaces GetValueFromProto with GetValueWithCastNumber so int64→double conversions are handled safely. - Removed / simplified logic: the previous evaluator branched on only the lower bound's proto type and had separate index/non-index handling for int64 vs float; that per-bound branching is replaced by unified numeric handling (convert to double when needed) and a single numeric path for index use — eliminating redundant, error-prone branches that assumed homogeneous bound types. - No data loss or regression: changes only promote int→double for JSON-range comparisons when the other bound is float; integer-only and float-only paths remain unchanged. Promotion uses IEEE double (C++ double and Go float64) and only affects template dispatch and value-extraction paths; GetValueWithCastNumber safely converts int64 to double and index/non-index code paths both normalize consistently, preserving semantics for comparisons and avoiding assertion failures. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-12-31 11:52:24 +08:00
zhagnlu	031acf5711	enhance: convert jsonstats translator to bson_index translator (#45036 ) issue: #42533 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-12-31 10:39:21 +08:00
cai.zhang	b13aac5164	fix: Include fieldID in raw data cleanup to prevent delete other fields (#46688 ) issue: #46687 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - Core invariant: raw-data cleanup must be scoped to (segment_id, field_id) so deleting temporary raw files for one field never removes raw files for other fields in the same segment (prevents cross-field deletion during index builds). - Root cause and fix (bug): VectorDiskIndex::Build() and BuildWithDataset() called RemoveDir on the segment-level path; this removed rawdata/{segment_id}/. The fix changes both calls to remove storage::GenFieldRawDataPathPrefix(local_chunk_manager, segment_id, field_id) instead, limiting cleanup to rawdata/{segment_id}_{field_id}/ (field-scoped). - Logic removed/simplified: the old helper GetSegmentRawDataPathPrefix was removed and callers were switched to GenFieldRawDataPathPrefix; cleanup logic is simplified from segment-level to field-level path generation and removal, eliminating redundant broad deletions. - Why this does NOT cause data loss or regress behavior: the change narrows RemoveDir() to the exact field path used when caching raw data and offsets earlier in Build (offsets_path and CacheRawDataToDisk produce field-scoped local paths). Build still writes/reads offsets and raw data from GenFieldRawDataPathPrefix(...) and then removes that same prefix after successful index.Build(); therefore only temporary files for the built field are deleted and other fields’ raw files under the same segment are preserved. This fixes issue #46687 by preventing accidental deletion of other fields’ raw data. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-12-30 21:13:21 +08:00
Spade A	1a6f3c4305	enhance: batch processing for ngram (#46648 ) issue: https://github.com/milvus-io/milvus/issues/42053 Process ngram in batch rather than all by once. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Batch Processing for N-gram Queries Core Invariant: All data iteration is now driven by `batch_size_` as the fundamental unit; for sealed chunked segments processing string/JSON data, processing is strictly stateless to allow specialized batched algorithms. Simplified Logic: - Removed the `process_all_chunks` boolean flag from `ProcessMultipleChunksCommon` (renamed to `ProcessDataChunksForMultipleChunk`) as it was redundant—all iteration paths now converge on the same `batch_size_`-driven chunking strategy with unified data size clamping (`std::min(chunk_size, batch_size_ - processed_size)`). - Eliminated wrapper delegation methods (`ProcessDataChunksForMultipleChunk` and `ProcessAllChunksForMultipleChunk` old wrappers) that pointed to a single common implementation with a conditional flag. No Data Loss or Behavior Regression: - The new `ProcessAllDataChunkBatched<T>` is an additional stateless public path (requires sealed + chunked segments, type constraints: `std::string_view\|Json\|ArrayView`) that iterates all `num_data_chunk_` chunks in `batch_size_` granularity without mutating cursor state (`current_data_chunk_`, `current_data_chunk_pos_`), ensuring deterministic re-entrant processing. - Existing cursor-based APIs (`ProcessDataChunksForMultipleChunk`, `ProcessChunkForSealedSeg`) remain unchanged for standard expression evaluation—no segment state is corrupted. - N-gram query execution now routes through `ExecuteQueryWithPredicate<T, Predicate>(literal, segment, predicate, need_post_filter)` which forwards generic predicates and delegates to `segment->ProcessAllDataChunkBatched<T>(execute_batch, res)` for post-filtering, avoiding per-chunk single-pass traversal. Enhancement: Generic predicate template `template <typename T, typename Predicate>` with perfect forwarding (`Predicate&& predicate`) replaces the fixed `std::function<bool(const T&)>` signature, eliminating function wrapper overhead for n-gram matcher closures and enabling efficient batch processing callbacks. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-12-30 16:57:22 +08:00
zhagnlu	83ab90af93	enhance: modify bson thirdparty lib compile mode (#45406 ) #42533 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-12-30 15:41:21 +08:00
Chun Han	da732ec04d	enhance: change credential provider to singleton(#46649 ) (#46653 ) related: #46649 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - Core invariant: STS IAM credential providers for Aliyun, Tencent Cloud, and Huawei Cloud are global, stateless resources that must be instantiated once and reused across all ChunkManager instances (singleton), rather than created per-manager. - Logic removed/simplified: Removed per-instance Aws::MakeShared instantiation of STSAssumeRoleWebIdentityCredentialsProvider inside Aliyun/Tencent/Huawei ChunkManager constructors and replaced them with public static Get...CredentialsProvider() methods that return a thread-safe, lazily-initialized shared_ptr singleton (static local variable). This eliminates duplicate provider construction and header/signal dependency usages tied to per-constructor instantiation. - Why this does NOT introduce data loss or behavior regression: Credential acquisition and usage paths are unchanged — callers still call provider->GetAWSCredentials() and use the returned AWSCredentials to construct Aws::S3::S3Client. The singleton returns the same provider object but the provider is stateless with respect to per-manager data (it only reads environment/platform credentials and produces AWSCredentials). C++11+ static local initialization provides atomic, thread-safe construction, so first-access semantics and validation checks (AssertInfo on access key/secret/token) remain intact. - PR type (Enhancement/Refactor): Improves credential management by centralizing provider lifecycle, removing redundant allocations and header dependencies, and enforcing a single shared provider per cloud vendor where IAM is used. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-12-29 20:35:21 +08:00
Buqian Zheng	dc7c92d398	fix: scalar bench builds on its own, removing related target from milvus (#46658 ) issue: https://github.com/milvus-io/milvus/issues/44452 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> Scalar Bench Decoupled from Milvus Build System - Core assumption: Scalar-bench is now managed as an independent build artifact outside the milvus repository, eliminating the need for conditional compilation integration within milvus's Makefile and CMakeLists.txt. - Build infrastructure simplified: Removed `scalar-bench` and `scalar-bench-ui` targets from Makefile and deleted the entire `ENABLE_SCALAR_BENCH` conditional block in `internal/core/unittest/CMakeLists.txt` (which handled FetchContent, cache variables, and subdirectory integration)—this eliminates optional, redundant build-time coupling that is no longer necessary. - No regression introduced: The removal only affects optional build-time integration paths. Core C++ builds continue functioning as before, and unit tests remain unaffected since `ENABLE_SCALAR_BENCH` was always optional (not a required dependency); the newly added `plan-parser-so` dependency on core build targets appears to be a separate, required component. - Decoupling benefit: Scalar-benchmark can now evolve and release on its own schedule independent of milvus release cycles, while maintaining clean separation of concerns between the two projects. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-12-29 20:13:21 +08:00
Chun Han	f087b7432e	fix: increase expiry time for huawei cloud(#46296 ) (#46298 ) related: #46296 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - Core invariant: expiration comparisons use Aws::Utils::DateTime::Now().count() which returns milliseconds; any expiration grace period must be expressed in milliseconds and compared via (GetExpiration() - Now()).count() in ExpiresSoon() (Huawei and Tencent providers). - Root cause and fix: the grace period constant was authored as 7200 (seconds) but used against millisecond counts, causing premature refreshes. The PR changes STS_CREDENTIAL_PROVIDER_EXPIRATION_GRACE_PERIOD to 180 * 1000 (180000 ms) in HuaweiCloudCredentialsProvider.cpp and TencentCloudCredentialsProvider.cpp to align units and stop unnecessary refreshes. - Removed/replaced redundant/incorrect behavior: the PR does not add new control flow but corrects unit mismatch and simplifies logging/STS request handling — HuaweiCloudSTSClient now explicitly requests a 7200-second token by adding "token": {"duration_seconds": 7200} to the JSON body and uses JsonValue(...).View() for parsing; Huawei logging level raised from TRACE to DEBUG and now logs expiration_count_diff_ms for clarity. These changes remove ambiguity about requested token lifetime and improve diagnostic output. - No data loss or regression: credential contents and assignment are unchanged — Reload()/RefreshIfExpired()/ExpiresSoon() still populate m_credentials from STS responses and return them via GetAWSCredentials(); only the grace-period unit and the Huawei STS request body/parsing/logging were adjusted. Code paths affected are ExpiresSoon()/RefreshIfExpired()/Reload() in both providers and HuaweiCloudSTSCredentialsClient::callHuaweiCloudSTS; since credentials are still read from the same response fields (access, secret, securitytoken, expires_at) and assigned to result.creds, there is no data loss or altered persistence/authorization semantics beyond aligning requested token duration and correct refresh timing. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-12-29 14:05:20 +08:00
zhenshan.cao	1399d955fc	enhance: optimize timestamptz comparison without interval (#46619 ) issue: https://github.com/milvus-io/milvus/issues/46618 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> • Core Invariant: TIMESTAMPTZ values are internally stored as int64 Unix microseconds. Simple comparisons without intervals can safely use native int64 range evaluation (`ExecRangeVisitorImpl<int64_t>`) and `UnaryRangeExpr` to leverage index-based scans, since the underlying data type and comparison semantics remain unchanged. • Logic Optimization: The parser now branches on interval presence. When `ctx.GetOp1() == nil` (no interval), it returns a lightweight `UnaryRangeExpr` for fast indexed range scans. When an interval exists, it falls back to the heavier `TimestamptzArithCompareExpr` for arithmetic evaluation. This eliminates redundant ISO interval parsing and type conversions for the common case of interval-free comparisons. • No Regression: The `UnaryRangeExpr` path preserves exact comparison semantics by treating TIMESTAMPTZ as int64 directly, matching the storage format. For reverse comparisons (e.g., `'2025-01-01' > column`), operator reversal correctly normalizes to column-centric form (`column < '2025-01-01'`), maintaining logical equivalence. Interval-based comparisons continue through the unchanged `TimestamptzArithCompareExpr` path. • Coverage: Both forward (column left of operator) and reverse (column right of operator) comparison syntaxes are handled with explicit branching logic, ensuring the optimization applies uniformly across comparison patterns. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2025-12-29 11:05:20 +08:00
Spade A	0114bd1dc9	feat: support match operator family (#46518 ) issue: https://github.com/milvus-io/milvus/issues/46517 ref: https://github.com/milvus-io/milvus/issues/42148 This PR supports match operator family with struct array and brute force search only. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - Core invariant: match operators only target struct-array element-level predicates and assume callers provide a correct row_start so element indices form a contiguous range; IArrayOffsets implementations convert row-level bitmaps/rows (starting at row_start) into element-level bitmaps or a contiguous element-offset vector used by brute-force evaluation. - New capability added: end-to-end support for MATCH_* semantics (match_any, match_all, match_least, match_most, match_exact) — parser (grammar + proto), planner (ParseMatchExprs), expr model (expr::MatchExpr), compilation (Expr→PhyMatchFilterExpr), execution (PhyMatchFilterExpr::Eval uses element offsets/bitmaps), and unit tests (MatchExprTest + parser tests). Implementation currently works for struct-array inputs and uses brute-force element counting via RowBitsetToElementOffsets/RowBitsetToElementBitset. - Logic removed or simplified and why: removed the ad-hoc DocBitsetToElementOffsets helper and consolidated offset/bitset derivation into IArrayOffsets::RowBitsetToElementOffsets and a row_start-aware RowBitsetToElementBitset, and removed EvalCtx overloads that embedded ExprSet (now EvalCtx(exec_ctx, offset_input)). This centralizes array-layout logic in ArrayOffsets and removes duplicated offset conversion and EvalCtx variants that were redundant for element-level evaluation. - No data loss / no behavior regression: persistent formats are unchanged (no proto storage or on-disk layout changed); callers were updated to supply row_start and now route through the centralized ArrayOffsets APIs which still use the authoritative row_to_element_start_ mapping, preserving exact element index mappings. Eval logic changes are limited to in-memory plumbing (how offsets/bitmaps are produced and how EvalCtx is constructed); expression evaluation still invokes exprs_->Eval where needed, so existing behavior and stored data remain intact. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-12-29 11:03:26 +08:00
aoiasd	55feb7ded8	feat: set related resource ids in collection schema (#46423 ) Support crate analyzer with file resource info, and return used file resource ids when validate analyzer. Save the related resource ids in collection schema. relate: https://github.com/milvus-io/milvus/issues/43687 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - Core invariant: analyzer file-resource resolution is deterministic and traceable by threading a FileResourcePathHelper (collecting used resource IDs in a HashSet) through all tokenizer/analyzer construction and validation paths; validate_analyzer(params, extra_info) returns the collected Vec<i64) which is propagated through C/Rust/Go layers to callers (CValidateResult → RustResult::from_vec_i64 → Go []int64 → querypb.ValidateAnalyzerResponse.ResourceIds → CollectionSchema.FileResourceIds). - Logic removed/simplified: ad‑hoc, scattered resource-path lookups and per-filter file helpers (e.g., read_synonyms_file and other inline file-reading logic) were consolidated into ResourceInfo + FileResourcePathHelper and a centralized get_resource_path(helper, ...) API; filter/tokenizer builder APIs now accept &mut FileResourcePathHelper so all file path resolution and ID collection use the same path and bookkeeping logic (redundant duplicated lookups removed). - Why no data loss or behavior regression: changes are additive and default-preserving — existing call sites pass extra_info = "" so analyzer creation/validation behavior and error paths remain unchanged; new Collection.FileResourceIds is populated from resp.ResourceIds in validateSchema and round‑tripped through marshal/unmarshal (model.Collection ↔ schemapb.CollectionSchema) so schema persistence uses the new list without overwriting other schema fields; proto change adds a repeated field (resource_ids) which is wire‑compatible (older clients ignore extra field). Concrete code paths: analyzer creation still uses create_analyzer (now with extra_info ""), tokenizer validation still returns errors as before but now also returns IDs via CValidateResult/RustResult, and rootcoord.validateSchema assigns resp.ResourceIds → schema.FileResourceIds. - New capability added: end‑to‑end discovery, return, and persistence of file resource IDs used by analyzers — validate flows now return resource IDs and the system stores them in collection schema (affects tantivy analyzer binding, canalyzer C bindings, internal/util analyzer APIs, querynode ValidateAnalyzer response, and rootcoord/create_collection flow). <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-12-26 22:49:19 +08:00
congqixia	6f94d8c41a	fix: Handle legacy binlog format (v1) in segment load diff computation (#46598 ) When computing load diff, binlogs in v1/legacy format have empty child_fields. In this case, the field_id itself should be used as the child_id (group_id == field_id for legacy format). Without this fix, legacy format binlogs are not recognized during diff computation, causing segments to fail loading and TestProxy to timeout. Changes: - Add fallback to use fieldid as child_id when child_fields is empty - Add LoadDiff::ToString() for debugging - Add logging for diff in Load/Reopen operations - Add comprehensive unit tests for legacy format handling Related to #46594 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - Core invariant: load-diff computation must enumerate every binlog child group for a field so current vs new segment state comparisons include all column-group/binlog groups; for legacy (v1) binlogs that have empty child_fields, the code must treat group_id == field_id to preserve that mapping. - Bug fix (resolves #46594): SegmentLoadInfo now normalizes field_binlog.child_fields() into a vector and falls back to using field_id as the single child group when child_fields is empty; the same normalization is applied for both current and new-info paths, ensuring legacy v1 binlogs are discovered and included in Load/ComputeDiff results so segments load correctly. - Logic simplified: removed the implicit assumption that child_fields is always present by centralizing a single normalization/fallback step used symmetrically for both diff paths, avoiding ad-hoc special-casing and unifying iteration over child groups. - No data loss / no behavior regression: the fallback only activates when child_fields is empty — non-legacy binlogs continue to use their child_fields unchanged. Add/drop semantics are preserved because the same normalization is applied to both sides of the diff. Unit tests (v1-only, v4-only, mixed cases) were added to validate correctness; LoadDiff::ToString() and extra logging are diagnostic only. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Co-authored-by: Cai Zhang <cai.zhang@zilliz.com> --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-12-25 23:33:19 +08:00
Buqian Zheng	6ac66e38d1	enhance: STL_SORT to support LIKE operator (#46534 ) issue: https://github.com/milvus-io/milvus/issues/44399 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * New Features * Enhanced pattern matching for string indexes with support for prefix, postfix, inner, and regex-based matching operations. * Optimized pattern matching performance through prefix-based filtering and range-based lookups. * Tests * Added comprehensive test coverage for pattern matching functionality across multiple index implementations. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-12-24 19:45:20 +08:00
zhagnlu	9ba0c4e501	fix:add json stats version because previous change #46130 (#46467 ) #42533 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-12-24 19:17:18 +08:00
congqixia	48f8b3b585	enhance: Unify segment Load and Reopen through diff-based loading (#46536 ) Related to #46358 Refactor segment loading to use a unified diff-based approach for both initial Load and Reopen operations: - Extract ApplyLoadDiff from Reopen to share loading logic - Add GetLoadDiff to compute diff from empty state for initial load - Change column_groups_to_load from map to vector<pair> to preserve order - Add validation for empty index file paths in diff computation - Add comprehensive unit tests for GetLoadDiff <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Performance * Improved segment loading efficiency through incremental updates, reducing memory overhead and enhancing performance during data updates. * Tests * Expanded test coverage for load operation scenarios. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-12-24 10:19:22 +08:00
marcelo-cjl	3b599441fd	feat: Add nullable vector support for proxy and querynode (#46305 ) related: #45993 This commit extends nullable vector support to the proxy layer, querynode, and adds comprehensive validation, search reduce, and field data handling for nullable vectors with sparse storage. Proxy layer changes: - Update validate_util.go checkAligned() with getExpectedVectorRows() helper to validate nullable vector field alignment using valid data count - Update checkFloatVectorFieldData/checkSparseFloatVectorFieldData for nullable vector validation with proper row count expectations - Add FieldDataIdxComputer in typeutil/schema.go for logical-to-physical index translation during search reduce operations - Update search_reduce_util.go reduceSearchResultData to use idxComputers for correct field data indexing with nullable vectors - Update task.go, task_query.go, task_upsert.go for nullable vector handling - Update msg_pack.go with nullable vector field data processing QueryNode layer changes: - Update segments/result.go for nullable vector result handling - Update segments/search_reduce.go with nullable vector offset translation Storage and index changes: - Update data_codec.go and utils.go for nullable vector serialization - Update indexcgowrapper/dataset.go and index.go for nullable vector indexing Utility changes: - Add FieldDataIdxComputer struct with Compute() method for efficient logical-to-physical index mapping across multiple field data - Update EstimateEntitySize() and AppendFieldData() with fieldIdxs parameter - Update funcutil.go with nullable vector support functions <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Full support for nullable vector fields (float, binary, float16, bfloat16, int8, sparse) across ingest, storage, indexing, search and retrieval; logical↔physical offset mapping preserves row semantics. * Client: compaction control and compaction-state APIs. * Bug Fixes * Improved validation for adding vector fields (nullable + dimension checks) and corrected search/query behavior for nullable vectors. * Chores * Persisted validity maps with indexes and on-disk formats. * Tests * Extensive new and updated end-to-end nullable-vector tests. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>	2025-12-24 10:13:19 +08:00
Buqian Zheng	e379b1f0f4	enhance: moved query optimization to proxy, added various optimizations (#45526 ) issue: https://github.com/milvus-io/milvus/issues/45525 see added README.md for added optimizations <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Added query expression optimization feature with a new `optimizeExpr` configuration flag to enable automatic simplification of filter predicates, including range predicate optimization, merging of IN/NOT IN conditions, and flattening of nested logical operators. * Bug Fixes * Adjusted delete operation behavior to correctly handle expression evaluation. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-12-24 00:39:19 +08:00
Buqian Zheng	db9afe9756	enhance: update tantivy (#46521 ) issue: #46520 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-12-23 16:57:19 +08:00
sparknack	0a2f8d4f63	enhance: map multi row groups into one cache cell (#46249 ) issue: #45486 Introduce row group batching to reduce cache cell granularity and improve memory&disk efficiency. Previously, each parquet row group mapped 1:1 to a cache cell. Now, up to `kRowGroupsPerCell` (4) row groups are merged into one cell. This reduces the number of cache cells (and associated overhead) by ~4x while maintaining the same data granularity for loading. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Refactor * Switched to cell-based grouping that merges multiple row groups for more efficient multi-file aggregation and reads. * Chunk loading now combines multiple source batches/tables per cell and better supports mmap-backed storage. * New Features * Exposed helpers to query row-group ranges and global row-group offsets for diagnostics and testing. * Translators now accept chunk-type and mmap/load hints to control on-disk vs in-memory behavior. * Bug Fixes * Improved bounds checks and clearer error messages for out-of-range cell requests. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-12-23 14:57:18 +08:00
congqixia	d3b15ac136	enhance: support pk isolation optional field data loading from manifest for index build (#46480 ) ### User description Related to #44956 Add manifest-based data loading path for optional fields in `cache_opt_field_memory_v2`. When a manifest file is provided in the config, the function now retrieves field data directly from the manifest using `GetFieldDatasFromManifest` instead of reading from segment insert files. This enables storage v2 compatibility for building indexes with optional fields. ___ ### PR Type Enhancement ___ ### Description - Add manifest-based data loading for optional fields in index building - Support storage v2 compatibility via `GetFieldDatasFromManifest` function - Enable PK isolation optional field handling without segment insert files ___ --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-12-23 14:55:21 +08:00
Buqian Zheng	674ac8a006	enhance: fix IsMmapSupported for stl sort (#46472 ) issue: https://github.com/milvus-io/milvus/issues/44399 this PR also adds `ByteSize()` methods for scalar indexes. currently not used in milvus code, but used in scalar benchmark. may be used by cachinglayer in the future. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Refactor * Improved and standardized memory-size computation and caching across index types so reported index footprints are more accurate and consistent. * Chores * Ensured byte-size metrics are refreshed immediately after index build/load operations to keep memory accounting in sync with runtime state. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-12-23 13:27:18 +08:00
foxspy	ab03521588	fix: fix chunk iterator merge order (#46461 ) issue: #46349 When using brute-force search, the iterator results from multiple chunks are merged; at that point, we need to pay attention to how the metric affects result ranking. Signed-off-by: xianliang.li <xianliang.li@zilliz.com>	2025-12-23 10:33:17 +08:00
congqixia	11c027ad81	fix: [Loon] pass mmap directory path to ManifestGroupTranslator (#46471 ) Related to #44956 When loading column groups with mmap enabled, the ManifestGroupTranslator needs the mmap directory path to properly handle memory-mapped data loading. This change retrieves the root path from LocalChunkManagerSingleton and passes it to the translator during construction. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-12-20 12:25:17 +08:00
Spade A	ab9bec0a6d	fix: some fixes for ngram index (#46405 ) issue: https://github.com/milvus-io/milvus/issues/42053 The splitted literals in `match` execution should be handled in `and` manner rather than `or`. Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-12-19 16:13:19 +08:00
Spade A	ad8aba7cb4	feat: impl ComputePhraseMatchSlop for compute min slop for phrase match query (#45892 ) issue: https://github.com/milvus-io/milvus/issues/45890 ComputePhraseMatchSlop accepts three pararms: 1. A string: query text 2. Some trings: data texts 3. Analyzer params, Slop will be calculated for the query text with each data text in the context of phrase match where they are tokenized with tokenizer with analyzer params. So two array will be returned: 1. is_match: is phrase match can sucess 2. slop: the related slop if phrase match can sucess, or -1 is cannot. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-12-19 16:03:18 +08:00
zhagnlu	52026cf07e	enhance: change jemalloc monitor from compile-time to runtime (#46377 ) #46133 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-12-17 16:17:16 +08:00
congqixia	21ed1fabfd	feat: support reopen segment for data/schema changes (#46359 ) issue: #46358 This PR implements segment reopening functionality on query nodes, enabling the application of data or schema changes to already-loaded segments without requiring a full reload. ### Core (C++) New SegmentLoadInfo class (`internal/core/src/segcore/SegmentLoadInfo.h/cpp`): - Encapsulates segment load configuration with structured access - Implements `ComputeDiff()` to calculate differences between old and new load states - Tracks indexes, binlogs, and column groups that need to be loaded or dropped - Provides `ConvertFieldIndexInfoToLoadIndexInfo()` for index loading ChunkedSegmentSealedImpl modifications: - Added `Reopen(const SegmentLoadInfo&)` method to apply incremental changes based on computed diff - Refactored `LoadColumnGroups()` and `LoadColumnGroup()` to support selective loading via field ID map - Extracted `LoadBatchIndexes()` and `LoadBatchFieldData()` for reusable batch loading logic - Added `LoadManifest()` for manifest-based loading path - Updated all methods to use `SegmentLoadInfo` wrapper instead of direct proto access SegmentGrowingImpl modifications: - Added `Reopen()` stub method for interface compliance C API additions (`segment_c.h/cpp`): - Added `ReopenSegment()` function exposing reopen to Go layer ### Go Side QueryNode handlers (`internal/querynodev2/`): - Added `HandleReopen()` in handlers.go - Added `ReopenSegments()` RPC in services.go Segment interface (`internal/querynodev2/segments/`): - Extended `Segment` interface with `Reopen()` method - Implemented `Reopen()` in LocalSegment - Added `Reopen()` to segment loader Segcore wrapper (`internal/util/segcore/`): - Added `Reopen()` method in segment.go - Added `ReopenSegmentRequest` in requests.go ### Proto - Added new fields to support reopen in `query_coord.proto` --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-12-17 15:49:16 +08:00
groot	d63ec2d8c6	fix: Enable search iterator for binary vector BIN_FLAT (#46340 ) issue: https://github.com/milvus-io/milvus/issues/46339 https://github.com/milvus-io/milvus/discussions/46326 Signed-off-by: yhmo <yihua.mo@zilliz.com>	2025-12-17 14:13:16 +08:00
Chun Han	f0265dde18	fix: catch exception from LoadWithStrategy(#46380 ) (#46381 ) related: #46380 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-12-17 11:37:17 +08:00
aoiasd	df80f54151	feat: support use user's file as dictionary for analyzer filter (#46145 ) relate: https://github.com/milvus-io/milvus/issues/43687 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-12-16 11:45:16 +08:00
Lanqing Yang	3e15604f2e	fix: use rlock for pinindex (#45932 ) fixes: https://github.com/milvus-io/milvus/issues/45934 pinIndex is a const and only do read operations rlock would be the right choice for performance Signed-off-by: Lanqing Yang <lanqingy93@gmail.com>	2025-12-15 22:33:16 +08:00
congqixia	ab90dd287f	fix: bump milvus-storage to fix initialization race condition (#46336 ) Related to #44647 Update milvus-storage from 91df193 to 839a8e5 to include milvus-io/milvus-storage#342, which fixes a race condition in S3GlobalContext initialization. The fix moves the is_initialized_ flag update from before DoInitialize() to after it completes. This ensures the initialization flag is only set to true after the actual initialization is done, preventing potential issues if DoInitialize() fails or if other code checks the flag during initialization. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-12-15 19:51:15 +08:00
Spade A	f6f716bcfd	feat: impl StructArray -- support embedding searches embeddings in embedding list with element level filter expression (#45830 ) issue: https://github.com/milvus-io/milvus/issues/42148 For a vector field inside a STRUCT, since a STRUCT can only appear as the element type of an ARRAY field, the vector field in STRUCT is effectively an array of vectors, i.e. an embedding list. Milvus already supports searching embedding lists with metrics whose names start with the prefix MAX_SIM_. This PR allows Milvus to search embeddings inside an embedding list using the same metrics as normal embedding fields. Each embedding in the list is treated as an independent vector and participates in ANN search. Further, since STRUCT may contain scalar fields that are highly related to the embedding field, this PR introduces an element-level filter expression to refine search results. The grammar of the element-level filter is: element_filter(structFieldName, $[subFieldName] == 3) where $[subFieldName] refers to the value of subFieldName in each element of the STRUCT array structFieldName. It can be combined with existing filter expressions, for example: "varcharField == 'aaa' && element_filter(struct_field, $[struct_int] == 3)" A full example: ``` struct_schema = milvus_client.create_struct_field_schema() struct_schema.add_field("struct_str", DataType.VARCHAR, max_length=65535) struct_schema.add_field("struct_int", DataType.INT32) struct_schema.add_field("struct_float_vec", DataType.FLOAT_VECTOR, dim=EMBEDDING_DIM) schema.add_field( "struct_field", datatype=DataType.ARRAY, element_type=DataType.STRUCT, struct_schema=struct_schema, max_capacity=1000, ) ... filter = "varcharField == 'aaa' && element_filter(struct_field, $[struct_int] == 3 && $[struct_str] == 'abc')" res = milvus_client.search( COLLECTION_NAME, data=query_embeddings, limit=10, anns_field="struct_field[struct_float_vec]", filter=filter, output_fields=["struct_field[struct_int]", "varcharField"], ) ``` TODO: 1. When an `element_filter` expression is used, a regular filter expression must also be present. Remove this restriction. 2. Implement `element_filter` expressions in the `query`. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-12-15 12:01:15 +08:00
Buqian Zheng	76aa00a4c6	fix: fix CanUseIndexForJson (#46286 ) issue: https://github.com/milvus-io/milvus/issues/46269 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-12-12 18:25:20 +08:00
zhagnlu	a86b8b7a12	enhance: move jsonshredding meta from parquet to meta.json (#46130 ) #42533 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-12-11 14:01:13 +08:00
congqixia	b2c49d0197	enhance: bump milvus-storage to resolve credentials provider namespace conflict (#46263 ) Upgrade milvus-storage from 33bf815 to 91df193. This includes the fix from milvus-io/milvus-storage#337, which resolves a namespace collision where both Milvus and milvus-storage defined identical credentials provider classes in the same namespace. Although no compile-time redefinition errors occurred, the dynamic linker could resolve to the wrong implementation at runtime, potentially causing cloud authentication failures due to configuration mismatches. The fix changes milvus-storage's credentials provider namespace to `milvus_storage`, ensuring each project uses its own implementation. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-12-11 10:09:13 +08:00
congqixia	8780e12570	fix: use assertion instead of modifying schema under shared lock (#46242 ) Related to #46225 Replace the heterogeneous insert data handling logic that modified schema_ while holding a shared lock with an assertion. The previous implementation had a concurrency bug where schema modification operations were performed under a shared_lock, which violates mutex semantics and can lead to data races. Issue: #46225 reported two problems: 1. Schema modification under shared_lock (not exclusive lock) 2. Access to schema_ not protected by mutex in growing segment The removed code attempted to handle "added fields" by: - Adding new field to schema (schema_->AddField) - Appending field metadata to insert_record_ - Setting default data for existing rows All these write operations were performed while holding only a shared_lock, which is incorrect since shared_locks are meant for read-only operations. This fix replaces the unsafe modification with an assertion that fails if an unexpected new field is encountered in a growing segment with existing data. The proper handling of schema changes should go through the Reopen() path which correctly acquires a unique_lock before modifying schema_. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-12-10 16:25:13 +08:00
Buqian Zheng	ab2e51b1c7	fix: VectorArrayChunkWriter::calculate_size (#46244 ) issue: #46238 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-12-10 15:27:14 +08:00
sparknack	5fb420b156	fix: milvus-common update (#45929 ) issue: #41435 fix some usage tracking bugs in caching layer. Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-12-10 14:53:13 +08:00
aoiasd	c84b6d56f8	fix: char_group tokenizer only support one byte char as delimiters (#46193 ) relate: https://github.com/milvus-io/milvus/issues/46192 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-12-10 14:33:13 +08:00
Buqian Zheng	85a7a7b1e3	fix: skip json path index if the query path includes number (#46200 ) issue: #45511 our tantivy inverted index currently does not include item index if the value is an array, thus we can't do `a[0] == 'b'` type of look up in the inverted index. for such, we need to skip the index and use brute force search. we may improve our index in the future, so this is a temp solution Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-12-10 13:59:13 +08:00
cai.zhang	bb486c0db3	fix: Fix path concatenation error when rootPath = "." in minio (#46220 ) issue: #46219 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-12-10 13:53:13 +08:00
zhagnlu	8f0b7983ec	enhance: add jemalloc cached monitor (#46041 ) #46133 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-12-09 19:53:13 +08:00
congqixia	728cdc15b2	fix: fill partition_id in load index info and close RemoteOutputStream properly (#46203 ) This PR fixes two issues related to segment loading and index deserialization: 1. Fill partition_id in LoadIndexInfo when converting field index info, which is required by cardinal (DiskANN) index deserialization. 2. Close RemoteOutputStream in destructor to ensure buffer flushed and resources released properly. issue: #46141 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-12-09 13:27:13 +08:00
Buqian Zheng	95a535cb4d	fix: struct reduce incorrect (#46150 ) issue: https://github.com/milvus-io/milvus/issues/42148 --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-12-08 10:23:11 +08:00
congqixia	d4450b2f57	enhance: [StorageV2] Integrate CMEK support into Loon FFI interface (#46123 ) This PR adds Customer Managed Encryption Keys (CMEK) support to the StorageV2 FFI layer, enabling data encryption/decryption through the cipher plugin system. Changes: - Add ffi_writer_c.cpp/h with GetEncParams() to retrieve encryption parameters (key and metadata) from cipher plugin for data encryption - Extend GetLoonReader() in ffi_reader_c.cpp to support CMEK decryption by configuring KeyRetriever when plugin context is provided - Add encryption property constants in ffi_common.go for writer config - Integrate CMEK encryption in NewFFIPackedWriter() to pass encryption parameters to the underlying storage writer issue: #44956 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-12-05 17:59:12 +08:00

1 2 3 4 5 ...

2357 Commits