mirror of
https://gitee.com/milvus-io/milvus.git
synced 2026-01-07 19:31:51 +08:00
issue: https://github.com/milvus-io/milvus/issues/42053 Process ngram in batch rather than all by once. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Batch Processing for N-gram Queries **Core Invariant:** All data iteration is now driven by `batch_size_` as the fundamental unit; for sealed chunked segments processing string/JSON data, processing is strictly stateless to allow specialized batched algorithms. **Simplified Logic:** - Removed the `process_all_chunks` boolean flag from `ProcessMultipleChunksCommon` (renamed to `ProcessDataChunksForMultipleChunk`) as it was redundant—all iteration paths now converge on the same `batch_size_`-driven chunking strategy with unified data size clamping (`std::min(chunk_size, batch_size_ - processed_size)`). - Eliminated wrapper delegation methods (`ProcessDataChunksForMultipleChunk` and `ProcessAllChunksForMultipleChunk` old wrappers) that pointed to a single common implementation with a conditional flag. **No Data Loss or Behavior Regression:** - The new `ProcessAllDataChunkBatched<T>` is an additional stateless public path (requires sealed + chunked segments, type constraints: `std::string_view|Json|ArrayView`) that iterates all `num_data_chunk_` chunks in `batch_size_` granularity without mutating cursor state (`current_data_chunk_`, `current_data_chunk_pos_`), ensuring deterministic re-entrant processing. - Existing cursor-based APIs (`ProcessDataChunksForMultipleChunk`, `ProcessChunkForSealedSeg`) remain unchanged for standard expression evaluation—no segment state is corrupted. - N-gram query execution now routes through `ExecuteQueryWithPredicate<T, Predicate>(literal, segment, predicate, need_post_filter)` which forwards generic predicates and delegates to `segment->ProcessAllDataChunkBatched<T>(execute_batch, res)` for post-filtering, avoiding per-chunk single-pass traversal. **Enhancement:** Generic predicate template `template <typename T, typename Predicate>` with perfect forwarding (`Predicate&& predicate`) replaces the fixed `std::function<bool(const T&)>` signature, eliminating function wrapper overhead for n-gram matcher closures and enabling efficient batch processing callbacks. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>