milvus/internal
Spade A 1a6f3c4305
enhance: batch processing for ngram (#46648)
issue: https://github.com/milvus-io/milvus/issues/42053
Process ngram in batch rather than all by once.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Batch Processing for N-gram Queries

**Core Invariant:** All data iteration is now driven by `batch_size_` as
the fundamental unit; for sealed chunked segments processing string/JSON
data, processing is strictly stateless to allow specialized batched
algorithms.

**Simplified Logic:**
- Removed the `process_all_chunks` boolean flag from
`ProcessMultipleChunksCommon` (renamed to
`ProcessDataChunksForMultipleChunk`) as it was redundant—all iteration
paths now converge on the same `batch_size_`-driven chunking strategy
with unified data size clamping (`std::min(chunk_size, batch_size_ -
processed_size)`).
- Eliminated wrapper delegation methods
(`ProcessDataChunksForMultipleChunk` and
`ProcessAllChunksForMultipleChunk` old wrappers) that pointed to a
single common implementation with a conditional flag.

**No Data Loss or Behavior Regression:**
- The new `ProcessAllDataChunkBatched<T>` is an additional stateless
public path (requires sealed + chunked segments, type constraints:
`std::string_view|Json|ArrayView`) that iterates all `num_data_chunk_`
chunks in `batch_size_` granularity without mutating cursor state
(`current_data_chunk_`, `current_data_chunk_pos_`), ensuring
deterministic re-entrant processing.
- Existing cursor-based APIs (`ProcessDataChunksForMultipleChunk`,
`ProcessChunkForSealedSeg`) remain unchanged for standard expression
evaluation—no segment state is corrupted.
- N-gram query execution now routes through
`ExecuteQueryWithPredicate<T, Predicate>(literal, segment, predicate,
need_post_filter)` which forwards generic predicates and delegates to
`segment->ProcessAllDataChunkBatched<T>(execute_batch, res)` for
post-filtering, avoiding per-chunk single-pass traversal.

**Enhancement:** Generic predicate template `template <typename T,
typename Predicate>` with perfect forwarding (`Predicate&& predicate`)
replaces the fixed `std::function<bool(const T&)>` signature,
eliminating function wrapper overhead for n-gram matcher closures and
enabling efficient batch processing callbacks.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-12-30 16:57:22 +08:00
..