mirror of
https://gitee.com/milvus-io/milvus.git
synced 2026-02-02 01:06:41 +08:00
issue : https://github.com/milvus-io/milvus/issues/41746 This PR adds MinHash "DIDO" (Data In, Data Out) support to Milvus, which allows computing MinHash signatures on-the-fly during search operations instead of requiring pre-stored vectors. Key changes: - Implemented SIMD-optimized C++ MinHash computation (AVX2/AVX512 for x86, NEON/SVE for ARM) - Added runtime CPU detection and function hooks to automatically select the best SIMD implementation - Integrated MinHash computation into search pipeline (brute force search, growing segment search) - Added support for LSH-based MinHash search with configurable band width and bit width parameters - Enabled direct text-to-signature conversion during query execution, reducing storage overhead This enables efficient text deduplication and similarity search without storing pre-computed MinHash vectors. Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>