mirror of
https://gitee.com/milvus-io/milvus.git
synced 2026-01-02 00:45:30 +08:00
Issue: #41210 After https://github.com/zilliztech/tantivy/pull/5, we can provide milvus row id directly to tantivy rather than record it in the fast field "doc_id". So rather than search tantivy doc id and then get milvus row id from "doc_id", now, the searched tantivy doc id is the milvus row id, eliminating the expensive acquiring row id phase. The following shows a simple benchmark where insert **1M** docs where all rows are "hello", the latency is **segcore** level, CPU is 9900K:  **The latency is 2.02 and 2.1 times respectively.** bench mark code: ``` TEST(TextMatch, TestPerf) { auto schema = GenTestSchema({}, true); auto seg = CreateSealedSegment(schema, empty_index_meta); int64_t N = 1000000; uint64_t seed = 19190504; auto raw_data = DataGen(schema, N, seed); auto str_col = raw_data.raw_->mutable_fields_data() ->at(1) .mutable_scalars() ->mutable_string_data() ->mutable_data(); for (int64_t i = 0; i < N - 1; i++) { str_col->at(i) = "hello"; } SealedLoadFieldData(raw_data, *seg); seg->CreateTextIndex(FieldId(101)); auto now = std::chrono::high_resolution_clock::now(); auto expr = GetMatchExpr(schema, "hello", OpType::TextMatch); auto final = ExecuteQueryExpr(expr, seg.get(), N, MAX_TIMESTAMP); auto end = std::chrono::high_resolution_clock::now(); auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - now); std::cout << "TextMatch query time: " << duration.count() << "ms" << std::endl; } ``` --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>