milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-07 19:31:51 +08:00

Author	SHA1	Message	Date
Xianhui Lin	7e46fc6618	feat: implement batch commit for JSON Stats (#42494 ) implement batch commit for JSON Stats issue:https://github.com/milvus-io/milvus/issues/41616 Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-06-08 19:58:33 +08:00
Xianhui Lin	1a6838b496	fix: json stats add map null check before insert into tantivity (#41505 ) json stats add map null check before insert into tantivity. Json stats index may fail if there is no data issue:https://github.com/milvus-io/milvus/issues/41494 --------- Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-04-24 21:06:37 +08:00
Xianhui Lin	3d4889586d	fix: JsonStats filter by conjunctExpr and improve the task slot calculation logic (#41459 ) Optimized JSON filter execution by introducing ProcessJsonStatsChunkPos() for unified position calculation and GetNextBatchSize() for better batch processing. Improved JSON key generation by replacing manual path joining with milvus::Json::pointer() and adjusted slot size calculation for JSON key index jobs. Updated the task slot calculation logic in calculateStatsTaskSlot() to handle the increased resource needs of JSON key index jobs. issue: https://github.com/milvus-io/milvus/issues/41378 https://github.com/milvus-io/milvus/issues/41218 --------- Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-04-23 16:30:37 +08:00
Spade A	5b1430f27e	enhance: tantivy collector set bitset directly (#39748 ) fix: #39755 The following shows a simple benchmark where insert 1M docs where all rows are "hello", the latency is segcore level, CPU is 9900K: master: 2.62ms this PR: 2.11ms bench mark code: ``` TEST(TextMatch, TestPerf) { auto schema = GenTestSchema({}, true); auto seg = CreateSealedSegment(schema, empty_index_meta); int64_t N = 1000000; uint64_t seed = 19190504; auto raw_data = DataGen(schema, N, seed); auto str_col = raw_data.raw_->mutable_fields_data() ->at(1) .mutable_scalars() ->mutable_string_data() ->mutable_data(); for (int64_t i = 0; i < N - 1; i++) { str_col->at(i) = "hello"; } SealedLoadFieldData(raw_data, *seg); seg->CreateTextIndex(FieldId(101)); auto now = std::chrono::high_resolution_clock::now(); auto expr = GetMatchExpr(schema, "hello", OpType::TextMatch); auto final = ExecuteQueryExpr(expr, seg.get(), N, MAX_TIMESTAMP); auto end = std::chrono::high_resolution_clock::now(); auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - now); std::cout << "TextMatch query time: " << duration.count() << "ms" << std::endl; } ``` --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-20 23:02:41 +08:00
Spade A	70d13dcf61	enhance: update tantivy for removing "doc_id" fast field (#41198 ) Issue: #41210 After https://github.com/zilliztech/tantivy/pull/5, we can provide milvus row id directly to tantivy rather than record it in the fast field "doc_id". So rather than search tantivy doc id and then get milvus row id from "doc_id", now, the searched tantivy doc id is the milvus row id, eliminating the expensive acquiring row id phase. The following shows a simple benchmark where insert 1M docs where all rows are "hello", the latency is segcore level, CPU is 9900K: ![image](https://github.com/user-attachments/assets/d8e72134-56b5-430b-8628-36c3bed8eaad) The latency is 2.02 and 2.1 times respectively. bench mark code: ``` TEST(TextMatch, TestPerf) { auto schema = GenTestSchema({}, true); auto seg = CreateSealedSegment(schema, empty_index_meta); int64_t N = 1000000; uint64_t seed = 19190504; auto raw_data = DataGen(schema, N, seed); auto str_col = raw_data.raw_->mutable_fields_data() ->at(1) .mutable_scalars() ->mutable_string_data() ->mutable_data(); for (int64_t i = 0; i < N - 1; i++) { str_col->at(i) = "hello"; } SealedLoadFieldData(raw_data, *seg); seg->CreateTextIndex(FieldId(101)); auto now = std::chrono::high_resolution_clock::now(); auto expr = GetMatchExpr(schema, "hello", OpType::TextMatch); auto final = ExecuteQueryExpr(expr, seg.get(), N, MAX_TIMESTAMP); auto end = std::chrono::high_resolution_clock::now(); auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - now); std::cout << "TextMatch query time: " << duration.count() << "ms" << std::endl; } ``` --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-15 20:20:32 +08:00
Spade A	9ce3e3cb44	enhance: add documents in batch for json key stats (#41228 ) issue: https://github.com/milvus-io/milvus/issues/40897 After this, the document add operations scheduling duration is decreased roughly from 6s to 0.9s for the case in the issue. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-11 14:08:26 +08:00
Xianhui Lin	3bc24c264f	enhance: Add json key inverted index in stats for optimization (#38039 ) Add json key inverted index in stats for optimization https://github.com/milvus-io/milvus/issues/36995 --------- Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-04-10 15:20:28 +08:00

7 Commits