milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-03 09:22:30 +08:00

Author	SHA1	Message	Date
aoiasd	f166843c5e	enhance: support use lindera tag filter (#40416 ) relate: https://github.com/milvus-io/milvus/issues/39659 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-04-21 15:56:36 +08:00
sparknack	8ccb875e41	enhance: add simde package (#40943 ) issue: #40942 Add simde package, which can make porting SIMD code to other architectures much easier. Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-04-21 12:18:40 +08:00
Spade A	5b1430f27e	enhance: tantivy collector set bitset directly (#39748 ) fix: #39755 The following shows a simple benchmark where insert 1M docs where all rows are "hello", the latency is segcore level, CPU is 9900K: master: 2.62ms this PR: 2.11ms bench mark code: ``` TEST(TextMatch, TestPerf) { auto schema = GenTestSchema({}, true); auto seg = CreateSealedSegment(schema, empty_index_meta); int64_t N = 1000000; uint64_t seed = 19190504; auto raw_data = DataGen(schema, N, seed); auto str_col = raw_data.raw_->mutable_fields_data() ->at(1) .mutable_scalars() ->mutable_string_data() ->mutable_data(); for (int64_t i = 0; i < N - 1; i++) { str_col->at(i) = "hello"; } SealedLoadFieldData(raw_data, *seg); seg->CreateTextIndex(FieldId(101)); auto now = std::chrono::high_resolution_clock::now(); auto expr = GetMatchExpr(schema, "hello", OpType::TextMatch); auto final = ExecuteQueryExpr(expr, seg.get(), N, MAX_TIMESTAMP); auto end = std::chrono::high_resolution_clock::now(); auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - now); std::cout << "TextMatch query time: " << duration.count() << "ms" << std::endl; } ``` --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-20 23:02:41 +08:00
Chun Han	016920b023	fix: solve incompitable problem for none-encoding index(#40838 ) (#41369 ) related: #40838 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-04-20 22:56:44 +08:00
Ted Xu	d50781c8cc	enhance: support nullable group by keys (#41313 ) See #36264 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-04-18 10:08:34 +08:00
Spade A	62293cb582	fix: revert batch add (#41374 ) issue: #41375 todo: to fix the problems fixed in the issue. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-17 22:32:38 +08:00
Bingyi Sun	4552dd4b23	fix: Fix json index does not work for string filter (#41382 ) issue: #35528 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-04-17 20:10:39 +08:00
sthuang	1f1c836fb9	feat: Storage v2 growing segment load (#41001 ) support parallel loading sealed and growing segments with storage v2 format by async reading row groups. related: #39173 --------- Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-04-16 17:14:33 +08:00
Spade A	70d13dcf61	enhance: update tantivy for removing "doc_id" fast field (#41198 ) Issue: #41210 After https://github.com/zilliztech/tantivy/pull/5, we can provide milvus row id directly to tantivy rather than record it in the fast field "doc_id". So rather than search tantivy doc id and then get milvus row id from "doc_id", now, the searched tantivy doc id is the milvus row id, eliminating the expensive acquiring row id phase. The following shows a simple benchmark where insert 1M docs where all rows are "hello", the latency is segcore level, CPU is 9900K: ![image](https://github.com/user-attachments/assets/d8e72134-56b5-430b-8628-36c3bed8eaad) The latency is 2.02 and 2.1 times respectively. bench mark code: ``` TEST(TextMatch, TestPerf) { auto schema = GenTestSchema({}, true); auto seg = CreateSealedSegment(schema, empty_index_meta); int64_t N = 1000000; uint64_t seed = 19190504; auto raw_data = DataGen(schema, N, seed); auto str_col = raw_data.raw_->mutable_fields_data() ->at(1) .mutable_scalars() ->mutable_string_data() ->mutable_data(); for (int64_t i = 0; i < N - 1; i++) { str_col->at(i) = "hello"; } SealedLoadFieldData(raw_data, *seg); seg->CreateTextIndex(FieldId(101)); auto now = std::chrono::high_resolution_clock::now(); auto expr = GetMatchExpr(schema, "hello", OpType::TextMatch); auto final = ExecuteQueryExpr(expr, seg.get(), N, MAX_TIMESTAMP); auto end = std::chrono::high_resolution_clock::now(); auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - now); std::cout << "TextMatch query time: " << duration.count() << "ms" << std::endl; } ``` --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-15 20:20:32 +08:00
Bingyi Sun	a953eaeaf0	enhance: support binary range expression for json path index (#41025 ) issue: #35528 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-04-15 19:32:33 +08:00
Chun Han	59b14d38f5	enhance: Optimize index format for improved load performance(#40838 ) (#40839 ) related: https://github.com/milvus-io/milvus/issues/40838 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-04-15 03:10:30 +08:00
Bingyi Sun	bf617115ca	enhance: Remove single chunk segment related codes (#39249 ) https://github.com/milvus-io/milvus/issues/39112 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-04-11 18:56:29 +08:00
Spade A	9ce3e3cb44	enhance: add documents in batch for json key stats (#41228 ) issue: https://github.com/milvus-io/milvus/issues/40897 After this, the document add operations scheduling duration is decreased roughly from 6s to 0.9s for the case in the issue. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-11 14:08:26 +08:00
Bingyi Sun	b9b8419cbf	fix: Use int32 when creating array index for element type int8/int16 (#41185 ) issue: #41172 Elements with type int8 or int16 in Array is encoded using int32, so we should parse it as int32 when creating index. Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-04-11 13:18:25 +08:00
foxspy	17e10beba0	fix: avoid segmentation faults caused by retrieving empty vector datasets (#40545 ) issue: #40544 Signed-off-by: xianliang.li <xianliang.li@zilliz.com>	2025-04-10 20:16:29 +08:00
Xianhui Lin	3bc24c264f	enhance: Add json key inverted index in stats for optimization (#38039 ) Add json key inverted index in stats for optimization https://github.com/milvus-io/milvus/issues/36995 --------- Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-04-10 15:20:28 +08:00
Spade A	e9fa30f462	fix: remove single segment logic in V7 (#41159 ) Ref: https://github.com/milvus-io/milvus/issues/40823 It does not make any sense to create single segment tantivy index for old version such as 2.4 by using tantivy V7. So, clean the relevant code. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-09 19:54:27 +08:00
zhagnlu	3ed23a5f48	fix: fix remove index type failed when remote storage is local mode (#41164 ) #41142 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-04-09 16:42:26 +08:00
zhagnlu	ee1faf80dd	fix:add clear bitmap for batch skip mode (#41166 ) #41086 #41150 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-04-09 13:08:27 +08:00
sthuang	50e02e3598	enhance: update packed reader api (#41055 ) related: https://github.com/milvus-io/milvus/issues/39173 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-04-09 10:18:26 +08:00
congqixia	e2d8adb963	fix: Use element_type for Array is null operator (#41157 ) Related to #41156 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-04-09 10:16:24 +08:00
Spade A	c6a0c2ab64	enhance: process tantivy document add by batch (#40124 ) issue: https://github.com/milvus-io/milvus/issues/40006 This PR make tantivy document add by batch. Add document by batch can greately reduce the latency of scheduling the document add operation (call tantivy `add_document` only schdules the add operation and it returns immediately after scheduled) , because each call involes a tokio block_on which is relatively heavy. Reduce scheduling part not necessarily reduces the overall latency if the index writer threads does not process indexing quickly enough. But if scheduling itself is pretty slow, even the index writer threads process indexing very fast (by increasing thread number), the overall performance can still be limited. The following codes bench the PR (Note, the duration only counts for scheduling without commit) ``` fn test_performance() { let field_name = "text"; let dir = TempDir::new().unwrap(); let mut index_wrapper = IndexWriterWrapper::create_text_writer( field_name, dir.path().to_str().unwrap(), "default", "", 1, 50_000_000, false, TantivyIndexVersion::V7, ) .unwrap(); let mut batch = vec![]; for i in 0..1_000_000 { batch.push(format!("hello{:04}", i)); } let batch_ref = batch.iter().map(\|s\| s.as_str()).collect::<Vec<_>>(); let now = std::time::Instant::now(); index_wrapper .add_data_by_batch(&batch_ref, Some(0)) .unwrap(); let elapsed = now.elapsed(); println!("add_data_by_batch elapsed: {:?}", elapsed); } ``` Latency roughly reduces from 1.4s to 558ms. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-08 19:50:24 +08:00
Bingyi Sun	da21640ac3	fix: Fix the bug that null data can not be filtered by null expr (#41124 ) issue: https://github.com/milvus-io/milvus/issues/41063 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-04-08 19:12:24 +08:00
aoiasd	6f17720e4e	enhance: support use jieba tokenizer with costum dictionary (#39854 ) relate: https://github.com/milvus-io/milvus/issues/40168 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-04-08 14:52:27 +08:00
Spade A	e4da2765ba	enhance: process batch of strings within one tantivy_index_add_string call (#40007 ) issue: #40006 --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-08 01:20:25 +08:00
Bingyi Sun	355f62d6c9	fix: Align brute force search with json index for exists expr (#41116 ) issue: #35528 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-04-07 15:42:23 +08:00
zhagnlu	ee8783cae9	fix:add operator type for some operator (#40895 ) #40894 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-04-07 11:58:27 +08:00
zhagnlu	10a63b3f2e	enhance: add formatter for serveral types to remove compile warning (#41094 ) #41091 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-04-07 11:54:24 +08:00
zhagnlu	0a378dc308	fix:fix format error for json (#41026 ) #40963 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-04-07 10:22:22 +08:00
Bingyi Sun	fcb03b5bd1	feat: add json null/exists expression (#41004 ) issue: #35528 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-04-03 17:48:21 +08:00
Zhen Ye	9f27d9af61	fix: segv if the LoadArrowReaderFromRemote run at the exception path (#41069 ) issue: #41067 Signed-off-by: chyezh <chyezh@outlook.com>	2025-04-03 02:54:21 +08:00
Spade A	f552ec67dd	fix: support building tantivy index with low version(5) (#40822 ) fix: https://github.com/milvus-io/milvus/issues/40823 To solve the problem in the issue, we have to support building tantivy index with low version for those query nodes with low tantivy version. This PR does two things: 1. refactor codes for IndexWriterWrapper to make it concise 2. enable IndexWriterWrapper to build tantivy index by different tantivy crate --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-02 18:46:20 +08:00
Chun Han	afa519b4c7	fix: array is null failed(#40686 ) (#41027 ) related: #40686 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-04-02 18:20:22 +08:00
smellthemoon	cb1e86e17c	enhance: support add field (#39800 ) after the pr merged, we can support to insert, upsert, build index, query, search in the added field. can only do the above operates in added field after add field request complete, which is a sync operate. compact will be supported in the next pr. #39718 --------- Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2025-04-02 14:24:31 +08:00
Spade A	216be1494b	fix: add log for object storage operation fail (#40666 ) fix: #40665 --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-02 01:26:21 +08:00
cqy123456	6dc0f42830	fix:growing mmap data type crashed by nullable input (#40994 ) issue: https://github.com/milvus-io/milvus/issues/40981 2.5 pr: https://github.com/milvus-io/milvus/pull/40980 Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>	2025-03-31 20:32:19 +08:00
Bingyi Sun	27ff3a42e7	enhance: Record simdjson error (#41003 ) issue: #35528 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-31 17:56:19 +08:00
Bingyi Sun	15ec7bae4d	fix: Fix using json index when iterative_filter is specified (#40945 ) issue: #40934 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-31 15:26:19 +08:00
Bingyi Sun	9676365af9	fix: Fix json index not equal filter (#40647 ) issue: #35528 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-27 23:06:23 +08:00
aoiasd	384d39ef5a	enhance: not build lindera features by default and support make milvus with tantivy features (#40813 ) relate: https://github.com/milvus-io/milvus/issues/39659 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-03-27 14:08:22 +08:00
zhagnlu	87e7d6d79f	fix:fix exception when do arith expr with using index (#40794 ) #40783 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-03-27 11:10:21 +08:00
Xiaofan	8788e591cd	enhance: add detailed stack for error message (#40883 ) fix #40882 adding stacktrace will operator execute failed. Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>	2025-03-26 13:24:20 +08:00
zhagnlu	7fdb2e144f	enhance:change multi or expr to in expr (#40757 ) #40752 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-03-25 11:06:18 +08:00
cai.zhang	a41cb942f6	fix: Do not delete the centroids file when sampling fails instead wait GC (#40701 ) issue: #40700 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-03-21 10:32:12 +08:00
sthuang	d7df78a6c9	feat: Storage v2 compaction (#40667 ) - Feat: Support Mix compaction. Covering tests include compatibility and rollback ability. - Read v1 segments and compact with v2 format. - Read both v1 and v2 segments and compact with v2 format. - Read v2 segments and compact with v2 format. - Compact with duplicate primary key test. - Compact with bm25 segments. - Compact with merge sort segments. - Compact with no expiration segments. - Compact with lack binlog segments. - Compact with nullable field segments. - Feat: Support Clustering compaction. Covering tests include compatibility and rollback ability. - Read v1 segments and compact with v2 format. - Read both v1 and v2 segments and compact with v2 format. - Read v2 segments and compact with v2 format. - Compact bm25 segments with v2 format. - Compact with memory limit. - Enhance: Use serdeMap serialize in BuildRecord function to support all Milvus data types. related: #39173 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-03-21 10:16:12 +08:00
Bingyi Sun	5a6b4e56d5	fix: Fix tasks will panic if one of them throw an exception. (#40691 ) issue: https://github.com/milvus-io/milvus/issues/40690 the variable rcm will be dangling if a future throws an exception and return. Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-19 16:52:09 +08:00
aoiasd	92bdf7a0c1	enhance: support run anayser return detaild token (#40458 ) relate: https://github.com/milvus-io/milvus/issues/39705 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-03-19 15:48:15 +08:00
zhagnlu	6c55db44f1	enhance: reorder sub expr for conjunct expr (#39872 ) two point: (1) reoder conjucts expr's subexpr, postpone heavy operations sequence: int(column) -> index(column) -> string(column) -> light conjuct ...... -> json(column) -> heavy conjuct -> two_column_compare (2) support pre filter for expr execute, skip scan raw data that had been skipped because of preceding expr result. #39869 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-03-19 14:50:14 +08:00
Zhen Ye	8db708f67d	enhance: enable memory prof based on jemalloc (#40731 ) issue: #40730 also see: https://github.com/milvus-io/cgosymbolizer/pull/2 After these PR, at linux: - the milvus will always enable jemalloc by default. - jemalloc will always compiled with --enable-prof options. - all image will always enable the jemalloc prof by default. - a pprof http service for jemalloc at `/debug/jemalloc/` will be registered into restful. - `jeprof` can remote profile the memory of milvus. Signed-off-by: chyezh <chyezh@outlook.com>	2025-03-19 14:46:18 +08:00
zhagnlu	7ebe3d7038	enhance: refine chunk access logic and add some comment on data (#40618 ) #40367 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-03-16 22:20:08 +08:00

1 2 3 4 5 ...

1901 Commits