milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-05 10:22:41 +08:00

Author	SHA1	Message	Date
zhagnlu	ee1faf80dd	fix:add clear bitmap for batch skip mode (#41166 ) #41086 #41150 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-04-09 13:08:27 +08:00
sthuang	50e02e3598	enhance: update packed reader api (#41055 ) related: https://github.com/milvus-io/milvus/issues/39173 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-04-09 10:18:26 +08:00
congqixia	e2d8adb963	fix: Use element_type for Array is null operator (#41157 ) Related to #41156 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-04-09 10:16:24 +08:00
Spade A	c6a0c2ab64	enhance: process tantivy document add by batch (#40124 ) issue: https://github.com/milvus-io/milvus/issues/40006 This PR make tantivy document add by batch. Add document by batch can greately reduce the latency of scheduling the document add operation (call tantivy `add_document` only schdules the add operation and it returns immediately after scheduled) , because each call involes a tokio block_on which is relatively heavy. Reduce scheduling part not necessarily reduces the overall latency if the index writer threads does not process indexing quickly enough. But if scheduling itself is pretty slow, even the index writer threads process indexing very fast (by increasing thread number), the overall performance can still be limited. The following codes bench the PR (Note, the duration only counts for scheduling without commit) ``` fn test_performance() { let field_name = "text"; let dir = TempDir::new().unwrap(); let mut index_wrapper = IndexWriterWrapper::create_text_writer( field_name, dir.path().to_str().unwrap(), "default", "", 1, 50_000_000, false, TantivyIndexVersion::V7, ) .unwrap(); let mut batch = vec![]; for i in 0..1_000_000 { batch.push(format!("hello{:04}", i)); } let batch_ref = batch.iter().map(\|s\| s.as_str()).collect::<Vec<_>>(); let now = std::time::Instant::now(); index_wrapper .add_data_by_batch(&batch_ref, Some(0)) .unwrap(); let elapsed = now.elapsed(); println!("add_data_by_batch elapsed: {:?}", elapsed); } ``` Latency roughly reduces from 1.4s to 558ms. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-08 19:50:24 +08:00
Bingyi Sun	da21640ac3	fix: Fix the bug that null data can not be filtered by null expr (#41124 ) issue: https://github.com/milvus-io/milvus/issues/41063 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-04-08 19:12:24 +08:00
aoiasd	6f17720e4e	enhance: support use jieba tokenizer with costum dictionary (#39854 ) relate: https://github.com/milvus-io/milvus/issues/40168 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-04-08 14:52:27 +08:00
Spade A	e4da2765ba	enhance: process batch of strings within one tantivy_index_add_string call (#40007 ) issue: #40006 --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-08 01:20:25 +08:00
Bingyi Sun	355f62d6c9	fix: Align brute force search with json index for exists expr (#41116 ) issue: #35528 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-04-07 15:42:23 +08:00
zhagnlu	ee8783cae9	fix:add operator type for some operator (#40895 ) #40894 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-04-07 11:58:27 +08:00
zhagnlu	10a63b3f2e	enhance: add formatter for serveral types to remove compile warning (#41094 ) #41091 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-04-07 11:54:24 +08:00
zhagnlu	0a378dc308	fix:fix format error for json (#41026 ) #40963 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-04-07 10:22:22 +08:00
Bingyi Sun	fcb03b5bd1	feat: add json null/exists expression (#41004 ) issue: #35528 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-04-03 17:48:21 +08:00
Zhen Ye	9f27d9af61	fix: segv if the LoadArrowReaderFromRemote run at the exception path (#41069 ) issue: #41067 Signed-off-by: chyezh <chyezh@outlook.com>	2025-04-03 02:54:21 +08:00
Spade A	f552ec67dd	fix: support building tantivy index with low version(5) (#40822 ) fix: https://github.com/milvus-io/milvus/issues/40823 To solve the problem in the issue, we have to support building tantivy index with low version for those query nodes with low tantivy version. This PR does two things: 1. refactor codes for IndexWriterWrapper to make it concise 2. enable IndexWriterWrapper to build tantivy index by different tantivy crate --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-02 18:46:20 +08:00
Chun Han	afa519b4c7	fix: array is null failed(#40686 ) (#41027 ) related: #40686 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-04-02 18:20:22 +08:00
smellthemoon	cb1e86e17c	enhance: support add field (#39800 ) after the pr merged, we can support to insert, upsert, build index, query, search in the added field. can only do the above operates in added field after add field request complete, which is a sync operate. compact will be supported in the next pr. #39718 --------- Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2025-04-02 14:24:31 +08:00
Spade A	216be1494b	fix: add log for object storage operation fail (#40666 ) fix: #40665 --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-02 01:26:21 +08:00
cqy123456	6dc0f42830	fix:growing mmap data type crashed by nullable input (#40994 ) issue: https://github.com/milvus-io/milvus/issues/40981 2.5 pr: https://github.com/milvus-io/milvus/pull/40980 Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>	2025-03-31 20:32:19 +08:00
Bingyi Sun	27ff3a42e7	enhance: Record simdjson error (#41003 ) issue: #35528 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-31 17:56:19 +08:00
Bingyi Sun	15ec7bae4d	fix: Fix using json index when iterative_filter is specified (#40945 ) issue: #40934 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-31 15:26:19 +08:00
Bingyi Sun	9676365af9	fix: Fix json index not equal filter (#40647 ) issue: #35528 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-27 23:06:23 +08:00
aoiasd	384d39ef5a	enhance: not build lindera features by default and support make milvus with tantivy features (#40813 ) relate: https://github.com/milvus-io/milvus/issues/39659 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-03-27 14:08:22 +08:00
zhagnlu	87e7d6d79f	fix:fix exception when do arith expr with using index (#40794 ) #40783 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-03-27 11:10:21 +08:00
Xiaofan	8788e591cd	enhance: add detailed stack for error message (#40883 ) fix #40882 adding stacktrace will operator execute failed. Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>	2025-03-26 13:24:20 +08:00
zhagnlu	7fdb2e144f	enhance:change multi or expr to in expr (#40757 ) #40752 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-03-25 11:06:18 +08:00
cai.zhang	a41cb942f6	fix: Do not delete the centroids file when sampling fails instead wait GC (#40701 ) issue: #40700 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-03-21 10:32:12 +08:00
sthuang	d7df78a6c9	feat: Storage v2 compaction (#40667 ) - Feat: Support Mix compaction. Covering tests include compatibility and rollback ability. - Read v1 segments and compact with v2 format. - Read both v1 and v2 segments and compact with v2 format. - Read v2 segments and compact with v2 format. - Compact with duplicate primary key test. - Compact with bm25 segments. - Compact with merge sort segments. - Compact with no expiration segments. - Compact with lack binlog segments. - Compact with nullable field segments. - Feat: Support Clustering compaction. Covering tests include compatibility and rollback ability. - Read v1 segments and compact with v2 format. - Read both v1 and v2 segments and compact with v2 format. - Read v2 segments and compact with v2 format. - Compact bm25 segments with v2 format. - Compact with memory limit. - Enhance: Use serdeMap serialize in BuildRecord function to support all Milvus data types. related: #39173 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-03-21 10:16:12 +08:00
Bingyi Sun	5a6b4e56d5	fix: Fix tasks will panic if one of them throw an exception. (#40691 ) issue: https://github.com/milvus-io/milvus/issues/40690 the variable rcm will be dangling if a future throws an exception and return. Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-19 16:52:09 +08:00
aoiasd	92bdf7a0c1	enhance: support run anayser return detaild token (#40458 ) relate: https://github.com/milvus-io/milvus/issues/39705 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-03-19 15:48:15 +08:00
zhagnlu	6c55db44f1	enhance: reorder sub expr for conjunct expr (#39872 ) two point: (1) reoder conjucts expr's subexpr, postpone heavy operations sequence: int(column) -> index(column) -> string(column) -> light conjuct ...... -> json(column) -> heavy conjuct -> two_column_compare (2) support pre filter for expr execute, skip scan raw data that had been skipped because of preceding expr result. #39869 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-03-19 14:50:14 +08:00
Zhen Ye	8db708f67d	enhance: enable memory prof based on jemalloc (#40731 ) issue: #40730 also see: https://github.com/milvus-io/cgosymbolizer/pull/2 After these PR, at linux: - the milvus will always enable jemalloc by default. - jemalloc will always compiled with --enable-prof options. - all image will always enable the jemalloc prof by default. - a pprof http service for jemalloc at `/debug/jemalloc/` will be registered into restful. - `jeprof` can remote profile the memory of milvus. Signed-off-by: chyezh <chyezh@outlook.com>	2025-03-19 14:46:18 +08:00
zhagnlu	7ebe3d7038	enhance: refine chunk access logic and add some comment on data (#40618 ) #40367 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-03-16 22:20:08 +08:00
Bingyi Sun	6249335859	fix: Catch invalid json pointer error (#40625 ) issue: #35528 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-14 16:56:08 +08:00
Bingyi Sun	d3adab15ac	fix: Build double index for all json numeric field (#40619 ) issue: #35528 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-14 16:52:11 +08:00
Bingyi Sun	8fbacf3583	fix: Null expr does not work for json field (#40456 ) issue: https://github.com/milvus-io/milvus/issues/40455 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-14 16:06:08 +08:00
Spade A	001fc992df	enhance: get doc ids by batch (#40608 ) issue: #40607 tantivy change: https://github.com/zilliztech/tantivy/pull/3 Benchmarks: Test Envrioment: CPU 9900K The data is insert by: ``` for i in 0..N { for j in 0..UNIQUE { let key = format!("hello{}", j); index_writer.add_string(&key, i * UNIQUE + j).unwrap(); } } ``` So the unique influences the locality of the matched docs. The latency is the avg latency over 1000 repeate quries. The result shows 22.5%-34.8% latency reduction. ![image](https://github.com/user-attachments/assets/dd8af75a-ddc3-445d-92df-50d354dd5645) --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-03-14 15:48:09 +08:00
Spade A	f36d1562bd	enhance: add metrics for random sample (#40634 ) issue: #39541 Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-03-13 21:42:11 +08:00
Spade A	9f3bd55755	fix: avoid panic when field not exists in schema in query node (#40541 ) ref #40473 This PR is a workaround to avoid the panic described in the issue. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-03-12 22:44:08 +08:00
Bingyi Sun	0698d04f7d	enhance: Upgrade simdjson version (#40538 ) issue: https://github.com/milvus-io/milvus/issues/40519 simdjson returns better error code in newer version. Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-11 15:04:05 +08:00
cai.zhang	e5f50076ec	enhance: Only check element type with not null array (#40446 ) Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-03-11 14:58:07 +08:00
Bingyi Sun	0a7e692b6f	fix: Fix null offset loading in inverted index (#40523 ) issue: #40516 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-10 22:12:04 +08:00
Cai Yudong	2bd2cca04a	enhance: Truly support multi vector data types in SearchBruteForce (#40499 ) Issue: #38666 Signed-off-by: CaiYudong <yudong.cai@zilliz.com>	2025-03-10 18:36:03 +08:00
sre-ci-robot	a6d4121034	[automated] Update Knowhere Commit (#40486 ) Update Knowhere Commit Signed-off-by: sre-ci-robot sre-ci-robot@users.noreply.github.com Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2025-03-10 12:28:04 +08:00
smellthemoon	faae8ee518	fix: store wrong offset when build tantivy in nullable field (#40452 ) #40454 Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2025-03-09 09:34:04 +08:00
Bingyi Sun	37b118d55d	fix: Skip loading primary key if index has raw data (#39921 ) issue: https://github.com/milvus-io/milvus/issues/39907 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-06 17:46:02 +08:00
Spade A	3db56560fb	fix: fix concurrent issues in null offset (#40363 ) issue: #40308 This issue fixes these two concurrent issues: 1. element in null_offset is used to set bitset where the size of bitset is initialized by tantivy document count. However, there may still be some documents that are not committed in tantivy but are null in null_offset. So array out of range occurs. 2. null_offset can be read and write concurrently but there's no synchronization protection. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-03-05 17:48:00 +08:00
Bingyi Sun	be4d09561b	fix: Fix missing null or non-exist key in json index (#40336 ) issue: #35528 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-05 11:48:02 +08:00
Bingyi Sun	7040ba1c12	enhance: make json path index support term filter (#40140 ) issue: #35528 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-04 11:56:02 +08:00
Zhen Ye	8eb662b4dc	enhance: add more metrics for async cgo component (#40136 ) issue: #40014 Signed-off-by: chyezh <chyezh@outlook.com>	2025-03-03 09:56:03 +08:00
sre-ci-robot	6a57a1973f	[automated] Update Knowhere Commit (#40283 ) Update Knowhere Commit Signed-off-by: sre-ci-robot sre-ci-robot@users.noreply.github.com Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2025-03-03 01:11:58 +08:00

1 2 3 4 5 ...

1883 Commits