milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-07 19:31:51 +08:00

Author	SHA1	Message	Date
Spade A	c6a0c2ab64	enhance: process tantivy document add by batch (#40124 ) issue: https://github.com/milvus-io/milvus/issues/40006 This PR make tantivy document add by batch. Add document by batch can greately reduce the latency of scheduling the document add operation (call tantivy `add_document` only schdules the add operation and it returns immediately after scheduled) , because each call involes a tokio block_on which is relatively heavy. Reduce scheduling part not necessarily reduces the overall latency if the index writer threads does not process indexing quickly enough. But if scheduling itself is pretty slow, even the index writer threads process indexing very fast (by increasing thread number), the overall performance can still be limited. The following codes bench the PR (Note, the duration only counts for scheduling without commit) ``` fn test_performance() { let field_name = "text"; let dir = TempDir::new().unwrap(); let mut index_wrapper = IndexWriterWrapper::create_text_writer( field_name, dir.path().to_str().unwrap(), "default", "", 1, 50_000_000, false, TantivyIndexVersion::V7, ) .unwrap(); let mut batch = vec![]; for i in 0..1_000_000 { batch.push(format!("hello{:04}", i)); } let batch_ref = batch.iter().map(\|s\| s.as_str()).collect::<Vec<_>>(); let now = std::time::Instant::now(); index_wrapper .add_data_by_batch(&batch_ref, Some(0)) .unwrap(); let elapsed = now.elapsed(); println!("add_data_by_batch elapsed: {:?}", elapsed); } ``` Latency roughly reduces from 1.4s to 558ms. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-08 19:50:24 +08:00
Bingyi Sun	da21640ac3	fix: Fix the bug that null data can not be filtered by null expr (#41124 ) issue: https://github.com/milvus-io/milvus/issues/41063 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-04-08 19:12:24 +08:00
Bingyi Sun	355f62d6c9	fix: Align brute force search with json index for exists expr (#41116 ) issue: #35528 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-04-07 15:42:23 +08:00
zhagnlu	0a378dc308	fix:fix format error for json (#41026 ) #40963 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-04-07 10:22:22 +08:00
Bingyi Sun	fcb03b5bd1	feat: add json null/exists expression (#41004 ) issue: #35528 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-04-03 17:48:21 +08:00
Spade A	f552ec67dd	fix: support building tantivy index with low version(5) (#40822 ) fix: https://github.com/milvus-io/milvus/issues/40823 To solve the problem in the issue, we have to support building tantivy index with low version for those query nodes with low tantivy version. This PR does two things: 1. refactor codes for IndexWriterWrapper to make it concise 2. enable IndexWriterWrapper to build tantivy index by different tantivy crate --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-02 18:46:20 +08:00
smellthemoon	cb1e86e17c	enhance: support add field (#39800 ) after the pr merged, we can support to insert, upsert, build index, query, search in the added field. can only do the above operates in added field after add field request complete, which is a sync operate. compact will be supported in the next pr. #39718 --------- Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2025-04-02 14:24:31 +08:00
Bingyi Sun	27ff3a42e7	enhance: Record simdjson error (#41003 ) issue: #35528 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-31 17:56:19 +08:00
Bingyi Sun	9676365af9	fix: Fix json index not equal filter (#40647 ) issue: #35528 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-27 23:06:23 +08:00
zhagnlu	7fdb2e144f	enhance:change multi or expr to in expr (#40757 ) #40752 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-03-25 11:06:18 +08:00
aoiasd	92bdf7a0c1	enhance: support run anayser return detaild token (#40458 ) relate: https://github.com/milvus-io/milvus/issues/39705 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-03-19 15:48:15 +08:00
zhagnlu	6c55db44f1	enhance: reorder sub expr for conjunct expr (#39872 ) two point: (1) reoder conjucts expr's subexpr, postpone heavy operations sequence: int(column) -> index(column) -> string(column) -> light conjuct ...... -> json(column) -> heavy conjuct -> two_column_compare (2) support pre filter for expr execute, skip scan raw data that had been skipped because of preceding expr result. #39869 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-03-19 14:50:14 +08:00
zhagnlu	7ebe3d7038	enhance: refine chunk access logic and add some comment on data (#40618 ) #40367 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-03-16 22:20:08 +08:00
Spade A	3db56560fb	fix: fix concurrent issues in null offset (#40363 ) issue: #40308 This issue fixes these two concurrent issues: 1. element in null_offset is used to set bitset where the size of bitset is initialized by tantivy document count. However, there may still be some documents that are not committed in tantivy but are null in null_offset. So array out of range occurs. 2. null_offset can be read and write concurrently but there's no synchronization protection. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-03-05 17:48:00 +08:00
Bingyi Sun	7040ba1c12	enhance: make json path index support term filter (#40140 ) issue: #35528 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-04 11:56:02 +08:00
zhagnlu	8c19e5c4a7	enhance: decrease delete record dump snapshot limit (#40101 ) #40100 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-03-02 17:55:59 +08:00
Chun Han	259f9106ad	enhance: refine variable-length-type memory usage(#38736 ) (#39578 ) related: #38736 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-02-27 21:13:58 +08:00
Spade A	476cf61d98	fix: random sample consider empty input (#40201 ) issue: #40198 Fix random sample does not consider empty input, that is no data is hit by filter expression. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-02-26 16:15:58 +08:00
Bingyi Sun	db4769281c	fix: Fall back to a brute-force search if json index type unmatched (#40076 ) issue: https://github.com/milvus-io/milvus/issues/35528 If the query data type does not match the index type, fall back to a brute-force search --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-02-24 16:25:57 +08:00
sthuang	3eb3af5f08	feat: explicitly specify column groups for storage v2 api (#39790 ) * use the new packed reader and writer api to be compatible with current etcd meta * For the new packed writer API: column groups and paths are explicitly defined by users and won't split column groups by memory in storage v2. Packed writer follows the user-defined column groups to split arrow record and write into the corresponding file path. * For the new packed reader API: read paths are explicitly defined by users. related: #39173 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-02-21 22:03:54 +08:00
Spade A	52c7d7dd80	fix: offset combined with term should be based on Token positions in phrase match (#39931 ) fix: #39711 Unlike English sentence where each words are parsed exactly once and one after one with position length 1, one Chinese word may be parsed to multiple words with position length larger than 1. For example, "badminton and skiing" will be parsed to Token{ start: 0, length: 1, text: "badminton" }, Token{ start: 1, length: 1, text: "and" }, and Token{ start: 2, length: 1, text: "tennis" }. While for exmaple for Chinsese: "羽毛球和滑雪" may be parsed to Token{ start: 0, length: 2, text: "羽毛" }, Token{ start: 0, length: 3, text: "羽毛球" }, Token{ start: 3, length: 1, text: "和" }, and Token{ start: 4, length: 2, text: "滑雪" }. This PR fix that the code not recognizes this situation. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-02-18 20:38:51 +08:00
Spade A	0dc21f0aeb	feat: support random sample (#39532 ) issue: #39541 This PR implements random sample, the syntax is: ``` filter="random_sample(factor)" or filter="boolean_expression && random_sample(factor)" where factor is a float between (0, 1) and boolean_expression is like "1 <= number < 10", "color in ["read, "blue"]" or others ``` --------- Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com> Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-02-18 12:40:50 +08:00
zhagnlu	316534e065	enhance: optimize delete init construct code (#39327 ) #39326 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-02-17 21:05:26 +08:00
Bingyi Sun	b59555057d	feat: support json index (#36750 ) https://github.com/milvus-io/milvus/issues/35528 This PR adds json index support for json and dynamic fields. Now you can only do unary query like 'a["b"] > 1' using this index. We will support more filter type later. basic usage: ``` collection.create_index("json_field", {"index_type": "INVERTED", "params": {"json_cast_type": DataType.STRING, "json_path": 'json_field["a"]["b"]'}}) ``` There are some limits to use this index: 1. If a record does not have the json path you specify, it will be ignored and there will not be an error. 2. If a value of the json path fails to be cast to the type you specify, it will be ignored and there will not be an error. 3. A specific json path can have only one json index. 4. If you try to create more than one json indexes for one json field, sdk(pymilvus<=2.4.7) may return immediately because of internal implementation. This will be fixed in a later version. --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-02-15 14:06:15 +08:00
sthuang	c4ae9f4ece	feat: introduce third-party milvus-storage (#39418 ) related: https://github.com/milvus-io/milvus/issues/39173 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-01-24 17:21:13 +08:00
Cai Yudong	5730b69e56	feat: Enable more VECTOR_INT8 unittest (#39569 ) Issue: #38666 Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>	2025-01-24 17:03:07 +08:00
zhagnlu	8117d59f85	fix:fix GetValueFromConfig for bool type (#39526 ) #39525 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-01-24 16:17:05 +08:00
Cai Yudong	341d6c1eb7	feat: Update segcore for VECTOR_INT8 (#39415 ) Issue: #38666 Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>	2025-01-21 11:03:03 +08:00
Bingyi Sun	140c5a0a75	enhance: add unit test for string pk (#39329 ) https://github.com/milvus-io/milvus/issues/39107 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-01-20 19:03:04 +08:00
Cai Yudong	5b35fc700d	enhance: [skip-e2e] Use template to remove duplicate unittest (#39396 ) Issue: #38666 Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>	2025-01-18 10:33:01 +08:00
Cai Yudong	64feeb0e2b	enhance: Rename API GenDataset to GenFieldData in unittest (#39386 ) Issue: #38666 Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>	2025-01-17 15:55:03 +08:00
Spade A	8c4ba70a4c	fix: enable to build index with single segment (#39233 ) fix https://github.com/milvus-io/milvus/issues/39232 --------- Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-01-16 11:01:06 +08:00
Zhen Ye	3e788f0fbd	enhance: record memory size (uncompressed) item for index (#38770 ) issue: #38715 - Current milvus use a serialized index size(compressed) for estimate resource for loading. - Add a new field `MemSize` (before compressing) for index to estimate resource. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-01-14 10:33:06 +08:00
Buqian Zheng	5e38f01e5b	enhance: update knowhere version (#39212 ) Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-01-14 10:21:05 +08:00
Alexander Guzhva	3447ff7310	enhance: [bitset] extend op_find() to be able to search both 0 and 1 (#39176 ) issue: #39124 `bitset::find_first()` and `bitset::find_next()` now accept one more parameter, which allows to search for `0` bit instead of `1` bit Signed-off-by: Alexandr Guzhva <alexanderguzhva@gmail.com>	2025-01-14 09:50:58 +08:00
Cai Yudong	2a02bbe3ee	enhance: Use template to remove unittest duplication (#39144 ) Issue: #38666 Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>	2025-01-13 09:58:57 +08:00
Spade A	032292a432	feat: support phrase match query (#38869 ) The relevant issue: https://github.com/milvus-io/milvus/issues/38930 --------- Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-01-12 20:24:58 +08:00
Spade A	8abf6c9149	fix: build text index when loading field data (#39070 ) fix: https://github.com/milvus-io/milvus/issues/39053 may fix https://github.com/milvus-io/milvus/issues/38644 which could be caused by https://github.com/milvus-io/milvus/issues/39053 --------- Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-01-09 15:24:56 +08:00
Gao	f0dae81494	fix: set iterative filter hint to false when no expr specified (#39033 ) issue: https://github.com/milvus-io/milvus/issues/39013 Signed-off-by: chasingegg <chao.gao@zilliz.com>	2025-01-08 12:56:56 +08:00
Chun Han	3739446a33	enhance: refine array view to optimize memory usage(#38736 ) (#38808 ) related: #38736 700m data, array_length=10 non-mmap_offsets_uint64: 2.0G mmap_offsets_uint64: 1.1G mmap_offsets_uint32: 880MB Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-01-07 13:26:55 +08:00
smellthemoon	907fc24f85	enhance: support null expr (#38772 ) #31728 --------- Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2025-01-02 14:16:54 +08:00
Bingyi Sun	2557e3f2a9	enhance: Initialize field id to avoid negative number (#38789 ) Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-12-27 18:00:50 +08:00
Patrick Weizhi Xu	85f462be1a	enhance: speed up search iterator stage 1 (#37947 ) issue: #37548 Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>	2024-12-26 10:32:49 +08:00
Ted Xu	acc8fb7af6	enhance: eliminate compile warnings (part2) (#38535 ) See #38435 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-12-25 15:30:50 +08:00
Zhen Ye	b537a72309	fix: interted index out of range (#38577 ) issue: #38546, #38486 Signed-off-by: chyezh <chyezh@outlook.com>	2024-12-19 15:20:47 +08:00
zhagnlu	9afcc5bc5c	fix:fix incorrect dir operations when create or load inverted index (#38359 ) #37944 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-12-17 20:06:45 +08:00
Bingyi Sun	dd4f33ae19	fix: Fix chunked segment can not warmup using mmap (#38492 ) issue: #38410 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-12-17 13:42:45 +08:00
Ted Xu	33aecb0655	fix: build break on target test_cpp under OSX (#38479 ) See: #38434 Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-12-17 13:38:45 +08:00
Bingyi Sun	3e2a2f278b	enhance: Handle rust error in c++ (#38113 ) https://github.com/milvus-io/milvus/issues/37930 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-12-16 19:40:45 +08:00
zhagnlu	01de0afc4e	enhance: refactor delete mvcc function (#38066 ) #37413 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-12-15 18:02:43 +08:00

1 2 3 4 5 ...

640 Commits