milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-07 19:31:51 +08:00

Author	SHA1	Message	Date
Bingyi Sun	4552dd4b23	fix: Fix json index does not work for string filter (#41382 ) issue: #35528 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-04-17 20:10:39 +08:00
sthuang	1f1c836fb9	feat: Storage v2 growing segment load (#41001 ) support parallel loading sealed and growing segments with storage v2 format by async reading row groups. related: #39173 --------- Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-04-16 17:14:33 +08:00
Spade A	70d13dcf61	enhance: update tantivy for removing "doc_id" fast field (#41198 ) Issue: #41210 After https://github.com/zilliztech/tantivy/pull/5, we can provide milvus row id directly to tantivy rather than record it in the fast field "doc_id". So rather than search tantivy doc id and then get milvus row id from "doc_id", now, the searched tantivy doc id is the milvus row id, eliminating the expensive acquiring row id phase. The following shows a simple benchmark where insert 1M docs where all rows are "hello", the latency is segcore level, CPU is 9900K: ![image](https://github.com/user-attachments/assets/d8e72134-56b5-430b-8628-36c3bed8eaad) The latency is 2.02 and 2.1 times respectively. bench mark code: ``` TEST(TextMatch, TestPerf) { auto schema = GenTestSchema({}, true); auto seg = CreateSealedSegment(schema, empty_index_meta); int64_t N = 1000000; uint64_t seed = 19190504; auto raw_data = DataGen(schema, N, seed); auto str_col = raw_data.raw_->mutable_fields_data() ->at(1) .mutable_scalars() ->mutable_string_data() ->mutable_data(); for (int64_t i = 0; i < N - 1; i++) { str_col->at(i) = "hello"; } SealedLoadFieldData(raw_data, *seg); seg->CreateTextIndex(FieldId(101)); auto now = std::chrono::high_resolution_clock::now(); auto expr = GetMatchExpr(schema, "hello", OpType::TextMatch); auto final = ExecuteQueryExpr(expr, seg.get(), N, MAX_TIMESTAMP); auto end = std::chrono::high_resolution_clock::now(); auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - now); std::cout << "TextMatch query time: " << duration.count() << "ms" << std::endl; } ``` --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-15 20:20:32 +08:00
Bingyi Sun	a953eaeaf0	enhance: support binary range expression for json path index (#41025 ) issue: #35528 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-04-15 19:32:33 +08:00
Chun Han	59b14d38f5	enhance: Optimize index format for improved load performance(#40838 ) (#40839 ) related: https://github.com/milvus-io/milvus/issues/40838 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-04-15 03:10:30 +08:00
Bingyi Sun	bf617115ca	enhance: Remove single chunk segment related codes (#39249 ) https://github.com/milvus-io/milvus/issues/39112 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-04-11 18:56:29 +08:00
Spade A	9ce3e3cb44	enhance: add documents in batch for json key stats (#41228 ) issue: https://github.com/milvus-io/milvus/issues/40897 After this, the document add operations scheduling duration is decreased roughly from 6s to 0.9s for the case in the issue. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-11 14:08:26 +08:00
Bingyi Sun	b9b8419cbf	fix: Use int32 when creating array index for element type int8/int16 (#41185 ) issue: #41172 Elements with type int8 or int16 in Array is encoded using int32, so we should parse it as int32 when creating index. Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-04-11 13:18:25 +08:00
foxspy	17e10beba0	fix: avoid segmentation faults caused by retrieving empty vector datasets (#40545 ) issue: #40544 Signed-off-by: xianliang.li <xianliang.li@zilliz.com>	2025-04-10 20:16:29 +08:00
Xianhui Lin	3bc24c264f	enhance: Add json key inverted index in stats for optimization (#38039 ) Add json key inverted index in stats for optimization https://github.com/milvus-io/milvus/issues/36995 --------- Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-04-10 15:20:28 +08:00
zhagnlu	3ed23a5f48	fix: fix remove index type failed when remote storage is local mode (#41164 ) #41142 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-04-09 16:42:26 +08:00
zhagnlu	ee1faf80dd	fix:add clear bitmap for batch skip mode (#41166 ) #41086 #41150 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-04-09 13:08:27 +08:00
sthuang	50e02e3598	enhance: update packed reader api (#41055 ) related: https://github.com/milvus-io/milvus/issues/39173 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-04-09 10:18:26 +08:00
congqixia	e2d8adb963	fix: Use element_type for Array is null operator (#41157 ) Related to #41156 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-04-09 10:16:24 +08:00
Bingyi Sun	da21640ac3	fix: Fix the bug that null data can not be filtered by null expr (#41124 ) issue: https://github.com/milvus-io/milvus/issues/41063 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-04-08 19:12:24 +08:00
Bingyi Sun	355f62d6c9	fix: Align brute force search with json index for exists expr (#41116 ) issue: #35528 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-04-07 15:42:23 +08:00
zhagnlu	ee8783cae9	fix:add operator type for some operator (#40895 ) #40894 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-04-07 11:58:27 +08:00
zhagnlu	10a63b3f2e	enhance: add formatter for serveral types to remove compile warning (#41094 ) #41091 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-04-07 11:54:24 +08:00
zhagnlu	0a378dc308	fix:fix format error for json (#41026 ) #40963 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-04-07 10:22:22 +08:00
Bingyi Sun	fcb03b5bd1	feat: add json null/exists expression (#41004 ) issue: #35528 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-04-03 17:48:21 +08:00
Zhen Ye	9f27d9af61	fix: segv if the LoadArrowReaderFromRemote run at the exception path (#41069 ) issue: #41067 Signed-off-by: chyezh <chyezh@outlook.com>	2025-04-03 02:54:21 +08:00
Spade A	f552ec67dd	fix: support building tantivy index with low version(5) (#40822 ) fix: https://github.com/milvus-io/milvus/issues/40823 To solve the problem in the issue, we have to support building tantivy index with low version for those query nodes with low tantivy version. This PR does two things: 1. refactor codes for IndexWriterWrapper to make it concise 2. enable IndexWriterWrapper to build tantivy index by different tantivy crate --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-02 18:46:20 +08:00
Chun Han	afa519b4c7	fix: array is null failed(#40686 ) (#41027 ) related: #40686 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-04-02 18:20:22 +08:00
smellthemoon	cb1e86e17c	enhance: support add field (#39800 ) after the pr merged, we can support to insert, upsert, build index, query, search in the added field. can only do the above operates in added field after add field request complete, which is a sync operate. compact will be supported in the next pr. #39718 --------- Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2025-04-02 14:24:31 +08:00
Spade A	216be1494b	fix: add log for object storage operation fail (#40666 ) fix: #40665 --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-02 01:26:21 +08:00
cqy123456	6dc0f42830	fix:growing mmap data type crashed by nullable input (#40994 ) issue: https://github.com/milvus-io/milvus/issues/40981 2.5 pr: https://github.com/milvus-io/milvus/pull/40980 Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>	2025-03-31 20:32:19 +08:00
Bingyi Sun	27ff3a42e7	enhance: Record simdjson error (#41003 ) issue: #35528 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-31 17:56:19 +08:00
Bingyi Sun	15ec7bae4d	fix: Fix using json index when iterative_filter is specified (#40945 ) issue: #40934 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-31 15:26:19 +08:00
Bingyi Sun	9676365af9	fix: Fix json index not equal filter (#40647 ) issue: #35528 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-27 23:06:23 +08:00
zhagnlu	87e7d6d79f	fix:fix exception when do arith expr with using index (#40794 ) #40783 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-03-27 11:10:21 +08:00
Xiaofan	8788e591cd	enhance: add detailed stack for error message (#40883 ) fix #40882 adding stacktrace will operator execute failed. Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>	2025-03-26 13:24:20 +08:00
zhagnlu	7fdb2e144f	enhance:change multi or expr to in expr (#40757 ) #40752 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-03-25 11:06:18 +08:00
cai.zhang	a41cb942f6	fix: Do not delete the centroids file when sampling fails instead wait GC (#40701 ) issue: #40700 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-03-21 10:32:12 +08:00
Bingyi Sun	5a6b4e56d5	fix: Fix tasks will panic if one of them throw an exception. (#40691 ) issue: https://github.com/milvus-io/milvus/issues/40690 the variable rcm will be dangling if a future throws an exception and return. Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-19 16:52:09 +08:00
aoiasd	92bdf7a0c1	enhance: support run anayser return detaild token (#40458 ) relate: https://github.com/milvus-io/milvus/issues/39705 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-03-19 15:48:15 +08:00
zhagnlu	6c55db44f1	enhance: reorder sub expr for conjunct expr (#39872 ) two point: (1) reoder conjucts expr's subexpr, postpone heavy operations sequence: int(column) -> index(column) -> string(column) -> light conjuct ...... -> json(column) -> heavy conjuct -> two_column_compare (2) support pre filter for expr execute, skip scan raw data that had been skipped because of preceding expr result. #39869 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-03-19 14:50:14 +08:00
zhagnlu	7ebe3d7038	enhance: refine chunk access logic and add some comment on data (#40618 ) #40367 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-03-16 22:20:08 +08:00
Bingyi Sun	6249335859	fix: Catch invalid json pointer error (#40625 ) issue: #35528 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-14 16:56:08 +08:00
Bingyi Sun	d3adab15ac	fix: Build double index for all json numeric field (#40619 ) issue: #35528 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-14 16:52:11 +08:00
Bingyi Sun	8fbacf3583	fix: Null expr does not work for json field (#40456 ) issue: https://github.com/milvus-io/milvus/issues/40455 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-14 16:06:08 +08:00
Spade A	f36d1562bd	enhance: add metrics for random sample (#40634 ) issue: #39541 Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-03-13 21:42:11 +08:00
Spade A	9f3bd55755	fix: avoid panic when field not exists in schema in query node (#40541 ) ref #40473 This PR is a workaround to avoid the panic described in the issue. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-03-12 22:44:08 +08:00
cai.zhang	e5f50076ec	enhance: Only check element type with not null array (#40446 ) Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-03-11 14:58:07 +08:00
Bingyi Sun	0a7e692b6f	fix: Fix null offset loading in inverted index (#40523 ) issue: #40516 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-10 22:12:04 +08:00
Cai Yudong	2bd2cca04a	enhance: Truly support multi vector data types in SearchBruteForce (#40499 ) Issue: #38666 Signed-off-by: CaiYudong <yudong.cai@zilliz.com>	2025-03-10 18:36:03 +08:00
smellthemoon	faae8ee518	fix: store wrong offset when build tantivy in nullable field (#40452 ) #40454 Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2025-03-09 09:34:04 +08:00
Bingyi Sun	37b118d55d	fix: Skip loading primary key if index has raw data (#39921 ) issue: https://github.com/milvus-io/milvus/issues/39907 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-06 17:46:02 +08:00
Spade A	3db56560fb	fix: fix concurrent issues in null offset (#40363 ) issue: #40308 This issue fixes these two concurrent issues: 1. element in null_offset is used to set bitset where the size of bitset is initialized by tantivy document count. However, there may still be some documents that are not committed in tantivy but are null in null_offset. So array out of range occurs. 2. null_offset can be read and write concurrently but there's no synchronization protection. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-03-05 17:48:00 +08:00
Bingyi Sun	be4d09561b	fix: Fix missing null or non-exist key in json index (#40336 ) issue: #35528 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-05 11:48:02 +08:00
Bingyi Sun	7040ba1c12	enhance: make json path index support term filter (#40140 ) issue: #35528 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-04 11:56:02 +08:00

... 5 6 7 8 9 ...

1843 Commits