milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-06 19:02:18 +08:00

Author	SHA1	Message	Date
foxspy	358bc150df	enhance: add force rebuild index configuration (#41473 ) issue: #41431 Signed-off-by: xianliang.li <xianliang.li@zilliz.com>	2025-05-14 10:52:21 +08:00
zhagnlu	f094d026f8	fix: add params to ignore config type exception (#41776 ) #41707 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-05-13 13:48:56 +08:00
Buqian Zheng	ff5c2770e5	feat: cachinglayer: various improvements (#41546 ) issue: https://github.com/milvus-io/milvus/issues/41435 this PR is based on https://github.com/milvus-io/milvus/pull/41436. Improvements include: - Lazy Load support for Storage v1 - Use Low/High watermark to control eviction - Caching Layer related config changes - Removed ChunkCache related configs and code in golang - Add `PinAllCells` helper method to CacheSlot class - Modified ValueAt, RawAt, PrimitiveRawAt to Bulk version, to reduce caching layer overhead - Removed some unclear templated bulk_subscript methods - CachedSearchIterator to store PinWrapper when searching on ChunkedColumn, and removed unused contrustor. --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-05-10 09:19:16 +08:00
congqixia	bcf94a0754	fix: Remove noexcept from `CacheIndexToDiskInternal` (#41725 ) Related to #41219 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-05-09 14:16:53 +08:00
zhagnlu	f674e232b9	fix: GetValueFromConfig return nullopt instead of exception for null value (#41709 ) #41707 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-05-09 11:18:53 +08:00
Xianhui Lin	26cbc74478	fix: support infix and suffix match types in JsonStats (#41720 ) fix: support infix and suffix match types in JsonStats issue:https://github.com/milvus-io/milvus/issues/41386 Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-05-09 10:42:53 +08:00
zhagnlu	e3c81ba1cc	enhance: use scan mode for like although inverted index exists (#41325 ) #41065 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-05-09 10:36:54 +08:00
zhagnlu	39e7ad33d7	enhance: add optimize for like expr (#41066 ) #41065 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-05-08 14:28:52 +08:00
foxspy	e2ddbe4962	feat: add cachinglayer to index (#41653 ) issue: #41435 Signed-off-by: xianliang.li <xianliang.li@zilliz.com>	2025-05-08 10:12:54 +08:00
congqixia	b1f3fe1f07	fix: Use sum of num_rows instead of last one (#41685 ) Related to #41656 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-05-07 19:40:53 +08:00
Bingyi Sun	0dee3ccfd7	enhance: Make user specified doc id selectable for tantivy index writer (#41528 ) issue: https://github.com/milvus-io/milvus/issues/41527 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-05-07 10:48:53 +08:00
Bingyi Sun	4c08090687	feat: Add json index support for json contains expr (#41478 ) issue: #35528 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-05-06 11:44:52 +08:00
Buqian Zheng	73bbf4c674	fix: error when lack_binlog_rows = 0 (#41644 ) issue: https://github.com/milvus-io/milvus/issues/41643 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-05-04 00:24:56 +08:00
sthuang	e9442f575d	feat: storage v2 seal segment load (#41567 ) storage v2 chunked seal segment loading is based on caching layer. A cell unit in storage v2 is a parquet row group in remote object storage, containing all fields. Therefore, each field needs a proxy to do related one field operations. <img width="965" alt="Screenshot 2025-04-28 at 10 59 30" src="https://github.com/user-attachments/assets/83e93a10-3b1d-4066-ac17-b996d5650416" /> related: #39173 --------- Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-04-30 14:22:58 +08:00
sthuang	6c377b6e86	feat: Storage v2 index and stats raw data (#41534 ) related: #39173 --------- Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-04-30 08:48:54 +08:00
zhagnlu	cd60b965c8	enhance: add expr filter ratio monitor params (#41402 ) #41401 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-04-29 17:02:54 +08:00
foxspy	1d99f8bd67	enhance: add force rebuild index configuration (#41473 ) issue: #41431 Signed-off-by: xianliang.li <xianliang.li@zilliz.com>	2025-04-29 16:20:56 +08:00
congqixia	f3f8227cd0	enhance: [AddField] Trigger check schema in retrieve as well (#41598 ) Related to #39718 Fixes milvus-io/pymilvus#2771 This PR: - Make AsyncRetrieve task triggers "schema check" logic as well - Rename `AddField` related methods to align with code standard Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-04-29 14:10:49 +08:00
Spade A	910f68c986	fix: update tantivy to fix tantivy doc out of order when merge (#41596 ) issue: #41597 Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-29 13:46:49 +08:00
Spade A	f35e8f7420	fix: fix arm64 compile issue (#41593 ) issue: https://github.com/milvus-io/milvus/issues/41059, https://github.com/milvus-io/milvus/issues/41510 Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-29 13:19:25 +08:00
Buqian Zheng	3de904c7ea	feat: add cachinglayer to sealed segment (#41436 ) issue: https://github.com/milvus-io/milvus/issues/41435 --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-04-28 10:52:40 +08:00
cai.zhang	640f526301	fix: Update current scalar index version to compatible tantivy different versions (#41141 ) issue: #40823 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-04-27 20:44:39 +08:00
Chun Han	12cde913b5	fix: fail to get string views due to chunk bound empty loop(#41300 ) (#41452 ) related: #41300 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-04-27 10:40:38 +08:00
congqixia	b5443ddbd0	enhance: [AddField] Reopen loaded segments after AddField (#41529 ) Related to #39718 This PR: - Add reopen logic for growing & sealed segments - Lazy reopen when schema version increases - Add FinishLoad api for loading progress --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-04-26 08:48:39 +08:00
Buqian Zheng	1c8b9c127d	fix: Make sure segment in ut is destroyed before static MmapManager singleton (#41508 ) issue: #41507 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-04-25 18:50:38 +08:00
Xianhui Lin	1a6838b496	fix: json stats add map null check before insert into tantivity (#41505 ) json stats add map null check before insert into tantivity. Json stats index may fail if there is no data issue:https://github.com/milvus-io/milvus/issues/41494 --------- Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-04-24 21:06:37 +08:00
congqixia	dbe54c2df8	enhance: [AddField] Resolve conflicts & make WAL ts collection updatets (#41476 ) Related to #39718 This PR: - Use WAL broadcast timestamp as Collection update timestamp - Remove request_fields size assertion - Remove proxy schema cache loaded field check & skip related cases - other minor issues --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-04-24 12:06:39 +08:00
Spade A	f3d878ab3f	fix: update tantivy for fixing phrase match (#41450 ) issue: #41454 https://github.com/zilliztech/tantivy/pull/8 fixes the problem, this PR update the tantivy. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-24 10:52:37 +08:00
aoiasd	f52c2909c4	feat: support multi analyzer for bm25 function (#41351 ) relate: https://github.com/milvus-io/milvus/issues/41213 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-04-23 18:22:38 +08:00
Xianhui Lin	3d4889586d	fix: JsonStats filter by conjunctExpr and improve the task slot calculation logic (#41459 ) Optimized JSON filter execution by introducing ProcessJsonStatsChunkPos() for unified position calculation and GetNextBatchSize() for better batch processing. Improved JSON key generation by replacing manual path joining with milvus::Json::pointer() and adjusted slot size calculation for JSON key index jobs. Updated the task slot calculation logic in calculateStatsTaskSlot() to handle the increased resource needs of JSON key index jobs. issue: https://github.com/milvus-io/milvus/issues/41378 https://github.com/milvus-io/milvus/issues/41218 --------- Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-04-23 16:30:37 +08:00
aoiasd	a16bd6263b	feat: support more lauguage for build in stop words and add remove punct, regex filter (#41412 ) relate: https://github.com/milvus-io/milvus/issues/41213 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-04-23 11:44:37 +08:00
aoiasd	11f2fae42e	feat: support extend default dict for jieba tokenizer (#41360 ) relate: https://github.com/milvus-io/milvus/issues/41213 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-04-22 20:34:37 +08:00
congqixia	b36c88f3c8	enhance: [AddField] Broadcast schema change via WAL (#41373 ) Related to #39718 Add Broadcast logic for collection schema change and notifies: - Streamnode - Delegator - Streamnode - Flush component - QueryNodes via grpc --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-04-22 16:28:37 +08:00
aoiasd	110c5aaaf4	feat: support icu and language identifier tokenizer (#41214 ) relate: https://github.com/milvus-io/milvus/issues/41213 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-04-22 15:56:37 +08:00
cqy123456	5219d9a723	fix: Inserting null and non-null array at the same time will cause milvus crash when growing mmap open (#41051 ) issue: https://github.com/milvus-io/milvus/issues/40981 2.5 pr: https://github.com/milvus-io/milvus/pull/41052 Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>	2025-04-22 12:26:37 +08:00
aoiasd	f166843c5e	enhance: support use lindera tag filter (#40416 ) relate: https://github.com/milvus-io/milvus/issues/39659 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-04-21 15:56:36 +08:00
sparknack	8ccb875e41	enhance: add simde package (#40943 ) issue: #40942 Add simde package, which can make porting SIMD code to other architectures much easier. Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-04-21 12:18:40 +08:00
Spade A	5b1430f27e	enhance: tantivy collector set bitset directly (#39748 ) fix: #39755 The following shows a simple benchmark where insert 1M docs where all rows are "hello", the latency is segcore level, CPU is 9900K: master: 2.62ms this PR: 2.11ms bench mark code: ``` TEST(TextMatch, TestPerf) { auto schema = GenTestSchema({}, true); auto seg = CreateSealedSegment(schema, empty_index_meta); int64_t N = 1000000; uint64_t seed = 19190504; auto raw_data = DataGen(schema, N, seed); auto str_col = raw_data.raw_->mutable_fields_data() ->at(1) .mutable_scalars() ->mutable_string_data() ->mutable_data(); for (int64_t i = 0; i < N - 1; i++) { str_col->at(i) = "hello"; } SealedLoadFieldData(raw_data, *seg); seg->CreateTextIndex(FieldId(101)); auto now = std::chrono::high_resolution_clock::now(); auto expr = GetMatchExpr(schema, "hello", OpType::TextMatch); auto final = ExecuteQueryExpr(expr, seg.get(), N, MAX_TIMESTAMP); auto end = std::chrono::high_resolution_clock::now(); auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - now); std::cout << "TextMatch query time: " << duration.count() << "ms" << std::endl; } ``` --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-20 23:02:41 +08:00
Chun Han	016920b023	fix: solve incompitable problem for none-encoding index(#40838 ) (#41369 ) related: #40838 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-04-20 22:56:44 +08:00
Ted Xu	d50781c8cc	enhance: support nullable group by keys (#41313 ) See #36264 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-04-18 10:08:34 +08:00
Spade A	62293cb582	fix: revert batch add (#41374 ) issue: #41375 todo: to fix the problems fixed in the issue. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-17 22:32:38 +08:00
Bingyi Sun	4552dd4b23	fix: Fix json index does not work for string filter (#41382 ) issue: #35528 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-04-17 20:10:39 +08:00
sthuang	1f1c836fb9	feat: Storage v2 growing segment load (#41001 ) support parallel loading sealed and growing segments with storage v2 format by async reading row groups. related: #39173 --------- Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-04-16 17:14:33 +08:00
Spade A	70d13dcf61	enhance: update tantivy for removing "doc_id" fast field (#41198 ) Issue: #41210 After https://github.com/zilliztech/tantivy/pull/5, we can provide milvus row id directly to tantivy rather than record it in the fast field "doc_id". So rather than search tantivy doc id and then get milvus row id from "doc_id", now, the searched tantivy doc id is the milvus row id, eliminating the expensive acquiring row id phase. The following shows a simple benchmark where insert 1M docs where all rows are "hello", the latency is segcore level, CPU is 9900K: ![image](https://github.com/user-attachments/assets/d8e72134-56b5-430b-8628-36c3bed8eaad) The latency is 2.02 and 2.1 times respectively. bench mark code: ``` TEST(TextMatch, TestPerf) { auto schema = GenTestSchema({}, true); auto seg = CreateSealedSegment(schema, empty_index_meta); int64_t N = 1000000; uint64_t seed = 19190504; auto raw_data = DataGen(schema, N, seed); auto str_col = raw_data.raw_->mutable_fields_data() ->at(1) .mutable_scalars() ->mutable_string_data() ->mutable_data(); for (int64_t i = 0; i < N - 1; i++) { str_col->at(i) = "hello"; } SealedLoadFieldData(raw_data, *seg); seg->CreateTextIndex(FieldId(101)); auto now = std::chrono::high_resolution_clock::now(); auto expr = GetMatchExpr(schema, "hello", OpType::TextMatch); auto final = ExecuteQueryExpr(expr, seg.get(), N, MAX_TIMESTAMP); auto end = std::chrono::high_resolution_clock::now(); auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - now); std::cout << "TextMatch query time: " << duration.count() << "ms" << std::endl; } ``` --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-15 20:20:32 +08:00
Bingyi Sun	a953eaeaf0	enhance: support binary range expression for json path index (#41025 ) issue: #35528 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-04-15 19:32:33 +08:00
Chun Han	59b14d38f5	enhance: Optimize index format for improved load performance(#40838 ) (#40839 ) related: https://github.com/milvus-io/milvus/issues/40838 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-04-15 03:10:30 +08:00
Bingyi Sun	bf617115ca	enhance: Remove single chunk segment related codes (#39249 ) https://github.com/milvus-io/milvus/issues/39112 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-04-11 18:56:29 +08:00
Spade A	9ce3e3cb44	enhance: add documents in batch for json key stats (#41228 ) issue: https://github.com/milvus-io/milvus/issues/40897 After this, the document add operations scheduling duration is decreased roughly from 6s to 0.9s for the case in the issue. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-11 14:08:26 +08:00
Bingyi Sun	b9b8419cbf	fix: Use int32 when creating array index for element type int8/int16 (#41185 ) issue: #41172 Elements with type int8 or int16 in Array is encoded using int32, so we should parse it as int32 when creating index. Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-04-11 13:18:25 +08:00
foxspy	17e10beba0	fix: avoid segmentation faults caused by retrieving empty vector datasets (#40545 ) issue: #40544 Signed-off-by: xianliang.li <xianliang.li@zilliz.com>	2025-04-10 20:16:29 +08:00

1 2 3 4 5 ...

1936 Commits