1570 Commits

Author SHA1 Message Date
congqixia
bcf94a0754
fix: Remove noexcept from CacheIndexToDiskInternal (#41725)
Related to #41219

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-05-09 14:16:53 +08:00
zhagnlu
f674e232b9
fix: GetValueFromConfig return nullopt instead of exception for null value (#41709)
#41707

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-05-09 11:18:53 +08:00
Xianhui Lin
26cbc74478
fix: support infix and suffix match types in JsonStats (#41720)
fix: support infix and suffix match types in JsonStats
issue:https://github.com/milvus-io/milvus/issues/41386

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-05-09 10:42:53 +08:00
zhagnlu
e3c81ba1cc
enhance: use scan mode for like although inverted index exists (#41325)
#41065

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-05-09 10:36:54 +08:00
zhagnlu
39e7ad33d7
enhance: add optimize for like expr (#41066)
#41065

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-05-08 14:28:52 +08:00
foxspy
e2ddbe4962
feat: add cachinglayer to index (#41653)
issue: #41435

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-05-08 10:12:54 +08:00
congqixia
b1f3fe1f07
fix: Use sum of num_rows instead of last one (#41685)
Related to #41656

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-05-07 19:40:53 +08:00
Bingyi Sun
4c08090687
feat: Add json index support for json contains expr (#41478)
issue: #35528

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-05-06 11:44:52 +08:00
Buqian Zheng
73bbf4c674
fix: error when lack_binlog_rows = 0 (#41644)
issue: https://github.com/milvus-io/milvus/issues/41643

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-05-04 00:24:56 +08:00
sthuang
e9442f575d
feat: storage v2 seal segment load (#41567)
storage v2 chunked seal segment loading is based on caching layer. A
cell unit in storage v2 is a parquet row group in remote object storage,
containing all fields. Therefore, each field needs a proxy to do related
one field operations.

<img width="965" alt="Screenshot 2025-04-28 at 10 59 30"
src="https://github.com/user-attachments/assets/83e93a10-3b1d-4066-ac17-b996d5650416"
/>

related: #39173

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-04-30 14:22:58 +08:00
sthuang
6c377b6e86
feat: Storage v2 index and stats raw data (#41534)
related: #39173

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-04-30 08:48:54 +08:00
zhagnlu
cd60b965c8
enhance: add expr filter ratio monitor params (#41402)
#41401

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-04-29 17:02:54 +08:00
foxspy
1d99f8bd67
enhance: add force rebuild index configuration (#41473)
issue: #41431

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-04-29 16:20:56 +08:00
congqixia
f3f8227cd0
enhance: [AddField] Trigger check schema in retrieve as well (#41598)
Related to #39718
Fixes milvus-io/pymilvus#2771

This PR:
- Make AsyncRetrieve task triggers "schema check" logic as well
- Rename `AddField` related methods to align with code standard

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-29 14:10:49 +08:00
Buqian Zheng
3de904c7ea
feat: add cachinglayer to sealed segment (#41436)
issue: https://github.com/milvus-io/milvus/issues/41435

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-04-28 10:52:40 +08:00
cai.zhang
640f526301
fix: Update current scalar index version to compatible tantivy different versions (#41141)
issue: #40823

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-04-27 20:44:39 +08:00
Chun Han
12cde913b5
fix: fail to get string views due to chunk bound empty loop(#41300) (#41452)
related: #41300

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-04-27 10:40:38 +08:00
congqixia
b5443ddbd0
enhance: [AddField] Reopen loaded segments after AddField (#41529)
Related to #39718

This PR:
- Add reopen logic for growing & sealed segments
- Lazy reopen when schema version increases
- Add FinishLoad api for loading progress

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-26 08:48:39 +08:00
Xianhui Lin
1a6838b496
fix: json stats add map null check before insert into tantivity (#41505)
json stats add map null check before insert into tantivity. Json stats
index may fail if there is no data
issue:https://github.com/milvus-io/milvus/issues/41494

---------

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-04-24 21:06:37 +08:00
congqixia
dbe54c2df8
enhance: [AddField] Resolve conflicts & make WAL ts collection updatets (#41476)
Related to #39718

This PR:
- Use WAL broadcast timestamp as Collection update timestamp
- Remove request_fields size assertion
- Remove proxy schema cache loaded field check & skip related cases
- other minor issues

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-24 12:06:39 +08:00
aoiasd
f52c2909c4
feat: support multi analyzer for bm25 function (#41351)
relate: https://github.com/milvus-io/milvus/issues/41213

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-04-23 18:22:38 +08:00
Xianhui Lin
3d4889586d
fix: JsonStats filter by conjunctExpr and improve the task slot calculation logic (#41459)
Optimized JSON filter execution by introducing
ProcessJsonStatsChunkPos() for unified position calculation and
GetNextBatchSize() for better batch processing.
Improved JSON key generation by replacing manual path joining with
milvus::Json::pointer() and adjusted slot size calculation for JSON key
index jobs.
Updated the task slot calculation logic in calculateStatsTaskSlot() to
handle the increased resource needs of JSON key index jobs.
issue: https://github.com/milvus-io/milvus/issues/41378
https://github.com/milvus-io/milvus/issues/41218

---------

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-04-23 16:30:37 +08:00
congqixia
b36c88f3c8
enhance: [AddField] Broadcast schema change via WAL (#41373)
Related to #39718

Add Broadcast logic for collection schema change and notifies:
- Streamnode - Delegator
- Streamnode - Flush component
- QueryNodes via grpc

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-22 16:28:37 +08:00
cqy123456
5219d9a723
fix: Inserting null and non-null array at the same time will cause milvus crash when growing mmap open (#41051)
issue: https://github.com/milvus-io/milvus/issues/40981
2.5 pr: https://github.com/milvus-io/milvus/pull/41052

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2025-04-22 12:26:37 +08:00
Spade A
5b1430f27e
enhance: tantivy collector set bitset directly (#39748)
fix: #39755

The following shows a simple benchmark where insert 1M docs where all
rows are "hello", the latency is segcore level, CPU is 9900K:
master: 2.62ms
this PR: 2.11ms

bench mark code:

```
TEST(TextMatch, TestPerf) {
    auto schema = GenTestSchema({}, true);
    auto seg = CreateSealedSegment(schema, empty_index_meta);
    int64_t N = 1000000;
    uint64_t seed = 19190504;
    auto raw_data = DataGen(schema, N, seed);
    auto str_col = raw_data.raw_->mutable_fields_data()
                       ->at(1)
                       .mutable_scalars()
                       ->mutable_string_data()
                       ->mutable_data();
    for (int64_t i = 0; i < N - 1; i++) {
        str_col->at(i) = "hello";
    }
    SealedLoadFieldData(raw_data, *seg);
    seg->CreateTextIndex(FieldId(101));

    auto now = std::chrono::high_resolution_clock::now();
    auto expr = GetMatchExpr(schema, "hello", OpType::TextMatch);
    auto final = ExecuteQueryExpr(expr, seg.get(), N, MAX_TIMESTAMP);
    auto end = std::chrono::high_resolution_clock::now();
    auto duration =
        std::chrono::duration_cast<std::chrono::microseconds>(end - now);
    std::cout << "TextMatch query time: " << duration.count() << "ms"
              << std::endl;
}
```

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-20 23:02:41 +08:00
Chun Han
016920b023
fix: solve incompitable problem for none-encoding index(#40838) (#41369)
related: #40838

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-04-20 22:56:44 +08:00
Ted Xu
d50781c8cc
enhance: support nullable group by keys (#41313)
See #36264

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-04-18 10:08:34 +08:00
Bingyi Sun
4552dd4b23
fix: Fix json index does not work for string filter (#41382)
issue: #35528

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-04-17 20:10:39 +08:00
sthuang
1f1c836fb9
feat: Storage v2 growing segment load (#41001)
support parallel loading sealed and growing segments with storage v2
format by async reading row groups.
related: #39173

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-04-16 17:14:33 +08:00
Spade A
70d13dcf61
enhance: update tantivy for removing "doc_id" fast field (#41198)
Issue: #41210

After https://github.com/zilliztech/tantivy/pull/5, we can provide
milvus row id directly to tantivy rather than record it in the fast
field "doc_id".
So rather than search tantivy doc id and then get milvus row id from
"doc_id", now, the searched tantivy doc id is the milvus row id,
eliminating the expensive acquiring row id phase.

The following shows a simple benchmark where insert **1M** docs where
all rows are "hello", the latency is **segcore** level, CPU is 9900K:

![image](https://github.com/user-attachments/assets/d8e72134-56b5-430b-8628-36c3bed8eaad)
**The latency is 2.02 and 2.1 times respectively.**

bench mark code:
```
TEST(TextMatch, TestPerf) {
    auto schema = GenTestSchema({}, true);
    auto seg = CreateSealedSegment(schema, empty_index_meta);
    int64_t N = 1000000;
    uint64_t seed = 19190504;
    auto raw_data = DataGen(schema, N, seed);
    auto str_col = raw_data.raw_->mutable_fields_data()
                       ->at(1)
                       .mutable_scalars()
                       ->mutable_string_data()
                       ->mutable_data();
    for (int64_t i = 0; i < N - 1; i++) {
        str_col->at(i) = "hello";
    }
    SealedLoadFieldData(raw_data, *seg);
    seg->CreateTextIndex(FieldId(101));

    auto now = std::chrono::high_resolution_clock::now();
    auto expr = GetMatchExpr(schema, "hello", OpType::TextMatch);
    auto final = ExecuteQueryExpr(expr, seg.get(), N, MAX_TIMESTAMP);
    auto end = std::chrono::high_resolution_clock::now();
    auto duration =
        std::chrono::duration_cast<std::chrono::microseconds>(end - now);
    std::cout << "TextMatch query time: " << duration.count() << "ms"
              << std::endl;
}
```

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-15 20:20:32 +08:00
Bingyi Sun
a953eaeaf0
enhance: support binary range expression for json path index (#41025)
issue: #35528

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-04-15 19:32:33 +08:00
Chun Han
59b14d38f5
enhance: Optimize index format for improved load performance(#40838) (#40839)
related: https://github.com/milvus-io/milvus/issues/40838

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-04-15 03:10:30 +08:00
Bingyi Sun
bf617115ca
enhance: Remove single chunk segment related codes (#39249)
https://github.com/milvus-io/milvus/issues/39112

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-04-11 18:56:29 +08:00
Spade A
9ce3e3cb44
enhance: add documents in batch for json key stats (#41228)
issue: https://github.com/milvus-io/milvus/issues/40897

After this, the document add operations scheduling duration is decreased
roughly from 6s to 0.9s for the case in the issue.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-11 14:08:26 +08:00
Bingyi Sun
b9b8419cbf
fix: Use int32 when creating array index for element type int8/int16 (#41185)
issue: #41172
Elements with type int8 or int16 in Array is encoded using int32, so we
should parse it as int32 when creating index.

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-04-11 13:18:25 +08:00
foxspy
17e10beba0
fix: avoid segmentation faults caused by retrieving empty vector datasets (#40545)
issue: #40544

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-04-10 20:16:29 +08:00
Xianhui Lin
3bc24c264f
enhance: Add json key inverted index in stats for optimization (#38039)
Add json key inverted index in stats for optimization
https://github.com/milvus-io/milvus/issues/36995

---------

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-04-10 15:20:28 +08:00
zhagnlu
3ed23a5f48
fix: fix remove index type failed when remote storage is local mode (#41164)
#41142

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-04-09 16:42:26 +08:00
zhagnlu
ee1faf80dd
fix:add clear bitmap for batch skip mode (#41166)
#41086 #41150

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-04-09 13:08:27 +08:00
sthuang
50e02e3598
enhance: update packed reader api (#41055)
related: https://github.com/milvus-io/milvus/issues/39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-04-09 10:18:26 +08:00
congqixia
e2d8adb963
fix: Use element_type for Array is null operator (#41157)
Related to #41156

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-09 10:16:24 +08:00
Bingyi Sun
da21640ac3
fix: Fix the bug that null data can not be filtered by null expr (#41124)
issue: https://github.com/milvus-io/milvus/issues/41063

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-04-08 19:12:24 +08:00
Bingyi Sun
355f62d6c9
fix: Align brute force search with json index for exists expr (#41116)
issue: #35528

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-04-07 15:42:23 +08:00
zhagnlu
ee8783cae9
fix:add operator type for some operator (#40895)
#40894

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-04-07 11:58:27 +08:00
zhagnlu
10a63b3f2e
enhance: add formatter for serveral types to remove compile warning (#41094)
#41091

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-04-07 11:54:24 +08:00
zhagnlu
0a378dc308
fix:fix format error for json (#41026)
#40963

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-04-07 10:22:22 +08:00
Bingyi Sun
fcb03b5bd1
feat: add json null/exists expression (#41004)
issue: #35528

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-04-03 17:48:21 +08:00
Zhen Ye
9f27d9af61
fix: segv if the LoadArrowReaderFromRemote run at the exception path (#41069)
issue: #41067

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-03 02:54:21 +08:00
Spade A
f552ec67dd
fix: support building tantivy index with low version(5) (#40822)
fix: https://github.com/milvus-io/milvus/issues/40823
To solve the problem in the issue, we have to support building tantivy
index with low version
for those query nodes with low tantivy version.

This PR does two things:
1. refactor codes for IndexWriterWrapper to make it concise
2. enable IndexWriterWrapper to build tantivy index by different tantivy
crate

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-02 18:46:20 +08:00
Chun Han
afa519b4c7
fix: array is null failed(#40686) (#41027)
related: #40686

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-04-02 18:20:22 +08:00