1843 Commits

Author SHA1 Message Date
Bingyi Sun
4552dd4b23
fix: Fix json index does not work for string filter (#41382)
issue: #35528

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-04-17 20:10:39 +08:00
sthuang
1f1c836fb9
feat: Storage v2 growing segment load (#41001)
support parallel loading sealed and growing segments with storage v2
format by async reading row groups.
related: #39173

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-04-16 17:14:33 +08:00
Spade A
70d13dcf61
enhance: update tantivy for removing "doc_id" fast field (#41198)
Issue: #41210

After https://github.com/zilliztech/tantivy/pull/5, we can provide
milvus row id directly to tantivy rather than record it in the fast
field "doc_id".
So rather than search tantivy doc id and then get milvus row id from
"doc_id", now, the searched tantivy doc id is the milvus row id,
eliminating the expensive acquiring row id phase.

The following shows a simple benchmark where insert **1M** docs where
all rows are "hello", the latency is **segcore** level, CPU is 9900K:

![image](https://github.com/user-attachments/assets/d8e72134-56b5-430b-8628-36c3bed8eaad)
**The latency is 2.02 and 2.1 times respectively.**

bench mark code:
```
TEST(TextMatch, TestPerf) {
    auto schema = GenTestSchema({}, true);
    auto seg = CreateSealedSegment(schema, empty_index_meta);
    int64_t N = 1000000;
    uint64_t seed = 19190504;
    auto raw_data = DataGen(schema, N, seed);
    auto str_col = raw_data.raw_->mutable_fields_data()
                       ->at(1)
                       .mutable_scalars()
                       ->mutable_string_data()
                       ->mutable_data();
    for (int64_t i = 0; i < N - 1; i++) {
        str_col->at(i) = "hello";
    }
    SealedLoadFieldData(raw_data, *seg);
    seg->CreateTextIndex(FieldId(101));

    auto now = std::chrono::high_resolution_clock::now();
    auto expr = GetMatchExpr(schema, "hello", OpType::TextMatch);
    auto final = ExecuteQueryExpr(expr, seg.get(), N, MAX_TIMESTAMP);
    auto end = std::chrono::high_resolution_clock::now();
    auto duration =
        std::chrono::duration_cast<std::chrono::microseconds>(end - now);
    std::cout << "TextMatch query time: " << duration.count() << "ms"
              << std::endl;
}
```

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-15 20:20:32 +08:00
Bingyi Sun
a953eaeaf0
enhance: support binary range expression for json path index (#41025)
issue: #35528

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-04-15 19:32:33 +08:00
Chun Han
59b14d38f5
enhance: Optimize index format for improved load performance(#40838) (#40839)
related: https://github.com/milvus-io/milvus/issues/40838

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-04-15 03:10:30 +08:00
Bingyi Sun
bf617115ca
enhance: Remove single chunk segment related codes (#39249)
https://github.com/milvus-io/milvus/issues/39112

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-04-11 18:56:29 +08:00
Spade A
9ce3e3cb44
enhance: add documents in batch for json key stats (#41228)
issue: https://github.com/milvus-io/milvus/issues/40897

After this, the document add operations scheduling duration is decreased
roughly from 6s to 0.9s for the case in the issue.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-11 14:08:26 +08:00
Bingyi Sun
b9b8419cbf
fix: Use int32 when creating array index for element type int8/int16 (#41185)
issue: #41172
Elements with type int8 or int16 in Array is encoded using int32, so we
should parse it as int32 when creating index.

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-04-11 13:18:25 +08:00
foxspy
17e10beba0
fix: avoid segmentation faults caused by retrieving empty vector datasets (#40545)
issue: #40544

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-04-10 20:16:29 +08:00
Xianhui Lin
3bc24c264f
enhance: Add json key inverted index in stats for optimization (#38039)
Add json key inverted index in stats for optimization
https://github.com/milvus-io/milvus/issues/36995

---------

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-04-10 15:20:28 +08:00
zhagnlu
3ed23a5f48
fix: fix remove index type failed when remote storage is local mode (#41164)
#41142

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-04-09 16:42:26 +08:00
zhagnlu
ee1faf80dd
fix:add clear bitmap for batch skip mode (#41166)
#41086 #41150

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-04-09 13:08:27 +08:00
sthuang
50e02e3598
enhance: update packed reader api (#41055)
related: https://github.com/milvus-io/milvus/issues/39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-04-09 10:18:26 +08:00
congqixia
e2d8adb963
fix: Use element_type for Array is null operator (#41157)
Related to #41156

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-09 10:16:24 +08:00
Bingyi Sun
da21640ac3
fix: Fix the bug that null data can not be filtered by null expr (#41124)
issue: https://github.com/milvus-io/milvus/issues/41063

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-04-08 19:12:24 +08:00
Bingyi Sun
355f62d6c9
fix: Align brute force search with json index for exists expr (#41116)
issue: #35528

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-04-07 15:42:23 +08:00
zhagnlu
ee8783cae9
fix:add operator type for some operator (#40895)
#40894

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-04-07 11:58:27 +08:00
zhagnlu
10a63b3f2e
enhance: add formatter for serveral types to remove compile warning (#41094)
#41091

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-04-07 11:54:24 +08:00
zhagnlu
0a378dc308
fix:fix format error for json (#41026)
#40963

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-04-07 10:22:22 +08:00
Bingyi Sun
fcb03b5bd1
feat: add json null/exists expression (#41004)
issue: #35528

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-04-03 17:48:21 +08:00
Zhen Ye
9f27d9af61
fix: segv if the LoadArrowReaderFromRemote run at the exception path (#41069)
issue: #41067

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-03 02:54:21 +08:00
Spade A
f552ec67dd
fix: support building tantivy index with low version(5) (#40822)
fix: https://github.com/milvus-io/milvus/issues/40823
To solve the problem in the issue, we have to support building tantivy
index with low version
for those query nodes with low tantivy version.

This PR does two things:
1. refactor codes for IndexWriterWrapper to make it concise
2. enable IndexWriterWrapper to build tantivy index by different tantivy
crate

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-02 18:46:20 +08:00
Chun Han
afa519b4c7
fix: array is null failed(#40686) (#41027)
related: #40686

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-04-02 18:20:22 +08:00
smellthemoon
cb1e86e17c
enhance: support add field (#39800)
after the pr merged, we can support to insert, upsert, build index,
query, search in the added field.
can only do the above operates in added field after add field request
complete, which is a sync operate.

compact will be supported in the next pr.
#39718

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2025-04-02 14:24:31 +08:00
Spade A
216be1494b
fix: add log for object storage operation fail (#40666)
fix: #40665

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-02 01:26:21 +08:00
cqy123456
6dc0f42830
fix:growing mmap data type crashed by nullable input (#40994)
issue: https://github.com/milvus-io/milvus/issues/40981
2.5 pr: https://github.com/milvus-io/milvus/pull/40980

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2025-03-31 20:32:19 +08:00
Bingyi Sun
27ff3a42e7
enhance: Record simdjson error (#41003)
issue: #35528

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-03-31 17:56:19 +08:00
Bingyi Sun
15ec7bae4d
fix: Fix using json index when iterative_filter is specified (#40945)
issue: #40934

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-03-31 15:26:19 +08:00
Bingyi Sun
9676365af9
fix: Fix json index not equal filter (#40647)
issue: #35528

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-03-27 23:06:23 +08:00
zhagnlu
87e7d6d79f
fix:fix exception when do arith expr with using index (#40794)
#40783

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-03-27 11:10:21 +08:00
Xiaofan
8788e591cd
enhance: add detailed stack for error message (#40883)
fix #40882
adding stacktrace will operator execute failed.

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
2025-03-26 13:24:20 +08:00
zhagnlu
7fdb2e144f
enhance:change multi or expr to in expr (#40757)
#40752

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-03-25 11:06:18 +08:00
cai.zhang
a41cb942f6
fix: Do not delete the centroids file when sampling fails instead wait GC (#40701)
issue: #40700

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-03-21 10:32:12 +08:00
Bingyi Sun
5a6b4e56d5
fix: Fix tasks will panic if one of them throw an exception. (#40691)
issue: https://github.com/milvus-io/milvus/issues/40690

the variable rcm will be dangling if a future throws an exception and
return.

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-03-19 16:52:09 +08:00
aoiasd
92bdf7a0c1
enhance: support run anayser return detaild token (#40458)
relate: https://github.com/milvus-io/milvus/issues/39705

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-03-19 15:48:15 +08:00
zhagnlu
6c55db44f1
enhance: reorder sub expr for conjunct expr (#39872)
two point:
 (1) reoder conjucts expr's subexpr, postpone heavy operations
sequence: int(column) -> index(column) -> string(column) -> light
conjuct
...... -> json(column) -> heavy conjuct -> two_column_compare
(2) support pre filter for expr execute, skip scan raw data that had
been skipped
     because of preceding expr result.

#39869

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-03-19 14:50:14 +08:00
zhagnlu
7ebe3d7038
enhance: refine chunk access logic and add some comment on data (#40618)
#40367

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-03-16 22:20:08 +08:00
Bingyi Sun
6249335859
fix: Catch invalid json pointer error (#40625)
issue: #35528

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-03-14 16:56:08 +08:00
Bingyi Sun
d3adab15ac
fix: Build double index for all json numeric field (#40619)
issue: #35528

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-03-14 16:52:11 +08:00
Bingyi Sun
8fbacf3583
fix: Null expr does not work for json field (#40456)
issue: https://github.com/milvus-io/milvus/issues/40455

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-03-14 16:06:08 +08:00
Spade A
f36d1562bd
enhance: add metrics for random sample (#40634)
issue: #39541

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-03-13 21:42:11 +08:00
Spade A
9f3bd55755
fix: avoid panic when field not exists in schema in query node (#40541)
ref #40473

This PR is a workaround to avoid the panic described in the issue.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-03-12 22:44:08 +08:00
cai.zhang
e5f50076ec
enhance: Only check element type with not null array (#40446)
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-03-11 14:58:07 +08:00
Bingyi Sun
0a7e692b6f
fix: Fix null offset loading in inverted index (#40523)
issue: #40516

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-03-10 22:12:04 +08:00
Cai Yudong
2bd2cca04a
enhance: Truly support multi vector data types in SearchBruteForce (#40499)
Issue: #38666

Signed-off-by: CaiYudong <yudong.cai@zilliz.com>
2025-03-10 18:36:03 +08:00
smellthemoon
faae8ee518
fix: store wrong offset when build tantivy in nullable field (#40452)
#40454

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2025-03-09 09:34:04 +08:00
Bingyi Sun
37b118d55d
fix: Skip loading primary key if index has raw data (#39921)
issue: https://github.com/milvus-io/milvus/issues/39907

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-03-06 17:46:02 +08:00
Spade A
3db56560fb
fix: fix concurrent issues in null offset (#40363)
issue: #40308
This issue fixes these two concurrent issues:
1. element in null_offset is used to set bitset where the size of bitset
is initialized by tantivy document count. However, there may still be
some documents that are not committed in tantivy but are null in
null_offset. So array out of range occurs.
2. null_offset can be read and write concurrently but there's no
synchronization protection.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-03-05 17:48:00 +08:00
Bingyi Sun
be4d09561b
fix: Fix missing null or non-exist key in json index (#40336)
issue: #35528

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-03-05 11:48:02 +08:00
Bingyi Sun
7040ba1c12
enhance: make json path index support term filter (#40140)
issue: #35528

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-03-04 11:56:02 +08:00