10443 Commits

Author SHA1 Message Date
congqixia
85ed200529
fix: Save update timestamp in catalog.AlterCollection API (#41468)
Related to #41467

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-23 16:48:37 +08:00
Xianhui Lin
3d4889586d
fix: JsonStats filter by conjunctExpr and improve the task slot calculation logic (#41459)
Optimized JSON filter execution by introducing
ProcessJsonStatsChunkPos() for unified position calculation and
GetNextBatchSize() for better batch processing.
Improved JSON key generation by replacing manual path joining with
milvus::Json::pointer() and adjusted slot size calculation for JSON key
index jobs.
Updated the task slot calculation logic in calculateStatsTaskSlot() to
handle the increased resource needs of JSON key index jobs.
issue: https://github.com/milvus-io/milvus/issues/41378
https://github.com/milvus-io/milvus/issues/41218

---------

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-04-23 16:30:37 +08:00
aoiasd
655cc7fe06
fix: bm25 stats idf oracle leak (#41425)
relate: https://github.com/milvus-io/milvus/issues/41424

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-04-23 14:28:37 +08:00
aoiasd
a16bd6263b
feat: support more lauguage for build in stop words and add remove punct, regex filter (#41412)
relate: https://github.com/milvus-io/milvus/issues/41213

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-04-23 11:44:37 +08:00
SimFG
91d40fa558
fix: Update logging context and upgrade dependencies (#41318)
- issue: #41291

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-04-23 10:52:38 +08:00
aoiasd
11f2fae42e
feat: support extend default dict for jieba tokenizer (#41360)
relate: https://github.com/milvus-io/milvus/issues/41213

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-04-22 20:34:37 +08:00
congqixia
481938297c
enhance: [AddField] Use next field id instead of global allocation (#41440)
Related to #39718

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-22 17:14:37 +08:00
congqixia
6f4e0d8e38
enhance: [AddField] Use schema update ts as guarantee ts (#41430)
Related to #39718

Use schema update ts when it's greater than calculated guarantee
timestamp to make sure that all read request using updated schema shall
wait all schema change event processed.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-22 17:12:45 +08:00
congqixia
b36c88f3c8
enhance: [AddField] Broadcast schema change via WAL (#41373)
Related to #39718

Add Broadcast logic for collection schema change and notifies:
- Streamnode - Delegator
- Streamnode - Flush component
- QueryNodes via grpc

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-22 16:28:37 +08:00
aoiasd
110c5aaaf4
feat: support icu and language identifier tokenizer (#41214)
relate: https://github.com/milvus-io/milvus/issues/41213

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-04-22 15:56:37 +08:00
cqy123456
5219d9a723
fix: Inserting null and non-null array at the same time will cause milvus crash when growing mmap open (#41051)
issue: https://github.com/milvus-io/milvus/issues/40981
2.5 pr: https://github.com/milvus-io/milvus/pull/41052

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2025-04-22 12:26:37 +08:00
Zhen Ye
7f5a9a6046
fix: unstable timeticksync unittest (#41437)
issue: #38399

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-22 10:53:29 +08:00
Zhen Ye
9339bccccc
enhance: move sent first timeticksync, make recovery more easier (#41405)
issue: #38399

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-21 17:18:37 +08:00
aoiasd
f166843c5e
enhance: support use lindera tag filter (#40416)
relate: https://github.com/milvus-io/milvus/issues/39659

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-04-21 15:56:36 +08:00
Xianhui Lin
c5428c12eb
feat: Add support for modifying max capacity of array fields (#41404)
feat: Add support for modifying max capacity of array fields

This commit adds support for modifying the max capacity of array fields
in the `alterCollectionFieldTask` function. It checks if the field is an
array type and then validates and updates the max capacity value. This
change improves the flexibility of array fields in the collection.

Issue: https://github.com/milvus-io/milvus/issues/41363

---------

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-04-21 15:52:37 +08:00
sparknack
8ccb875e41
enhance: add simde package (#40943)
issue: #40942

Add simde package, which can make porting SIMD code to other
architectures much easier.

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-04-21 12:18:40 +08:00
Spade A
5b1430f27e
enhance: tantivy collector set bitset directly (#39748)
fix: #39755

The following shows a simple benchmark where insert 1M docs where all
rows are "hello", the latency is segcore level, CPU is 9900K:
master: 2.62ms
this PR: 2.11ms

bench mark code:

```
TEST(TextMatch, TestPerf) {
    auto schema = GenTestSchema({}, true);
    auto seg = CreateSealedSegment(schema, empty_index_meta);
    int64_t N = 1000000;
    uint64_t seed = 19190504;
    auto raw_data = DataGen(schema, N, seed);
    auto str_col = raw_data.raw_->mutable_fields_data()
                       ->at(1)
                       .mutable_scalars()
                       ->mutable_string_data()
                       ->mutable_data();
    for (int64_t i = 0; i < N - 1; i++) {
        str_col->at(i) = "hello";
    }
    SealedLoadFieldData(raw_data, *seg);
    seg->CreateTextIndex(FieldId(101));

    auto now = std::chrono::high_resolution_clock::now();
    auto expr = GetMatchExpr(schema, "hello", OpType::TextMatch);
    auto final = ExecuteQueryExpr(expr, seg.get(), N, MAX_TIMESTAMP);
    auto end = std::chrono::high_resolution_clock::now();
    auto duration =
        std::chrono::duration_cast<std::chrono::microseconds>(end - now);
    std::cout << "TextMatch query time: " << duration.count() << "ms"
              << std::endl;
}
```

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-20 23:02:41 +08:00
Chun Han
016920b023
fix: solve incompitable problem for none-encoding index(#40838) (#41369)
related: #40838

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-04-20 22:56:44 +08:00
Zhen Ye
c4a41cc32b
fix: add node id check to avoid double flush at most time (#41236)
issue: #41028

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-20 22:44:38 +08:00
Zhen Ye
ef4923e66b
fix: catchup scan never done if wal truncate (#41345)
issue: #41062

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-20 22:40:37 +08:00
Zhen Ye
78fca7e88d
fix: transaction should retry if transaction is expired (#41379)
issue: #41248

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-20 22:38:36 +08:00
tinswzy
6fa68c1f16
enhance: Support Woodpecker as a WAL storage option for Milvus (#41095)
#40916 Support Woodpecker as a WAL storage option for Milvus

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-04-20 22:22:42 +08:00
Zhen Ye
c893344289
fix: close of wal is block when recovery (#41326)
issue: #41307

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-18 16:14:35 +08:00
Xianhui Lin
c43f8f7944
feat: Ignore reporting index metrics for non-existent indexes (#41294)
feat: Ignore reporting index metrics for non-existent indexes

Remove the reporting of index metrics for non-existent indexes in the
`getCollectionMetrics` function. This change improves the code by
skipping unnecessary operations and reduces log noise.
issue: https://github.com/milvus-io/milvus/issues/41280

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-04-18 10:36:36 +08:00
Ted Xu
d50781c8cc
enhance: support nullable group by keys (#41313)
See #36264

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-04-18 10:08:34 +08:00
Spade A
62293cb582
fix: revert batch add (#41374)
issue: #41375

todo: to fix the problems fixed in the issue.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-17 22:32:38 +08:00
Bingyi Sun
4552dd4b23
fix: Fix json index does not work for string filter (#41382)
issue: #35528

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-04-17 20:10:39 +08:00
Xiaowei Shi
1a35374672
fix: correct wrong querynode metric labels (#41344)
issue: https://github.com/milvus-io/milvus/issues/41343

Signed-off-by: Xiaowei Shi <shallwe.shih@gmail.com>
2025-04-16 21:32:33 +08:00
cai.zhang
5fd8a196f6
fix: Fix panic with nil pointer dereference when get indexed segment (#41297)
issue: #41288

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-04-16 20:54:00 +08:00
Xianhui Lin
deb610e5d3
fix: update MixCoord registration in MilvusRoles (#41337)
enhance: update MixCoord registration in MilvusRoles

The `runMixCoord` function in `MilvusRoles` was updated to use the
`RegisterMixCoord` function from the `rootcoord_metrics` package instead
of `RegisterRootCoord`. This change aligns with the recent modifications
made to the `rootcoord_metrics` package.
issue:https://github.com/milvus-io/milvus/issues/41338

---------

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-04-16 19:49:54 +08:00
박상호
4be6d0e967
fix: skip dim check for non-vector fields in PreCheck (#41287) (#41289)
## What this PR does

This PR fixes an issue where the `PreCheck` function in DataCoord logs
unnecessary warnings
when attempting to retrieve 'dim' from non-vector fields.

The change adds a check to only call `GetDimFromParams` when the field
type is a vector type.

## Related issue

Fixes #41287

---------

Signed-off-by: 박상호 <sangho@rapportlabs.kr>
Signed-off-by: Sangho Park <hoyaspark@gmail.com>
2025-04-16 17:52:32 +08:00
Xiaowei Shi
a6606ce9c6
fix: check PreCreatedTopic first in shard number validation (#41274)
issue : https://github.com/milvus-io/milvus/issues/41271

Signed-off-by: Xiaowei Shi <shallwe.shih@gmail.com>
2025-04-16 17:38:34 +08:00
sthuang
e46e3a1708
enhance: optimize error log message for list policy (#41251)
related: #41250

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-04-16 17:16:32 +08:00
sthuang
1f1c836fb9
feat: Storage v2 growing segment load (#41001)
support parallel loading sealed and growing segments with storage v2
format by async reading row groups.
related: #39173

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-04-16 17:14:33 +08:00
Spade A
70d13dcf61
enhance: update tantivy for removing "doc_id" fast field (#41198)
Issue: #41210

After https://github.com/zilliztech/tantivy/pull/5, we can provide
milvus row id directly to tantivy rather than record it in the fast
field "doc_id".
So rather than search tantivy doc id and then get milvus row id from
"doc_id", now, the searched tantivy doc id is the milvus row id,
eliminating the expensive acquiring row id phase.

The following shows a simple benchmark where insert **1M** docs where
all rows are "hello", the latency is **segcore** level, CPU is 9900K:

![image](https://github.com/user-attachments/assets/d8e72134-56b5-430b-8628-36c3bed8eaad)
**The latency is 2.02 and 2.1 times respectively.**

bench mark code:
```
TEST(TextMatch, TestPerf) {
    auto schema = GenTestSchema({}, true);
    auto seg = CreateSealedSegment(schema, empty_index_meta);
    int64_t N = 1000000;
    uint64_t seed = 19190504;
    auto raw_data = DataGen(schema, N, seed);
    auto str_col = raw_data.raw_->mutable_fields_data()
                       ->at(1)
                       .mutable_scalars()
                       ->mutable_string_data()
                       ->mutable_data();
    for (int64_t i = 0; i < N - 1; i++) {
        str_col->at(i) = "hello";
    }
    SealedLoadFieldData(raw_data, *seg);
    seg->CreateTextIndex(FieldId(101));

    auto now = std::chrono::high_resolution_clock::now();
    auto expr = GetMatchExpr(schema, "hello", OpType::TextMatch);
    auto final = ExecuteQueryExpr(expr, seg.get(), N, MAX_TIMESTAMP);
    auto end = std::chrono::high_resolution_clock::now();
    auto duration =
        std::chrono::duration_cast<std::chrono::microseconds>(end - now);
    std::cout << "TextMatch query time: " << duration.count() << "ms"
              << std::endl;
}
```

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-15 20:20:32 +08:00
Bingyi Sun
a953eaeaf0
enhance: support binary range expression for json path index (#41025)
issue: #35528

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-04-15 19:32:33 +08:00
congqixia
1d564a2d95
fix: Make TestScannerAdaptorReadError stable (#41303)
Related to #41302

Previously wait for 200 milliseconds could cause unsable behavior of
this unittest. This PR make unittest wait for certain function call
instead of wait for some time.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-15 14:54:34 +08:00
congqixia
a53f3024cf
fix: Add save field schema log for kv_catalog.AlterCollection (#41242)
Related to #41241

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-15 12:14:32 +08:00
yihao.dai
dccfc69660
enhance: Get compaction params from request (#41125)
Make DataNode use compaction parameters from request instead of
configuration.

issue: https://github.com/milvus-io/milvus/issues/41123

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-04-15 10:28:53 +08:00
cai.zhang
bc11feae74
fix: Close client before remove worker client (#41253)
issue: #41252

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-04-15 10:26:31 +08:00
Xianhui Lin
3963fc818f
fix:Add debug memory freeing in sortStats (#41284)
issue: https://github.com/milvus-io/milvus/issues/41218

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-04-15 09:56:29 +08:00
Xianhui Lin
23f9226250
fix: Initialize streaming coordinator during mixCoord initialization (#41283)
relater-pr: https://github.com/milvus-io/milvus/pull/41006
issue: https://github.com/milvus-io/milvus/issues/41282

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-04-15 09:44:30 +08:00
Chun Han
59b14d38f5
enhance: Optimize index format for improved load performance(#40838) (#40839)
related: https://github.com/milvus-io/milvus/issues/40838

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-04-15 03:10:30 +08:00
Spade A
736512a59e
fix: change log info to debug for collection ref (#41267)
issue: #41268

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-14 15:16:30 +08:00
Bingyi Sun
bf617115ca
enhance: Remove single chunk segment related codes (#39249)
https://github.com/milvus-io/milvus/issues/39112

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-04-11 18:56:29 +08:00
congqixia
154a2a68e0
enhance: Fill dbname for AddCollectionFieldRequest (#41237)
Related to #39718

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-11 18:54:29 +08:00
Xianhui Lin
f9febe3bae
enhance: Merge RootCoord, DataCoord And QueryCoord into MixCoord (#41006)
Merge RootCoord, DataCoord And QueryCoord into MixCoord
Make Session into one
issue : https://github.com/milvus-io/milvus/issues/37764

---------

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-04-11 16:36:30 +08:00
Spade A
9ce3e3cb44
enhance: add documents in batch for json key stats (#41228)
issue: https://github.com/milvus-io/milvus/issues/40897

After this, the document add operations scheduling duration is decreased
roughly from 6s to 0.9s for the case in the issue.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-11 14:08:26 +08:00
Bingyi Sun
b9b8419cbf
fix: Use int32 when creating array index for element type int8/int16 (#41185)
issue: #41172
Elements with type int8 or int16 in Array is encoded using int32, so we
should parse it as int32 when creating index.

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-04-11 13:18:25 +08:00
Xianhui Lin
144911aec6
fix: CreateStatsRequest change storage_version to 25 consistent with 2.5 (#41217)
fix: CreateStatsRequest change storage_version to 25 consistent with 2.5
relate-pr:https://github.com/milvus-io/milvus/pull/38039
issue: https://github.com/milvus-io/milvus/issues/36995

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-04-11 11:16:43 +08:00