22120 Commits

Author SHA1 Message Date
cai.zhang
8a77fb9cdc
enhance: Support slot for index task and stats task (#39084)
issue: #39101

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-04-08 20:46:25 +08:00
Spade A
c6a0c2ab64
enhance: process tantivy document add by batch (#40124)
issue: https://github.com/milvus-io/milvus/issues/40006

This PR make tantivy document add by batch. Add document by batch can
greately reduce the latency of scheduling the document add operation
(call tantivy `add_document` only schdules the add operation and it
returns immediately after scheduled) , because each call involes a tokio
block_on which is relatively heavy.

Reduce scheduling part not necessarily reduces the overall latency if
the index writer threads does not process indexing quickly enough.
But if scheduling itself is pretty slow, even the index writer threads
process indexing very fast (by increasing thread number), the overall
performance can still be limited.

The following codes bench the PR (Note, the duration only counts for
scheduling without commit)
```
fn test_performance() {
    let field_name = "text";
    let dir = TempDir::new().unwrap();
    let mut index_wrapper = IndexWriterWrapper::create_text_writer(
        field_name,
        dir.path().to_str().unwrap(),
        "default",
        "",
        1,
        50_000_000,
        false,
        TantivyIndexVersion::V7,
    )
    .unwrap();

    let mut batch = vec![];
    for i in 0..1_000_000 {
        batch.push(format!("hello{:04}", i));
    }
    let batch_ref = batch.iter().map(|s| s.as_str()).collect::<Vec<_>>();

    let now = std::time::Instant::now();
    index_wrapper
        .add_data_by_batch(&batch_ref, Some(0))
        .unwrap();
    let elapsed = now.elapsed();
    println!("add_data_by_batch elapsed: {:?}", elapsed);
}
```
Latency roughly reduces from 1.4s to 558ms.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-08 19:50:24 +08:00
Bingyi Sun
da21640ac3
fix: Fix the bug that null data can not be filtered by null expr (#41124)
issue: https://github.com/milvus-io/milvus/issues/41063

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-04-08 19:12:24 +08:00
cai.zhang
a7713df18d
fix: Correctly parse the minimum value of int64 (#41009)
issue: #40729 

Current approach to parse negative numbers is first parse the numeric
part and then multiply the result by -1(mainly to distinguish the
precedence of the negative sign and the subtraction operator). However,
for the minimum value of int64(`-9223372036854775808`), the value
`9223372036854775808` already exceeds the representable range of int64.
As a result, parsing error occurs.
Therefore, use a specific rule to match `-9223372036854775808`.

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-04-08 16:36:26 +08:00
zhenshan.cao
758cf29e77
fix: create multiple idential indexes by accident (#40179)
issue: https://github.com/milvus-io/milvus/issues/40163

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2025-04-08 15:06:25 +08:00
aoiasd
6f17720e4e
enhance: support use jieba tokenizer with costum dictionary (#39854)
relate: https://github.com/milvus-io/milvus/issues/40168

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-04-08 14:52:27 +08:00
congqixia
96eca2531f
fix: Avoid update original search/query request (#41126)
Related to #41034

Recent pr #40842 introduced logic to avoid requery pk column, which
updates the original request which makes the request not equavilant to
the original one.

When retry happens due to incomplete request error, this change makes
the final result set lacks the pk column even when user specifies it
explicitly.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-08 14:26:28 +08:00
wei liu
57212e5376
enhance: Optimize log output for L0 segment deletions (#40975)
related to: #40884 #39552
Reduce log frequency by aggregating deletion logs for L0 segments:
- Add segment count statistics in rangeHitL0Deletions function
- Change individual segment logs to a single consolidated log entry
- Include total number of processed L0 segments in log output

This change significantly reduces log volume while maintaining essential
visibility into deletion operations.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-04-08 12:04:26 +08:00
wei liu
f79391dea9
fix: remove metrics reset calls to ensure accurate reporting (#41049)
issue: #41048
Fixes issue introduced in PR #33522 where metric resets caused
incomplete data collection by monitoring systems.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-04-08 11:38:36 +08:00
congqixia
1afa7433f0
enhance: Rectify client_request_id logic (#41089)
The traceID is not initialized by client_request_id in context. If the
client sent valid traceID, milvus log will print two different traceID
which is wierd.

This PR add the logic to tray parsing incoming `client_request_id` into
traceID. If it works just use it the request traceID, otherwise set it
to a different field named `client_request_id`.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-08 10:32:26 +08:00
Ted Xu
1bcea2a775
fix: assigning the correct storage version in sync and index tasks (#41093)
See #39663 #40667

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-04-08 10:14:25 +08:00
Zhen Ye
7830dd3713
fix: keep the last persisted message in writeaheadbuffer to fix the never catch up (#41109)
issue: #41062

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-08 09:10:28 +08:00
Spade A
e4da2765ba
enhance: process batch of strings within one tantivy_index_add_string call (#40007)
issue: #40006

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-08 01:20:25 +08:00
congqixia
484cd8c4a9
fix: Ignore growing segment without start pos for seal policy (#41130)
Related to #41129

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-07 22:16:23 +08:00
Bingyi Sun
355f62d6c9
fix: Align brute force search with json index for exists expr (#41116)
issue: #35528

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-04-07 15:42:23 +08:00
wei liu
99270103cf
fix: Offline segment block delegator recovery (#40827)
issue: #39937
Before PR #39552, whenever a segment was missing in either the `current
target` or the `next target`, we would trigger `load segment` to recover
the delegator. However, restoring only the missing segments in the `next
target` is sufficient to advance the target and complete the recovery
process.

In PR #39552, we removed the scheduling of L0 segments along with this
unnecessary `load segment` logic. However, this exposed a new issue: if
the `current target` still has missing segments and there is a flaw in
the `checkDelegatorDataReady` logic, it could block the recovery of a
delegator that contains `offline segments`.

Since `offline segments` are cleaned up asynchronously in this scenario,
this PR removes their blocking effect on delegator recovery, ensuring a
smoother failure recovery process.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-04-07 14:56:22 +08:00
zhagnlu
ee8783cae9
fix:add operator type for some operator (#40895)
#40894

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-04-07 11:58:27 +08:00
zhagnlu
10a63b3f2e
enhance: add formatter for serveral types to remove compile warning (#41094)
#41091

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-04-07 11:54:24 +08:00
cai.zhang
a5be7cbce9
fix: Add the field index lock for getSegmentsIndexStates (#40968)
issue: #40966

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-04-07 11:38:24 +08:00
cai.zhang
05e25431d9
enhance: Deprecate disk params about indexing (#41045)
issue: #40863

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-04-07 11:36:34 +08:00
zhagnlu
0a378dc308
fix:fix format error for json (#41026)
#40963

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-04-07 10:22:22 +08:00
sthuang
a85e36bad2
fix: create collection task check failed after restart (#40982)
The fields and partitions information are stored and fetched with
different prefixes in the metadata. In the CreateCollectionTask, the
RootCoord checks the existing collection information against the
metadata. This check fails if the order of the fields or partitions info
differs, leading to an error after restarting Milvus. To resolve this,
we should use a map in the check logic to ensure consistency.

related: https://github.com/milvus-io/milvus/issues/40955

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-04-05 06:58:22 +08:00
Zhen Ye
f18aa85083
enhance: vchannel fair balance policy for streaming (#40959)
issue: #40638 

- Add `ChannelID` for streaming replica in future.
- Remove the pchannel count fair balance policy for streaming.
- Add Score based vchannel fair balance policy for streaming.
- Add pchannel stats manager to collect the stats of pchannel for
balancer.
- Add configuration and metrics for new balance policy

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-04 10:12:22 +08:00
Bingyi Sun
fcb03b5bd1
feat: add json null/exists expression (#41004)
issue: #35528

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-04-03 17:48:21 +08:00
zhuwenxing
276a8d36f8
test: add negative case when field name is reserved keywords (#41082)
/kind improvement

related: https://github.com/milvus-io/milvus/issues/40290

Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>
2025-04-03 15:54:28 +08:00
zhuwenxing
4306ed4329
test: add import corner testcase (#41079)
/kind improvement

issue: https://github.com/milvus-io/milvus/issues/40291

Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>
2025-04-03 15:52:29 +08:00
wei liu
bf8547578f
fix: Address manual balance and balance check issues (#41037)
issue: #37651
- Fix context propagation for manual balance segment task creation from
PR #38080.
- Optimize stopping balance by preventing redundant checks per round,
addressing performance regression from PR #40297.
- Decrease default `checkBalanceInterval` from 3000ms to 300ms.
- Correct minor log messages in `BalanceChecker`.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-04-03 15:48:27 +08:00
binbin
afb4621012
test: Add test cases for part of json path index (#41016)
Signed-off-by: binbin lv <binbin.lv@zilliz.com>
2025-04-03 13:02:27 +08:00
Zhen Ye
9f27d9af61
fix: segv if the LoadArrowReaderFromRemote run at the exception path (#41069)
issue: #41067

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-03 02:54:21 +08:00
zhuwenxing
eb4884b5e7
test: add coo format for sparse vector import and some negative case (#41040)
/kind improvement

Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>
2025-04-02 19:54:26 +08:00
Spade A
f552ec67dd
fix: support building tantivy index with low version(5) (#40822)
fix: https://github.com/milvus-io/milvus/issues/40823
To solve the problem in the issue, we have to support building tantivy
index with low version
for those query nodes with low tantivy version.

This PR does two things:
1. refactor codes for IndexWriterWrapper to make it concise
2. enable IndexWriterWrapper to build tantivy index by different tantivy
crate

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-02 18:46:20 +08:00
Chun Han
afa519b4c7
fix: array is null failed(#40686) (#41027)
related: #40686

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-04-02 18:20:22 +08:00
cai.zhang
902f6506ca
fix: Get all children deltalogs for segment to load (#40956)
issue: #40207

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-04-02 16:32:22 +08:00
Buqian Zheng
e1216829f7
enhance: weighted reranker to allow skip score normalization (#40903)
issue: https://github.com/milvus-io/milvus/issues/40836

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-04-02 16:24:23 +08:00
smellthemoon
cb1e86e17c
enhance: support add field (#39800)
after the pr merged, we can support to insert, upsert, build index,
query, search in the added field.
can only do the above operates in added field after add field request
complete, which is a sync operate.

compact will be supported in the next pr.
#39718

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2025-04-02 14:24:31 +08:00
congqixia
37cf9a0dc1
enhance: Use %v for missing id log (#41036)
`incomplete query result, missing id %!s(int64=348), len(searchIDs) =
10, len(queryIDs) = 9` error message format with error when missing id
is int64

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-02 11:54:23 +08:00
Spade A
216be1494b
fix: add log for object storage operation fail (#40666)
fix: #40665

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-02 01:26:21 +08:00
Zhen Ye
847b8c8fdc
fix: node version checker should use -dev before releasing (#41039)
issue: #40532

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-01 20:04:20 +08:00
yihao.dai
b4cb8a4b13
enhance: Add UTF-8 string validation for import (#40694)
issue: https://github.com/milvus-io/milvus/issues/40684

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-04-01 19:04:21 +08:00
sre-ci-robot
599be2b88f
[automated] Bump milvus version to v2.5.8 (#41029)
Bump milvus version to v2.5.8
Signed-off-by: sre-ci-robot sre-ci-robot@users.noreply.github.com

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-04-01 17:24:22 +08:00
sre-ci-robot
0e5ba195d5
[automated] Bump milvus version to v2.5.8 (#41023)
Bump milvus version to v2.5.8
Signed-off-by: sre-ci-robot sre-ci-robot@users.noreply.github.com

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-04-01 15:08:28 +08:00
Zhen Ye
b03e60558a
enhance: add proxy and datanode checker when wal balance startup (#40877)
issue: #40532

- balance should enable only when there's no proxy and datanode which
version is lower than 2.6.0

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-01 11:24:22 +08:00
Zhen Ye
cef1d16454
fix: timetick interceptor panics when closing write ahead buffer (#40970)
issue: #40967

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-01 10:44:22 +08:00
yihao.dai
5b78ef0a49
fix: Fix delete data loss due to duplicate binlogID (#40960)
With concurrenct L0 compaction
(https://github.com/milvus-io/milvus/pull/36816), delta logs might be
written to the same L1 segment, causing logID duplication when using the
incremental beginLogID. This PR removes the beginLogID mechanism and
instead passes a log ID range, where the number of IDs in the range
equals the number of compaction segment binlogs multiplied by an
expansion factor.

issue: https://github.com/milvus-io/milvus/issues/40207

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-04-01 10:36:22 +08:00
Bingyi Sun
ba5834adc9
fix: Fix passing wrong json_cast_type in e2e (#41015)
issue: #35528

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-04-01 10:28:21 +08:00
groot
5146b41aa4
fix: fix a 404 bug of WebUI when http.enablePprof is false (#40951)
issue: https://github.com/milvus-io/milvus/issues/40952

Signed-off-by: yhmo <yihua.mo@zilliz.com>
2025-04-01 10:22:22 +08:00
cqy123456
6dc0f42830
fix:growing mmap data type crashed by nullable input (#40994)
issue: https://github.com/milvus-io/milvus/issues/40981
2.5 pr: https://github.com/milvus-io/milvus/pull/40980

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2025-03-31 20:32:19 +08:00
jaime
651afe5058
enhance: Aligning Etcd and MinIO versions in Docker Compose (#40857)
Signed-off-by: jaime <yun.zhang@zilliz.com>
2025-03-31 18:08:21 +08:00
Bingyi Sun
27ff3a42e7
enhance: Record simdjson error (#41003)
issue: #35528

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-03-31 17:56:19 +08:00
wei liu
c02892e9fb
enhance: Balance the collection with the largest row count first (#40297)
issue: #37651
this PR enable to balance the collection with largest row count first,
to avoid temporary migration of small table data to new nodes during
their onboarding, only to be moved out again after the large table
balance, which would cause unnecessary load.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-03-31 16:00:19 +08:00