10379 Commits

Author SHA1 Message Date
zhagnlu
ee1faf80dd
fix:add clear bitmap for batch skip mode (#41166)
#41086 #41150

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-04-09 13:08:27 +08:00
zhenshan.cao
ecc2d80915
enhance: Add primary field name in SearchResult and QueryResults (#39220)
issue: https://github.com/milvus-io/milvus/issues/39219

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2025-04-09 10:48:25 +08:00
sthuang
50e02e3598
enhance: update packed reader api (#41055)
related: https://github.com/milvus-io/milvus/issues/39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-04-09 10:18:26 +08:00
congqixia
e2d8adb963
fix: Use element_type for Array is null operator (#41157)
Related to #41156

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-09 10:16:24 +08:00
cai.zhang
8a77fb9cdc
enhance: Support slot for index task and stats task (#39084)
issue: #39101

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-04-08 20:46:25 +08:00
Spade A
c6a0c2ab64
enhance: process tantivy document add by batch (#40124)
issue: https://github.com/milvus-io/milvus/issues/40006

This PR make tantivy document add by batch. Add document by batch can
greately reduce the latency of scheduling the document add operation
(call tantivy `add_document` only schdules the add operation and it
returns immediately after scheduled) , because each call involes a tokio
block_on which is relatively heavy.

Reduce scheduling part not necessarily reduces the overall latency if
the index writer threads does not process indexing quickly enough.
But if scheduling itself is pretty slow, even the index writer threads
process indexing very fast (by increasing thread number), the overall
performance can still be limited.

The following codes bench the PR (Note, the duration only counts for
scheduling without commit)
```
fn test_performance() {
    let field_name = "text";
    let dir = TempDir::new().unwrap();
    let mut index_wrapper = IndexWriterWrapper::create_text_writer(
        field_name,
        dir.path().to_str().unwrap(),
        "default",
        "",
        1,
        50_000_000,
        false,
        TantivyIndexVersion::V7,
    )
    .unwrap();

    let mut batch = vec![];
    for i in 0..1_000_000 {
        batch.push(format!("hello{:04}", i));
    }
    let batch_ref = batch.iter().map(|s| s.as_str()).collect::<Vec<_>>();

    let now = std::time::Instant::now();
    index_wrapper
        .add_data_by_batch(&batch_ref, Some(0))
        .unwrap();
    let elapsed = now.elapsed();
    println!("add_data_by_batch elapsed: {:?}", elapsed);
}
```
Latency roughly reduces from 1.4s to 558ms.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-08 19:50:24 +08:00
Bingyi Sun
da21640ac3
fix: Fix the bug that null data can not be filtered by null expr (#41124)
issue: https://github.com/milvus-io/milvus/issues/41063

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-04-08 19:12:24 +08:00
cai.zhang
a7713df18d
fix: Correctly parse the minimum value of int64 (#41009)
issue: #40729 

Current approach to parse negative numbers is first parse the numeric
part and then multiply the result by -1(mainly to distinguish the
precedence of the negative sign and the subtraction operator). However,
for the minimum value of int64(`-9223372036854775808`), the value
`9223372036854775808` already exceeds the representable range of int64.
As a result, parsing error occurs.
Therefore, use a specific rule to match `-9223372036854775808`.

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-04-08 16:36:26 +08:00
zhenshan.cao
758cf29e77
fix: create multiple idential indexes by accident (#40179)
issue: https://github.com/milvus-io/milvus/issues/40163

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2025-04-08 15:06:25 +08:00
aoiasd
6f17720e4e
enhance: support use jieba tokenizer with costum dictionary (#39854)
relate: https://github.com/milvus-io/milvus/issues/40168

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-04-08 14:52:27 +08:00
congqixia
96eca2531f
fix: Avoid update original search/query request (#41126)
Related to #41034

Recent pr #40842 introduced logic to avoid requery pk column, which
updates the original request which makes the request not equavilant to
the original one.

When retry happens due to incomplete request error, this change makes
the final result set lacks the pk column even when user specifies it
explicitly.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-08 14:26:28 +08:00
wei liu
57212e5376
enhance: Optimize log output for L0 segment deletions (#40975)
related to: #40884 #39552
Reduce log frequency by aggregating deletion logs for L0 segments:
- Add segment count statistics in rangeHitL0Deletions function
- Change individual segment logs to a single consolidated log entry
- Include total number of processed L0 segments in log output

This change significantly reduces log volume while maintaining essential
visibility into deletion operations.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-04-08 12:04:26 +08:00
wei liu
f79391dea9
fix: remove metrics reset calls to ensure accurate reporting (#41049)
issue: #41048
Fixes issue introduced in PR #33522 where metric resets caused
incomplete data collection by monitoring systems.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-04-08 11:38:36 +08:00
Ted Xu
1bcea2a775
fix: assigning the correct storage version in sync and index tasks (#41093)
See #39663 #40667

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-04-08 10:14:25 +08:00
Zhen Ye
7830dd3713
fix: keep the last persisted message in writeaheadbuffer to fix the never catch up (#41109)
issue: #41062

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-08 09:10:28 +08:00
Spade A
e4da2765ba
enhance: process batch of strings within one tantivy_index_add_string call (#40007)
issue: #40006

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-08 01:20:25 +08:00
congqixia
484cd8c4a9
fix: Ignore growing segment without start pos for seal policy (#41130)
Related to #41129

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-07 22:16:23 +08:00
Bingyi Sun
355f62d6c9
fix: Align brute force search with json index for exists expr (#41116)
issue: #35528

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-04-07 15:42:23 +08:00
wei liu
99270103cf
fix: Offline segment block delegator recovery (#40827)
issue: #39937
Before PR #39552, whenever a segment was missing in either the `current
target` or the `next target`, we would trigger `load segment` to recover
the delegator. However, restoring only the missing segments in the `next
target` is sufficient to advance the target and complete the recovery
process.

In PR #39552, we removed the scheduling of L0 segments along with this
unnecessary `load segment` logic. However, this exposed a new issue: if
the `current target` still has missing segments and there is a flaw in
the `checkDelegatorDataReady` logic, it could block the recovery of a
delegator that contains `offline segments`.

Since `offline segments` are cleaned up asynchronously in this scenario,
this PR removes their blocking effect on delegator recovery, ensuring a
smoother failure recovery process.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-04-07 14:56:22 +08:00
zhagnlu
ee8783cae9
fix:add operator type for some operator (#40895)
#40894

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-04-07 11:58:27 +08:00
zhagnlu
10a63b3f2e
enhance: add formatter for serveral types to remove compile warning (#41094)
#41091

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-04-07 11:54:24 +08:00
cai.zhang
a5be7cbce9
fix: Add the field index lock for getSegmentsIndexStates (#40968)
issue: #40966

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-04-07 11:38:24 +08:00
cai.zhang
05e25431d9
enhance: Deprecate disk params about indexing (#41045)
issue: #40863

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-04-07 11:36:34 +08:00
zhagnlu
0a378dc308
fix:fix format error for json (#41026)
#40963

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-04-07 10:22:22 +08:00
sthuang
a85e36bad2
fix: create collection task check failed after restart (#40982)
The fields and partitions information are stored and fetched with
different prefixes in the metadata. In the CreateCollectionTask, the
RootCoord checks the existing collection information against the
metadata. This check fails if the order of the fields or partitions info
differs, leading to an error after restarting Milvus. To resolve this,
we should use a map in the check logic to ensure consistency.

related: https://github.com/milvus-io/milvus/issues/40955

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-04-05 06:58:22 +08:00
Zhen Ye
f18aa85083
enhance: vchannel fair balance policy for streaming (#40959)
issue: #40638 

- Add `ChannelID` for streaming replica in future.
- Remove the pchannel count fair balance policy for streaming.
- Add Score based vchannel fair balance policy for streaming.
- Add pchannel stats manager to collect the stats of pchannel for
balancer.
- Add configuration and metrics for new balance policy

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-04 10:12:22 +08:00
Bingyi Sun
fcb03b5bd1
feat: add json null/exists expression (#41004)
issue: #35528

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-04-03 17:48:21 +08:00
wei liu
bf8547578f
fix: Address manual balance and balance check issues (#41037)
issue: #37651
- Fix context propagation for manual balance segment task creation from
PR #38080.
- Optimize stopping balance by preventing redundant checks per round,
addressing performance regression from PR #40297.
- Decrease default `checkBalanceInterval` from 3000ms to 300ms.
- Correct minor log messages in `BalanceChecker`.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-04-03 15:48:27 +08:00
Zhen Ye
9f27d9af61
fix: segv if the LoadArrowReaderFromRemote run at the exception path (#41069)
issue: #41067

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-03 02:54:21 +08:00
Spade A
f552ec67dd
fix: support building tantivy index with low version(5) (#40822)
fix: https://github.com/milvus-io/milvus/issues/40823
To solve the problem in the issue, we have to support building tantivy
index with low version
for those query nodes with low tantivy version.

This PR does two things:
1. refactor codes for IndexWriterWrapper to make it concise
2. enable IndexWriterWrapper to build tantivy index by different tantivy
crate

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-02 18:46:20 +08:00
Chun Han
afa519b4c7
fix: array is null failed(#40686) (#41027)
related: #40686

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-04-02 18:20:22 +08:00
cai.zhang
902f6506ca
fix: Get all children deltalogs for segment to load (#40956)
issue: #40207

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-04-02 16:32:22 +08:00
Buqian Zheng
e1216829f7
enhance: weighted reranker to allow skip score normalization (#40903)
issue: https://github.com/milvus-io/milvus/issues/40836

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-04-02 16:24:23 +08:00
smellthemoon
cb1e86e17c
enhance: support add field (#39800)
after the pr merged, we can support to insert, upsert, build index,
query, search in the added field.
can only do the above operates in added field after add field request
complete, which is a sync operate.

compact will be supported in the next pr.
#39718

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2025-04-02 14:24:31 +08:00
congqixia
37cf9a0dc1
enhance: Use %v for missing id log (#41036)
`incomplete query result, missing id %!s(int64=348), len(searchIDs) =
10, len(queryIDs) = 9` error message format with error when missing id
is int64

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-02 11:54:23 +08:00
Spade A
216be1494b
fix: add log for object storage operation fail (#40666)
fix: #40665

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-02 01:26:21 +08:00
Zhen Ye
847b8c8fdc
fix: node version checker should use -dev before releasing (#41039)
issue: #40532

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-01 20:04:20 +08:00
yihao.dai
b4cb8a4b13
enhance: Add UTF-8 string validation for import (#40694)
issue: https://github.com/milvus-io/milvus/issues/40684

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-04-01 19:04:21 +08:00
Zhen Ye
b03e60558a
enhance: add proxy and datanode checker when wal balance startup (#40877)
issue: #40532

- balance should enable only when there's no proxy and datanode which
version is lower than 2.6.0

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-01 11:24:22 +08:00
Zhen Ye
cef1d16454
fix: timetick interceptor panics when closing write ahead buffer (#40970)
issue: #40967

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-01 10:44:22 +08:00
yihao.dai
5b78ef0a49
fix: Fix delete data loss due to duplicate binlogID (#40960)
With concurrenct L0 compaction
(https://github.com/milvus-io/milvus/pull/36816), delta logs might be
written to the same L1 segment, causing logID duplication when using the
incremental beginLogID. This PR removes the beginLogID mechanism and
instead passes a log ID range, where the number of IDs in the range
equals the number of compaction segment binlogs multiplied by an
expansion factor.

issue: https://github.com/milvus-io/milvus/issues/40207

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-04-01 10:36:22 +08:00
groot
5146b41aa4
fix: fix a 404 bug of WebUI when http.enablePprof is false (#40951)
issue: https://github.com/milvus-io/milvus/issues/40952

Signed-off-by: yhmo <yihua.mo@zilliz.com>
2025-04-01 10:22:22 +08:00
cqy123456
6dc0f42830
fix:growing mmap data type crashed by nullable input (#40994)
issue: https://github.com/milvus-io/milvus/issues/40981
2.5 pr: https://github.com/milvus-io/milvus/pull/40980

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2025-03-31 20:32:19 +08:00
Bingyi Sun
27ff3a42e7
enhance: Record simdjson error (#41003)
issue: #35528

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-03-31 17:56:19 +08:00
wei liu
c02892e9fb
enhance: Balance the collection with the largest row count first (#40297)
issue: #37651
this PR enable to balance the collection with largest row count first,
to avoid temporary migration of small table data to new nodes during
their onboarding, only to be moved out again after the large table
balance, which would cause unnecessary load.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-03-31 16:00:19 +08:00
Bingyi Sun
15ec7bae4d
fix: Fix using json index when iterative_filter is specified (#40945)
issue: #40934

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-03-31 15:26:19 +08:00
groot
aae3a3598e
enhance: bulkinsert supports parsing sparse vector form parquet struct (#40927)
issue: https://github.com/milvus-io/milvus/issues/40777

Signed-off-by: yhmo <yihua.mo@zilliz.com>
2025-03-31 14:20:30 +08:00
aoiasd
1cc88d7755
enhance: add utf8 check for all varchar field (#40670)
https://github.com/milvus-io/milvus/issues/40684

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-03-28 19:38:17 +08:00
Buqian Zheng
7a056aff9d
enhance: avoid re-query if hybrid search requested only pk as output field (#40842)
proxy to always remove pk field from output field when forwarding
request to QN, and if user requested pk, fill it from IDs.

issue: https://github.com/milvus-io/milvus/issues/40833

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-03-28 14:32:18 +08:00
Bingyi Sun
9676365af9
fix: Fix json index not equal filter (#40647)
issue: #35528

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-03-27 23:06:23 +08:00