10471 Commits

Author SHA1 Message Date
congqixia
f3f8227cd0
enhance: [AddField] Trigger check schema in retrieve as well (#41598)
Related to #39718
Fixes milvus-io/pymilvus#2771

This PR:
- Make AsyncRetrieve task triggers "schema check" logic as well
- Rename `AddField` related methods to align with code standard

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-29 14:10:49 +08:00
Spade A
910f68c986
fix: update tantivy to fix tantivy doc out of order when merge (#41596)
issue: #41597

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-29 13:46:49 +08:00
Spade A
f35e8f7420
fix: fix arm64 compile issue (#41593)
issue: https://github.com/milvus-io/milvus/issues/41059,
https://github.com/milvus-io/milvus/issues/41510

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-29 13:19:25 +08:00
yihao.dai
71b14fc32b
enhance: Skip disk quota check for l0 import (#41571)
issue: https://github.com/milvus-io/milvus/issues/41569

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-04-29 10:46:54 +08:00
Zhen Ye
9cb5271027
enhance: remove support of embeded nats mq (#41565)
issue: #41564

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-28 23:06:49 +08:00
junjiejiangjjj
bb7df40fc1
feat: Http interface supports rerank (#41486)
https://github.com/milvus-io/milvus/issues/35856

Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>
2025-04-28 23:02:50 +08:00
Zhen Ye
dfbb02a5f7
enhance: make streaming message as a log field for easier coding (#41545)
issue: #41544

- implement message can be logged as a field by zap.
- fix too many slow log for woodpecker.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-28 14:38:42 +08:00
Buqian Zheng
3de904c7ea
feat: add cachinglayer to sealed segment (#41436)
issue: https://github.com/milvus-io/milvus/issues/41435

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-04-28 10:52:40 +08:00
cai.zhang
640f526301
fix: Update current scalar index version to compatible tantivy different versions (#41141)
issue: #40823

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-04-27 20:44:39 +08:00
yihao.dai
16eb5eb921
enhance: Accelerate delete filtering during binlog import (#41551)
Use map for deleteData instead of slice to accelerate delete filtering
during binlog import.

issue: https://github.com/milvus-io/milvus/issues/41550

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-04-27 18:56:38 +08:00
aoiasd
3892451880
fix: bm25 search failed when avgdl == nan (#41502)
relate: https://github.com/milvus-io/milvus/issues/41490

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-04-27 17:34:38 +08:00
Chun Han
12cde913b5
fix: fail to get string views due to chunk bound empty loop(#41300) (#41452)
related: #41300

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-04-27 10:40:38 +08:00
Zhen Ye
6a15790799
enhance: add interface for message and fix write ahead buffer (#41470)
issue: #41439

- add IsPersisted and VChannel interface for message
- add WithNotPersisted() for message builder
- fix the persisted time tick lost at write ahead buffer

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-27 10:24:38 +08:00
congqixia
c0661f4e9d
enhance: Set resolve_type_alias to False to generate MockMsgHandler (#41531)
Previously, mockery will resolve message.ImmutableXXXMessage to private
message.specializedImmutableMessage[H, B], which will not be able to
compile.

This PR set generate-mockery-interal `resolve_type_alias` to False,
which is recommented for v3 compatible, to avoid this problem.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-27 10:02:38 +08:00
congqixia
b5443ddbd0
enhance: [AddField] Reopen loaded segments after AddField (#41529)
Related to #39718

This PR:
- Add reopen logic for growing & sealed segments
- Lazy reopen when schema version increases
- Add FinishLoad api for loading progress

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-26 08:48:39 +08:00
Buqian Zheng
1c8b9c127d
fix: Make sure segment in ut is destroyed before static MmapManager singleton (#41508)
issue: #41507

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-04-25 18:50:38 +08:00
XuanYang-cn
dab39c610b
enhance: remove not inused DDLCodec (#41485)
See also: #39242

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-04-25 17:26:37 +08:00
Zhen Ye
01c0356ed3
fix: make add segment operation in meta idempotent (#41515)
issue: #41514

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-25 16:50:38 +08:00
Xianhui Lin
1a6838b496
fix: json stats add map null check before insert into tantivity (#41505)
json stats add map null check before insert into tantivity. Json stats
index may fail if there is no data
issue:https://github.com/milvus-io/milvus/issues/41494

---------

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-04-24 21:06:37 +08:00
Zhen Ye
a3d621cb5e
fix: remove the concurrent limits for streaming service (#41484)
issue: #41479

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-24 20:36:38 +08:00
Zhen Ye
ecfc868dcb
fix: write buffer not unregistered when datasyncservice is gone (#41496)
issue: #41495

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-24 19:38:38 +08:00
junjiejiangjjj
e56adc121b
enhance: refactor embedding credentials manager (#41442)
https://github.com/milvus-io/milvus/issues/35856

Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>
2025-04-24 14:34:38 +08:00
congqixia
dbe54c2df8
enhance: [AddField] Resolve conflicts & make WAL ts collection updatets (#41476)
Related to #39718

This PR:
- Use WAL broadcast timestamp as Collection update timestamp
- Remove request_fields size assertion
- Remove proxy schema cache loaded field check & skip related cases
- other minor issues

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-24 12:06:39 +08:00
XuanYang-cn
540456041f
enhance: Remove not inuse binlog iterator (#41359)
See also: #41466

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-04-24 12:04:38 +08:00
Spade A
f3d878ab3f
fix: update tantivy for fixing phrase match (#41450)
issue: #41454
https://github.com/zilliztech/tantivy/pull/8 fixes the problem, this PR
update the tantivy.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-24 10:52:37 +08:00
Zhen Ye
5fd47c3c89
fix: mockery too unavailable after upgrade golang version (#41481)
issue: #41291
pr: #41318

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-24 10:46:43 +08:00
junjiejiangjjj
f23df95a77
feat : Support decay rerank (#41223)
https://github.com/milvus-io/milvus/issues/35856
#41312

Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>
2025-04-23 20:48:39 +08:00
aoiasd
f52c2909c4
feat: support multi analyzer for bm25 function (#41351)
relate: https://github.com/milvus-io/milvus/issues/41213

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-04-23 18:22:38 +08:00
congqixia
85ed200529
fix: Save update timestamp in catalog.AlterCollection API (#41468)
Related to #41467

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-23 16:48:37 +08:00
Xianhui Lin
3d4889586d
fix: JsonStats filter by conjunctExpr and improve the task slot calculation logic (#41459)
Optimized JSON filter execution by introducing
ProcessJsonStatsChunkPos() for unified position calculation and
GetNextBatchSize() for better batch processing.
Improved JSON key generation by replacing manual path joining with
milvus::Json::pointer() and adjusted slot size calculation for JSON key
index jobs.
Updated the task slot calculation logic in calculateStatsTaskSlot() to
handle the increased resource needs of JSON key index jobs.
issue: https://github.com/milvus-io/milvus/issues/41378
https://github.com/milvus-io/milvus/issues/41218

---------

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-04-23 16:30:37 +08:00
aoiasd
655cc7fe06
fix: bm25 stats idf oracle leak (#41425)
relate: https://github.com/milvus-io/milvus/issues/41424

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-04-23 14:28:37 +08:00
aoiasd
a16bd6263b
feat: support more lauguage for build in stop words and add remove punct, regex filter (#41412)
relate: https://github.com/milvus-io/milvus/issues/41213

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-04-23 11:44:37 +08:00
SimFG
91d40fa558
fix: Update logging context and upgrade dependencies (#41318)
- issue: #41291

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-04-23 10:52:38 +08:00
aoiasd
11f2fae42e
feat: support extend default dict for jieba tokenizer (#41360)
relate: https://github.com/milvus-io/milvus/issues/41213

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-04-22 20:34:37 +08:00
congqixia
481938297c
enhance: [AddField] Use next field id instead of global allocation (#41440)
Related to #39718

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-22 17:14:37 +08:00
congqixia
6f4e0d8e38
enhance: [AddField] Use schema update ts as guarantee ts (#41430)
Related to #39718

Use schema update ts when it's greater than calculated guarantee
timestamp to make sure that all read request using updated schema shall
wait all schema change event processed.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-22 17:12:45 +08:00
congqixia
b36c88f3c8
enhance: [AddField] Broadcast schema change via WAL (#41373)
Related to #39718

Add Broadcast logic for collection schema change and notifies:
- Streamnode - Delegator
- Streamnode - Flush component
- QueryNodes via grpc

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-22 16:28:37 +08:00
aoiasd
110c5aaaf4
feat: support icu and language identifier tokenizer (#41214)
relate: https://github.com/milvus-io/milvus/issues/41213

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-04-22 15:56:37 +08:00
cqy123456
5219d9a723
fix: Inserting null and non-null array at the same time will cause milvus crash when growing mmap open (#41051)
issue: https://github.com/milvus-io/milvus/issues/40981
2.5 pr: https://github.com/milvus-io/milvus/pull/41052

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2025-04-22 12:26:37 +08:00
Zhen Ye
7f5a9a6046
fix: unstable timeticksync unittest (#41437)
issue: #38399

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-22 10:53:29 +08:00
Zhen Ye
9339bccccc
enhance: move sent first timeticksync, make recovery more easier (#41405)
issue: #38399

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-21 17:18:37 +08:00
aoiasd
f166843c5e
enhance: support use lindera tag filter (#40416)
relate: https://github.com/milvus-io/milvus/issues/39659

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-04-21 15:56:36 +08:00
Xianhui Lin
c5428c12eb
feat: Add support for modifying max capacity of array fields (#41404)
feat: Add support for modifying max capacity of array fields

This commit adds support for modifying the max capacity of array fields
in the `alterCollectionFieldTask` function. It checks if the field is an
array type and then validates and updates the max capacity value. This
change improves the flexibility of array fields in the collection.

Issue: https://github.com/milvus-io/milvus/issues/41363

---------

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-04-21 15:52:37 +08:00
sparknack
8ccb875e41
enhance: add simde package (#40943)
issue: #40942

Add simde package, which can make porting SIMD code to other
architectures much easier.

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-04-21 12:18:40 +08:00
Spade A
5b1430f27e
enhance: tantivy collector set bitset directly (#39748)
fix: #39755

The following shows a simple benchmark where insert 1M docs where all
rows are "hello", the latency is segcore level, CPU is 9900K:
master: 2.62ms
this PR: 2.11ms

bench mark code:

```
TEST(TextMatch, TestPerf) {
    auto schema = GenTestSchema({}, true);
    auto seg = CreateSealedSegment(schema, empty_index_meta);
    int64_t N = 1000000;
    uint64_t seed = 19190504;
    auto raw_data = DataGen(schema, N, seed);
    auto str_col = raw_data.raw_->mutable_fields_data()
                       ->at(1)
                       .mutable_scalars()
                       ->mutable_string_data()
                       ->mutable_data();
    for (int64_t i = 0; i < N - 1; i++) {
        str_col->at(i) = "hello";
    }
    SealedLoadFieldData(raw_data, *seg);
    seg->CreateTextIndex(FieldId(101));

    auto now = std::chrono::high_resolution_clock::now();
    auto expr = GetMatchExpr(schema, "hello", OpType::TextMatch);
    auto final = ExecuteQueryExpr(expr, seg.get(), N, MAX_TIMESTAMP);
    auto end = std::chrono::high_resolution_clock::now();
    auto duration =
        std::chrono::duration_cast<std::chrono::microseconds>(end - now);
    std::cout << "TextMatch query time: " << duration.count() << "ms"
              << std::endl;
}
```

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-20 23:02:41 +08:00
Chun Han
016920b023
fix: solve incompitable problem for none-encoding index(#40838) (#41369)
related: #40838

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-04-20 22:56:44 +08:00
Zhen Ye
c4a41cc32b
fix: add node id check to avoid double flush at most time (#41236)
issue: #41028

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-20 22:44:38 +08:00
Zhen Ye
ef4923e66b
fix: catchup scan never done if wal truncate (#41345)
issue: #41062

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-20 22:40:37 +08:00
Zhen Ye
78fca7e88d
fix: transaction should retry if transaction is expired (#41379)
issue: #41248

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-20 22:38:36 +08:00
tinswzy
6fa68c1f16
enhance: Support Woodpecker as a WAL storage option for Milvus (#41095)
#40916 Support Woodpecker as a WAL storage option for Milvus

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-04-20 22:22:42 +08:00