milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-03 09:22:30 +08:00

History

fix: offset combined with term should be based on Token positions in phrase match (#39931 )

fix: #39711

Unlike English sentence where each words are parsed exactly once and one
after one with position length 1, one Chinese word may be parsed to
multiple words with position length larger than 1.

For example, "badminton and skiing" will be parsed to Token{ start: 0,
length: 1, text: "badminton" }, Token{ start: 1, length: 1, text: "and"
}, and Token{ start: 2, length: 1, text: "tennis" }.

While for exmaple for Chinsese: "羽毛球和滑雪" may be parsed to Token{ start:
0, length: 2, text: "羽毛" }, Token{ start: 0, length: 3, text: "羽毛球" },
Token{ start: 3, length: 1, text: "和" }, and Token{ start: 4, length: 2,
text: "滑雪" }.

This PR fix that the code not recognizes this situation.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>

2025-02-18 20:38:51 +08:00

boost_ext

Remove ConcurrentBitsetPtr in segcore

2021-03-09 16:16:43 +08:00

jemalloc

enhance: jemalloc aarch64 platform use 64k pagesize. (#29522 )

2024-03-07 21:01:01 +08:00

knowhere

[automated] Update Knowhere Commit (#39898 )

2025-02-16 01:32:13 +08:00

milvus-storage

feat: introduce third-party milvus-storage (#39418 )