mirror of
https://gitee.com/milvus-io/milvus.git
synced 2026-01-03 09:22:30 +08:00
fix: #39711 Unlike English sentence where each words are parsed exactly once and one after one with position length 1, one Chinese word may be parsed to multiple words with position length larger than 1. For example, "badminton and skiing" will be parsed to Token{ start: 0, length: 1, text: "badminton" }, Token{ start: 1, length: 1, text: "and" }, and Token{ start: 2, length: 1, text: "tennis" }. While for exmaple for Chinsese: "羽毛球和滑雪" may be parsed to Token{ start: 0, length: 2, text: "羽毛" }, Token{ start: 0, length: 3, text: "羽毛球" }, Token{ start: 3, length: 1, text: "和" }, and Token{ start: 4, length: 2, text: "滑雪" }. This PR fix that the code not recognizes this situation. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>