milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-07 19:31:51 +08:00

Author	SHA1	Message	Date
Amit Kumar	388d56fdc7	enhance: Add support for minimum_should_match in text_match (parser, engine, client, and tests) (#44988 ) ### Is there an existing issue for this? - [x] I have searched the existing issues --- Please see: https://github.com/milvus-io/milvus/issues/44593 for the background This PR makes https://github.com/milvus-io/milvus/pull/44638 redundant, which can be closed. The PR comments for the original implementation suggested an alternative and a better approach, this new PR has that implementation. --- This PR - Adds an optional `minimum_should_match` argument to `text_match(...)` and wires it through the parser, planner/visitor, index bindings, and client-level tests/examples so full-text queries can require a minimum number of tokens to match. Motivation - Provide a way to require an expression to match a minimum number of tokens in lexical search. What changed - Parser / grammar - Added grammar rule and token: `MINIMUM_SHOULD_MATCH` and `textMatchOption` in `internal/parser/planparserv2/Plan.g4`. - Regenerated parser outputs: `internal/parser/planparserv2/generated/*` (parser, lexer, visitor, etc.) to support the new rule. - Planner / visitor - `parser_visitor.go`: parse and validate the `minimum_should_match` integer; propagate as an extra value on the `TextMatch` expression so downstream components receive it. - Added `VisitTextMatchOption` visitor method handling. - Client (Golang) - Added a unit test to verify `text_match(..., minimum_should_match=...)` appears in the generated DSL and is accepted by client code: `client/milvusclient/read_test.go` (new test coverage). - Added an integration-style test for the feature to the go-client testcase suite: `tests/go_client/testcases/full_text_search_test.go` (exercise min=1, min=3, large min). - Added an example demonstrating `text_match` usage: `client/milvusclient/read_example_test.go` (example name conforms to godoc mapping). - Engine / index - Updated C++ index interface: `TextMatchIndex::MatchQuery` - Added/updated unit tests for the index behavior: `internal/core/src/index/TextMatchIndexTest.cpp`. - Tantivy binding - Added `match_query_with_minimum` implementation and unit tests to `internal/core/thirdparty/tantivy/tantivy-binding/src/index_reader_text.rs` that construct boolean queries with minimum required clauses. Behavioral / compatibility notes - This adds an optional argument to `text_match` only; default behavior (no `minimum_should_match`) is unchanged. - Internal API change: `TextMatchIndex::MatchQuery` signature changed (internal component). Callers in the repo were updated accordingly. - Parser changes required regenerating ANTLR outputs Tests and verification - New/updated tests: - Go client unit test: `client/milvusclient/read_test.go` (mocked Search request asserts DSL contains `minimum_should_match=2`). - Go e2e-style test: `tests/go_client/testcases/full_text_search_test.go` (exercises min=1, 3 and a large min). - C++ unit tests for index behavior: `internal/core/src/index/TextMatchIndexTest.cpp`. - Rust binding unit tests for `match_query_with_minimum`. - Local verification commands to run: - Go client tests: `cd client && go test ./milvusclient -run ^$` (client package) - Go testcases: `cd tests/go_client && go test ./testcases -run TestTextMatchMinimumShouldMatch` (requires a running Milvus instance) - C++ unit tests / build: run core build/test per repo instructions (the change touches core index code). - Rust binding tests: `cd internal/core/thirdparty/tantivy/tantivy-binding && cargo test` (if developing locally). --------- Signed-off-by: Amit Kumar <amit.kumar@reddit.com> Co-authored-by: Amit Kumar <amit.kumar@reddit.com>	2025-11-07 16:07:11 +08:00
congqixia	489288b5e3	enhance: Update builder image tag upgrading go1.24.9 (#45394 ) Related to #45359 Fixing CVE-2025-58187 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-11-07 15:27:34 +08:00
aoiasd	6102f001a9	enhance: skip check source id (#45377 ) relate:https://github.com/milvus-io/milvus/issues/45381 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-11-07 15:19:34 +08:00
yihao.dai	2fad5b34f7	fix: Fix data race in replicate stream client (#45346 ) issue: https://github.com/milvus-io/milvus/issues/44123 Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-11-07 10:17:33 +08:00
congqixia	4a6e8d822c	enhance: Bump go version to 1.24.9 (#45359 ) Fixing CVE-2025-58187 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-11-07 10:13:35 +08:00
cai.zhang	7527ddf50f	enhance: [test] Move R-Tree index tests into the implementation package (#45355 ) Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-11-07 10:03:33 +08:00
cai.zhang	b8f9384a85	fix: Skip building text index for newly added columns (#45316 ) issue: #45315 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-11-06 19:47:35 +08:00
XuanYang-cn	2dd2c96eb1	fix: Accidentally ignored sealed segments in L0 Compaction (#45340 ) When there're no growing segments in the collection, L0 Compaction will try to choose all L0 segments that hits all L1/L2 segments. However, if there's Sealed Segment still under flushing in DataNode at the same time L0 Compaction selects satisfied L1/L2 segments, L0 Compaction will ignore this Segment because it's not in "FlushState", which is wrong, causing missing deletes on the Sealed Segment. This quick solution here is to fail this L0 compaction task once selected a Sealed segment. See also: #45339 --------- Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-11-06 16:53:38 +08:00
yanliang567	a2282d61cb	test: Add more async tests (#45327 ) related issue: #45326 Signed-off-by: yanliang567 <yanliang.qiao@zilliz.com>	2025-11-06 15:43:33 +08:00
XuanYang-cn	623a9e5156	fix: Accurate size estimation for sliced arrow arrays in compaction (#45294 ) Sliced arrow arrays "incorrectly" returned the original array's size via SizeInBytes(), causing inaccurate memory estimates during compaction. This resulted in segments closing prematurely in mergeSplit mode - expected 500MB compactions produced 4x100+MB segments instead. Fixed by calculating actual byte size of sliced arrays, ensuring proper segment sizing and more accurate memory usage tracking. See also: #45293 Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-11-06 14:57:34 +08:00
congqixia	e284733399	fix: Move FinishLoad before text index creation to ensure raw data availability (#45334 ) Related to #45333 Fix segment loading failure when adding fields with text match enabled. The issue occurred because text indexes were being loaded before FinishLoad() was called, meaning raw data was not properly available when text index creation attempted to access it, resulting in "failed to create text index, neither raw data nor index are found" errors. Solution is to move the FinishLoad() call to execute after raw data loading but before text index loading. This ensures that: 1. Raw data is properly loaded and available in memory 2. Text indexes can access the raw data they need during creation 3. The segment is in the correct state before any index operations Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-11-06 14:49:34 +08:00
zhagnlu	59c64bee07	fix: not use json_shredding for json path is null (#45310 ) #45284 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-11-06 11:43:33 +08:00
sparknack	9032bb7668	enhance: unify the aligned buffer for both buffered and direct I/O (#45323 ) issue: #43040 Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-11-06 10:53:33 +08:00
congqixia	1cf00c6d32	fix: Support JSON default value in compaction (#45330 ) Related to #45329 Fix compaction failure when handling newly added dynamic fields with storage v1 binlogs. The issue occurred because the `GenerateEmptyArrayFromSchema` function did not support JSON data type default values, causing "Unexpected default value" errors during compaction. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-11-06 10:15:34 +08:00
Lior Friedman	a4d69031f1	fix: Add AiSAQ index type RAM estimation implementation on the query node. (#45246 ) Currently, the index type AiSAQ RAM usage estimation is not being calculated correctly. AiSAQ index type consumes less RAM usage while loading the index than DISKANN does, and the query node module is missing the implementation of the RAM usage estimation for that AiSAQ index type. We suggest that the AiSAQ RAM usage estimation calculation should be as follows: UsedDiskMemoryRatioAisaq = 1024 (contrary to the UsedDiskMemoryRatio, which is 4) neededMemSize = indexInfo.IndexSize / UsedDiskMemoryRatioAisaq neededDiskSize = indexInfo.IndexSize Reported issue is #45247 --------- Signed-off-by: Lior Friedman <lior.friedman@il.kioxia.com> Signed-off-by: friedl <lior.friedman@kioxia.com> Co-authored-by: friedl <lior.friedman@kioxia.com>	2025-11-06 08:53:34 +08:00
yihao.dai	121eb912ba	fix: Fix load segment failed due to get disk usage error (#45255 ) When getting disk usage, files or directories may be removed concurrently due to segment release. This PR ignores “file or directory does not exist” errors in such cases. issue: https://github.com/milvus-io/milvus/issues/45239 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-11-06 08:51:33 +08:00
congqixia	55bfd610b6	enhance: [StorageV2] Integrate FFI interface for packed reader (#45132 ) Related to #44956 Integrate the StorageV2 FFI interface as the unified storage layer for reading packed columnar data, replacing the custom iterative reader with a manifest-based approach using the milvus-storage library. Changes: - Add C++ FFI reader implementation (ffi_reader_c.cpp/h) with Arrow C Stream interface - Implement utility functions to convert CStorageConfig to milvus-storage Properties - Create ManifestReader in Go that generates manifests from binlogs - Add FFI packed reader CGO bindings (packed_reader_ffi.go) - Refactor NewBinlogRecordReader to use ManifestReader for V2 storage - Support both manifest file paths and direct manifest content - Enable configurable buffer sizes and column projection Technical improvements: - Zero-copy data exchange using Arrow C Data Interface - Optimized I/O operations through milvus-storage library - Simplified code path with manifest-based reading - Better performance with batched streaming reads --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-11-05 19:57:34 +08:00
XuanYang-cn	d036fd5422	test: Increase PyMilvus version to 2.7.0rc54 for master branch (#45273 ) Automated daily bump from pymilvus master branch. Updates tests/python_client/requirements.txt. Signed-off-by: XuanYang-cn <xuan.yang@zilliz.com>	2025-11-05 19:35:33 +08:00
zhenshan.cao	4a936fae8e	fix: timezone parameter ingored in Search/Query (#45320 ) issue: https://github.com/milvus-io/milvus/issues/44598 Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2025-11-05 19:19:33 +08:00
congqixia	1e48911825	enhance: [GoSDK] Support struct array field type (#45291 ) Related to #42148 Add comprehensive support for struct array field type in the Go SDK, including data structure definitions, column operations, schema construction, and full test coverage. Struct Array Column Implementation (`client/column/struct.go`) - Add `columnStructArray` type to handle struct array fields - Implement `Column` interface methods: - `NewColumnStructArray()`: Create new struct array column from sub-fields - `Name()`, `Type()`: Basic metadata accessors - `Slice()`: Support slicing across all sub-fields - `FieldData()`: Convert to protobuf `StructArrayField` format - `Get()`: Retrieve struct values as `map[string]any` - `ValidateNullable()`, `CompactNullableValues()`: Nullable support - Placeholder implementations for unsupported operations (AppendValue, GetAsX, IsNull, AppendNull) Struct Array Parsing (`client/column/columns.go`) - Add `parseStructArrayData()` function to parse `StructArrayField` from protobuf - Update `FieldDataColumn()` to detect and parse struct array fields - Support range-based slicing for struct array data --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-11-05 15:43:33 +08:00
zhenshan.cao	490a618c30	fix: Handle timestamptz import errors (#45287 ) issue: https://github.com/milvus-io/milvus/issues/44585 Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2025-11-05 15:05:33 +08:00
foxspy	95d7302cf4	enhance: update knowhere version (#45270 ) issue: #42937 Signed-off-by: xianliang.li <xianliang.li@zilliz.com>	2025-11-05 11:09:32 +08:00
Zhen Ye	a2ce70d252	fix: ddl framework bug patch (#45290 ) issue: #45080, #45274, #45285 - LoadCollection doesn't ignore the ignorable request, for false field array. - CreatIndex doesn't ignore the ignorable request, for wrong index. - index meta is not thread safe. - lost parameter check of DDL. - DDL Ack scheduler may get stuck and DDL is block until next incoming DDL. - lost parameter checker of ddl --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-11-04 22:25:33 +08:00
cai.zhang	fa3d4ebfbe	fix: Compute the correct batch size for the geometry index of the growing segment (#45253 ) issue: #44648 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-11-04 20:25:37 +08:00
zhagnlu	792e931fcb	enhance: rename jsonstats related user config params (#45254 ) #44132 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-11-04 20:21:36 +08:00
Spade A	c0029b788d	fix: alter collection failed with MMAP setting for STRUCT (#45173 ) issue: https://github.com/milvus-io/milvus/issues/45001 ref: https://github.com/milvus-io/milvus/issues/42148 --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: aoiasd <zhicheng.yue@zilliz.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com> Co-authored-by: aoiasd <zhicheng.yue@zilliz.com>	2025-11-04 20:19:33 +08:00
sijie-ni-0214	19af1903f6	fix: etcd data persistence by adding --data-dir parameter (#45200 ) issue: https://github.com/milvus-io/milvus/issues/45174 Signed-off-by: sijie-ni-0214 <sijie.ni@zilliz.com>	2025-11-04 19:37:34 +08:00
Gao	8f645760af	enhance: make knowhere thread pool config refreshable (#45190 ) Signed-off-by: chasingegg <chao.gao@zilliz.com>	2025-11-04 18:33:33 +08:00
Zhen Ye	966ebfbcab	fix: support upgrading from 2.6.x to 2.6.5 (#45264 ) issue: #43897 Signed-off-by: chyezh <chyezh@outlook.com>	2025-11-04 18:31:32 +08:00
zhuwenxing	06933c25b8	test: add geometry datatype in import testcases (#45014 ) /kind improvement Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>	2025-11-04 16:55:33 +08:00
zhenshan.cao	6327c9a514	fix: Fix bugs related to TimestampTz (#45111 ) issue: https://github.com/milvus-io/milvus/issues/44527 https://github.com/milvus-io/milvus/issues/44537 https://github.com/milvus-io/milvus/issues/44538 https://github.com/milvus-io/milvus/issues/44585 https://github.com/milvus-io/milvus/issues/44622 Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2025-11-04 16:51:33 +08:00
Feilong Hou	9e4975bdfa	test: added test case for partial update on duplicate pk (#45130 ) Issue: #45129 <test>: <add new test case> <also delete duplicate test case> On branch feature/partial-update Changes to be committed: modified: milvus_client/test_milvus_client_partial_update.py modified: milvus_client/test_milvus_client_upsert.py --------- Signed-off-by: Eric Hou <eric.hou@zilliz.com> Co-authored-by: Eric Hou <eric.hou@zilliz.com>	2025-11-04 15:47:32 +08:00
zhikunyao	7193d01808	test: support e2e-amd helm in gcp milvus cluster (#45175 ) Signed-off-by: Zhikun Yao <zhikun.yao@zilliz.com>	2025-11-04 15:07:32 +08:00
sparknack	40b5e6b134	fix: avoid potential race conditions when updating the executor (#45230 ) issue: #43040 Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-11-04 14:25:33 +08:00
yihao.dai	ab11fddc66	enhance: Wait for replicate stream client to finish (#45259 ) Make channel replicator stop more gracefully. issue: https://github.com/milvus-io/milvus/issues/44123 Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-11-04 14:19:33 +08:00
cai.zhang	617891b436	fix: Skip create tmp dir for growing R-Tree index (#45256 ) issue: #45181 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-11-04 13:01:32 +08:00
Spade A	2b5241fe5a	fix: allow "[" and "]" in index name (#45193 ) issue: https://github.com/milvus-io/milvus/issues/42148 --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-11-04 11:59:34 +08:00
Spade A	cd0b36c39e	feat: impl StructArray -- support diskann index (#45223 ) issue: https://github.com/milvus-io/milvus/issues/42148 --------- Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com> Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-11-04 11:57:33 +08:00
zhagnlu	653e95aaad	fix: fix bug for shredding json when empty json but not null (#45221 ) #45157 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-11-04 11:11:33 +08:00
Zhen Ye	d320ccab99	fix: milvus role cannot stop at initializing state (#45244 ) issue: #45243 Signed-off-by: chyezh <chyezh@outlook.com>	2025-11-04 10:47:32 +08:00
congqixia	e6be590b97	enhance: set schema version when creating new collection (#45263 ) Related to #43028 Initialize the schema version field when creating a new collection instance in QueryNode. The schema version is extracted from loadMetaInfo and assigned to the collection, ensuring proper schema version tracking and consistency across the distributed system. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-11-04 10:15:32 +08:00
Zhen Ye	576084fe86	enhance: support alter collection/database with WAL-based DDL framework (#45266 ) issue: #43897 - Alter collection/database is implemented by WAL-based DDL framework now. - Support AlterCollection/AlterDatabase in wal now. - Alter operation can be synced by new CDC now. - Refactor some UT for alter DDL. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-11-04 09:59:33 +08:00
Zhen Ye	31a609c21d	fix: kafka should auto reset the offset from earliest to read (#45237 ) issue: #44172, #45210, #44851 kafka will auto reset the offset to "latest" if the offset is Out-of-range. the recovery of milvus wal cannot read any message from that. So once the offset is out-of-range, kafka should read from eariest to read the latest uncleared data. https://kafka.apache.org/documentation/#consumerconfigs_auto.offset.reset Signed-off-by: chyezh <chyezh@outlook.com>	2025-11-03 21:07:33 +08:00
cai.zhang	01cf5c9341	enhance: Add log to debug index task (#45198 ) Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-11-03 20:01:34 +08:00
cai.zhang	ed8ba4a28c	enhance: Make GeometryCache an optional configuration (#45192 ) issue: #45187 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-11-03 19:59:32 +08:00
Spade A	ae03dee116	feat: implement ngram tokenizer with token_chars and custom_token_chars (#45040 ) issue: https://github.com/milvus-io/milvus/issues/45039 Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-11-03 18:09:33 +08:00
zhuwenxing	434e0847fd	test: remove xfail after fix (#45114 ) /kind improvement Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>	2025-11-03 17:21:37 +08:00
zhuwenxing	a03c398986	test: add import case for struct array (#45146 ) /kind improvement Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>	2025-11-03 17:19:39 +08:00
Zhen Ye	25e0485a56	fix: unrecoverable when replicate from old (#45224 ) issue: #44962 Signed-off-by: chyezh <chyezh@outlook.com>	2025-11-03 15:07:36 +08:00
yihao.dai	27734982fa	enhance: Don't start cdc by default (#45216 ) issue: https://github.com/milvus-io/milvus/issues/44123 Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-11-03 13:15:32 +08:00

1 2 3 4 5 ...

23423 Commits