milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2025-12-08 01:58:34 +08:00

Author	SHA1	Message	Date
congqixia	0a208d7224	enhance: Move segment loading logic from Go layer to segcore for self-managed loading (#45488 ) Related to #45060 Refactor segment loading architecture to make segments autonomously manage their own loading process, moving the orchestration logic from Go (segment_loader.go) to C++ (segcore). C++ Layer (segcore): - Added `SetLoadInfo()` and `Load()` methods to `SegmentInterface` and implementations - Implemented `ChunkedSegmentSealedImpl::Load()` with parallel loading strategy: - Separates indexed fields from non-indexed fields - Loads indexes concurrently using thread pools - Loads field data for non-indexed fields in parallel - Implemented `SegmentGrowingImpl::Load()` to convert and load field data - Extracted `LoadIndexData()` as a reusable utility function in `Utils.cpp` - Added `SegmentLoad()` C binding in `segment_c.cpp` Go Layer: - Added `Load()` method to segment interfaces - Updated mock implementations and test interfaces - Integrated new C++ `SegmentLoad()` binding in Go segment wrapper --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-11-14 11:21:37 +08:00
Gao	09a3195867	enhance: support max_connections config for remote storage (#45225 ) related: https://github.com/milvus-io/milvus/issues/45344 Signed-off-by: chasingegg <chao.gao@zilliz.com>	2025-11-13 15:37:38 +08:00
Spade A	929dc65882	fix: fix index compatibility after upgrade (#45373 ) issue: https://github.com/milvus-io/milvus/issues/45380 --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-11-13 12:59:38 +08:00
Chun Han	406fa7b694	fix: failed to get raw data for hybrid index(#45318 ) (#45411 ) related: #45318 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-11-13 10:17:37 +08:00
sparknack	9d75d0393e	enhance: some optimization of scalar field fetching in tiered storage scenarios (#45360 ) issue: #43611 --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-11-11 17:17:41 +08:00
cai.zhang	e3c1673191	fix: Fix filter geometry for growing with mmap (#45464 ) issue: #45450 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-11-11 15:39:36 +08:00
Chun Han	69f3aab229	feat: milvus support huawei cloud iam verification(#45298 ) (#45457 ) related: #45298 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-11-11 14:41:41 +08:00
congqixia	8d1ea751a6	fix: Support JSON default values in FillFieldData (#45455 ) Related to #45445 Previously, FillFieldData for JSON fields would assert and fail when a default_value was provided, blocking index creation for JSON fields with default values (including dynamic fields like $meta). This change enables JSON default value support by: - Removing the assertion that blocked default values - Parsing bytes_data into Json objects when default_value is present - Properly filling data_ array and setting valid_data_ bitset to true - Maintaining null behavior when no default_value is provided Impact: - Fixes index creation failure for JSON fields with default values - Resolves upgrade issues from 2.5 to 2.6.5 where dynamic fields with default values couldn't be indexed - Index builds that were stuck in InProgress state can now complete Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-11-11 10:35:36 +08:00
Gao	e9a875f7ac	enhance: override index_type while creating segment index (#45416 ) issue: #44752 --------- Signed-off-by: chasingegg <chao.gao@zilliz.com>	2025-11-11 07:27:36 +08:00
congqixia	0e1de0073a	enhance: Update tantivy-binding with cargo build result (#45458 ) Related to #44988 This PR commit newly updated tantivy-binding.h with cargo build result which shall passes format check. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-11-10 18:09:36 +08:00
aoiasd	e82bf0e54f	enhance: fix typo of analyzer params (#45299 ) Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-11-10 14:35:35 +08:00
aoiasd	a38a0deb43	enhance: prevent panic by adding null pointer check when clearing InsertRecord _pk2offset_ (#45281 ) Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-11-10 11:37:35 +08:00
Xiaofan	7aa0ca5d4e	enhance: Clean unused conan dependency (#45366 ) fix #45365 Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>	2025-11-07 17:07:34 +08:00
Buqian Zheng	515a939edf	enhance: remove obsolete code (#45307 ) issue: #44452 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-11-07 16:07:35 +08:00
Amit Kumar	388d56fdc7	enhance: Add support for minimum_should_match in text_match (parser, engine, client, and tests) (#44988 ) ### Is there an existing issue for this? - [x] I have searched the existing issues --- Please see: https://github.com/milvus-io/milvus/issues/44593 for the background This PR makes https://github.com/milvus-io/milvus/pull/44638 redundant, which can be closed. The PR comments for the original implementation suggested an alternative and a better approach, this new PR has that implementation. --- This PR - Adds an optional `minimum_should_match` argument to `text_match(...)` and wires it through the parser, planner/visitor, index bindings, and client-level tests/examples so full-text queries can require a minimum number of tokens to match. Motivation - Provide a way to require an expression to match a minimum number of tokens in lexical search. What changed - Parser / grammar - Added grammar rule and token: `MINIMUM_SHOULD_MATCH` and `textMatchOption` in `internal/parser/planparserv2/Plan.g4`. - Regenerated parser outputs: `internal/parser/planparserv2/generated/*` (parser, lexer, visitor, etc.) to support the new rule. - Planner / visitor - `parser_visitor.go`: parse and validate the `minimum_should_match` integer; propagate as an extra value on the `TextMatch` expression so downstream components receive it. - Added `VisitTextMatchOption` visitor method handling. - Client (Golang) - Added a unit test to verify `text_match(..., minimum_should_match=...)` appears in the generated DSL and is accepted by client code: `client/milvusclient/read_test.go` (new test coverage). - Added an integration-style test for the feature to the go-client testcase suite: `tests/go_client/testcases/full_text_search_test.go` (exercise min=1, min=3, large min). - Added an example demonstrating `text_match` usage: `client/milvusclient/read_example_test.go` (example name conforms to godoc mapping). - Engine / index - Updated C++ index interface: `TextMatchIndex::MatchQuery` - Added/updated unit tests for the index behavior: `internal/core/src/index/TextMatchIndexTest.cpp`. - Tantivy binding - Added `match_query_with_minimum` implementation and unit tests to `internal/core/thirdparty/tantivy/tantivy-binding/src/index_reader_text.rs` that construct boolean queries with minimum required clauses. Behavioral / compatibility notes - This adds an optional argument to `text_match` only; default behavior (no `minimum_should_match`) is unchanged. - Internal API change: `TextMatchIndex::MatchQuery` signature changed (internal component). Callers in the repo were updated accordingly. - Parser changes required regenerating ANTLR outputs Tests and verification - New/updated tests: - Go client unit test: `client/milvusclient/read_test.go` (mocked Search request asserts DSL contains `minimum_should_match=2`). - Go e2e-style test: `tests/go_client/testcases/full_text_search_test.go` (exercises min=1, 3 and a large min). - C++ unit tests for index behavior: `internal/core/src/index/TextMatchIndexTest.cpp`. - Rust binding unit tests for `match_query_with_minimum`. - Local verification commands to run: - Go client tests: `cd client && go test ./milvusclient -run ^$` (client package) - Go testcases: `cd tests/go_client && go test ./testcases -run TestTextMatchMinimumShouldMatch` (requires a running Milvus instance) - C++ unit tests / build: run core build/test per repo instructions (the change touches core index code). - Rust binding tests: `cd internal/core/thirdparty/tantivy/tantivy-binding && cargo test` (if developing locally). --------- Signed-off-by: Amit Kumar <amit.kumar@reddit.com> Co-authored-by: Amit Kumar <amit.kumar@reddit.com>	2025-11-07 16:07:11 +08:00
cai.zhang	7527ddf50f	enhance: [test] Move R-Tree index tests into the implementation package (#45355 ) Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-11-07 10:03:33 +08:00
zhagnlu	59c64bee07	fix: not use json_shredding for json path is null (#45310 ) #45284 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-11-06 11:43:33 +08:00
sparknack	9032bb7668	enhance: unify the aligned buffer for both buffered and direct I/O (#45323 ) issue: #43040 Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-11-06 10:53:33 +08:00
yihao.dai	121eb912ba	fix: Fix load segment failed due to get disk usage error (#45255 ) When getting disk usage, files or directories may be removed concurrently due to segment release. This PR ignores “file or directory does not exist” errors in such cases. issue: https://github.com/milvus-io/milvus/issues/45239 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-11-06 08:51:33 +08:00
congqixia	55bfd610b6	enhance: [StorageV2] Integrate FFI interface for packed reader (#45132 ) Related to #44956 Integrate the StorageV2 FFI interface as the unified storage layer for reading packed columnar data, replacing the custom iterative reader with a manifest-based approach using the milvus-storage library. Changes: - Add C++ FFI reader implementation (ffi_reader_c.cpp/h) with Arrow C Stream interface - Implement utility functions to convert CStorageConfig to milvus-storage Properties - Create ManifestReader in Go that generates manifests from binlogs - Add FFI packed reader CGO bindings (packed_reader_ffi.go) - Refactor NewBinlogRecordReader to use ManifestReader for V2 storage - Support both manifest file paths and direct manifest content - Enable configurable buffer sizes and column projection Technical improvements: - Zero-copy data exchange using Arrow C Data Interface - Optimized I/O operations through milvus-storage library - Simplified code path with manifest-based reading - Better performance with batched streaming reads --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-11-05 19:57:34 +08:00
foxspy	95d7302cf4	enhance: update knowhere version (#45270 ) issue: #42937 Signed-off-by: xianliang.li <xianliang.li@zilliz.com>	2025-11-05 11:09:32 +08:00
cai.zhang	fa3d4ebfbe	fix: Compute the correct batch size for the geometry index of the growing segment (#45253 ) issue: #44648 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-11-04 20:25:37 +08:00
zhenshan.cao	6327c9a514	fix: Fix bugs related to TimestampTz (#45111 ) issue: https://github.com/milvus-io/milvus/issues/44527 https://github.com/milvus-io/milvus/issues/44537 https://github.com/milvus-io/milvus/issues/44538 https://github.com/milvus-io/milvus/issues/44585 https://github.com/milvus-io/milvus/issues/44622 Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2025-11-04 16:51:33 +08:00
sparknack	40b5e6b134	fix: avoid potential race conditions when updating the executor (#45230 ) issue: #43040 Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-11-04 14:25:33 +08:00
cai.zhang	617891b436	fix: Skip create tmp dir for growing R-Tree index (#45256 ) issue: #45181 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-11-04 13:01:32 +08:00
Spade A	cd0b36c39e	feat: impl StructArray -- support diskann index (#45223 ) issue: https://github.com/milvus-io/milvus/issues/42148 --------- Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com> Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-11-04 11:57:33 +08:00
zhagnlu	653e95aaad	fix: fix bug for shredding json when empty json but not null (#45221 ) #45157 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-11-04 11:11:33 +08:00
cai.zhang	01cf5c9341	enhance: Add log to debug index task (#45198 ) Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-11-03 20:01:34 +08:00
cai.zhang	ed8ba4a28c	enhance: Make GeometryCache an optional configuration (#45192 ) issue: #45187 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-11-03 19:59:32 +08:00
Spade A	ae03dee116	feat: implement ngram tokenizer with token_chars and custom_token_chars (#45040 ) issue: https://github.com/milvus-io/milvus/issues/45039 Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-11-03 18:09:33 +08:00
Jingsong Yin	e25ee08566	fix: fix LoadMetrics bool type error (#45209 ) #44584 Signed-off-by: thekingking <1677273255@qq.com>	2025-11-01 01:19:32 +08:00
Jingsong Yin	0cc79772e7	enhance: Extend SkipIndex with IN/Match support and BloomFilter (#44581 ) issue: #44584 --------- Signed-off-by: thekingking <1677273255@qq.com>	2025-10-31 22:39:32 +08:00
congqixia	22098c1785	fix: add null check for packed_writer_ in JsonStatsParquetWriter::Close() (#45158 ) Related to #45157 Fix a bug where DataNode panics when building json stats index throws an exception before the writer is initialized. The destructor would call Close() on an uninitialized packed_writer_ pointer, causing a null pointer dereference. Changes: - Add null check for packed_writer_ before calling Flush() and Close() - Prevents null pointer dereference in edge cases - Ignore close status as this is a cleanup operation This ensures safe cleanup even when initialization fails due to exceptions. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-10-30 17:40:09 +08:00
cqy123456	35d8213a00	fix: fail to mmap emb_list_meta in embedding list (#45127 ) issue: https://github.com/milvus-io/milvus/issues/44965 Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>	2025-10-30 11:00:09 +08:00
aoiasd	ad9a0cae48	enhance: add global analyzer options (#44684 ) relate: https://github.com/milvus-io/milvus/issues/43687 Add global analyzer options, avoid having to merge some milvus params into user's analyzer params. Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-10-28 14:52:10 +08:00
congqixia	fd0ef09e97	fix: Handle all-null data in StringIndexSort to prevent load timeout (#45100 ) Related to #45081 StringIndexSort now properly handles collections with all-null string fields by: - Removing the error thrown when unique_count is 0 in ParseBinaryData - Adding alignment and padding support in mmap serialization (similar to ScalarIndexSort) - Separating data_size_ from mmap_size_ to correctly parse data without reading padding This fixes load collection timeout failures when all string field data is null, particularly affecting STL_SORT and TRIE index types. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-10-27 18:04:09 +08:00
congqixia	36a887b38b	enhance: add NewSegmentWithLoadInfo API to support segment self-managed loading (#45061 ) This commit introduces the foundation for enabling segments to manage their own loading process by passing load information during segment creation. Changes: C++ Layer: - Add NewSegmentWithLoadInfo() C API to create segments with serialized load info - Add SetLoadInfo() method to SegmentInterface for storing load information - Refactor segment creation logic into shared CreateSegment() helper function - Add comprehensive documentation for the new API Go Layer: - Extend CreateCSegmentRequest to support optional LoadInfo field - Update segment creation in querynode to pass SegmentLoadInfo when available - Add ConvertToSegcoreSegmentLoadInfo() and helper converters for proto translation Proto Definitions: - Add segcorepb.SegmentLoadInfo message with essential loading metadata - Add supporting messages: Binlog, FieldBinlog, FieldIndexInfo, TextIndexStats, JsonKeyStats - Remove dependency on data_coord.proto by creating segcore-specific definitions Testing: - Add comprehensive unit tests for proto conversion functions - Test edge cases including nil inputs, empty data, and nil array/map elements This is the first step toward issue #45060 - enabling segments to autonomously manage their loading process, which will: - Clarify responsibilities between Go and C++ layers - Reduce cross-language call overhead - Enable precise resource management at the C++ level - Support better integration with caching layer - Enable proactive schema evolution handling Related to #45060 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-10-27 15:28:12 +08:00
congqixia	7c627260f3	enhance: Optimize ScalarIndexSort bitmap initialization for range queries (#45085 ) Optimize bitmap initialization in ScalarIndexSort range queries by using adaptive strategy based on result density. When more than 50% of elements match the range condition, initialize bitmap with all true values and clear non-matching elements. Otherwise, use the original approach of initializing with false and setting matching elements. Also defer bitmap allocation until after early return checks to avoid unnecessary memory allocation. This optimization reduces bit operations for high-selectivity queries while maintaining the same performance for low-selectivity queries. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-10-27 10:08:06 +08:00
Buqian Zheng	c284e8c4a8	enhance: some minor code cleanup, prepare for scalar benchmark (#45008 ) issue: https://github.com/milvus-io/milvus/issues/44452 --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-10-24 14:22:05 +08:00
congqixia	199f6d936e	fix: Update milvus-storage to fix duplicate AWS SDK initialization (#45062 ) Update milvus-storage version from aa189ad to e5f5b4c to include the fix for duplicate AWS SDK initialization that was causing init conflicts. This update removes the redundant arrow::fs::InitializeS3() call that was resulting in duplicate Aws::InitAPI() initialization. The duplicate initialization was causing AWS SDK to ignore custom configurations, particularly affecting GCP Workload Identity authentication. Changes in milvus-storage e5f5b4c: - Remove redundant arrow::fs::InitializeS3() call - Keep only the extended S3 initialization with custom AWS SDK options - Ensure GCP IAM authentication via custom HTTP client factory works correctly Relates to #44745 Reference: milvus-io/milvus-storage#285 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-10-24 11:32:05 +08:00
Buqian Zheng	22995cea3f	fix: Remove debug logging from JsonFlatIndex (#44807 ) issue: https://github.com/milvus-io/milvus/issues/44452 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com> Co-authored-by: buqian.zheng <buqian.zheng@zilliz.com>	2025-10-23 16:08:06 +08:00
Bingyi Sun	52270701ce	feat: use namespace skip index when search (#44888 ) issue: #44011 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-10-23 12:04:04 +08:00
Spade A	6077178553	enhance: enable STL_SORT to support VARCHAR (#44401 ) issue: https://github.com/milvus-io/milvus/issues/44399 This PR implements STL_SORT for VARCHAR data type for both RAM and MMAP mode. The general idea is that we deduplicate field values and maintains a posting list for each unique value. The serialization format of the index is: ``` [unique_count][string_offsets][string_data][post_list_offsets][post_list_data][magic_code] string_offsets: array of offsets into string_data section string_data: str_len1, str1, str_len2, str2, ... post_list_offsets: array of offsets into post_list_data section post_list_data: post_list_len1, row_id1, row_id2, ..., post_list_len2, row_id1, row_id2, ... ``` --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-10-23 11:00:05 +08:00
cai.zhang	3d11ba06ef	fix: Double check to avoid iter has been earsed by other thread (#45013 ) issue: #44974 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-10-21 23:36:04 +08:00
zhagnlu	730308b1eb	fix: fix not equal not include None (#44959 ) #44816 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-10-21 17:08:03 +08:00
cai.zhang	b23d75a032	fix: Fix bug for gis function to filter geometry (#44966 ) issue: #44961 This PR fixes 3 geometry related bugs: 1. Implement `ToString` interface for GisFunctionFilter. 2. Ignore GisFunctionFilter `MoveCursor` for growing segment. 3. Don't skip null geometry for building R-Tree index, should be record in null_offsets. --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-10-21 09:52:04 +08:00
cai.zhang	a35a3b7c69	fix: Ensure fulfill promise when CreateArrowFileSystem throw an exception (#44975 ) issue: #44974 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-10-20 23:32:03 +08:00
zhagnlu	05df48fbe4	fix:remove duplicated '/' in jsonstats path (#44939 ) #44950 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-10-20 14:06:03 +08:00
Zhen Ye	f98d02b3e1	fix: use short debug string to avoid newline in debug logs (#44925 ) issue: #44924 Signed-off-by: chyezh <chyezh@outlook.com>	2025-10-20 10:16:03 +08:00
Bingyi Sun	3ddf9154ab	fix: Fix exists expr for json flat index (#44910 ) issue: https://github.com/milvus-io/milvus/issues/44915 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-10-19 19:46:07 +08:00

1 2 3 4 5 ...

2268 Commits