milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2025-12-06 17:18:35 +08:00

Author	SHA1	Message	Date
congqixia	8e82631282	fix: correct index_has_raw_data logic for fielddata loading (#46117 ) Related to #46098 This fix addresses a bug where the segment loader incorrectly determined whether scalar fields have raw data in their indexes, leading to unnecessary field data loading or skipping indexed raw data retrieval. - Build `field_ids` vector that handles both single field and column group cases (when `child_fields_size() > 0`) - Move the mmap setting and index_has_raw_data checks before the skip decision, iterating over the correctly built `field_ids` - Fix the boolean AND logic in both `Load()` and `LoadColumnGroup()` to properly check if ALL fields in the group have raw data in their indexes This bug was hiding the root cause of issue #46098, where QueryNode panics when outputting timestamptz data from scalar index with raw data. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-12-05 17:47:12 +08:00
congqixia	5d0c8b1b40	fix: apply mmap settings correctly during segment load (#46017 ) Previously, mmap settings configured at the collection or field level were not being applied during segment loading in segcore. This was caused by: 1. A typo in the key name: "mmap.enable" instead of "mmap.enabled" 2. Missing logic to parse and apply mmap settings from schema This commit fixes the issue by: - Correcting the key name to "mmap.enabled" to match the standard - Adding Schema::MmapEnabled() method to retrieve field/collection level mmap settings with proper fallback logic - Parsing mmap settings from field type_params and collection properties during schema parsing - Applying computed mmap settings in LoadColumnGroup() and Load() methods instead of hardcoded false values - Using global MmapConfig as fallback when no explicit setting exists The mmap setting priority is now: 1. Field-level mmap setting (from type_params) 2. Collection-level mmap setting (from properties) 3. Global mmap config (from MmapManager) For column groups, if any field has mmap enabled, the entire group uses mmap (since they are loaded together). Related issue: #45060 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-12-03 10:31:10 +08:00
sparknack	8ef35de7ca	enhance: always use buffered io for high load priority (#45900 ) issue: #43040 Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-11-29 00:03:08 +08:00
congqixia	ae256c52ae	enhance: Resolve issues integrating loon FFI (#45918 ) Related to #44956 - Update milvus-storage version to ba7df7b for chunk reader fix - Pass manifest path to index build request in DataCoord/DataNode - Add null chunk assertion with detailed debug info in ManifestGroupTranslator - Fix memory corruption by removing premature transaction handle destruction - Clean up log message in ChunkedSegmentSealedImpl --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-11-28 18:41:08 +08:00
Buqian Zheng	6c0a80d8c3	enhance: pk binary range in sealed segment to use binary search (#45829 ) issue: https://github.com/milvus-io/milvus/discussions/44935 pr: https://github.com/milvus-io/milvus/pull/45328 this pr is to improve pk range op --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-11-26 17:17:08 +08:00
sparknack	4b14ab14e3	enhance: mmap once for each group chunk (#45487 ) issue: #45486 This commit refactors the chunk writing system by introducing a two-phase approach: size calculation followed by writing to a target. This enables efficient group chunk creation where multiple fields share a single mmap region, significantly reducing the number of mmap system calls and VMAs. - Optimize `mmap` usage: single `mmap` per group chunk instead of per field - Split ChunkWriter into two phases: - `calculate_size()`: Pre-compute required memory without allocation - `write_to_target()`: Write data to a provided ChunkTarget - Implement `ChunkMmapGuard` for unified mmap region lifecycle management - Handles `munmap` and file cleanup via RAII - Shared via `std::shared_ptr` across multiple chunks in a group Signed-off-by: Shawn Wang <shawn.wang@zilliz.com> --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-11-26 10:37:08 +08:00
sparknack	0392db6976	enhance: add cancellation checking in each operator and expr (#45354 ) issue: #45353 --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-11-26 10:15:07 +08:00
congqixia	03f5d7c0a5	enhance: integrate StorageV2 FFI interface for manifest-based segment loading (#45798 ) Related to #44956 New Translator (C++) - Added `ManifestGroupTranslator` (`internal/core/src/segcore/storagev2translator/`) - Translates manifest-based column groups to Milvus internal format - Implements `GroupCTMeta` interface for chunk-based column access - Supports both memory and mmap storage modes - Handles cache warmup policies for vector and scalar data ChunkedSegmentSealedImpl (`internal/core/src/segcore/ChunkedSegmentSealedImpl.cpp:333`) - Added `LoadColumnGroups(const std::string& manifest_path)`: Main entry point for manifest-based loading - Creates milvus-storage Reader from manifest file - Parallelizes column group loading using thread pool - Aggregates loading exceptions and reports errors - Added `LoadColumnGroup()`: Loads individual column group - Extracts field IDs from column group metadata - Creates ManifestGroupTranslator for each column group - Builds ProxyChunkColumn for field access - Special handling for timestamp field index construction SegmentGrowingImpl (`internal/core/src/segcore/SegmentGrowingImpl.cpp`) - Added similar `LoadColumnGroups()` and `LoadColumnGroup()` methods for growing segments - Maintains consistency with sealed segment loading path Storage FFI Utilities loon_ffi/util (`internal/core/src/storage/loon_ffi/util.cpp`) - Added `MakeInternalPropertiesFromStorageConfig()`: Converts C storage config to internal Properties - Maps all storage configuration fields (S3, GCS, Azure, local) - Handles SSL, IAM, virtual host settings - Configures connection timeouts and max connections - Added `MakeInternalLocalProperies()`: Creates local filesystem properties - Added `ToCStorageConfig()`: Converts Go StorageConfig to C representation - Added `GetColumnGroups()`: Extracts column groups from manifest file using Transaction API Protocol Buffer Changes segcore.proto (`pkg/proto/segcore.proto:121`) - Added `manifest_path` field to `SegmentLoadInfo` message - Enables passing manifest file path from Go layer to C++ core Go Integration segment.go (`internal/util/segcore/segment.go:372`) - Updated `ConvertToSegcoreSegmentLoadInfo()` to propagate `ManifestPath` field - Bridges QueryNode segment load info to Segcore format --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-11-25 17:27:07 +08:00
Buqian Zheng	7078f403f1	enhance: add vector reserve to improve memory allocation in segcore (#45757 ) This commit optimizes std::vector usage across segcore by adding reserve() calls where the size is known in advance, reducing memory reallocations during push_back operations. Changes: - TimestampIndex.cpp: Reserve space for prefix_sums and timestamp_barriers - SegmentGrowingImpl.cpp: Reserve space for binlog info vectors - ChunkedSegmentSealedImpl.cpp: Reserve space for futures and field data vectors - storagev2translator/GroupChunkTranslator.cpp: Reserve space for metadata vectors This improves performance by avoiding multiple memory reallocations when the vector size is predictable. issue: https://github.com/milvus-io/milvus/issues/45679 --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-11-25 14:19:07 +08:00
Buqian Zheng	2cf1e0e452	enhance: optimize pk search to use binary search, and 2 pointers for in expr (#45328 ) issue: #44935 this is somewhat related to #44935, but on pk instead of stl_sort index Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-11-21 19:01:05 +08:00
Buqian Zheng	e00ad1098f	enhance: add ScalarFieldProto& overload to avoid unnecessary copies (#45743 ) 1. Array.h: Add output_data(ScalarFieldProto&) overload for both Array and ArrayView classes 2. Use std::string_view instead of std::string for VARCHAR and GEOMETRY types to avoid extra string copies 3. Call Reserve(length_) before writing to proto objects to reduce memory reallocations a simple test shows those optimizations improve the Array of Varchar bulk_subscript performance by 20% issue: https://github.com/milvus-io/milvus/issues/45679 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-11-21 18:35:05 +08:00
Bingyi Sun	a3add6a391	fix: Fix json indices can not be loaded (#45620 ) issue: https://github.com/milvus-io/milvus/issues/45575 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-11-20 10:41:06 +08:00
congqixia	0a208d7224	enhance: Move segment loading logic from Go layer to segcore for self-managed loading (#45488 ) Related to #45060 Refactor segment loading architecture to make segments autonomously manage their own loading process, moving the orchestration logic from Go (segment_loader.go) to C++ (segcore). C++ Layer (segcore): - Added `SetLoadInfo()` and `Load()` methods to `SegmentInterface` and implementations - Implemented `ChunkedSegmentSealedImpl::Load()` with parallel loading strategy: - Separates indexed fields from non-indexed fields - Loads indexes concurrently using thread pools - Loads field data for non-indexed fields in parallel - Implemented `SegmentGrowingImpl::Load()` to convert and load field data - Extracted `LoadIndexData()` as a reusable utility function in `Utils.cpp` - Added `SegmentLoad()` C binding in `segment_c.cpp` Go Layer: - Added `Load()` method to segment interfaces - Updated mock implementations and test interfaces - Integrated new C++ `SegmentLoad()` binding in Go segment wrapper --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-11-14 11:21:37 +08:00
Chun Han	406fa7b694	fix: failed to get raw data for hybrid index(#45318 ) (#45411 ) related: #45318 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-11-13 10:17:37 +08:00
sparknack	9d75d0393e	enhance: some optimization of scalar field fetching in tiered storage scenarios (#45360 ) issue: #43611 --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-11-11 17:17:41 +08:00
Buqian Zheng	515a939edf	enhance: remove obsolete code (#45307 ) issue: #44452 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-11-07 16:07:35 +08:00
cai.zhang	ed8ba4a28c	enhance: Make GeometryCache an optional configuration (#45192 ) issue: #45187 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-11-03 19:59:32 +08:00
zhagnlu	ae19c93c14	enhance: remove timestamp filter for search_ids to optimize performance (#44634 ) #44352 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-10-17 16:10:01 +08:00
sparknack	4bd30a74ca	enhance: cachinglayer: add mmap and eviction support for TextMatchIndex (#44806 ) issue: #41435, #44502 Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-10-17 14:42:02 +08:00
Bingyi Sun	26d06c6340	feat: load skip index using parquet statistics (#44252 ) #44011 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-10-15 19:16:00 +08:00
Spade A	c4f3f0ce4c	feat: impl StructArray -- support more types of vector in STRUCT (#44736 ) ref: https://github.com/milvus-io/milvus/issues/42148 --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-10-15 10:25:59 +08:00
sparknack	c8a4d6e2ef	enhance: add cachinglayer management for TextMatchIndex (#44741 ) issue: #41435, #44502 Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-10-13 14:37:58 +08:00
cai.zhang	19346fa389	feat: Geospatial Data Type and GIS Function support for milvus (#44547 ) issue: #43427 This pr's main goal is merge #37417 to milvus 2.5 without conflicts. # Main Goals 1. Create and describe collections with geospatial type 2. Insert geospatial data into the insert binlog 3. Load segments containing geospatial data into memory 4. Enable query and search can display geospatial data 5. Support using GIS funtions like ST_EQUALS in query 6. Support R-Tree index for geometry type # Solution 1. Add Type: Modify the Milvus core by adding a Geospatial type in both the C++ and Go code layers, defining the Geospatial data structure and the corresponding interfaces. 2. Dependency Libraries: Introduce necessary geospatial data processing libraries. In the C++ source code, use Conan package management to include the GDAL library. In the Go source code, add the go-geom library to the go.mod file. 3. Protocol Interface: Revise the Milvus protocol to provide mechanisms for Geospatial message serialization and deserialization. 4. Data Pipeline: Facilitate interaction between the client and proxy using the WKT format for geospatial data. The proxy will convert all data into WKB format for downstream processing, providing column data interfaces, segment encapsulation, segment loading, payload writing, and cache block management. 5. Query Operators: Implement simple display and support for filter queries. Initially, focus on filtering based on spatial relationships for a single column of geospatial literal values, providing parsing and execution for query expressions.Now only support brutal search 7. Client Modification: Enable the client to handle user input for geospatial data and facilitate end-to-end testing.Check the modification in pymilvus. --------- Signed-off-by: Yinwei Li <yinwei.li@zilliz.com> Signed-off-by: Cai Zhang <cai.zhang@zilliz.com> Co-authored-by: ZhuXi <150327960+Yinwei-Yu@users.noreply.github.com>	2025-09-28 19:43:05 +08:00
zhagnlu	eac16a577c	enhance:support cachelayer for json stats (#44446 ) #42533 Signed-off-by: zhagnlu <lu.zhang@zilliz.com>	2025-09-24 15:30:04 +08:00
sparknack	ab64afba2f	enhance: add storage resource usage for scalar search (#44414 ) issue: #44212 --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-09-22 14:28:06 +08:00
Gao	d3784c6515	enhance: add storage resource usage for vector search (#44308 ) issue: #44212 Implement search/query storage usage statistics in go side(result reduce), only record storage usage in vector search C++ path. Need to be implemented in query c++ path in next prs. --------- Signed-off-by: chasingegg <chao.gao@zilliz.com> Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com> Co-authored-by: marcelo.chen <marcelo.chen@zilliz.com>	2025-09-19 20:20:02 +08:00
congqixia	98d23de36c	enhance: [StorageV2] Make load info contains child info (#44384 ) Related to #44257 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-09-16 16:14:00 +08:00
zhagnlu	baa84e0b2b	fix: avoid mvcc when doing pk compare expr (#44353 ) #44352 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-09-15 10:17:59 +08:00
Bingyi Sun	1931dcd9b5	fix: Fix initialize timestamp index concurrently (#44317 ) #issue: https://github.com/milvus-io/milvus/issues/44341 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-09-12 14:25:57 +08:00
sparknack	4a01c726f3	enhance: cachinglayer: some metric and params update (#44276 ) issue: #41435 --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-09-10 11:03:57 +08:00
Gao	2e98cb0103	enhance: load resource estimation for tiered index (#44171 ) issue: https://github.com/milvus-io/milvus/issues/42032 - Use bytes to estimate load resource in the whole estimation procedure - Add num_rows and dim info for vector index to better estimate - Disable eviction for tiered index's meta --------- Signed-off-by: chasingegg <chao.gao@zilliz.com>	2025-09-04 19:41:53 +08:00
Bingyi Sun	0c0630cc38	feat: support dropping index without releasing collection (#42941 ) issue: #42942 This pr includes the following changes: 1. Added checks for index checker in querycoord to generate drop index tasks 2. Added drop index interface to querynode 3. To avoid search failure after dropping the index, the querynode allows the use of lazy mode (warmup=disable) to load raw data even when indexes contain raw data. 4. In segcore, loading the index no longer deletes raw data; instead, it evicts it. 5. In expr, the index is pinned to prevent concurrent errors. --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-09-02 16:17:52 +08:00
sparknack	70c8114e85	enhance: cachinglayer: resource management for segment loading (#43846 ) issue: #41435 --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-08-29 11:37:50 +08:00
Buqian Zheng	6b22661c06	fix: use tbb::concurrent_unordered_map for ChunkedSegmentSealedImpl::fields_ (#44084 ) issue: https://github.com/milvus-io/milvus/issues/44078 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-08-29 10:01:51 +08:00
congqixia	e3b3502287	fix: Use correct regex for cppcheck (#44077 ) Related to #44076 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-08-27 20:57:50 +08:00
marcelo-cjl	e13e19cd2c	enhance: add sparse_u32_f32 data type for sparse vertor (#43974 ) issue: #43973 Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com>	2025-08-27 16:47:50 +08:00
Chun Han	da156981c6	feat: milvus support posix-compatible mode(milvus-io#43942) (#43944 ) related: #43942 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-08-27 16:29:50 +08:00
Spade A	8456f824be	feat: impl StructArray -- miscellaneous staffs for struct array (#43960 ) Ref https://github.com/milvus-io/milvus/issues/42148 1. enable storage v2 2. implement some missing staffs 3. fix some bugs and add tests --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-08-26 21:35:53 +08:00
Tianx	c0d62268ac	feat: add timesatmptz data type (#44005 ) issue: https://github.com/milvus-io/milvus/issues/27467 > https://github.com/milvus-io/milvus/issues/27467#issuecomment-3092211420 > * [x] M1 Create collection with timestamptz field > * [x] M2 Insert timestamptz field data > * [x] M3 Retrieve timestamptz field data > * [x] M4 Implement handoff[ ] The second PR of issue: https://github.com/milvus-io/milvus/issues/27467, which completes M1-M4 described above. --------- Signed-off-by: xtx <xtianx@smail.nju.edu.cn>	2025-08-26 15:59:53 +08:00
Gao	e97a618630	enhance: support readAt interface for remote input stream (#43997 ) #42032 Also, fix the cacheoptfield method to work in storagev2. Also, change the sparse related interface for knowhere version bump #43974 . Also, includes https://github.com/milvus-io/milvus/pull/44046 for metric lost. --------- Signed-off-by: chasingegg <chao.gao@zilliz.com> Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com> Signed-off-by: Congqi Xia <congqi.xia@zilliz.com> Co-authored-by: marcelo.chen <marcelo.chen@zilliz.com> Co-authored-by: Congqi Xia <congqi.xia@zilliz.com>	2025-08-26 11:19:58 +08:00
zhagnlu	d904c4e677	enhance: optimize compare expr performance for pk field (#43154 ) #43153 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-08-21 10:59:46 +08:00
congqixia	7963b17ac1	fix: Revert "fix: Use `folly::SharedMutex` preventing starvation (#43937 )" (#43959 ) Related to #43958 This reverts commit 580350495ab40b3c0a2ec473882258edf6d7dd08. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-08-21 10:09:47 +08:00
Spade A	d6a428e880	feat: impl StructArray -- support create index for vector array (embedding list) and search on it (#43726 ) Ref https://github.com/milvus-io/milvus/issues/42148 This PR supports create index for vector array (now, only for `DataType.FLOAT_VECTOR`) and search on it. The index type supported in this PR is `EMB_LIST_HNSW` and the metric type is `MAX_SIM` only. The way to use it: ```python milvus_client = MilvusClient("xxx:19530") schema = milvus_client.create_schema(enable_dynamic_field=True, auto_id=True) ... struct_schema = milvus_client.create_struct_array_field_schema("struct_array_field") ... struct_schema.add_field("struct_float_vec", DataType.ARRAY_OF_VECTOR, element_type=DataType.FLOAT_VECTOR, dim=128, max_capacity=1000) ... schema.add_struct_array_field(struct_schema) index_params = milvus_client.prepare_index_params() index_params.add_index(field_name="struct_float_vec", index_type="EMB_LIST_HNSW", metric_type="MAX_SIM", index_params={"nlist": 128}) ... milvus_client.create_index(COLLECTION_NAME, schema=schema, index_params=index_params) ``` Note: This PR uses `Lims` to convey offsets of the vector array to knowhere where vectors of multiple vector arrays are concatenated and we need offsets to specify which vectors belong to which vector array. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-08-20 10:27:46 +08:00
congqixia	580350495a	fix: Use `folly::SharedMutex` preventing starvation (#43937 ) Related to #43936 This PR: - Use `folly::SharedMutex` instead of `std::shared_mutex` preventing starvation - Use `folly::SharedMutex::WriteHolder/ReadHolder` instead of std::shared_lock and std::unique_lock to get better performance Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-08-19 20:05:46 +08:00
Gao	81a0915c29	enhance: add milvus-common module to decouple knwhere & segcore (#43624 ) issue: https://github.com/milvus-io/milvus/issues/42032 https://github.com/milvus-io/milvus/issues/41435 based on pr: https://github.com/milvus-io/milvus/pull/42124 --------- Signed-off-by: chasingegg <chao.gao@zilliz.com> Co-authored-by: xianliang.li <xianliang.li@zilliz.com>	2025-08-11 14:09:42 +08:00
congqixia	b6199acb05	enhance: Utilize `search_batch_pks` for `search_ids` of PkTerm (#43751 ) Related to #43660 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-08-07 14:19:40 +08:00
congqixia	d414f6bd4d	enhance: Add assertion preventing reload same field (#43736 ) Related to #43725 This patch add assertion preventing segment reloading same field column. Also improve the message info when pk already exists. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-08-05 19:35:39 +08:00
Chun Han	d826d6ac91	fix: try to get span raw data for variable length data type(#43544 ) (#43705 ) related: #43544 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-08-04 11:15:38 +08:00
congqixia	4aff581007	enhance: Pass callback in search batch pks to void large result (#43695 ) Related to #43660 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-08-02 17:57:37 +08:00
congqixia	5f2f4eb3d6	enhance: Ignore entry with same ts when DeleteRecord search pks (#43669 ) Related to #43660 This patch reduces the unwanted offset&ts entries having same timestamp of delete record. Under large amount of upsert, this false hit could increase large amount of memory usage while applying delete. The next step could be passing a callback to `search_pk_func_` to handle hit entry streamingly. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-08-01 10:15:36 +08:00

1 2 3

129 Commits