milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-07 19:31:51 +08:00

Author	SHA1	Message	Date
congqixia	22098c1785	fix: add null check for packed_writer_ in JsonStatsParquetWriter::Close() (#45158 ) Related to #45157 Fix a bug where DataNode panics when building json stats index throws an exception before the writer is initialized. The destructor would call Close() on an uninitialized packed_writer_ pointer, causing a null pointer dereference. Changes: - Add null check for packed_writer_ before calling Flush() and Close() - Prevents null pointer dereference in edge cases - Ignore close status as this is a cleanup operation This ensures safe cleanup even when initialization fails due to exceptions. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-10-30 17:40:09 +08:00
cqy123456	35d8213a00	fix: fail to mmap emb_list_meta in embedding list (#45127 ) issue: https://github.com/milvus-io/milvus/issues/44965 Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>	2025-10-30 11:00:09 +08:00
aoiasd	ad9a0cae48	enhance: add global analyzer options (#44684 ) relate: https://github.com/milvus-io/milvus/issues/43687 Add global analyzer options, avoid having to merge some milvus params into user's analyzer params. Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-10-28 14:52:10 +08:00
congqixia	fd0ef09e97	fix: Handle all-null data in StringIndexSort to prevent load timeout (#45100 ) Related to #45081 StringIndexSort now properly handles collections with all-null string fields by: - Removing the error thrown when unique_count is 0 in ParseBinaryData - Adding alignment and padding support in mmap serialization (similar to ScalarIndexSort) - Separating data_size_ from mmap_size_ to correctly parse data without reading padding This fixes load collection timeout failures when all string field data is null, particularly affecting STL_SORT and TRIE index types. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-10-27 18:04:09 +08:00
congqixia	36a887b38b	enhance: add NewSegmentWithLoadInfo API to support segment self-managed loading (#45061 ) This commit introduces the foundation for enabling segments to manage their own loading process by passing load information during segment creation. Changes: C++ Layer: - Add NewSegmentWithLoadInfo() C API to create segments with serialized load info - Add SetLoadInfo() method to SegmentInterface for storing load information - Refactor segment creation logic into shared CreateSegment() helper function - Add comprehensive documentation for the new API Go Layer: - Extend CreateCSegmentRequest to support optional LoadInfo field - Update segment creation in querynode to pass SegmentLoadInfo when available - Add ConvertToSegcoreSegmentLoadInfo() and helper converters for proto translation Proto Definitions: - Add segcorepb.SegmentLoadInfo message with essential loading metadata - Add supporting messages: Binlog, FieldBinlog, FieldIndexInfo, TextIndexStats, JsonKeyStats - Remove dependency on data_coord.proto by creating segcore-specific definitions Testing: - Add comprehensive unit tests for proto conversion functions - Test edge cases including nil inputs, empty data, and nil array/map elements This is the first step toward issue #45060 - enabling segments to autonomously manage their loading process, which will: - Clarify responsibilities between Go and C++ layers - Reduce cross-language call overhead - Enable precise resource management at the C++ level - Support better integration with caching layer - Enable proactive schema evolution handling Related to #45060 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-10-27 15:28:12 +08:00
congqixia	7c627260f3	enhance: Optimize ScalarIndexSort bitmap initialization for range queries (#45085 ) Optimize bitmap initialization in ScalarIndexSort range queries by using adaptive strategy based on result density. When more than 50% of elements match the range condition, initialize bitmap with all true values and clear non-matching elements. Otherwise, use the original approach of initializing with false and setting matching elements. Also defer bitmap allocation until after early return checks to avoid unnecessary memory allocation. This optimization reduces bit operations for high-selectivity queries while maintaining the same performance for low-selectivity queries. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-10-27 10:08:06 +08:00
Buqian Zheng	c284e8c4a8	enhance: some minor code cleanup, prepare for scalar benchmark (#45008 ) issue: https://github.com/milvus-io/milvus/issues/44452 --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-10-24 14:22:05 +08:00
congqixia	199f6d936e	fix: Update milvus-storage to fix duplicate AWS SDK initialization (#45062 ) Update milvus-storage version from aa189ad to e5f5b4c to include the fix for duplicate AWS SDK initialization that was causing init conflicts. This update removes the redundant arrow::fs::InitializeS3() call that was resulting in duplicate Aws::InitAPI() initialization. The duplicate initialization was causing AWS SDK to ignore custom configurations, particularly affecting GCP Workload Identity authentication. Changes in milvus-storage e5f5b4c: - Remove redundant arrow::fs::InitializeS3() call - Keep only the extended S3 initialization with custom AWS SDK options - Ensure GCP IAM authentication via custom HTTP client factory works correctly Relates to #44745 Reference: milvus-io/milvus-storage#285 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-10-24 11:32:05 +08:00
Buqian Zheng	22995cea3f	fix: Remove debug logging from JsonFlatIndex (#44807 ) issue: https://github.com/milvus-io/milvus/issues/44452 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com> Co-authored-by: buqian.zheng <buqian.zheng@zilliz.com>	2025-10-23 16:08:06 +08:00
Bingyi Sun	52270701ce	feat: use namespace skip index when search (#44888 ) issue: #44011 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-10-23 12:04:04 +08:00
Spade A	6077178553	enhance: enable STL_SORT to support VARCHAR (#44401 ) issue: https://github.com/milvus-io/milvus/issues/44399 This PR implements STL_SORT for VARCHAR data type for both RAM and MMAP mode. The general idea is that we deduplicate field values and maintains a posting list for each unique value. The serialization format of the index is: ``` [unique_count][string_offsets][string_data][post_list_offsets][post_list_data][magic_code] string_offsets: array of offsets into string_data section string_data: str_len1, str1, str_len2, str2, ... post_list_offsets: array of offsets into post_list_data section post_list_data: post_list_len1, row_id1, row_id2, ..., post_list_len2, row_id1, row_id2, ... ``` --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-10-23 11:00:05 +08:00
cai.zhang	3d11ba06ef	fix: Double check to avoid iter has been earsed by other thread (#45013 ) issue: #44974 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-10-21 23:36:04 +08:00
zhagnlu	730308b1eb	fix: fix not equal not include None (#44959 ) #44816 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-10-21 17:08:03 +08:00
cai.zhang	b23d75a032	fix: Fix bug for gis function to filter geometry (#44966 ) issue: #44961 This PR fixes 3 geometry related bugs: 1. Implement `ToString` interface for GisFunctionFilter. 2. Ignore GisFunctionFilter `MoveCursor` for growing segment. 3. Don't skip null geometry for building R-Tree index, should be record in null_offsets. --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-10-21 09:52:04 +08:00
cai.zhang	a35a3b7c69	fix: Ensure fulfill promise when CreateArrowFileSystem throw an exception (#44975 ) issue: #44974 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-10-20 23:32:03 +08:00
zhagnlu	05df48fbe4	fix:remove duplicated '/' in jsonstats path (#44939 ) #44950 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-10-20 14:06:03 +08:00
Zhen Ye	f98d02b3e1	fix: use short debug string to avoid newline in debug logs (#44925 ) issue: #44924 Signed-off-by: chyezh <chyezh@outlook.com>	2025-10-20 10:16:03 +08:00
Bingyi Sun	3ddf9154ab	fix: Fix exists expr for json flat index (#44910 ) issue: https://github.com/milvus-io/milvus/issues/44915 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-10-19 19:46:07 +08:00
congqixia	27dbb8e75d	fix: support JSON default value in `CreateArrowScalarFromDefaultValue` (#44912 ) Related to #44897 Add missing JSON data type handling in CreateArrowScalarFromDefaultValue to fix query failures when dynamic fields are enabled. JSON default values are now properly converted to arrow::BinaryScalar using bytes_data(). Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-10-17 18:22:00 +08:00
cai.zhang	b0f642fb4c	fix: Fix the geometry return POINT(0 0) when growing mmap is enabled (#44889 ) issue: #44802 After a Geometry object is serialized into WKB, the resulting binary may contain '\0' bytes. When growing mmap is enabled, the append data logic uses strcpy, which stops copying at the first '\0' bytes. This causes only part of the WKB---typically the portion up to the geometry type field to be copied, leading to corrupted data. As a result, during parsing, all POINT geometries are incorrectly interperted as POINT(0 0). To fix this issue, memcpy will be used instead of strcpy. Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-10-17 17:10:11 +08:00
zhagnlu	b7935557e1	fix:unified json exists path semantic (#44916 ) #44927 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-10-17 16:40:02 +08:00
zhagnlu	ae19c93c14	enhance: remove timestamp filter for search_ids to optimize performance (#44634 ) #44352 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-10-17 16:10:01 +08:00
sparknack	4bd30a74ca	enhance: cachinglayer: add mmap and eviction support for TextMatchIndex (#44806 ) issue: #41435, #44502 Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-10-17 14:42:02 +08:00
Bingyi Sun	633cae9461	enhance: add namespace for query and search request (#44343 ) issue: #44011 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-10-16 17:52:01 +08:00
congqixia	684018ca7b	fix: ensure deterministic search result ordering when scores are equal (#44870 ) Related to #44819 This fix addresses an issue(#44819) where the offset parameter did not work correctly during searches when multiple results had identical scores. The problem occurred because results with equal scores were not consistently ordered, leading to unpredictable pagination behavior. The solution adds a new sorting step (SortEqualScoresByPks) in the reduce phase that sorts results with identical scores by their primary keys in ascending order. This ensures deterministic ordering and enables proper offset functionality. Changes: - Add SortEqualScoresByPks() to sort results with equal scores by PK - Add SortEqualScoresOneNQ() to handle per-query sorting logic - Invoke sorting step after FillPrimaryKey() in Reduce() workflow --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-10-16 10:04:00 +08:00
Bingyi Sun	26d06c6340	feat: load skip index using parquet statistics (#44252 ) #44011 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-10-15 19:16:00 +08:00
cqy123456	822588302a	enhance: embedding_list support mmap in MemVectorIndex (#44764 ) issue: https://github.com/milvus-io/milvus/issues/44702 Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>	2025-10-15 15:22:00 +08:00
Spade A	c4f3f0ce4c	feat: impl StructArray -- support more types of vector in STRUCT (#44736 ) ref: https://github.com/milvus-io/milvus/issues/42148 --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-10-15 10:25:59 +08:00
Spade A	b8df1c0cc5	enhance: improve observability in trace for segcore scalar expression (#44260 ) Ref https://github.com/milvus-io/milvus/issues/44259 This PR connects the trace between go and segcore, and add full traces for scalar expression calling chain: <img width="2418" height="960" alt="image" src="https://github.com/user-attachments/assets/8cad69d7-bcb7-4002-a4e3-679a3641e229" /> <img width="2452" height="850" alt="image" src="https://github.com/user-attachments/assets/8b44aed0-0f03-48a7-baa0-b022fee994ce" /> <img width="2403" height="707" alt="image" src="https://github.com/user-attachments/assets/cd6f0601-0d5c-4087-8ed8-2385f1bc740b" /> --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-10-14 17:15:59 +08:00
Bingyi Sun	6cb1f7d7c6	enhance: optimize the performace of bitmap reverse lookup (#44804 ) Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-10-14 11:57:58 +08:00
zhagnlu	2f178f810f	fix:fix json_contains(path, int) bug (#44814 ) #44816 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-10-14 00:19:59 +08:00
sparknack	df6a4dc1a0	fix: cachinglayer: avoid eviction during json handling (#44812 ) issue: #44797 Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-10-13 22:07:58 +08:00
aoiasd	1b17e16fc7	fix: expr filter return wrong result when skipped (#44778 ) relate: https://github.com/milvus-io/milvus/issues/44777 Should return res with false if skipped. But now return vaild[0], it almost be true. Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-10-13 18:33:59 +08:00
zhagnlu	3dd5deb70a	fix:disable using shredding for json_path contains digital (#44724 ) #44132 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-10-13 17:25:59 +08:00
sparknack	c8a4d6e2ef	enhance: add cachinglayer management for TextMatchIndex (#44741 ) issue: #41435, #44502 Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-10-13 14:37:58 +08:00
aoiasd	09865a5da5	fix: BM25 with boost return result not ordered. (#44744 ) relate: https://github.com/milvus-io/milvus/issues/44758 Wrong code which should be `(result.seg_offsets_[i] >= 0 && result.seg_offsets_[j] < 0)`, but was `(result.seg_offsets_[j] >= 0 && result.seg_offsets_[j] < 0) ` now. But because all placeholder which was offset -1, will fill with worst distance value. For IP, L2 or COSIN, it will be +inf or -inf. So sort distance was enough. But when use BM25, it will be NAN. Will case sort out of ordered. Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-10-11 17:17:58 +08:00
congqixia	5ece760d73	fix: Pass fs via `FileManagerContext` when loading index (#44733 ) Related to #44615 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-10-11 09:55:57 +08:00
sparknack	7e750190b6	enhance: add a size getter for tantivy inverted index (#44609 ) issue: #41435 --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-10-10 17:43:57 +08:00
congqixia	8a443c699e	fix: Make aws credential provider singleton (#44687 ) Related to #44647 This patch make milvus-storage using singleton credential provider in case of data race when concurrent index build task recieved. See also milvus-io/milvus-storage#44647 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-10-09 16:11:58 +08:00
congqixia	1d85b83215	enhance: [backlog] Fix unittest and remove fs fallback logic (#44615 ) Related to #44535 This PR: - Fix the unittest creating `DiskFileManagerImpl` without `filesystem` - Add comments for methods need `fs_` - Remove fallback creation and add assertion for nullptr fs Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-10-09 10:41:57 +08:00
cai.zhang	9d1bb8497c	fix: Get R-Tree index correct for growing segment (#44612 ) issue： #43427 R-Tree index is the entire segment, not the chunk. Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-09-29 21:34:54 +08:00
cai.zhang	aecb46a08b	fix: Skip empty loop for process growing segment (#44606 ) issue: #43427 The GISFunction asserts that the segment_offsets cannot be nullptr. When size is 0, the segment_offsets is nullptr, so the loop is skiped. Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-09-29 21:15:05 +08:00
cai.zhang	19346fa389	feat: Geospatial Data Type and GIS Function support for milvus (#44547 ) issue: #43427 This pr's main goal is merge #37417 to milvus 2.5 without conflicts. # Main Goals 1. Create and describe collections with geospatial type 2. Insert geospatial data into the insert binlog 3. Load segments containing geospatial data into memory 4. Enable query and search can display geospatial data 5. Support using GIS funtions like ST_EQUALS in query 6. Support R-Tree index for geometry type # Solution 1. Add Type: Modify the Milvus core by adding a Geospatial type in both the C++ and Go code layers, defining the Geospatial data structure and the corresponding interfaces. 2. Dependency Libraries: Introduce necessary geospatial data processing libraries. In the C++ source code, use Conan package management to include the GDAL library. In the Go source code, add the go-geom library to the go.mod file. 3. Protocol Interface: Revise the Milvus protocol to provide mechanisms for Geospatial message serialization and deserialization. 4. Data Pipeline: Facilitate interaction between the client and proxy using the WKT format for geospatial data. The proxy will convert all data into WKB format for downstream processing, providing column data interfaces, segment encapsulation, segment loading, payload writing, and cache block management. 5. Query Operators: Implement simple display and support for filter queries. Initially, focus on filtering based on spatial relationships for a single column of geospatial literal values, providing parsing and execution for query expressions.Now only support brutal search 7. Client Modification: Enable the client to handle user input for geospatial data and facilitate end-to-end testing.Check the modification in pymilvus. --------- Signed-off-by: Yinwei Li <yinwei.li@zilliz.com> Signed-off-by: Cai Zhang <cai.zhang@zilliz.com> Co-authored-by: ZhuXi <150327960+Yinwei-Yu@users.noreply.github.com>	2025-09-28 19:43:05 +08:00
aoiasd	1b20e956be	enhance: support random score for boost function score (#44214 ) And support set function mode and boost mode when run search with boost. RandomScore support get random function score between [0, weight). FunctionMode decide how to calculate boost score for multiple boost function scores. BoostMode decide how to calculate final score for origin score and boost score. relate: https://github.com/milvus-io/milvus/issues/43867 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-09-24 17:50:04 +08:00
foxspy	13c3b0b909	enhance: add autoindex configuration for the int8 vector type (#44554 ) issue: #38666 Add int8 support for autoindex to ensure it can be independently configured. At the same time, remove the restriction on int8 type for vectorDiskIndex (note that vectorDiskIndex only determines the building and loading method of the index, not the index type). Signed-off-by: xianliang.li <xianliang.li@zilliz.com>	2025-09-24 17:48:04 +08:00
sparknack	0145dc8c06	fix: refund loaded resource usage in Insert/DeleteRecord destructor (#44555 ) issue: #44528 Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-09-24 16:16:04 +08:00
zhagnlu	eac16a577c	enhance:support cachelayer for json stats (#44446 ) #42533 Signed-off-by: zhagnlu <lu.zhang@zilliz.com>	2025-09-24 15:30:04 +08:00
sparknack	14c085374e	fix: set mmap_file_raii_ to nullptr when mmap is disabled (#44516 ) issue: #44510 related: #44501 Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-09-24 11:50:03 +08:00
congqixia	ea307ea3c9	fix: [StorageV2] Make DiskFileManager use fs from context (#44535 ) Related to #44534 Datanode shall not use singleton fs after 2.6+. This patch make disk file manager use filesystem passed by fileManagerContext instead of errorous singleton one. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-09-24 10:12:03 +08:00
Bingyi Sun	f0446fd9a0	enhance: optimize the performance of binary_search_string (#44469 ) Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-09-23 10:52:13 +08:00

1 2 3 4 5 ...

1827 Commits