milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-07 19:31:51 +08:00

Author	SHA1	Message	Date
congqixia	e3b3502287	fix: Use correct regex for cppcheck (#44077 ) Related to #44076 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-08-27 20:57:50 +08:00
Chun Han	da156981c6	feat: milvus support posix-compatible mode(milvus-io#43942) (#43944 ) related: #43942 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-08-27 16:29:50 +08:00
XuanYang-cn	37a447d166	feat: Add CMEK cipher plugin (#43722 ) 1. Enable Milvus to read cipher configs 2. Enable cipher plugin in binlog reader and writer 3. Add a testCipher for unittests 4. Support pooling for datanode 5. Add encryption in storagev2 See also: #40321 Signed-off-by: yangxuan <xuan.yang@zilliz.com> --------- Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-08-27 11:15:52 +08:00
Tianx	c0d62268ac	feat: add timesatmptz data type (#44005 ) issue: https://github.com/milvus-io/milvus/issues/27467 > https://github.com/milvus-io/milvus/issues/27467#issuecomment-3092211420 > * [x] M1 Create collection with timestamptz field > * [x] M2 Insert timestamptz field data > * [x] M3 Retrieve timestamptz field data > * [x] M4 Implement handoff[ ] The second PR of issue: https://github.com/milvus-io/milvus/issues/27467, which completes M1-M4 described above. --------- Signed-off-by: xtx <xtianx@smail.nju.edu.cn>	2025-08-26 15:59:53 +08:00
Gao	e97a618630	enhance: support readAt interface for remote input stream (#43997 ) #42032 Also, fix the cacheoptfield method to work in storagev2. Also, change the sparse related interface for knowhere version bump #43974 . Also, includes https://github.com/milvus-io/milvus/pull/44046 for metric lost. --------- Signed-off-by: chasingegg <chao.gao@zilliz.com> Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com> Signed-off-by: Congqi Xia <congqi.xia@zilliz.com> Co-authored-by: marcelo.chen <marcelo.chen@zilliz.com> Co-authored-by: Congqi Xia <congqi.xia@zilliz.com>	2025-08-26 11:19:58 +08:00
Gao	81a0915c29	enhance: add milvus-common module to decouple knwhere & segcore (#43624 ) issue: https://github.com/milvus-io/milvus/issues/42032 https://github.com/milvus-io/milvus/issues/41435 based on pr: https://github.com/milvus-io/milvus/pull/42124 --------- Signed-off-by: chasingegg <chao.gao@zilliz.com> Co-authored-by: xianliang.li <xianliang.li@zilliz.com>	2025-08-11 14:09:42 +08:00
sthuang	5cebc9f7f6	fix: [StorageV2] handle correct cid with multiple files and add storage v2 prefix logs (#43539 ) related: #43372 --------- Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-07-25 11:22:54 +08:00
sthuang	59bbdd93f5	fix: [StorageV2] fill the correct group chunk into cell (#43486 ) The root cause of the issue lies in the fact that when a sealed segment contains multiple row groups, the get_cells function may receive unordered cids. This can result in row groups being written into incorrect cells during data retrieval. Previously, this issue was hard to reproduce because the old Storage V2 writer had a bug that caused it to write row groups larger than 1MB. These large row groups could lead to uncontrolled memory usage and eventually an OOM (Out of Memory) error. Additionally, compaction typically produced a single large row group, which avoided the incorrect cell-filling issue during query execution. related: https://github.com/milvus-io/milvus/issues/43388, https://github.com/milvus-io/milvus/issues/43372, https://github.com/milvus-io/milvus/issues/43464, #43446, #43453 --------- Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-07-22 22:22:53 +08:00
Buqian Zheng	389104d200	enhance: rename PanicInfo to ThrowInfo (#43384 ) issue: #41435 this is to prevent AI from thinking of our exception throwing as a dangerous PANIC operation that terminates the program. Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-07-19 20:22:52 +08:00
congqixia	ae48f0e484	fix: [StorageV2] Handle missing column creating index (#43292 ) Related to #43250 Use FieldIDList to check missing field. If column is missing, return empty resultset Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-14 17:06:50 +08:00
sthuang	276c52490d	fix: [StorageV2] missing arrow fs when building index (#43162 ) fix: https://github.com/milvus-io/milvus/issues/43150, https://github.com/milvus-io/milvus/issues/43149 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-07-07 15:26:46 +08:00
Spade A	26ec841feb	feat: optimize `Like` query with n-gram (#41803 ) Ref #42053 This is the first PR for optimizing `LIKE` with ngram inverted index. Now, only VARCHAR data type is supported and only InnerMatch LIKE (%xxx%) query is supported. How to use it: ``` milvus_client = MilvusClient("http://localhost:19530") schema = milvus_client.create_schema() ... schema.add_field("content_ngram", DataType.VARCHAR, max_length=10000) ... index_params = milvus_client.prepare_index_params() index_params.add_index(field_name="content_ngram", index_type="NGRAM", index_name="ngram_index", min_gram=2, max_gram=3) milvus_client.create_collection(COLLECTION_NAME, ...) ``` min_gram and max_gram controls how we tokenize the documents. For example, for min_gram=2 and max_gram=4, we will tokenize each document with 2-gram, 3-gram and 4-gram. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-07-01 10:08:44 +08:00
sthuang	238bd30f42	fix: [StorageV2] end to end minor issues for sync, stats, and load (#42948 ) Fix issues in end-to-end tests: 1. Split column groups based on schema, rather than estimating by average chunk row size. Ensure column group consistency within a segment, to avoid errors caused by loading multiple column group chunks simultaneously. 2. Use sorted segmentId when generating the stats binlog path, to ensure consistent and correct file path resolution. 3. Determine field IDs as follows: For multi-column column groups, retrieve the field ID list from metadata. For single-column column groups, use the column group ID directly as the field ID. related: #39173 fix: #42862 --------- Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-06-27 14:44:42 +08:00
XuanYang-cn	0dfe5308e1	enhance: Tidy Download and decode in segcore storage (#42902 ) 1. Unify calling from GetObjectData 2. Move SetData inside Deserialize See also: #40013 --------- Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-06-25 11:10:43 +08:00
sthuang	0d57acb13a	enhance: [StorageV2] field id as meta path for wide column when load (#42863 ) related: #42862 #39173 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-06-25 11:08:48 +08:00
Xianhui Lin	b902960057	fix: revert remote jsonstats path (#42882 ) fix: revert remote jsonstats path relate-pr:https://github.com/milvus-io/milvus/pull/42676 issue:https://github.com/milvus-io/milvus/issues/42872 Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-06-21 13:24:39 +08:00
sthuang	4a0a2441f2	enhance: [StorageV2] field id as meta path for wide column (#42787 ) related: #39173 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-06-19 15:00:38 +08:00
Spade A	80f1d707f7	fix: tidy up path for scalar index (#42676 ) Ref #42626 This path tidy up path for scalar index including path for loading index from remote storage and temporary path for buliding index. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-06-18 00:42:38 +08:00
Chun Han	001619aef9	feat: supporing load priority for loading (#42413 ) related: #40781 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-06-17 15:22:38 +08:00
congqixia	f01ff57f3f	fix: [StorageV2] Use correct offset filling null bitmap (#42774 ) Related to #39173 `null_bitmap_data()` returns raw pointer of null bitmap of Array. While after slicing, this bitmap is not rewritten due to zero copy implementation, so the current start pos maybe non-zero while FillFieldData generating column `valid_data` array. This PR add `offset` param for `FillFieldData` method, and force all invocation pass correct offset of `null_bitmap_data` ptr. Also update milvus-storage commit fixing reader failed to return data when buffer size smaller than row group size problem. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-06-17 10:08:38 +08:00
Spade A	9873e0ee78	fix: fix text match index / json key stats index leak when segment released (#42655 ) Ref https://github.com/milvus-io/milvus/issues/42626 Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-06-13 04:28:37 +08:00
congqixia	c9bc70f272	fix: [AddField] Use shared_ptr of schema in plan fixing dangling ref (#42693 ) Related to #42640 The search/query plan holded a reference to schema, which could be destructed after schema change. This PR make plan hold a shared ptr to it fixing dangling reference problem under concurrent read & schema change. This PR also remove field binlog check for loading index for old segment with old schema may have binlog lack. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-06-12 20:46:36 +08:00
Spade A	911a8df17c	feat: impl StructArray -- data storage support in segcore (#42406 ) Ref https://github.com/milvus-io/milvus/issues/42148 This PR mainly enables segcore to support array of vector (read and write, but not indexing). Now only float vector as the element type is supported. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-06-12 14:38:35 +08:00
congqixia	499e9a0a73	fix: [AddField] Use corresponding datatype for int8/int16 def val (#42633 ) Related to #42629 This PR handles converting default value to int8/int18 scalar with int32 default value definition Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-06-11 11:54:34 +08:00
Bingyi Sun	fbf5cb4e62	feat: Add json flat index (#39917 ) issue: https://github.com/milvus-io/milvus/issues/35528 This PR introduces a JSON flat index that allows indexing JSON fields and dynamic fields in the same way as other field types. In a previous PR (#36750), we implemented a JSON index that requires specifying a JSON path and casting a type. The only distinction lies in the json_cast_type parameter. When json_cast_type is set to JSON type, Milvus automatically creates a JSON flat index. For details on how Tantivy interprets JSON data, refer to the [tantivy documentation](https://github.com/quickwit-oss/tantivy/blob/main/doc/src/json.md#pitfalls-limitation-and-corner-cases). Limitations Array handling: Arrays do not function as nested objects. See the [limitations section](https://github.com/quickwit-oss/tantivy/blob/main/doc/src/json.md#arrays-do-not-work-like-nested-object) for more details. --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-06-10 19:14:35 +08:00
congqixia	cc42d49769	fix: [StorageV2][AddField] Handle lack binlog rows in storage v2 (#42186 ) Related to #39173 #39718 In storage v2, the `lack_bin_rows` cannot be used since field id is not column group id, which will not be matched forever. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-05-31 02:44:30 +08:00
sthuang	b9b554676c	fix: storage v2 get field data with correct column group files (#42107 ) related: #39173 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-05-27 15:26:28 +08:00
congqixia	f2a8330f87	fix: [StorageV2] Use correct group building index (#41925 ) Related to #39173 #41534 This pr fixes an issue that building mem index may report datatype not match error when collection split fields into multiple groups --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-05-20 13:26:23 +08:00
sthuang	6c377b6e86	feat: Storage v2 index and stats raw data (#41534 ) related: #39173 --------- Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-04-30 08:48:54 +08:00
Buqian Zheng	3de904c7ea	feat: add cachinglayer to sealed segment (#41436 ) issue: https://github.com/milvus-io/milvus/issues/41435 --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-04-28 10:52:40 +08:00
Chun Han	016920b023	fix: solve incompitable problem for none-encoding index(#40838 ) (#41369 ) related: #40838 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-04-20 22:56:44 +08:00
Chun Han	59b14d38f5	enhance: Optimize index format for improved load performance(#40838 ) (#40839 ) related: https://github.com/milvus-io/milvus/issues/40838 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-04-15 03:10:30 +08:00
Xianhui Lin	3bc24c264f	enhance: Add json key inverted index in stats for optimization (#38039 ) Add json key inverted index in stats for optimization https://github.com/milvus-io/milvus/issues/36995 --------- Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-04-10 15:20:28 +08:00
Patrick Weizhi Xu	04fff74a56	feat: introduce Text data type (#39874 ) issue: https://github.com/milvus-io/milvus/issues/39818 This PR mimics Varchar data type, allows insert, search, query, delete, full-text search and others. Functionalities related to filter expressions are disabled temporarily. Storage changes for Text data type will be in the following PRs. Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>	2025-02-19 11:04:51 +08:00
congqixia	844df76cc0	enhance: Rectify run_clang_format grep command (#39534 ) Previously the grep with regex does not work and failed to match lots of .cpp files This PR: - use "-E" flag to use regex match - commit the fixed result of current cpp code Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-01-23 17:07:05 +08:00
Cai Yudong	341d6c1eb7	feat: Update segcore for VECTOR_INT8 (#39415 ) Issue: #38666 Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>	2025-01-21 11:03:03 +08:00
Gao	75d7978a18	enhance: pass partition key scalar info if enable for vector mem index (#39123 ) issue: #34332 --------- Signed-off-by: chasingegg <chao.gao@zilliz.com>	2025-01-16 14:33:03 +08:00
smellthemoon	9b6dd23f8e	fix: wrong path spelling when use rootpath in segcore (#37453 ) #36532 Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-11-07 11:22:25 +08:00
zhenshan.cao	63843dce33	fix: Fix conan gdal building problem (#37338 ) issue:https://github.com/milvus-io/milvus/issues/27576 Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2024-10-31 21:04:16 +08:00
Hao Tan	67c4340565	feat: Geospatial Data Type and GIS Function Support for milvus server (#35990 ) issue:https://github.com/milvus-io/milvus/issues/27576 # Main Goals 1. Create and describe collections with geospatial fields, enabling both client and server to recognize and process geo fields. 2. Insert geospatial data as payload values in the insert binlog, and print the values for verification. 3. Load segments containing geospatial data into memory. 4. Ensure query outputs can display geospatial data. 5. Support filtering on GIS functions for geospatial columns. # Solution 1. Add Type: Modify the Milvus core by adding a Geospatial type in both the C++ and Go code layers, defining the Geospatial data structure and the corresponding interfaces. 2. Dependency Libraries: Introduce necessary geospatial data processing libraries. In the C++ source code, use Conan package management to include the GDAL library. In the Go source code, add the go-geom library to the go.mod file. 3. Protocol Interface: Revise the Milvus protocol to provide mechanisms for Geospatial message serialization and deserialization. 4. Data Pipeline: Facilitate interaction between the client and proxy using the WKT format for geospatial data. The proxy will convert all data into WKB format for downstream processing, providing column data interfaces, segment encapsulation, segment loading, payload writing, and cache block management. 5. Query Operators: Implement simple display and support for filter queries. Initially, focus on filtering based on spatial relationships for a single column of geospatial literal values, providing parsing and execution for query expressions. 6. Client Modification: Enable the client to handle user input for geospatial data and facilitate end-to-end testing.Check the modification in pymilvus. --------- Signed-off-by: tasty-gumi <1021989072@qq.com>	2024-10-31 20:58:20 +08:00
Bingyi Sun	a75bb85f3a	feat: support chunked column for sealed segment (#35764 ) This PR splits sealed segment to chunked data to avoid unnecessary memory copy and save memory usage when loading segments so that loading can be accelerated. To support rollback to previous version, we add an option `multipleChunkedEnable` which is false by default. Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-10-12 15:04:52 +08:00
Rijin-N	a05a37a583	enhance: GCS native support (GCS implemented using Google Cloud Storage libraries) (#36214 ) Native support for Google cloud storage using the Google Cloud Storage libraries. Authentication is performed using GCS service account credentials JSON. Currently, Milvus supports Google Cloud Storage using S3-compatible APIs via the AWS SDK. This approach has the following limitations: 1. Overhead: Translating requests between S3-compatible APIs and GCS can introduce additional overhead. 2. Compatibility Limitations: Some features of the original S3 API may not fully translate or work as expected with GCS. To address these limitations, This enhancement is needed. Related Issue: #36212	2024-09-30 13:23:32 +08:00
Jiquan Long	89bf226f0b	feat: support keyword text match (#35923 ) fix: #35922 --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2024-09-10 15:11:08 +08:00
Zhen Ye	a773836b89	enhance: optimize milvus core building (#35610 ) issue: #35549,#35611,#35633 - remove milvus_segcore milvus_indexbuilder..., add libmilvus_core - core building only link once - move opendal compilation into cmake - fix odr --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-08-23 12:35:02 +08:00
smellthemoon	80dbe87759	enhance: support null value in index (#35238 ) #31728 --------- Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-08-16 15:30:54 +08:00
Jiquan Long	91df03afe8	feat: put inverted index into ram (#35222 ) fix: https://github.com/milvus-io/milvus/issues/35224 --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2024-08-06 11:54:16 +08:00
zhenshan.cao	aa247f192d	enhance: remove unused code for StorageV2 (#35132 ) issue: https://github.com/milvus-io/milvus/issues/34168 Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2024-08-01 12:08:13 +08:00
smellthemoon	5616b7e8d2	enhance: support null in c data_datacodec and load null value (#32183 ) 1. support read and write null in segcore will store valid_data(use uint8_t type to save memory) in fieldData. 2. support load null binlog reader read and write data into column(sealed segment), insertRecord(growing segment). In sealed segment, store valid_data directly. In growing segment, considering prior implementation and easy code reading, it covert uint8_t to fbvector<bool>, which may optimize in future. 3. retrieve valid_data. parse valid_data in search/query. #31728 --------- Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-07-23 16:07:51 +08:00
zhagnlu	0d7ea8ec42	enhance: Enhance and correct exception module (#33705 ) #33704 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-06-23 21:22:01 +08:00
Buqian Zheng	8cb350598c	enhance: Improve GetVectorById of Sparse Float Vector (#33209 ) issue: #29419 * sparse float vector to support raw data mmap For get vector from chunk cache, I added a unit test but marking it as skipped due to a known issue. I have tested it locally. Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2024-06-12 10:09:55 +08:00

1 2

84 Commits