milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-07 19:31:51 +08:00

Author	SHA1	Message	Date
Patrick Weizhi Xu	04fff74a56	feat: introduce Text data type (#39874 ) issue: https://github.com/milvus-io/milvus/issues/39818 This PR mimics Varchar data type, allows insert, search, query, delete, full-text search and others. Functionalities related to filter expressions are disabled temporarily. Storage changes for Text data type will be in the following PRs. Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>	2025-02-19 11:04:51 +08:00
zhagnlu	316534e065	enhance: optimize delete init construct code (#39327 ) #39326 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-02-17 21:05:26 +08:00
Bingyi Sun	b59555057d	feat: support json index (#36750 ) https://github.com/milvus-io/milvus/issues/35528 This PR adds json index support for json and dynamic fields. Now you can only do unary query like 'a["b"] > 1' using this index. We will support more filter type later. basic usage: ``` collection.create_index("json_field", {"index_type": "INVERTED", "params": {"json_cast_type": DataType.STRING, "json_path": 'json_field["a"]["b"]'}}) ``` There are some limits to use this index: 1. If a record does not have the json path you specify, it will be ignored and there will not be an error. 2. If a value of the json path fails to be cast to the type you specify, it will be ignored and there will not be an error. 3. A specific json path can have only one json index. 4. If you try to create more than one json indexes for one json field, sdk(pymilvus<=2.4.7) may return immediately because of internal implementation. This will be fixed in a later version. --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-02-15 14:06:15 +08:00
Cai Yudong	341d6c1eb7	feat: Update segcore for VECTOR_INT8 (#39415 ) Issue: #38666 Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>	2025-01-21 11:03:03 +08:00
congqixia	45d49df89b	fix: Skip load extra indexes for sorted segment pk field (#39389 ) Related to #39339 Extra indexes can be ignored for most cases since sorted pk column already provided indexing features --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-01-20 18:13:15 +08:00
congqixia	7cac87caca	fix: Skip erase field if index build on PK field (#39370 ) Related to #39339 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-01-17 20:31:02 +08:00
congqixia	da1b786ef8	enhance: Utilize "find0" in segment.find_first (#39229 ) Related to #39003 Previous PR #39004 has to clone & flip bitset due to bitset does not support find0 operator. #39176 added this feature so clone & flip could be removed now. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-01-14 14:14:58 +08:00
Zhen Ye	5f94954bb4	fix: data race when accessing field_ when retrieving (#39151 ) issue: #39148 Signed-off-by: chyezh <chyezh@outlook.com>	2025-01-13 11:23:04 +08:00
Ted Xu	3dc95153b7	fix: build break under debug mode (#38790 ) See #38435 Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-01-07 17:36:56 +08:00
Chun Han	3739446a33	enhance: refine array view to optimize memory usage(#38736 ) (#38808 ) related: #38736 700m data, array_length=10 non-mmap_offsets_uint64: 2.0G mmap_offsets_uint64: 1.1G mmap_offsets_uint32: 880MB Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-01-07 13:26:55 +08:00
congqixia	72f5b85c05	enhance: Accelerate `find_first` by utilizing bitset simd methods (#39004 ) Related to #39003 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-01-07 10:34:54 +08:00
aoiasd	bc15ad24f2	fix: sealed segment get empty index params when brute force search for bm25 (#38707 ) relate: https://github.com/milvus-io/milvus/issues/38236 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2024-12-25 19:06:51 +08:00
Ted Xu	acc8fb7af6	enhance: eliminate compile warnings (part2) (#38535 ) See #38435 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-12-25 15:30:50 +08:00
zhagnlu	01de0afc4e	enhance: refactor delete mvcc function (#38066 ) #37413 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-12-15 18:02:43 +08:00
Gao	994fc544e7	enhance: support iterative filter execution (#37363 ) issue: #37360 --------- Signed-off-by: chasingegg <chao.gao@zilliz.com>	2024-12-11 11:32:44 +08:00
aoiasd	e9391acf80	fix: bm25 brute force search need index params k1 and b (#37721 ) relate: https://github.com/milvus-io/milvus/issues/35853 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2024-11-18 15:44:31 +08:00
zhagnlu	e4b6773d0a	fix: fix create text index dir conflict bug (#37693 ) #37623 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-11-15 18:26:30 +08:00
smellthemoon	3389a6b500	enhance: support null in text match index (#37517 ) #37508 Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-11-13 11:08:29 +08:00
aoiasd	12951f0abb	enhance: rename tokenizer to analyzer and check analyzer params (#37478 ) relate: https://github.com/milvus-io/milvus/issues/35853 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2024-11-10 16:12:26 +08:00
aoiasd	d67853fa89	feat: Tokenizer support build with params and clone for concurrency (#37048 ) relate: https://github.com/milvus-io/milvus/issues/35853 https://github.com/milvus-io/milvus/issues/36751 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2024-11-06 17:48:24 +08:00
cai.zhang	625b6176cd	fix: Search for pk using raw data to reduce the overhead caused by views (#37202 ) issue: #37152 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-11-05 20:36:24 +08:00
zhenshan.cao	63843dce33	fix: Fix conan gdal building problem (#37338 ) issue:https://github.com/milvus-io/milvus/issues/27576 Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2024-10-31 21:04:16 +08:00
Hao Tan	67c4340565	feat: Geospatial Data Type and GIS Function Support for milvus server (#35990 ) issue:https://github.com/milvus-io/milvus/issues/27576 # Main Goals 1. Create and describe collections with geospatial fields, enabling both client and server to recognize and process geo fields. 2. Insert geospatial data as payload values in the insert binlog, and print the values for verification. 3. Load segments containing geospatial data into memory. 4. Ensure query outputs can display geospatial data. 5. Support filtering on GIS functions for geospatial columns. # Solution 1. Add Type: Modify the Milvus core by adding a Geospatial type in both the C++ and Go code layers, defining the Geospatial data structure and the corresponding interfaces. 2. Dependency Libraries: Introduce necessary geospatial data processing libraries. In the C++ source code, use Conan package management to include the GDAL library. In the Go source code, add the go-geom library to the go.mod file. 3. Protocol Interface: Revise the Milvus protocol to provide mechanisms for Geospatial message serialization and deserialization. 4. Data Pipeline: Facilitate interaction between the client and proxy using the WKT format for geospatial data. The proxy will convert all data into WKB format for downstream processing, providing column data interfaces, segment encapsulation, segment loading, payload writing, and cache block management. 5. Query Operators: Implement simple display and support for filter queries. Initially, focus on filtering based on spatial relationships for a single column of geospatial literal values, providing parsing and execution for query expressions. 6. Client Modification: Enable the client to handle user input for geospatial data and facilitate end-to-end testing.Check the modification in pymilvus. --------- Signed-off-by: tasty-gumi <1021989072@qq.com>	2024-10-31 20:58:20 +08:00
cai.zhang	86687bd8ed	enhance: Refine code for get_deleted_bitmap (#36819 ) issue: #33744 Check whether the PK is truly sorted in the debug model. --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-10-28 15:19:30 +08:00
smellthemoon	eb3e4583ec	enhance: all op(Null) is false in expr (#35527 ) #31728 --------- Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-10-17 21:14:30 +08:00
cqy123456	b474374ea5	enhance: use growingMmapEnabled to control the behavior of interim index, not vectorField (#36500 ) issue:https://github.com/milvus-io/milvus/issues/36392 related pr: https://github.com/milvus-io/milvus/pull/36391 Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>	2024-10-17 20:25:24 +08:00
Bingyi Sun	a75bb85f3a	feat: support chunked column for sealed segment (#35764 ) This PR splits sealed segment to chunked data to avoid unnecessary memory copy and save memory usage when loading segments so that loading can be accelerated. To support rollback to previous version, we add an option `multipleChunkedEnable` which is false by default. Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-10-12 15:04:52 +08:00
SimFG	130a923dec	enhance: the estimate method when loading the collection (#36307 ) - issue: #36530 --------- Signed-off-by: SimFG <bang.fu@zilliz.com> Signed-off-by: xianliang.li <xianliang.li@zilliz.com> Co-authored-by: xianliang.li <xianliang.li@zilliz.com>	2024-10-09 17:35:19 +08:00
Buqian Zheng	8495bc6bbc	fix: fix broken Sparse Float Vector raw data mmap (#36183 ) issue: https://github.com/milvus-io/milvus/issues/36182 * improved `Column.h` to make the code much more readable and maintainable, and added detailed comments. * fixed an issue where `ArrayColumn::NumRows()` always returns 0 when the mmap backing storage is a file. * removed unused `ColumnBase` constructors and unnecessary members so we don't get confused. * Updated `test_chunk_cache.cpp` to make the tests parameterized: to test both mmap enabled and disabled. Added sparse field in the test to add coverage. * re-enabled test `Sealed::GetSparseVectorFromChunkCache`. * But 2 other disabled tests `Sealed::WarmupChunkCache` and `Sealed::GetVectorFromChunkCache` remain disabled, there seems to be errors. @bigsheeper PTAL. --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2024-09-25 18:59:13 +08:00
yihao.dai	8cda48a96a	enhance: Use mmap.scalarIndex config for text index (#36400 ) issue: https://github.com/milvus-io/milvus/issues/35273 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-09-24 12:21:13 +08:00
zhagnlu	489087d18b	enhance: refactor executor framework V2 (#35251 ) #32636 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-09-13 20:57:09 +08:00
congqixia	58d3200986	enhance: Filter out non-hit delete records during load delta (#36207 ) Related to #35303 This PR utilizes pk index in segment to exclude non-hit delete record during load delete records. This ability is crucial when l0/delete forward policy only replies on segment itself(without BF filtering). --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-09-13 19:05:08 +08:00
Jiquan Long	89bf226f0b	feat: support keyword text match (#35923 ) fix: #35922 --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2024-09-10 15:11:08 +08:00
cqy123456	560e8e70b0	enhance: reduce mmap_rss after chunkcache warmup (#35974 ) related pr: https://github.com/milvus-io/milvus/pull/35965 Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>	2024-09-05 18:07:05 +08:00
zhagnlu	74048ce34f	fix:rename mmap file path to avoid directory conflict (#35810 ) #35784 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-09-03 16:05:03 +08:00
cai.zhang	2c9bb4dfa3	feat: Support stats task to sort segment by PK (#35054 ) issue: #33744 This PR includes the following changes: 1. Added a new task type to the task scheduler in datacoord: stats task, which sorts segments by primary key. 2. Implemented segment sorting in indexnode. 3. Added a new field `FieldStatsLog` to SegmentInfo to store token index information. --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-09-02 14:19:03 +08:00
yihao.dai	f2b83d316b	enhance: Support memory mode chunk cache (#35347 ) Chunk cache supports loading raw vectors into memory. issue: https://github.com/milvus-io/milvus/issues/35273 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-08-25 15:42:58 +08:00
zhagnlu	3107701fe8	enhance: optimize retrieve on dynamic field (#35580 ) #35514 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com> Co-authored-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2024-08-22 14:24:56 +08:00
smellthemoon	80dbe87759	enhance: support null value in index (#35238 ) #31728 --------- Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-08-16 15:30:54 +08:00
zhagnlu	4b553b0333	enhance: revert remove duplicated pk function (#35103 ) issue: #34778 Revert "fix: fix query count(*) concurrently" Revert "enhance: mark duplicated pk as deleted " Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-08-05 10:48:17 +08:00
smellthemoon	475c333fa2	enhance: add valid_data in span (#35030 ) #31728 Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-08-02 15:40:14 +08:00
zhenshan.cao	aa247f192d	enhance: remove unused code for StorageV2 (#35132 ) issue: https://github.com/milvus-io/milvus/issues/34168 Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2024-08-01 12:08:13 +08:00
zhagnlu	dd0c26cf58	enhance: redefine variable column block size (#35040 ) #35013 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-07-30 19:23:50 +08:00
smellthemoon	5616b7e8d2	enhance: support null in c data_datacodec and load null value (#32183 ) 1. support read and write null in segcore will store valid_data(use uint8_t type to save memory) in fieldData. 2. support load null binlog reader read and write data into column(sealed segment), insertRecord(growing segment). In sealed segment, store valid_data directly. In growing segment, considering prior implementation and easy code reading, it covert uint8_t to fbvector<bool>, which may optimize in future. 3. retrieve valid_data. parse valid_data in search/query. #31728 --------- Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-07-23 16:07:51 +08:00
zhagnlu	804dd5409a	enhance: mark duplicated pk as deleted (#34586 ) fix #34247 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-07-16 14:25:39 +08:00
congqixia	4850336ca3	fix: Write padding at end of mmap file not chunk (#34529 ) Related to #34508 The padding bytes shall be written only at the end of the mmap file not the chunk of each field data file. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-07-10 11:12:14 +08:00
zhagnlu	3030e4625e	enhance: refactor variable column to reduce memory cost (#33875 ) #33874 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-06-30 20:16:06 +08:00
zhagnlu	03a3f50892	enhance: add skip using array index when some situation (#33947 ) #32900 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-06-23 21:26:02 +08:00
cqy123456	dc4437ff82	enhance: use segment id and type to register in MmapChunkManager and opt malloc in variableChunk (#33993 ) issue: https://github.com/milvus-io/milvus/issues/32984 Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>	2024-06-20 17:42:02 +08:00
cqy123456	b460862537	fix: can't find Chunk struct after growing support mmap (#33951 ) issue: https://github.com/milvus-io/milvus/issues/32984 Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>	2024-06-18 18:37:58 +08:00

1 2 3 4 5

211 Commits