milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-04 18:02:08 +08:00

Author	SHA1	Message	Date
cai.zhang	3d11ba06ef	fix: Double check to avoid iter has been earsed by other thread (#45013 ) issue: #44974 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-10-21 23:36:04 +08:00
cai.zhang	a35a3b7c69	fix: Ensure fulfill promise when CreateArrowFileSystem throw an exception (#44975 ) issue: #44974 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-10-20 23:32:03 +08:00
congqixia	27dbb8e75d	fix: support JSON default value in `CreateArrowScalarFromDefaultValue` (#44912 ) Related to #44897 Add missing JSON data type handling in CreateArrowScalarFromDefaultValue to fix query failures when dynamic fields are enabled. JSON default values are now properly converted to arrow::BinaryScalar using bytes_data(). Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-10-17 18:22:00 +08:00
Spade A	c4f3f0ce4c	feat: impl StructArray -- support more types of vector in STRUCT (#44736 ) ref: https://github.com/milvus-io/milvus/issues/42148 --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-10-15 10:25:59 +08:00
congqixia	5ece760d73	fix: Pass fs via `FileManagerContext` when loading index (#44733 ) Related to #44615 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-10-11 09:55:57 +08:00
congqixia	8a443c699e	fix: Make aws credential provider singleton (#44687 ) Related to #44647 This patch make milvus-storage using singleton credential provider in case of data race when concurrent index build task recieved. See also milvus-io/milvus-storage#44647 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-10-09 16:11:58 +08:00
congqixia	1d85b83215	enhance: [backlog] Fix unittest and remove fs fallback logic (#44615 ) Related to #44535 This PR: - Fix the unittest creating `DiskFileManagerImpl` without `filesystem` - Add comments for methods need `fs_` - Remove fallback creation and add assertion for nullptr fs Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-10-09 10:41:57 +08:00
cai.zhang	19346fa389	feat: Geospatial Data Type and GIS Function support for milvus (#44547 ) issue: #43427 This pr's main goal is merge #37417 to milvus 2.5 without conflicts. # Main Goals 1. Create and describe collections with geospatial type 2. Insert geospatial data into the insert binlog 3. Load segments containing geospatial data into memory 4. Enable query and search can display geospatial data 5. Support using GIS funtions like ST_EQUALS in query 6. Support R-Tree index for geometry type # Solution 1. Add Type: Modify the Milvus core by adding a Geospatial type in both the C++ and Go code layers, defining the Geospatial data structure and the corresponding interfaces. 2. Dependency Libraries: Introduce necessary geospatial data processing libraries. In the C++ source code, use Conan package management to include the GDAL library. In the Go source code, add the go-geom library to the go.mod file. 3. Protocol Interface: Revise the Milvus protocol to provide mechanisms for Geospatial message serialization and deserialization. 4. Data Pipeline: Facilitate interaction between the client and proxy using the WKT format for geospatial data. The proxy will convert all data into WKB format for downstream processing, providing column data interfaces, segment encapsulation, segment loading, payload writing, and cache block management. 5. Query Operators: Implement simple display and support for filter queries. Initially, focus on filtering based on spatial relationships for a single column of geospatial literal values, providing parsing and execution for query expressions.Now only support brutal search 7. Client Modification: Enable the client to handle user input for geospatial data and facilitate end-to-end testing.Check the modification in pymilvus. --------- Signed-off-by: Yinwei Li <yinwei.li@zilliz.com> Signed-off-by: Cai Zhang <cai.zhang@zilliz.com> Co-authored-by: ZhuXi <150327960+Yinwei-Yu@users.noreply.github.com>	2025-09-28 19:43:05 +08:00
foxspy	13c3b0b909	enhance: add autoindex configuration for the int8 vector type (#44554 ) issue: #38666 Add int8 support for autoindex to ensure it can be independently configured. At the same time, remove the restriction on int8 type for vectorDiskIndex (note that vectorDiskIndex only determines the building and loading method of the index, not the index type). Signed-off-by: xianliang.li <xianliang.li@zilliz.com>	2025-09-24 17:48:04 +08:00
congqixia	ea307ea3c9	fix: [StorageV2] Make DiskFileManager use fs from context (#44535 ) Related to #44534 Datanode shall not use singleton fs after 2.6+. This patch make disk file manager use filesystem passed by fileManagerContext instead of errorous singleton one. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-09-24 10:12:03 +08:00
Tianx	2c0c5ef41e	feat: timestamptz expression & index & timezone (#44080 ) issue: https://github.com/milvus-io/milvus/issues/27467 >My plan is as follows. >- [x] M1 Create collection with timestamptz field >- [x] M2 Insert timestamptz field data >- [x] M3 Retrieve timestamptz field data >- [x] M4 Implement handoff >- [x] M5 Implement compare operator >- [x] M6 Implement extract operator >- [x] M8 Support database/collection level default timezone >- [x] M7 Support STL-SORT index for datatype timestamptz --- The third PR of issue: https://github.com/milvus-io/milvus/issues/27467, which completes M5, M6, M7, M8 described above. ## M8 Default Timezone We will be able to use alter_collection() and alter_database() in a future Python SDK release to modify the default timezone at the collection or database level. For insert requests, the timezone will be resolved using the following order of precedence: String Literal-> Collection Default -> Database Default. For retrieval requests, the timezone will be resolved in this order: Query Parameters -> Collection Default -> Database Default. In both cases, the final fallback timezone is UTC. ## M5: Comparison Operators We can now use the following expression format to filter on the timestamptz field: - `timestamptz_field [+/- INTERVAL 'interval_string'] {comparison_op} ISO 'iso_string' ` - The interval_string follows the ISO 8601 duration format, for example: P1Y2M3DT1H2M3S. - The iso_string follows the ISO 8601 timestamp format, for example: 2025-01-03T00:00:00+08:00. - Example expressions: "tsz + INTERVAL 'P0D' != ISO '2025-01-03T00:00:00+08:00'" or "tsz != ISO '2025-01-03T00:00:00+08:00'". ## M6: Extract We will be able to extract sepecific time filed by kwargs in a future Python SDK release. The key is `time_fields`, and value should be one or more of "year, month, day, hour, minute, second, microsecond", seperated by comma or space. Then the result of each record would be an array of int64. ## M7: Indexing Support Expressions without interval arithmetic can be accelerated using an STL-SORT index. However, expressions that include interval arithmetic cannot be indexed. This is because the result of an interval calculation depends on the specific timestamp value. For example, adding one month to a date in February results in a different number of added days than adding one month to a date in March. --- After this PR, the input / output type of timestamptz would be iso string. Timestampz would be stored as timestamptz data, which is int64_t finally. > for more information, see https://en.wikipedia.org/wiki/ISO_8601 --------- Signed-off-by: xtx <xtianx@smail.nju.edu.cn>	2025-09-23 10:24:12 +08:00
congqixia	7b83314bf3	enhance: [StorageV2] Make datanode use non-singleton fs (#44418 ) Related to #39173 According to the current design, datanode shall create fs from storage config in request instead of using singleton fs. This PR upgrade milvus-storage and make packed reader/writer compose new fs from storage config. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-09-18 20:06:00 +08:00
sthuang	2f70a73258	fix: turn on azure by default (#44377 ) related: #44354, #44138, #43869 --------- Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-09-17 10:12:01 +08:00
sthuang	b38013352d	enhance: [StorageV2] enable build with azure (#44177 ) related: #43869 --------- Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-09-14 08:05:58 +08:00
congqixia	f5618d5153	enhance: [StorageV2] Utilized advance split policy and persist in meta (#44282 ) Related to #44257 This PR: - Utilize configurable split policy for storage v2, enabling system field policy - Store split result in field binlog struct - Adapt legacy binlog without child fields --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-09-10 14:47:57 +08:00
sparknack	4a01c726f3	enhance: cachinglayer: some metric and params update (#44276 ) issue: #41435 --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-09-10 11:03:57 +08:00
Buqian Zheng	9bf2b5c10c	enhance: moved more segcore unit test files (#44210 ) issue: https://github.com/milvus-io/milvus/issues/43931 --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-09-08 10:21:55 +08:00
Buqian Zheng	b76bf13fc3	enhance: move c++ unit test file to aside of the production code (#43932 ) issue: https://github.com/milvus-io/milvus/issues/43931 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-09-03 23:45:53 +08:00
Spade A	7cb15ef141	feat: impl StructArray -- optimize vector array serialization (#44035 ) issue: https://github.com/milvus-io/milvus/issues/42148 Optimized from Go VectorArray → VectorArray Proto → Binary → C++ VectorArray Proto → C++ VectorArray local impl → Memory to Go VectorArray → Arrow ListArray → Memory --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-09-03 16:39:53 +08:00
zhagnlu	fc876639cf	enhance: support json stats with shredding design (#42534 ) #42533 Co-authored-by: luzhang <luzhang@zilliz.com>	2025-09-01 10:49:52 +08:00
congqixia	e3b3502287	fix: Use correct regex for cppcheck (#44077 ) Related to #44076 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-08-27 20:57:50 +08:00
marcelo-cjl	e13e19cd2c	enhance: add sparse_u32_f32 data type for sparse vertor (#43974 ) issue: #43973 Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com>	2025-08-27 16:47:50 +08:00
Chun Han	da156981c6	feat: milvus support posix-compatible mode(milvus-io#43942) (#43944 ) related: #43942 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-08-27 16:29:50 +08:00
XuanYang-cn	37a447d166	feat: Add CMEK cipher plugin (#43722 ) 1. Enable Milvus to read cipher configs 2. Enable cipher plugin in binlog reader and writer 3. Add a testCipher for unittests 4. Support pooling for datanode 5. Add encryption in storagev2 See also: #40321 Signed-off-by: yangxuan <xuan.yang@zilliz.com> --------- Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-08-27 11:15:52 +08:00
Tianx	c0d62268ac	feat: add timesatmptz data type (#44005 ) issue: https://github.com/milvus-io/milvus/issues/27467 > https://github.com/milvus-io/milvus/issues/27467#issuecomment-3092211420 > * [x] M1 Create collection with timestamptz field > * [x] M2 Insert timestamptz field data > * [x] M3 Retrieve timestamptz field data > * [x] M4 Implement handoff[ ] The second PR of issue: https://github.com/milvus-io/milvus/issues/27467, which completes M1-M4 described above. --------- Signed-off-by: xtx <xtianx@smail.nju.edu.cn>	2025-08-26 15:59:53 +08:00
Gao	e97a618630	enhance: support readAt interface for remote input stream (#43997 ) #42032 Also, fix the cacheoptfield method to work in storagev2. Also, change the sparse related interface for knowhere version bump #43974 . Also, includes https://github.com/milvus-io/milvus/pull/44046 for metric lost. --------- Signed-off-by: chasingegg <chao.gao@zilliz.com> Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com> Signed-off-by: Congqi Xia <congqi.xia@zilliz.com> Co-authored-by: marcelo.chen <marcelo.chen@zilliz.com> Co-authored-by: Congqi Xia <congqi.xia@zilliz.com>	2025-08-26 11:19:58 +08:00
sparknack	4fae074d56	enhance: add write rate limit for disk file writer (#43912 ) issue: #43040 --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-08-25 10:27:47 +08:00
Gao	b602b4187d	enhance: upgrade aws-sdk from 1.9.234 to 1.11.352 (#43916 ) issue: #43908 Signed-off-by: chasingegg <chao.gao@zilliz.com>	2025-08-19 11:11:45 +08:00
foxspy	647c2bca2d	enhance: Support streaming read and write of vector index files (#43824 ) issue: #42032 Signed-off-by: xianliang.li <xianliang.li@zilliz.com>	2025-08-15 23:41:43 +08:00
Gao	81a0915c29	enhance: add milvus-common module to decouple knwhere & segcore (#43624 ) issue: https://github.com/milvus-io/milvus/issues/42032 https://github.com/milvus-io/milvus/issues/41435 based on pr: https://github.com/milvus-io/milvus/pull/42124 --------- Signed-off-by: chasingegg <chao.gao@zilliz.com> Co-authored-by: xianliang.li <xianliang.li@zilliz.com>	2025-08-11 14:09:42 +08:00
zhagnlu	c04d678ad4	enhance: make segcore params effective without restarting milvus (#43231 ) #43230 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-08-08 10:33:48 +08:00
congqixia	0b860b4aec	fix: Revert "enhance: DataCodec to release ownership of input_data after initialization (#43542 )" (#43571 )	2025-07-25 20:48:16 +08:00
Buqian Zheng	d23205b718	enhance: DataCodec to release ownership of input_data after initialization (#43542 ) issue: https://github.com/milvus-io/milvus/issues/43088 issue: https://github.com/milvus-io/milvus/issues/43038 see also https://github.com/milvus-io/milvus/pull/43533. Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-07-25 14:24:54 +08:00
sthuang	5cebc9f7f6	fix: [StorageV2] handle correct cid with multiple files and add storage v2 prefix logs (#43539 ) related: #43372 --------- Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-07-25 11:22:54 +08:00
sthuang	59bbdd93f5	fix: [StorageV2] fill the correct group chunk into cell (#43486 ) The root cause of the issue lies in the fact that when a sealed segment contains multiple row groups, the get_cells function may receive unordered cids. This can result in row groups being written into incorrect cells during data retrieval. Previously, this issue was hard to reproduce because the old Storage V2 writer had a bug that caused it to write row groups larger than 1MB. These large row groups could lead to uncontrolled memory usage and eventually an OOM (Out of Memory) error. Additionally, compaction typically produced a single large row group, which avoided the incorrect cell-filling issue during query execution. related: https://github.com/milvus-io/milvus/issues/43388, https://github.com/milvus-io/milvus/issues/43372, https://github.com/milvus-io/milvus/issues/43464, #43446, #43453 --------- Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-07-22 22:22:53 +08:00
Buqian Zheng	389104d200	enhance: rename PanicInfo to ThrowInfo (#43384 ) issue: #41435 this is to prevent AI from thinking of our exception throwing as a dangerous PANIC operation that terminates the program. Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-07-19 20:22:52 +08:00
Buqian Zheng	f7b262a702	feat: make storagev1 to support eviction (#43219 ) issue: https://github.com/milvus-io/milvus/issues/41435 turns out we have per file binlog size in golang code, by passing it into segcore we can support eviction in storage v1 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-07-19 02:02:52 +08:00
congqixia	ae48f0e484	fix: [StorageV2] Handle missing column creating index (#43292 ) Related to #43250 Use FieldIDList to check missing field. If column is missing, return empty resultset Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-14 17:06:50 +08:00
sthuang	276c52490d	fix: [StorageV2] missing arrow fs when building index (#43162 ) fix: https://github.com/milvus-io/milvus/issues/43150, https://github.com/milvus-io/milvus/issues/43149 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-07-07 15:26:46 +08:00
congqixia	1d9a9a993d	fix: [StorageV2] Use correct template typename for `cache_raw_data_to_disk_common` (#43104 ) Related to #43099 Previously `cache_raw_data_to_disk_common` used `milvus::DataType` template typename, which shall be `knowhere::bf16` or other actual datatype. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-03 18:50:46 +08:00
Zhen Ye	bbbc7d4517	enhance: collect all cgo calling into metric and log slow cgo call (#43035 ) issue: #42833 - also fix the error metric for async cgo. - also make sure the roles can be seen when node startup, #43041. Signed-off-by: chyezh <chyezh@outlook.com>	2025-07-03 15:00:44 +08:00
sparknack	7e855f1046	enhance: add disk file writer with Direct IO support (#42665 ) issue: #43040 This patch introduces a disk file writer that supports Direct IO. Currently, it is exclusively utilized during the QueryNode load process. Below is its parameters: 1. `common.diskWriteMode` This parameter controls the write mode of the local disk, which is used to write temporary data downloaded from remote storage. Currently, only QueryNode uses 'common.diskWrite*' parameters. Support for other components will be added in the future. The options include 'direct' and 'buffered'. The default value is 'buffered'. 2. `common.diskWriteBufferSizeKb` Disk write buffer size in KB, only used when disk write mode is 'direct', default is 64KB. Current valid range is [4, 65536]. If the value is not aligned to 4KB, it will be rounded up to the nearest multiple of 4KB. 3. `common.diskWriteNumThreads` This parameter controls the number of writer threads used for disk write operations. The valid range is [0, hardware_concurrency]. It is designed to limit the maximum concurrency of disk write operations to reduce the impact on disk read performance. For example, if you want to limit the maximum concurrency of disk write operations to 1, you can set this parameter to 1. The default value is 0, which means the caller will perform write operations directly without using an additional writer thread pool. In this case, the maximum concurrency of disk write operations is determined by the caller's thread pool size. Both parameters can be updated during runtime. --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-07-02 22:18:44 +08:00
Spade A	26ec841feb	feat: optimize `Like` query with n-gram (#41803 ) Ref #42053 This is the first PR for optimizing `LIKE` with ngram inverted index. Now, only VARCHAR data type is supported and only InnerMatch LIKE (%xxx%) query is supported. How to use it: ``` milvus_client = MilvusClient("http://localhost:19530") schema = milvus_client.create_schema() ... schema.add_field("content_ngram", DataType.VARCHAR, max_length=10000) ... index_params = milvus_client.prepare_index_params() index_params.add_index(field_name="content_ngram", index_type="NGRAM", index_name="ngram_index", min_gram=2, max_gram=3) milvus_client.create_collection(COLLECTION_NAME, ...) ``` min_gram and max_gram controls how we tokenize the documents. For example, for min_gram=2 and max_gram=4, we will tokenize each document with 2-gram, 3-gram and 4-gram. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-07-01 10:08:44 +08:00
sthuang	238bd30f42	fix: [StorageV2] end to end minor issues for sync, stats, and load (#42948 ) Fix issues in end-to-end tests: 1. Split column groups based on schema, rather than estimating by average chunk row size. Ensure column group consistency within a segment, to avoid errors caused by loading multiple column group chunks simultaneously. 2. Use sorted segmentId when generating the stats binlog path, to ensure consistent and correct file path resolution. 3. Determine field IDs as follows: For multi-column column groups, retrieve the field ID list from metadata. For single-column column groups, use the column group ID directly as the field ID. related: #39173 fix: #42862 --------- Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-06-27 14:44:42 +08:00
XuanYang-cn	0dfe5308e1	enhance: Tidy Download and decode in segcore storage (#42902 ) 1. Unify calling from GetObjectData 2. Move SetData inside Deserialize See also: #40013 --------- Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-06-25 11:10:43 +08:00
sthuang	0d57acb13a	enhance: [StorageV2] field id as meta path for wide column when load (#42863 ) related: #42862 #39173 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-06-25 11:08:48 +08:00
Xianhui Lin	b902960057	fix: revert remote jsonstats path (#42882 ) fix: revert remote jsonstats path relate-pr:https://github.com/milvus-io/milvus/pull/42676 issue:https://github.com/milvus-io/milvus/issues/42872 Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-06-21 13:24:39 +08:00
sthuang	4a0a2441f2	enhance: [StorageV2] field id as meta path for wide column (#42787 ) related: #39173 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-06-19 15:00:38 +08:00
Spade A	e2c85eec81	fix: load stats index based on mmap config (#42788 ) ref https://github.com/milvus-io/milvus/issues/42626 This PR makes text match index and json key stats index be loaded based on mmap config. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-06-19 10:10:39 +08:00
Spade A	80f1d707f7	fix: tidy up path for scalar index (#42676 ) Ref #42626 This path tidy up path for scalar index including path for loading index from remote storage and temporary path for buliding index. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-06-18 00:42:38 +08:00

1 2 3 4 5 ...

251 Commits