milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-07 19:31:51 +08:00

Author	SHA1	Message	Date
aoiasd	cfeb095ad7	enhance: forbid build analyzer at proxy (#44067 ) relate: https://github.com/milvus-io/milvus/issues/43687 We used to run the temporary analyzer and validate analyzer on the proxy, but the proxy should not be a computation-heavy node. This PR move all analyzer calculations to the streaming node. --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-10-23 10:58:12 +08:00
Ted Xu	196006b4ce	enhance: update delta log serialization APIs to integrate storage V2 (#44998 ) See #39173 In this PR: - Adjusted the delta log serialization APIs. - Refactored the stats collector to improve the collection and digest of primary key and BM25 statistics. - Introduced new tests for the delta log reader/writer and stats collectors to ensure functionality and correctness. --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-10-22 15:58:12 +08:00
congqixia	8f16afd5e7	fix: Handle JSON field default values in storage layer (#44999 ) Related to #44995 Added missing case for JSON data type in GetDefaultValue function to properly retrieve default values for JSON fields. This prevents crashes when enabling dynamic fields with default values during concurrent insert operations. Changes: - Added JSON data type case in GetDefaultValue to return BytesData - Added comprehensive tests for fillMissingFields covering JSON and other data types with default values - Added tests for nullable fields, required fields validation, and edge cases Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-10-21 15:14:03 +08:00
cai.zhang	d5ecb63f53	enhance: Support import geometry data by json/csv (#44826 ) issue: #44787 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-10-17 17:08:02 +08:00
Spade A	6c8e353439	feat: impl StructArray -- ban non-float-vector for now (#44875 ) ref https://github.com/milvus-io/milvus/issues/42148 --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-10-17 10:26:09 +08:00
Spade A	c4f3f0ce4c	feat: impl StructArray -- support more types of vector in STRUCT (#44736 ) ref: https://github.com/milvus-io/milvus/issues/42148 --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-10-15 10:25:59 +08:00
cai.zhang	7a93cfe890	fix: Fix bug for nullable geometry (#44732 ) issue: #44648 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-10-11 11:27:57 +08:00
aoiasd	78ee76f018	enhance: support preload sealed segment bm25 stats and optimize bm25 stats serialize (#44279 ) relate: https://github.com/milvus-io/milvus/issues/41424 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-09-29 16:35:05 +08:00
cai.zhang	19346fa389	feat: Geospatial Data Type and GIS Function support for milvus (#44547 ) issue: #43427 This pr's main goal is merge #37417 to milvus 2.5 without conflicts. # Main Goals 1. Create and describe collections with geospatial type 2. Insert geospatial data into the insert binlog 3. Load segments containing geospatial data into memory 4. Enable query and search can display geospatial data 5. Support using GIS funtions like ST_EQUALS in query 6. Support R-Tree index for geometry type # Solution 1. Add Type: Modify the Milvus core by adding a Geospatial type in both the C++ and Go code layers, defining the Geospatial data structure and the corresponding interfaces. 2. Dependency Libraries: Introduce necessary geospatial data processing libraries. In the C++ source code, use Conan package management to include the GDAL library. In the Go source code, add the go-geom library to the go.mod file. 3. Protocol Interface: Revise the Milvus protocol to provide mechanisms for Geospatial message serialization and deserialization. 4. Data Pipeline: Facilitate interaction between the client and proxy using the WKT format for geospatial data. The proxy will convert all data into WKB format for downstream processing, providing column data interfaces, segment encapsulation, segment loading, payload writing, and cache block management. 5. Query Operators: Implement simple display and support for filter queries. Initially, focus on filtering based on spatial relationships for a single column of geospatial literal values, providing parsing and execution for query expressions.Now only support brutal search 7. Client Modification: Enable the client to handle user input for geospatial data and facilitate end-to-end testing.Check the modification in pymilvus. --------- Signed-off-by: Yinwei Li <yinwei.li@zilliz.com> Signed-off-by: Cai Zhang <cai.zhang@zilliz.com> Co-authored-by: ZhuXi <150327960+Yinwei-Yu@users.noreply.github.com>	2025-09-28 19:43:05 +08:00
Tianx	2c0c5ef41e	feat: timestamptz expression & index & timezone (#44080 ) issue: https://github.com/milvus-io/milvus/issues/27467 >My plan is as follows. >- [x] M1 Create collection with timestamptz field >- [x] M2 Insert timestamptz field data >- [x] M3 Retrieve timestamptz field data >- [x] M4 Implement handoff >- [x] M5 Implement compare operator >- [x] M6 Implement extract operator >- [x] M8 Support database/collection level default timezone >- [x] M7 Support STL-SORT index for datatype timestamptz --- The third PR of issue: https://github.com/milvus-io/milvus/issues/27467, which completes M5, M6, M7, M8 described above. ## M8 Default Timezone We will be able to use alter_collection() and alter_database() in a future Python SDK release to modify the default timezone at the collection or database level. For insert requests, the timezone will be resolved using the following order of precedence: String Literal-> Collection Default -> Database Default. For retrieval requests, the timezone will be resolved in this order: Query Parameters -> Collection Default -> Database Default. In both cases, the final fallback timezone is UTC. ## M5: Comparison Operators We can now use the following expression format to filter on the timestamptz field: - `timestamptz_field [+/- INTERVAL 'interval_string'] {comparison_op} ISO 'iso_string' ` - The interval_string follows the ISO 8601 duration format, for example: P1Y2M3DT1H2M3S. - The iso_string follows the ISO 8601 timestamp format, for example: 2025-01-03T00:00:00+08:00. - Example expressions: "tsz + INTERVAL 'P0D' != ISO '2025-01-03T00:00:00+08:00'" or "tsz != ISO '2025-01-03T00:00:00+08:00'". ## M6: Extract We will be able to extract sepecific time filed by kwargs in a future Python SDK release. The key is `time_fields`, and value should be one or more of "year, month, day, hour, minute, second, microsecond", seperated by comma or space. Then the result of each record would be an array of int64. ## M7: Indexing Support Expressions without interval arithmetic can be accelerated using an STL-SORT index. However, expressions that include interval arithmetic cannot be indexed. This is because the result of an interval calculation depends on the specific timestamp value. For example, adding one month to a date in February results in a different number of added days than adding one month to a date in March. --- After this PR, the input / output type of timestamptz would be iso string. Timestampz would be stored as timestamptz data, which is int64_t finally. > for more information, see https://en.wikipedia.org/wiki/ISO_8601 --------- Signed-off-by: xtx <xtianx@smail.nju.edu.cn>	2025-09-23 10:24:12 +08:00
congqixia	6f7318a731	enhance: [StorageV2] Use compressed size as log file size (#44402 ) Related to #39173 backlog issue that memory size and log size shared same value. This patch add `GetFileSize` api to get remote compressed binlog size as meta log file size to calculate usage more accurate. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-09-16 21:20:02 +08:00
congqixia	aa861f55e6	enhance: [StorageV2] Reverts #44232 bucket name change (#44390 ) Related to #39173 - Put bucket name concatenation logic back for azure support This reverts commit 8f97eb355fde6b86cf37f166d2191750b4210ba3. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-09-16 10:10:00 +08:00
Spade A	eb793531b9	feat: impl StructArray -- support import for CSV/JSON/PARQUET/BINLOG (#44201 ) Ref https://github.com/milvus-io/milvus/issues/42148 --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-09-15 20:41:59 +08:00
congqixia	9cfa013ec6	enhance: [StorageV2] Store column group info in compaction result (#44327 ) Related to #44257 This PR store split result in compaction segment binlog struct to make querynode could utilize it later. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-09-11 19:49:57 +08:00
congqixia	fc968ff1c2	enhance: [StorageV2] Pass args for avg size split policy (#44301 ) Related to #44257 This PR - Pass column stats for avg size split policy - Add param items for policy configuration --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-09-11 10:43:57 +08:00
congqixia	f5618d5153	enhance: [StorageV2] Utilized advance split policy and persist in meta (#44282 ) Related to #44257 This PR: - Utilize configurable split policy for storage v2, enabling system field policy - Store split result in field binlog struct - Adapt legacy binlog without child fields --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-09-10 14:47:57 +08:00
cai.zhang	fb43651a74	fix: Fix MultiSegmentWrite only write one segment (#44256 ) issue: #44254 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-09-09 14:28:10 +08:00
Bingyi Sun	c5bb4f7483	fix: Fix merge sort out of range (#44230 ) issue: https://github.com/milvus-io/milvus/issues/44227 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-09-08 18:01:55 +08:00
congqixia	8f97eb355f	enhance: [StorageV2] Make bucket name concatenation transparent to user (#44232 ) Related to #39173 This PR: - Bump milvus-storage commit to handle bucket name concatenation logic in multipart s3 fs - Remove all user-side bucket name concatenation code Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-09-08 10:15:55 +08:00
Bingyi Sun	ef392bb1b2	enhance: merge sort support multiple fields (#44191 ) issue: #44011 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-09-04 19:53:53 +08:00
Spade A	7cb15ef141	feat: impl StructArray -- optimize vector array serialization (#44035 ) issue: https://github.com/milvus-io/milvus/issues/42148 Optimized from Go VectorArray → VectorArray Proto → Binary → C++ VectorArray Proto → C++ VectorArray local impl → Memory to Go VectorArray → Arrow ListArray → Memory --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-09-03 16:39:53 +08:00
Bingyi Sun	6624011927	enhance: storage sort can sort by multiple fields (#43994 ) https://github.com/milvus-io/milvus/issues/44011 this is to support compaction that sorts records by partition key and pk in the future --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-09-03 10:11:52 +08:00
XuanYang-cn	37a447d166	feat: Add CMEK cipher plugin (#43722 ) 1. Enable Milvus to read cipher configs 2. Enable cipher plugin in binlog reader and writer 3. Add a testCipher for unittests 4. Support pooling for datanode 5. Add encryption in storagev2 See also: #40321 Signed-off-by: yangxuan <xuan.yang@zilliz.com> --------- Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-08-27 11:15:52 +08:00
Spade A	8456f824be	feat: impl StructArray -- miscellaneous staffs for struct array (#43960 ) Ref https://github.com/milvus-io/milvus/issues/42148 1. enable storage v2 2. implement some missing staffs 3. fix some bugs and add tests --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-08-26 21:35:53 +08:00
Tianx	c0d62268ac	feat: add timesatmptz data type (#44005 ) issue: https://github.com/milvus-io/milvus/issues/27467 > https://github.com/milvus-io/milvus/issues/27467#issuecomment-3092211420 > * [x] M1 Create collection with timestamptz field > * [x] M2 Insert timestamptz field data > * [x] M3 Retrieve timestamptz field data > * [x] M4 Implement handoff[ ] The second PR of issue: https://github.com/milvus-io/milvus/issues/27467, which completes M1-M4 described above. --------- Signed-off-by: xtx <xtianx@smail.nju.edu.cn>	2025-08-26 15:59:53 +08:00
Spade A	d6a428e880	feat: impl StructArray -- support create index for vector array (embedding list) and search on it (#43726 ) Ref https://github.com/milvus-io/milvus/issues/42148 This PR supports create index for vector array (now, only for `DataType.FLOAT_VECTOR`) and search on it. The index type supported in this PR is `EMB_LIST_HNSW` and the metric type is `MAX_SIM` only. The way to use it: ```python milvus_client = MilvusClient("xxx:19530") schema = milvus_client.create_schema(enable_dynamic_field=True, auto_id=True) ... struct_schema = milvus_client.create_struct_array_field_schema("struct_array_field") ... struct_schema.add_field("struct_float_vec", DataType.ARRAY_OF_VECTOR, element_type=DataType.FLOAT_VECTOR, dim=128, max_capacity=1000) ... schema.add_struct_array_field(struct_schema) index_params = milvus_client.prepare_index_params() index_params.add_index(field_name="struct_float_vec", index_type="EMB_LIST_HNSW", metric_type="MAX_SIM", index_params={"nlist": 128}) ... milvus_client.create_index(COLLECTION_NAME, schema=schema, index_params=index_params) ``` Note: This PR uses `Lims` to convey offsets of the vector array to knowhere where vectors of multiple vector arrays are concatenated and we need offsets to specify which vectors belong to which vector array. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-08-20 10:27:46 +08:00
Ted Xu	e37cd19da2	enhance: enable storage v2 by default (#43652 ) Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-08-01 08:59:36 +08:00
sthuang	a2c7ed2780	fix: [StorageV2] sort field binlogs paths for packed reader and writer (#43585 ) key changes: * fix unstable storage v2 compaction unit test by guaranteeing the order of paths during sync. * bump milvus-storage version, include https://github.com/milvus-io/milvus-storage/pull/222 https://github.com/milvus-io/milvus-storage/pull/223 https://github.com/milvus-io/milvus-storage/pull/224 https://github.com/milvus-io/milvus-storage/pull/225 https://github.com/milvus-io/milvus-storage/pull/226 * Also fix the below related oom issue. related: https://github.com/milvus-io/milvus/issues/43310 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-07-30 08:09:36 +08:00
congqixia	34d3f0c0f8	enhance: Reserve builder space for ValueSerializer (#43570 ) Add `arrowBuild.Reserve` call for `ValueSerializer` to reduce repeated resizing buffer when write size is large Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-28 11:02:55 +08:00
Spade A	faeb7fd410	feat: impl StructArray -- create schema, insert, and retrieve data (#42855 ) Ref https://github.com/milvus-io/milvus/issues/42148 https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of storage for handling with VectorArray. This PR: 1. impls the go part of storage for VectorArray 2. impls the collection creation with StructArrayField and VectorArray 3. insert and retrieve data from the collection. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>	2025-07-27 01:30:55 +08:00
Ted Xu	9041bf1b9a	fix: including shouldCopy parameter in file readers (#43578 ) This parameter determines whether the returned value should be a copy or a reference from the arrow array. The updates enhance memory management and provide more control over data handling during deserialization. See #43186 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-07-26 17:30:55 +08:00
sthuang	a0c9f499ee	fix: [StorageV2] sync panic with nullable add field (#43142 ) related: https://github.com/milvus-io/milvus/pull/42932 fix: https://github.com/milvus-io/milvus/issues/43072 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-07-25 10:08:53 +08:00
congqixia	1cf8ed505f	fix: Implement `NeededFields` feature in `RecordReader` (#43523 ) Related to #43522 Currently, passing partial schema to storage v2 packed reader may trigger SEGV during clustering compaction unit test. This patch implement `NeededFields` differently in each `RecordReader` imlementation. For now, v2 will implemented as no-op. This will be supported after packed reader support this API. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-24 00:22:54 +08:00
congqixia	563e2935c5	enhance: [StorageV2] Fill ts range default values for `PackedBinlogRecordWriter` (#43454 ) This PR fill default value for `PackedBinlogRecordWriter` timestamp range so target segment meta will contains correct timestamp range Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-22 12:04:53 +08:00
yihao.dai	b69e601fe1	fix: [StorageV2] Correct read and write buffer size (#43335 ) Correct read and buffer size to 64MB to prevent OOM during clustering compaction. issue: https://github.com/milvus-io/milvus/issues/43310 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-07-16 14:28:52 +08:00
yihao.dai	1984be646c	fix: Fix storagev2 binlog import (#43221 ) issue: https://github.com/milvus-io/milvus/issues/43218 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-07-13 22:52:49 +08:00
congqixia	5a9efb3f81	enhance: [StorageV2] Refine storage rw option usage & validation (#43175 ) Related to #39173 This PR: - Make all datanode task passes storage config via storage config option - Remove legacy comments, rootPath & bucketName parameters - Fix clustering compaction option behavior - Add validation logic for `rwOptions` - Use correct storageType from storageConfig - Add storage config in sync task --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-11 01:14:48 +08:00
cai.zhang	95e767611a	fix: Fix merge sort loss data when last row in a record is deleted (#43216 ) issue: #43207 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-07-09 22:18:48 +08:00
cai.zhang	8720feeb79	fix: Fix enqueuing when current batch is fully deleted (#43174 ) issue: #43045 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-07-08 12:20:46 +08:00
congqixia	ab818dcbca	fix: [StorageV2] Pass storage config for compaction rw (#43167 ) Related to #43148 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-07 15:32:46 +08:00
congqixia	d09764508a	fix: [Storagev2] Close segment readers in mergeSort (#43116 ) Related to #43062 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-04 23:56:44 +08:00
cai.zhang	4133e3b8fd	fix: Enable merge sort and fix sort bug (#43080 ) issue: #42980, #43034 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-07-04 10:18:44 +08:00
congqixia	8962b0058d	fix: [StorageV2] Check writer nil when closing not written one (#43056 ) Related to #43047 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-02 14:22:43 +08:00
congqixia	9b06ecb72f	enhance: [StorageV2] Release record and close reader (#42983 ) Related to #39173 This PR - Close packed reader after sort - Release arrow.Record preventing memory leakage - Invoke `pack_reader->Close()` for CloseReader --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-06-27 14:46:43 +08:00
sthuang	238bd30f42	fix: [StorageV2] end to end minor issues for sync, stats, and load (#42948 ) Fix issues in end-to-end tests: 1. Split column groups based on schema, rather than estimating by average chunk row size. Ensure column group consistency within a segment, to avoid errors caused by loading multiple column group chunks simultaneously. 2. Use sorted segmentId when generating the stats binlog path, to ensure consistent and correct file path resolution. 3. Determine field IDs as follows: For multi-column column groups, retrieve the field ID list from metadata. For single-column column groups, use the column group ID directly as the field ID. related: #39173 fix: #42862 --------- Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-06-27 14:44:42 +08:00
cai.zhang	ebe1c95bb1	enhance: Add Size interface to FileReader to eliminate the StatObject call during Read (#42908 ) issue: #42907 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-06-25 14:36:41 +08:00
congqixia	ee056f0bff	fix: [AddField] Fill default value in serde logic when field missing (#42891 ) Related to #42856 Default value will be missing after segment get sorted/compacted. This PR is a temp workaround since in long term default value shall be filled with storage engine instead. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-06-23 14:20:41 +08:00
sthuang	4a0a2441f2	enhance: [StorageV2] field id as meta path for wide column (#42787 ) related: #39173 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-06-19 15:00:38 +08:00
congqixia	f01ff57f3f	fix: [StorageV2] Use correct offset filling null bitmap (#42774 ) Related to #39173 `null_bitmap_data()` returns raw pointer of null bitmap of Array. While after slicing, this bitmap is not rewritten due to zero copy implementation, so the current start pos maybe non-zero while FillFieldData generating column `valid_data` array. This PR add `offset` param for `FillFieldData` method, and force all invocation pass correct offset of `null_bitmap_data` ptr. Also update milvus-storage commit fixing reader failed to return data when buffer size smaller than row group size problem. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-06-17 10:08:38 +08:00
sthuang	ed5dbf3eaa	enhance: [StorageV2] sync separate vector datatype into its own column group (#42638 ) related: #39173 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-06-16 11:48:37 +08:00

1 2 3 4 5 ...

598 Commits