milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2025-12-07 01:28:27 +08:00

Author	SHA1	Message	Date
Tianx	2c0c5ef41e	feat: timestamptz expression & index & timezone (#44080 ) issue: https://github.com/milvus-io/milvus/issues/27467 >My plan is as follows. >- [x] M1 Create collection with timestamptz field >- [x] M2 Insert timestamptz field data >- [x] M3 Retrieve timestamptz field data >- [x] M4 Implement handoff >- [x] M5 Implement compare operator >- [x] M6 Implement extract operator >- [x] M8 Support database/collection level default timezone >- [x] M7 Support STL-SORT index for datatype timestamptz --- The third PR of issue: https://github.com/milvus-io/milvus/issues/27467, which completes M5, M6, M7, M8 described above. ## M8 Default Timezone We will be able to use alter_collection() and alter_database() in a future Python SDK release to modify the default timezone at the collection or database level. For insert requests, the timezone will be resolved using the following order of precedence: String Literal-> Collection Default -> Database Default. For retrieval requests, the timezone will be resolved in this order: Query Parameters -> Collection Default -> Database Default. In both cases, the final fallback timezone is UTC. ## M5: Comparison Operators We can now use the following expression format to filter on the timestamptz field: - `timestamptz_field [+/- INTERVAL 'interval_string'] {comparison_op} ISO 'iso_string' ` - The interval_string follows the ISO 8601 duration format, for example: P1Y2M3DT1H2M3S. - The iso_string follows the ISO 8601 timestamp format, for example: 2025-01-03T00:00:00+08:00. - Example expressions: "tsz + INTERVAL 'P0D' != ISO '2025-01-03T00:00:00+08:00'" or "tsz != ISO '2025-01-03T00:00:00+08:00'". ## M6: Extract We will be able to extract sepecific time filed by kwargs in a future Python SDK release. The key is `time_fields`, and value should be one or more of "year, month, day, hour, minute, second, microsecond", seperated by comma or space. Then the result of each record would be an array of int64. ## M7: Indexing Support Expressions without interval arithmetic can be accelerated using an STL-SORT index. However, expressions that include interval arithmetic cannot be indexed. This is because the result of an interval calculation depends on the specific timestamp value. For example, adding one month to a date in February results in a different number of added days than adding one month to a date in March. --- After this PR, the input / output type of timestamptz would be iso string. Timestampz would be stored as timestamptz data, which is int64_t finally. > for more information, see https://en.wikipedia.org/wiki/ISO_8601 --------- Signed-off-by: xtx <xtianx@smail.nju.edu.cn>	2025-09-23 10:24:12 +08:00
Gao	d3784c6515	enhance: add storage resource usage for vector search (#44308 ) issue: #44212 Implement search/query storage usage statistics in go side(result reduce), only record storage usage in vector search C++ path. Need to be implemented in query c++ path in next prs. --------- Signed-off-by: chasingegg <chao.gao@zilliz.com> Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com> Co-authored-by: marcelo.chen <marcelo.chen@zilliz.com>	2025-09-19 20:20:02 +08:00
Spade A	d6a428e880	feat: impl StructArray -- support create index for vector array (embedding list) and search on it (#43726 ) Ref https://github.com/milvus-io/milvus/issues/42148 This PR supports create index for vector array (now, only for `DataType.FLOAT_VECTOR`) and search on it. The index type supported in this PR is `EMB_LIST_HNSW` and the metric type is `MAX_SIM` only. The way to use it: ```python milvus_client = MilvusClient("xxx:19530") schema = milvus_client.create_schema(enable_dynamic_field=True, auto_id=True) ... struct_schema = milvus_client.create_struct_array_field_schema("struct_array_field") ... struct_schema.add_field("struct_float_vec", DataType.ARRAY_OF_VECTOR, element_type=DataType.FLOAT_VECTOR, dim=128, max_capacity=1000) ... schema.add_struct_array_field(struct_schema) index_params = milvus_client.prepare_index_params() index_params.add_index(field_name="struct_float_vec", index_type="EMB_LIST_HNSW", metric_type="MAX_SIM", index_params={"nlist": 128}) ... milvus_client.create_index(COLLECTION_NAME, schema=schema, index_params=index_params) ``` Note: This PR uses `Lims` to convey offsets of the vector array to knowhere where vectors of multiple vector arrays are concatenated and we need offsets to specify which vectors belong to which vector array. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-08-20 10:27:46 +08:00
Spade A	faeb7fd410	feat: impl StructArray -- create schema, insert, and retrieve data (#42855 ) Ref https://github.com/milvus-io/milvus/issues/42148 https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of storage for handling with VectorArray. This PR: 1. impls the go part of storage for VectorArray 2. impls the collection creation with StructArrayField and VectorArray 3. insert and retrieve data from the collection. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>	2025-07-27 01:30:55 +08:00
congqixia	942055fa7d	fix: Use task timestamp to calculate TTL timestamp (#42920 ) Related to #42918 Previously the `CollectionTtlTimestamp` could be overflowed when the guarantee_ts==1, which means using `Eventually` consistency level. This patch use task timestamp, allocated by scheduler, to generate ttl timestamp ignore the potential very small timestamp being used. Also add overflow check for ttl timestamp calculated. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-06-25 20:48:42 +08:00
Xianhui Lin	6a0e182e13	enhance: support TTL expiration with queries returning no results (#42086 ) support TTL expiration with queries returning no results issue:https://github.com/milvus-io/milvus/issues/41959 Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-05-27 18:28:27 +08:00
congqixia	6f4e0d8e38	enhance: [AddField] Use schema update ts as guarantee ts (#41430 ) Related to #39718 Use schema update ts when it's greater than calculated guarantee timestamp to make sure that all read request using updated schema shall wait all schema change event processed. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-04-22 17:12:45 +08:00
Xianhui Lin	f9febe3bae	enhance: Merge RootCoord, DataCoord And QueryCoord into MixCoord (#41006 ) Merge RootCoord, DataCoord And QueryCoord into MixCoord Make Session into one issue : https://github.com/milvus-io/milvus/issues/37764 --------- Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-04-11 16:36:30 +08:00
Xianhui Lin	3bc24c264f	enhance: Add json key inverted index in stats for optimization (#38039 ) Add json key inverted index in stats for optimization https://github.com/milvus-io/milvus/issues/36995 --------- Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-04-10 15:20:28 +08:00
zhenshan.cao	ecc2d80915	enhance: Add primary field name in SearchResult and QueryResults (#39220 ) issue: https://github.com/milvus-io/milvus/issues/39219 Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2025-04-09 10:48:25 +08:00
congqixia	96eca2531f	fix: Avoid update original search/query request (#41126 ) Related to #41034 Recent pr #40842 introduced logic to avoid requery pk column, which updates the original request which makes the request not equavilant to the original one. When retry happens due to incomplete request error, this change makes the final result set lacks the pk column even when user specifies it explicitly. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-04-08 14:26:28 +08:00
Buqian Zheng	7a056aff9d	enhance: avoid re-query if hybrid search requested only pk as output field (#40842 ) proxy to always remove pk field from output field when forwarding request to QN, and if user requested pk, fill it from IDs. issue: https://github.com/milvus-io/milvus/issues/40833 --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-03-28 14:32:18 +08:00
cai.zhang	13aff35a83	enhance: Add metrics for parse expression (#39654 ) Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-02-28 10:07:58 +08:00
congqixia	cb7f2fa6fd	enhance: Use v2 package name for pkg module (#39990 ) Related to #39095 https://go.dev/doc/modules/version-numbers Update pkg version according to golang dep version convention --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-02-22 23:15:58 +08:00
congqixia	a774f05ea7	fix: Add sub task pool for multi-stage tasks (#40079 ) Related to #40078 Add a subTaskPool to execute sub task in case of logic deadlock described in issue. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-02-22 16:37:54 +08:00
Zhen Ye	21724ab52c	enhance: generate guaranteets at delegator if local wal (#39799 ) issue: #38399, #39892 - use mvcc timestamp of wal as guaranteets if wal and delegator is located at same node. - fix: ignore growing option is lost at hibridsearch --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-02-17 15:22:15 +08:00
Chun Han	ed31a5a4bf	enhance: fix inconsistenty of alias and db for query iterator(#39045 ) (#39216 ) related: #39045 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-01-15 09:48:59 +08:00
Zhen Ye	bb8d1ab3bf	enhance: make new go package to manage proto (#39114 ) issue: #39095 --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-01-10 10:49:01 +08:00
congqixia	7128e36eb0	enhance: Use mvcc timestamp as guarantee ts if set (#38980 ) When MvccTimestamp is set, it could be used as guarantee timestamp directly instead of new ts allocated by scheduler reducing the waiting time when delegator has tsafe lag Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-01-05 19:02:54 +08:00
tinswzy	27229f7907	enhance: refine exists log print with ctx (#38080 ) issue: #35917 Refines exists log print with ctx Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>	2024-12-14 22:36:44 +08:00
aoiasd	d67853fa89	feat: Tokenizer support build with params and clone for concurrency (#37048 ) relate: https://github.com/milvus-io/milvus/issues/35853 https://github.com/milvus-io/milvus/issues/36751 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2024-11-06 17:48:24 +08:00
zhenshan.cao	63843dce33	fix: Fix conan gdal building problem (#37338 ) issue:https://github.com/milvus-io/milvus/issues/27576 Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2024-10-31 21:04:16 +08:00
Hao Tan	67c4340565	feat: Geospatial Data Type and GIS Function Support for milvus server (#35990 ) issue:https://github.com/milvus-io/milvus/issues/27576 # Main Goals 1. Create and describe collections with geospatial fields, enabling both client and server to recognize and process geo fields. 2. Insert geospatial data as payload values in the insert binlog, and print the values for verification. 3. Load segments containing geospatial data into memory. 4. Ensure query outputs can display geospatial data. 5. Support filtering on GIS functions for geospatial columns. # Solution 1. Add Type: Modify the Milvus core by adding a Geospatial type in both the C++ and Go code layers, defining the Geospatial data structure and the corresponding interfaces. 2. Dependency Libraries: Introduce necessary geospatial data processing libraries. In the C++ source code, use Conan package management to include the GDAL library. In the Go source code, add the go-geom library to the go.mod file. 3. Protocol Interface: Revise the Milvus protocol to provide mechanisms for Geospatial message serialization and deserialization. 4. Data Pipeline: Facilitate interaction between the client and proxy using the WKT format for geospatial data. The proxy will convert all data into WKB format for downstream processing, providing column data interfaces, segment encapsulation, segment loading, payload writing, and cache block management. 5. Query Operators: Implement simple display and support for filter queries. Initially, focus on filtering based on spatial relationships for a single column of geospatial literal values, providing parsing and execution for query expressions. 6. Client Modification: Enable the client to handle user input for geospatial data and facilitate end-to-end testing.Check the modification in pymilvus. --------- Signed-off-by: tasty-gumi <1021989072@qq.com>	2024-10-31 20:58:20 +08:00
cai.zhang	2ef6cbbf59	feat: The expression supports filling elements through templates (#37033 ) issue: #36672 The expression supports filling elements through templates, which helps to reduce the overhead of parsing the elements. --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-10-31 14:20:22 +08:00
Patrick Weizhi Xu	43ad9af529	fix: use max MvccTs for iterator (#37247 ) issue: #37158 Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>	2024-10-30 13:58:20 +08:00
Patrick Weizhi Xu	fc69df44a1	fix: set guarantee ts for seach/query iterator (#37180 ) issue: #37158 Return the GuaranteeTS so that the subsequent requests following the correct TS. BeginTS is the current timestamp when the task is created. The GuaranteeTS is the one parsed based on both consistency level and beginTS, in PreExecute of the task on Proxy. The delegator will wait until GuaranteeTS is met. In PostExecute of the task on Proxy, the TS of the first iterator request will be returned to the SDK and add it to the subsequent requests. Hence, if the default consistency level is Eventually or Bounded, the order of TS will be > Guarantee TS < BeginTS If it returns the BeginTS, the second request will need to catch up and result in extra 200ms max of latency, which results in something like \| Call \| Latency \| \| --- \| --- \| \| first call on `Next()` \| 30ms \| \| second call on `Next()` \| 210ms \| \| third call on `Next()` \| 10ms \| \| fourth call on `Next()` \| 11 ms \| \| ... \| ... \| where we expect \| Call \| Latency \| \| --- \| --- \| \| first call on `Next()` \| 30ms \| \| second call on `Next()` \| 10ms \| \| third call on `Next()` \| 10ms \| \| fourth call on `Next()` \| 11 ms \| \| ... \| ... \| Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>	2024-10-28 15:57:35 +08:00
Chun Han	903450f5c6	enhance: add ts support for iterator(#22718 ) (#36572 ) related: #22718 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2024-10-16 18:51:23 +08:00
Chun Han	df7ae08851	fix: iterator cursor progress too fast(#36179 ) (#36180 ) related: #36179 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2024-09-24 11:45:13 +08:00
Ted Xu	363004fd44	enhance: simplify reduction on single search result (#36334 ) See: #36122 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-09-20 11:59:10 +08:00
zhagnlu	3107701fe8	enhance: optimize retrieve on dynamic field (#35580 ) #35514 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com> Co-authored-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2024-08-22 14:24:56 +08:00
wei liu	c45f38aa61	enhance: Update protobuf-go to protobuf-go v2 (#34394 ) issue: #34252 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-07-29 11:31:51 +08:00
congqixia	e2f40fc2a8	fix: Check legacy guarantee ts when skipping alloc ts (#34981 ) See also #34980 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-07-25 10:17:45 +08:00
Jiquan Long	a2ac84bd64	feat: record the duration waiting in the proxy queue (#34744 ) fix: https://github.com/milvus-io/milvus/issues/34743 --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2024-07-23 14:23:52 +08:00
wei liu	b49862d4f3	enhance: Optimize grow slice cost during query (#34253 ) issue: #32252 This PR try to pre-allocate FieldData for Reduce operations in the Query chain using typeutil.PrepareResultFieldData to avoid the overhead of dynamically growing the slice during appendFieldData process. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-07-01 15:18:11 +08:00
aoiasd	186757e622	enhance: support mark error as user error (#33498 ) relate: https://github.com/milvus-io/milvus/issues/33492 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2024-07-01 14:56:12 +08:00
Chun Han	416a2cf507	fix: query iterator lack results(#33137 ) (#33422 ) related: #33137 adding has_more_result_tag for various level's reduce to rectify reduce_stop_for_best Signed-off-by: MrPresent-Han <chun.han@zilliz.com>	2024-05-30 17:51:44 +08:00
Jiquan Long	9f81290c63	fix: try best to get enough query results (#33178 ) issue: https://github.com/milvus-io/milvus/issues/33137 Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2024-05-21 11:57:51 +08:00
SimFG	8594b55ad5	enhance: add `max insert request size` and `must use partition key` configs (#32433 ) issue: https://github.com/milvus-io/milvus/issues/30577 /kind improvement Signed-off-by: SimFG <bang.fu@zilliz.com>	2024-04-19 10:31:20 +08:00
Xiaofan	dbab9c5096	fix: reduce didn't handle offset without limit and reduceStopForBest correctly (#32089 ) fix https://github.com/milvus-io/milvus/issues/32059 this pr fix two issues: offset is not handled correctly without specify a limit reduceStopForBest doesn't guarantee to return limit result even if there are more result when there is small segment Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>	2024-04-10 16:01:18 +08:00
SimFG	90bed1caf9	enhance: add the related data size for the read apis (#31816 ) issue: #30436 origin pr: #30438 related pr: #31772 --------- Signed-off-by: SimFG <bang.fu@zilliz.com>	2024-04-10 15:07:17 +08:00
Cai Yudong	00438f408f	enhance: Unify data type check APIs for go (#31887 ) Issue: #22837 Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>	2024-04-07 14:27:22 +08:00
SimFG	b1a1cca10b	feat: add more operation detail info for better allocation (#30438 ) issue: #30436 --------- Signed-off-by: SimFG <bang.fu@zilliz.com>	2024-03-28 06:33:11 +08:00
Chun Han	c3264ca3e3	feat: support segment pruner (#31003 ) related: #30376	2024-03-22 13:57:06 +08:00
congqixia	9b3005f1be	enhance: Avoid create schema helper for each read task (#30981 ) See also #30806 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-04 19:39:00 +08:00
cai.zhang	40ca98f57f	enhance: Skip timestamp allocation when search/query consistency level is eventually (#29773 ) issue: #29772 1. Skip timestamp allocation when search/query consistency level is eventually. Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-02-21 09:52:59 +08:00
aoiasd	c863b82476	enhance: Return parse expression failed error with reason (#30548 ) Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2024-02-20 16:02:52 +08:00
congqixia	8e8ac213aa	enhance: Utilize partition key optimization in reQuery (#30253 ) See also #30250 This PR add requery flag in query task. When reQuery flag is true, query task shall skip partition name conversion and use pre-calculated partitionIDs passed from search task. TODO: hybrid search does not have partition id information. we shall apply same logic for hybrid search later. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-01-25 11:05:07 +08:00
Xu Tong	e429965f32	Add float16 approve for multi-type part (#28427 ) issue：https://github.com/milvus-io/milvus/issues/22837 Add bfloat16 vector, add the index part of float16 vector. Signed-off-by: Writer-X <1256866856@qq.com>	2024-01-11 15:48:51 +08:00
zhenshan.cao	60e88fb833	fix: Restore the MVCC functionality. (#29749 ) When the TimeTravel functionality was previously removed, it inadvertently affected the MVCC functionality within the system. This PR aims to reintroduce the internal MVCC functionality as follows: 1. Add MvccTimestamp to the requests of Search/Query and the results of Search internally. 2. When the delegator receives a Query/Search request and there is no MVCC timestamp set in the request, set the delegator's current tsafe as the MVCC timestamp of the request. If the request already has an MVCC timestamp, do not modify it. 3. When the Proxy handles Search and triggers the second phase ReQuery, divide the ReQuery into different shards and pass the MVCC timestamp to the corresponding Query requests. issue: #29656 Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2024-01-09 11:38:48 +08:00
congqixia	4f8c540c77	enhance: cache collection schema attributes to reduce proxy cpu (#29668 ) See also #29113 The collection schema is crucial when performing search/query but some of the information is calculated for every request. This PR change schema field of cached collection info into a utility `schemaInfo` type to store some stable result, say pk field, partitionKeyEnabled, etc. And provided field name to id map for search/query services. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-01-04 17:28:46 +08:00

1 2 3

126 Commits