milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-07 19:31:51 +08:00

Author	SHA1	Message	Date
sparknack	c8a4d6e2ef	enhance: add cachinglayer management for TextMatchIndex (#44741 ) issue: #41435, #44502 Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-10-13 14:37:58 +08:00
sparknack	6d5b41644b	enhance: remove logical usage checks during segment loading (#44743 ) issue: #41435 Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-10-13 14:21:58 +08:00
congqixia	5ece760d73	fix: Pass fs via `FileManagerContext` when loading index (#44733 ) Related to #44615 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-10-11 09:55:57 +08:00
wei liu	33d1e7de83	fix: Replace incorrect log import with milvus v2 log package (#44731 ) issue: #44730 Fix the issue where logs were not outputting as expected due to incorrect log package imports across multiple components. Changes include: - Add golangci-lint rule to forbid github.com/pingcap/log usage - Replace github.com/pingcap/log with github.com/milvus-io/milvus/pkg/v2/log Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-10-10 20:27:57 +08:00
Zhen Ye	a110d8cc49	fix: don't use logical resource for metrics of quota center on streaming node (#44613 ) issue: #44599 Signed-off-by: chyezh <chyezh@outlook.com>	2025-09-29 21:34:13 +08:00
aoiasd	78ee76f018	enhance: support preload sealed segment bm25 stats and optimize bm25 stats serialize (#44279 ) relate: https://github.com/milvus-io/milvus/issues/41424 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-09-29 16:35:05 +08:00
Zhen Ye	b6b59bd222	fix: remove redundant initialization of storage v2 (#44597 ) issue: #44596 - querynode already init the storage v2 and segcore, so streamingnode should not do this again. - It also fix the gcp object storage access denied. Signed-off-by: chyezh <chyezh@outlook.com>	2025-09-29 10:17:04 +08:00
zhagnlu	eac16a577c	enhance:support cachelayer for json stats (#44446 ) #42533 Signed-off-by: zhagnlu <lu.zhang@zilliz.com>	2025-09-24 15:30:04 +08:00
Tianx	2c0c5ef41e	feat: timestamptz expression & index & timezone (#44080 ) issue: https://github.com/milvus-io/milvus/issues/27467 >My plan is as follows. >- [x] M1 Create collection with timestamptz field >- [x] M2 Insert timestamptz field data >- [x] M3 Retrieve timestamptz field data >- [x] M4 Implement handoff >- [x] M5 Implement compare operator >- [x] M6 Implement extract operator >- [x] M8 Support database/collection level default timezone >- [x] M7 Support STL-SORT index for datatype timestamptz --- The third PR of issue: https://github.com/milvus-io/milvus/issues/27467, which completes M5, M6, M7, M8 described above. ## M8 Default Timezone We will be able to use alter_collection() and alter_database() in a future Python SDK release to modify the default timezone at the collection or database level. For insert requests, the timezone will be resolved using the following order of precedence: String Literal-> Collection Default -> Database Default. For retrieval requests, the timezone will be resolved in this order: Query Parameters -> Collection Default -> Database Default. In both cases, the final fallback timezone is UTC. ## M5: Comparison Operators We can now use the following expression format to filter on the timestamptz field: - `timestamptz_field [+/- INTERVAL 'interval_string'] {comparison_op} ISO 'iso_string' ` - The interval_string follows the ISO 8601 duration format, for example: P1Y2M3DT1H2M3S. - The iso_string follows the ISO 8601 timestamp format, for example: 2025-01-03T00:00:00+08:00. - Example expressions: "tsz + INTERVAL 'P0D' != ISO '2025-01-03T00:00:00+08:00'" or "tsz != ISO '2025-01-03T00:00:00+08:00'". ## M6: Extract We will be able to extract sepecific time filed by kwargs in a future Python SDK release. The key is `time_fields`, and value should be one or more of "year, month, day, hour, minute, second, microsecond", seperated by comma or space. Then the result of each record would be an array of int64. ## M7: Indexing Support Expressions without interval arithmetic can be accelerated using an STL-SORT index. However, expressions that include interval arithmetic cannot be indexed. This is because the result of an interval calculation depends on the specific timestamp value. For example, adding one month to a date in February results in a different number of added days than adding one month to a date in March. --- After this PR, the input / output type of timestamptz would be iso string. Timestampz would be stored as timestamptz data, which is int64_t finally. > for more information, see https://en.wikipedia.org/wiki/ISO_8601 --------- Signed-off-by: xtx <xtianx@smail.nju.edu.cn>	2025-09-23 10:24:12 +08:00
jiaqizho	338ed2fed4	enhance: Introduce sparse filter in query (#44347 ) issue: #44373 The current commit implements sparse filtering in query tasks using the statistical information (Bloom filter/MinMax) of the Primary Key (PK). The statistical information of the PK is bound to the segment during the segment loading phase. A new filter has been added to the segment filter to enable the sparse filtering functionality. Signed-off-by: jiaqizho <jiaqi.zhou@zilliz.com>	2025-09-23 09:58:09 +08:00
Gao	d3784c6515	enhance: add storage resource usage for vector search (#44308 ) issue: #44212 Implement search/query storage usage statistics in go side(result reduce), only record storage usage in vector search C++ path. Need to be implemented in query c++ path in next prs. --------- Signed-off-by: chasingegg <chao.gao@zilliz.com> Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com> Co-authored-by: marcelo.chen <marcelo.chen@zilliz.com>	2025-09-19 20:20:02 +08:00
zhenshan.cao	691a8df953	feat: Add RESTful api for rolling upgrade support (#44381 ) issue: https://github.com/milvus-io/milvus/issues/43968 Co-authored-by: chyezh <ye.zhen@zilliz.com>	2025-09-16 20:08:00 +08:00
yihao.dai	51f69f32d0	feat: Add CDC support (#44124 ) This PR implements a new CDC service for Milvus 2.6, providing log-based cross-cluster replication. issue: https://github.com/milvus-io/milvus/issues/44123 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com> Signed-off-by: chyezh <chyezh@outlook.com> Co-authored-by: chyezh <chyezh@outlook.com>	2025-09-16 16:32:01 +08:00
congqixia	aa861f55e6	enhance: [StorageV2] Reverts #44232 bucket name change (#44390 ) Related to #39173 - Put bucket name concatenation logic back for azure support This reverts commit 8f97eb355fde6b86cf37f166d2191750b4210ba3. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-09-16 10:10:00 +08:00
sthuang	9140201b8f	fix: add init fs check for querynode and streaming node (#44360 ) related: #44354 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-09-13 10:31:58 +08:00
congqixia	abe22b95c7	enhance: Utilize group info estimating logic usage as well (#44356 ) Related to #44257 #44334 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-09-12 22:47:57 +08:00
congqixia	9d2ff48d63	enhance: Utilize group split info to estimate usage (#44338 ) Related to #44257 #44334 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-09-12 14:49:57 +08:00
aoiasd	9add663a08	fix: idf oracle use wrong dir (#44266 ) relate: https://github.com/milvus-io/milvus/issues/44264 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-09-10 14:41:56 +08:00
congqixia	8f97eb355f	enhance: [StorageV2] Make bucket name concatenation transparent to user (#44232 ) Related to #39173 This PR: - Bump milvus-storage commit to handle bucket name concatenation logic in multipart s3 fs - Remove all user-side bucket name concatenation code Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-09-08 10:15:55 +08:00
zhagnlu	d67f1ea0ab	enhance: add param to modify dump snapshot batch size (#44215 ) issue: #44216 Signed-off-by: luzhang <luzhang@zilliz.com>	2025-09-05 14:29:54 +08:00
Gao	2e98cb0103	enhance: load resource estimation for tiered index (#44171 ) issue: https://github.com/milvus-io/milvus/issues/42032 - Use bytes to estimate load resource in the whole estimation procedure - Add num_rows and dim info for vector index to better estimate - Disable eviction for tiered index's meta --------- Signed-off-by: chasingegg <chao.gao@zilliz.com>	2025-09-04 19:41:53 +08:00
Bingyi Sun	0c0630cc38	feat: support dropping index without releasing collection (#42941 ) issue: #42942 This pr includes the following changes: 1. Added checks for index checker in querycoord to generate drop index tasks 2. Added drop index interface to querynode 3. To avoid search failure after dropping the index, the querynode allows the use of lazy mode (warmup=disable) to load raw data even when indexes contain raw data. 4. In segcore, loading the index no longer deletes raw data; instead, it evicts it. 5. In expr, the index is pinned to prevent concurrent errors. --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-09-02 16:17:52 +08:00
congqixia	7721edf32a	enhance: Add mutex and range check preventing concurrent del (#44128 ) This PR adds a mutex prevent concurrent applying delete on same segment and check latestDeltaTimestamp to skip overlapping delete range Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-09-01 14:29:52 +08:00
zhagnlu	fc876639cf	enhance: support json stats with shredding design (#42534 ) #42533 Co-authored-by: luzhang <luzhang@zilliz.com>	2025-09-01 10:49:52 +08:00
sparknack	70c8114e85	enhance: cachinglayer: resource management for segment loading (#43846 ) issue: #41435 --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-08-29 11:37:50 +08:00
Buqian Zheng	6420d72391	enhance: print as storage size unit MB with 2 digits only, so the log is easier to read (#44085 ) issue: https://github.com/milvus-io/milvus/issues/41435 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-08-27 19:47:50 +08:00
Chun Han	da156981c6	feat: milvus support posix-compatible mode(milvus-io#43942) (#43944 ) related: #43942 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-08-27 16:29:50 +08:00
XuanYang-cn	37a447d166	feat: Add CMEK cipher plugin (#43722 ) 1. Enable Milvus to read cipher configs 2. Enable cipher plugin in binlog reader and writer 3. Add a testCipher for unittests 4. Support pooling for datanode 5. Add encryption in storagev2 See also: #40321 Signed-off-by: yangxuan <xuan.yang@zilliz.com> --------- Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-08-27 11:15:52 +08:00
Spade A	8456f824be	feat: impl StructArray -- miscellaneous staffs for struct array (#43960 ) Ref https://github.com/milvus-io/milvus/issues/42148 1. enable storage v2 2. implement some missing staffs 3. fix some bugs and add tests --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-08-26 21:35:53 +08:00
Tianx	c0d62268ac	feat: add timesatmptz data type (#44005 ) issue: https://github.com/milvus-io/milvus/issues/27467 > https://github.com/milvus-io/milvus/issues/27467#issuecomment-3092211420 > * [x] M1 Create collection with timestamptz field > * [x] M2 Insert timestamptz field data > * [x] M3 Retrieve timestamptz field data > * [x] M4 Implement handoff[ ] The second PR of issue: https://github.com/milvus-io/milvus/issues/27467, which completes M1-M4 described above. --------- Signed-off-by: xtx <xtianx@smail.nju.edu.cn>	2025-08-26 15:59:53 +08:00
Gao	e97a618630	enhance: support readAt interface for remote input stream (#43997 ) #42032 Also, fix the cacheoptfield method to work in storagev2. Also, change the sparse related interface for knowhere version bump #43974 . Also, includes https://github.com/milvus-io/milvus/pull/44046 for metric lost. --------- Signed-off-by: chasingegg <chao.gao@zilliz.com> Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com> Signed-off-by: Congqi Xia <congqi.xia@zilliz.com> Co-authored-by: marcelo.chen <marcelo.chen@zilliz.com> Co-authored-by: Congqi Xia <congqi.xia@zilliz.com>	2025-08-26 11:19:58 +08:00
zhagnlu	8934c18792	enhance: support cache result cache for expr (#43923 ) issue: #43878 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-08-26 10:55:52 +08:00
sparknack	4fae074d56	enhance: add write rate limit for disk file writer (#43912 ) issue: #43040 --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-08-25 10:27:47 +08:00
wei liu	399f63300c	enhance: Implement dynamic interval updates for ticker components (#43865 ) issue: #43858 Enable dynamic configuration updates for ticker intervals without restart. This enhancement allows runtime configuration changes to take effect immediately for better operational flexibility. Changes include: - Apply "drain+Reset only when interval changed" pattern across all ticker components to preserve existing timing phases - Fix goroutine variable capture issue in CheckerController.Start() - Remove unnecessary ticker.Stop() in manual trigger paths - Add dynamic interval checking in QueryCoordV2 components: * checkers/controller.go: Various checker intervals * dist/dist_handler.go: DistPullInterval, CheckExecutedFlagInterval * session/cluster.go: CheckNodeSessionInterval * server.go: CheckAutoBalanceConfigInterval * observers/target_observer.go: UpdateNextTargetInterval * observers/collection_observer.go: CollectionObserverInterval - Add dynamic interval checking in QueryNodeV2 components: * segments/disk_usage_fetcher.go: DiskSizeFetchInterval - Ensure thread safety by performing all ticker operations in same goroutine with proper drain before Reset to avoid spurious triggers --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-08-21 10:07:47 +08:00
Spade A	d6a428e880	feat: impl StructArray -- support create index for vector array (embedding list) and search on it (#43726 ) Ref https://github.com/milvus-io/milvus/issues/42148 This PR supports create index for vector array (now, only for `DataType.FLOAT_VECTOR`) and search on it. The index type supported in this PR is `EMB_LIST_HNSW` and the metric type is `MAX_SIM` only. The way to use it: ```python milvus_client = MilvusClient("xxx:19530") schema = milvus_client.create_schema(enable_dynamic_field=True, auto_id=True) ... struct_schema = milvus_client.create_struct_array_field_schema("struct_array_field") ... struct_schema.add_field("struct_float_vec", DataType.ARRAY_OF_VECTOR, element_type=DataType.FLOAT_VECTOR, dim=128, max_capacity=1000) ... schema.add_struct_array_field(struct_schema) index_params = milvus_client.prepare_index_params() index_params.add_index(field_name="struct_float_vec", index_type="EMB_LIST_HNSW", metric_type="MAX_SIM", index_params={"nlist": 128}) ... milvus_client.create_index(COLLECTION_NAME, schema=schema, index_params=index_params) ``` Note: This PR uses `Lims` to convey offsets of the vector array to knowhere where vectors of multiple vector arrays are concatenated and we need offsets to specify which vectors belong to which vector array. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-08-20 10:27:46 +08:00
Xianhui Lin	c7d8dc100a	fix: add segment lock in LoadTextIndex and LoadJSONKeyIndex (#43811 ) fix: add segment lock in LoadTextIndex and LoadJSONKeyIndex issue:https://github.com/milvus-io/milvus/issues/43572 Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-08-18 01:17:52 +08:00
congqixia	de3e5c285b	enhance: Add downgrade tsafe switch param item (#43874 ) Related to #43873 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-08-15 12:31:43 +08:00
congqixia	f032044125	enhance: Refine segcore param change callback (#43838 ) Related to #43230 This PR - Move segcore setup function to `initcore` package to remove cgo dependency from pkg - Register core callback only for components depends on segcore - Rectify `UpdateLogLevel` implementation Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-08-13 19:31:44 +08:00
zhagnlu	c04d678ad4	enhance: make segcore params effective without restarting milvus (#43231 ) #43230 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-08-08 10:33:48 +08:00
wei liu	715b5153b8	enhance: Improve delegator serviceable check logic in PinReadableSegments (#43768 ) issue: #43767 - Enhance serviceable check logic to properly handle full vs partial result requirements - For full result (requiredLoadRatio >= 1.0): check queryView.Serviceable() - For partial result (requiredLoadRatio < 1.0): check load ratio satisfaction - Add comprehensive unit tests covering all serviceable check scenarios This enhancement ensures delegator correctly validates serviceability based on the requested result completeness, improving reliability of query operations. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-08-07 12:13:40 +08:00
Zhen Ye	5551d99425	enhance: remove old arch non-streaming arch code (#43651 ) issue: #41609 - remove all dml dead code at proxy - remove dead code at l0_write_buffer - remove msgstream dependency at proxy - remove timetick reporter from proxy - remove replicate stream implementation --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-08-06 14:41:40 +08:00
sparknack	544c7c0600	enhance: update cachinglayer default cache ratio to 0.3 (#43723 ) issue: #41435 --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-08-05 01:35:39 +08:00
zhagnlu	f14c7d598c	fix: skip load raw data when loading index for storagev2 (#43720 ) #43653 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-08-04 21:17:39 +08:00
Chun Han	d826d6ac91	fix: try to get span raw data for variable length data type(#43544 ) (#43705 ) related: #43544 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-08-04 11:15:38 +08:00
sparknack	bdd65871ea	enhance: tiered storage: estimate segment loading resource usage while considering eviction (#43323 ) issue: #41435 After introducing the caching layer's lazy loading and eviction mechanisms, most parts of a segment won't be loaded into memory or disk immediately, even if the segment is marked as LOADED. This means physical resource usage may be very low. However, we still need to reserve enough resources for the segments marked as LOADED. Thus, the logic of resource usage estimation during segment loading, which based on physcial resource usage only for now, should be changed. To address this issue, we introduced the concept of logical resource usage in this patch. This can be thought of as the base reserved resource for each LOADED segment. A segment’s logical resource usage is derived from its final evictable and inevictable resource usage and calculated as follows: ``` SLR = SFPIER + evitable_cache_ratio * SFPER ``` it also equals to ``` SLR = (SFPIER + SFPER) - (1.0 - evitable_cache_ratio) * SFPER ``` `SLR`: The logical resource usage of a segment. `SFPIER`: The final physical inevictable resource usage of a segment. `SFPER`: The final physical evictable resource usage of a segment. `evitable_cache_ratio`: The ratio of a segment's evictable resources that can be cached locally. The higher the ratio, the more physical memory is reserved for evictable memory. When loading a segment, two types of resource usage are taken into account. First is the estimated maximum physical resource usage: ``` PPR = HPR + CPR + SMPR - SFPER ``` `PPR`: The predicted physical resource usage after the current segment is allowed to load. `HPR`: The physical resource usage obtained from hardware information. `CPR`: The total physical resource usage of segments that have been committed but not yet loaded. When one new segment is allow to load, `CPR' = CPR + (SMR - SER)`. When one of the committed segments is loaded, `CPR' = CPR - (SMR - SER)`. `SMPR`: The maximum physical resource usage of the current segment. `SFPER`: The final physical evictable resource usage of the current segment. Second is the estimated logical resource usage, this check is only valid when eviction is enabled: ``` PLR = LLR + CLR + SLR ``` `PLR`: The predicted logical resource usage after the current segment is allowed to load. `LLR`: The total logical resource usage of all loaded segments. When a new segment is loaded, `LLR` should be updated to `LLR' = LLR + SLR`. `CLR`: The total logical resource usage of segments that have been committed but not yet loaded. When one new segment is allow to load, `CLR' = CLR + SLR`. When one of the committed segments is loaded, `CLR' = CLR - SLR`. `SLR`: The logical resource usage of the current segment. Only when `PPR < PRL && PLR < PRL` (`PRL`: Physical resource limit of the querynode), the segment is allowed to be loaded. --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-08-01 21:31:37 +08:00
Buqian Zheng	21cec95fe8	fix: fix disk path sent to cachinglayer (#43685 ) `localDataRootPath` is used to init local chunk manager and has `querynode` appended to it, thus is incorrect #41435 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-08-01 13:19:36 +08:00
zhagnlu	2594250906	fix: fix miss loading index for storagev2 (#43674 ) #43653 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-08-01 13:07:36 +08:00
Chun Han	d72c0357ff	fix: empty hybridsearch result due to one-sub-search empty(#43537 ) (#43647 ) related: #43537 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-07-31 19:47:37 +08:00
Buqian Zheng	052fb6c562	feat: add time based eviction to data managed by cachinglayer (#43490 ) issue: https://github.com/milvus-io/milvus/issues/41435 also added disk capacity protection --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-07-29 16:17:35 +08:00
wei liu	7b8bf6393b	enhance: Improve partial result evaluation with row count based strategy (#43361 ) issue: #43360 Enhance the partial result evaluation mechanism in delegator to use row count based data ratio instead of simple segment count ratio for better accuracy. Key improvements: - Introduce PartialResultEvaluator interface for flexible evaluation strategy - Implement NewRowCountBasedEvaluator using sealed segment row count data - Replace segment count based ratio with row count based data ratio calculation - Update PinReadableSegments to return sealedRowCount information - Modify executeSubTasks to use configurable evaluator for partial result decisions - Add comprehensive unit tests for the new row count based evaluation logic This change provides more accurate partial result evaluation by considering the actual data volume rather than just segment quantity, leading to better query performance and consistency when some segments are unavailable. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-07-28 10:18:55 +08:00

1 2 3 4 5 ...

805 Commits