milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2025-12-07 17:48:29 +08:00

Author	SHA1	Message	Date
congqixia	8f97eb355f	enhance: [StorageV2] Make bucket name concatenation transparent to user (#44232 ) Related to #39173 This PR: - Bump milvus-storage commit to handle bucket name concatenation logic in multipart s3 fs - Remove all user-side bucket name concatenation code Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-09-08 10:15:55 +08:00
Gao	2e98cb0103	enhance: load resource estimation for tiered index (#44171 ) issue: https://github.com/milvus-io/milvus/issues/42032 - Use bytes to estimate load resource in the whole estimation procedure - Add num_rows and dim info for vector index to better estimate - Disable eviction for tiered index's meta --------- Signed-off-by: chasingegg <chao.gao@zilliz.com>	2025-09-04 19:41:53 +08:00
Bingyi Sun	0c0630cc38	feat: support dropping index without releasing collection (#42941 ) issue: #42942 This pr includes the following changes: 1. Added checks for index checker in querycoord to generate drop index tasks 2. Added drop index interface to querynode 3. To avoid search failure after dropping the index, the querynode allows the use of lazy mode (warmup=disable) to load raw data even when indexes contain raw data. 4. In segcore, loading the index no longer deletes raw data; instead, it evicts it. 5. In expr, the index is pinned to prevent concurrent errors. --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-09-02 16:17:52 +08:00
congqixia	7721edf32a	enhance: Add mutex and range check preventing concurrent del (#44128 ) This PR adds a mutex prevent concurrent applying delete on same segment and check latestDeltaTimestamp to skip overlapping delete range Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-09-01 14:29:52 +08:00
zhagnlu	fc876639cf	enhance: support json stats with shredding design (#42534 ) #42533 Co-authored-by: luzhang <luzhang@zilliz.com>	2025-09-01 10:49:52 +08:00
sparknack	70c8114e85	enhance: cachinglayer: resource management for segment loading (#43846 ) issue: #41435 --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-08-29 11:37:50 +08:00
Buqian Zheng	6420d72391	enhance: print as storage size unit MB with 2 digits only, so the log is easier to read (#44085 ) issue: https://github.com/milvus-io/milvus/issues/41435 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-08-27 19:47:50 +08:00
Chun Han	da156981c6	feat: milvus support posix-compatible mode(milvus-io#43942) (#43944 ) related: #43942 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-08-27 16:29:50 +08:00
XuanYang-cn	37a447d166	feat: Add CMEK cipher plugin (#43722 ) 1. Enable Milvus to read cipher configs 2. Enable cipher plugin in binlog reader and writer 3. Add a testCipher for unittests 4. Support pooling for datanode 5. Add encryption in storagev2 See also: #40321 Signed-off-by: yangxuan <xuan.yang@zilliz.com> --------- Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-08-27 11:15:52 +08:00
Spade A	8456f824be	feat: impl StructArray -- miscellaneous staffs for struct array (#43960 ) Ref https://github.com/milvus-io/milvus/issues/42148 1. enable storage v2 2. implement some missing staffs 3. fix some bugs and add tests --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-08-26 21:35:53 +08:00
Tianx	c0d62268ac	feat: add timesatmptz data type (#44005 ) issue: https://github.com/milvus-io/milvus/issues/27467 > https://github.com/milvus-io/milvus/issues/27467#issuecomment-3092211420 > * [x] M1 Create collection with timestamptz field > * [x] M2 Insert timestamptz field data > * [x] M3 Retrieve timestamptz field data > * [x] M4 Implement handoff[ ] The second PR of issue: https://github.com/milvus-io/milvus/issues/27467, which completes M1-M4 described above. --------- Signed-off-by: xtx <xtianx@smail.nju.edu.cn>	2025-08-26 15:59:53 +08:00
zhagnlu	8934c18792	enhance: support cache result cache for expr (#43923 ) issue: #43878 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-08-26 10:55:52 +08:00
wei liu	399f63300c	enhance: Implement dynamic interval updates for ticker components (#43865 ) issue: #43858 Enable dynamic configuration updates for ticker intervals without restart. This enhancement allows runtime configuration changes to take effect immediately for better operational flexibility. Changes include: - Apply "drain+Reset only when interval changed" pattern across all ticker components to preserve existing timing phases - Fix goroutine variable capture issue in CheckerController.Start() - Remove unnecessary ticker.Stop() in manual trigger paths - Add dynamic interval checking in QueryCoordV2 components: * checkers/controller.go: Various checker intervals * dist/dist_handler.go: DistPullInterval, CheckExecutedFlagInterval * session/cluster.go: CheckNodeSessionInterval * server.go: CheckAutoBalanceConfigInterval * observers/target_observer.go: UpdateNextTargetInterval * observers/collection_observer.go: CollectionObserverInterval - Add dynamic interval checking in QueryNodeV2 components: * segments/disk_usage_fetcher.go: DiskSizeFetchInterval - Ensure thread safety by performing all ticker operations in same goroutine with proper drain before Reset to avoid spurious triggers --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-08-21 10:07:47 +08:00
Spade A	d6a428e880	feat: impl StructArray -- support create index for vector array (embedding list) and search on it (#43726 ) Ref https://github.com/milvus-io/milvus/issues/42148 This PR supports create index for vector array (now, only for `DataType.FLOAT_VECTOR`) and search on it. The index type supported in this PR is `EMB_LIST_HNSW` and the metric type is `MAX_SIM` only. The way to use it: ```python milvus_client = MilvusClient("xxx:19530") schema = milvus_client.create_schema(enable_dynamic_field=True, auto_id=True) ... struct_schema = milvus_client.create_struct_array_field_schema("struct_array_field") ... struct_schema.add_field("struct_float_vec", DataType.ARRAY_OF_VECTOR, element_type=DataType.FLOAT_VECTOR, dim=128, max_capacity=1000) ... schema.add_struct_array_field(struct_schema) index_params = milvus_client.prepare_index_params() index_params.add_index(field_name="struct_float_vec", index_type="EMB_LIST_HNSW", metric_type="MAX_SIM", index_params={"nlist": 128}) ... milvus_client.create_index(COLLECTION_NAME, schema=schema, index_params=index_params) ``` Note: This PR uses `Lims` to convey offsets of the vector array to knowhere where vectors of multiple vector arrays are concatenated and we need offsets to specify which vectors belong to which vector array. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-08-20 10:27:46 +08:00
Xianhui Lin	c7d8dc100a	fix: add segment lock in LoadTextIndex and LoadJSONKeyIndex (#43811 ) fix: add segment lock in LoadTextIndex and LoadJSONKeyIndex issue:https://github.com/milvus-io/milvus/issues/43572 Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-08-18 01:17:52 +08:00
sparknack	544c7c0600	enhance: update cachinglayer default cache ratio to 0.3 (#43723 ) issue: #41435 --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-08-05 01:35:39 +08:00
zhagnlu	f14c7d598c	fix: skip load raw data when loading index for storagev2 (#43720 ) #43653 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-08-04 21:17:39 +08:00
Chun Han	d826d6ac91	fix: try to get span raw data for variable length data type(#43544 ) (#43705 ) related: #43544 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-08-04 11:15:38 +08:00
sparknack	bdd65871ea	enhance: tiered storage: estimate segment loading resource usage while considering eviction (#43323 ) issue: #41435 After introducing the caching layer's lazy loading and eviction mechanisms, most parts of a segment won't be loaded into memory or disk immediately, even if the segment is marked as LOADED. This means physical resource usage may be very low. However, we still need to reserve enough resources for the segments marked as LOADED. Thus, the logic of resource usage estimation during segment loading, which based on physcial resource usage only for now, should be changed. To address this issue, we introduced the concept of logical resource usage in this patch. This can be thought of as the base reserved resource for each LOADED segment. A segment’s logical resource usage is derived from its final evictable and inevictable resource usage and calculated as follows: ``` SLR = SFPIER + evitable_cache_ratio * SFPER ``` it also equals to ``` SLR = (SFPIER + SFPER) - (1.0 - evitable_cache_ratio) * SFPER ``` `SLR`: The logical resource usage of a segment. `SFPIER`: The final physical inevictable resource usage of a segment. `SFPER`: The final physical evictable resource usage of a segment. `evitable_cache_ratio`: The ratio of a segment's evictable resources that can be cached locally. The higher the ratio, the more physical memory is reserved for evictable memory. When loading a segment, two types of resource usage are taken into account. First is the estimated maximum physical resource usage: ``` PPR = HPR + CPR + SMPR - SFPER ``` `PPR`: The predicted physical resource usage after the current segment is allowed to load. `HPR`: The physical resource usage obtained from hardware information. `CPR`: The total physical resource usage of segments that have been committed but not yet loaded. When one new segment is allow to load, `CPR' = CPR + (SMR - SER)`. When one of the committed segments is loaded, `CPR' = CPR - (SMR - SER)`. `SMPR`: The maximum physical resource usage of the current segment. `SFPER`: The final physical evictable resource usage of the current segment. Second is the estimated logical resource usage, this check is only valid when eviction is enabled: ``` PLR = LLR + CLR + SLR ``` `PLR`: The predicted logical resource usage after the current segment is allowed to load. `LLR`: The total logical resource usage of all loaded segments. When a new segment is loaded, `LLR` should be updated to `LLR' = LLR + SLR`. `CLR`: The total logical resource usage of segments that have been committed but not yet loaded. When one new segment is allow to load, `CLR' = CLR + SLR`. When one of the committed segments is loaded, `CLR' = CLR - SLR`. `SLR`: The logical resource usage of the current segment. Only when `PPR < PRL && PLR < PRL` (`PRL`: Physical resource limit of the querynode), the segment is allowed to be loaded. --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-08-01 21:31:37 +08:00
zhagnlu	2594250906	fix: fix miss loading index for storagev2 (#43674 ) #43653 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-08-01 13:07:36 +08:00
Chun Han	d72c0357ff	fix: empty hybridsearch result due to one-sub-search empty(#43537 ) (#43647 ) related: #43537 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-07-31 19:47:37 +08:00
Spade A	faeb7fd410	feat: impl StructArray -- create schema, insert, and retrieve data (#42855 ) Ref https://github.com/milvus-io/milvus/issues/42148 https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of storage for handling with VectorArray. This PR: 1. impls the go part of storage for VectorArray 2. impls the collection creation with StructArrayField and VectorArray 3. insert and retrieve data from the collection. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>	2025-07-27 01:30:55 +08:00
Buqian Zheng	f7b262a702	feat: make storagev1 to support eviction (#43219 ) issue: https://github.com/milvus-io/milvus/issues/41435 turns out we have per file binlog size in golang code, by passing it into segcore we can support eviction in storage v1 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-07-19 02:02:52 +08:00
Zhen Ye	3aacd179f7	fix: balance channel before balance segment when upgrading (#43346 ) issue: #43117, #42966, #43373 - also fix channel balance may not work at 2.6. - fix error lost at delete path - add mvcc into s/q log - change the log level for TestCoordDownSearch Signed-off-by: chyezh <chyezh@outlook.com>	2025-07-17 20:16:52 +08:00
Chun Han	07745439b5	fix: empty search groupby result causing crash(#43137 ) (#43214 ) related: #43137 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-07-10 12:04:48 +08:00
congqixia	f027eea545	enhance: [AddField] Add log for segcore segment schema change (#43215 ) Related to #39178 This PR add logs for segment schema change operations. Also fixes the nit comments from PR #42490 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-10 10:22:47 +08:00
congqixia	7bc7b18ed5	fix: [AddField] Prevent concurrent load during UpdateSchema (#43043 ) Related to #43028 This PR: - Add mutex prevent concurrent load segment & schema change - Add schema verison field in load meta - Update schema in PutOrRef if schema verison is larger --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-02 17:38:44 +08:00
wei liu	c381bf3e41	enhance: add logs for count(*) (#43001 ) Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-07-01 19:36:43 +08:00
Spade A	26ec841feb	feat: optimize `Like` query with n-gram (#41803 ) Ref #42053 This is the first PR for optimizing `LIKE` with ngram inverted index. Now, only VARCHAR data type is supported and only InnerMatch LIKE (%xxx%) query is supported. How to use it: ``` milvus_client = MilvusClient("http://localhost:19530") schema = milvus_client.create_schema() ... schema.add_field("content_ngram", DataType.VARCHAR, max_length=10000) ... index_params = milvus_client.prepare_index_params() index_params.add_index(field_name="content_ngram", index_type="NGRAM", index_name="ngram_index", min_gram=2, max_gram=3) milvus_client.create_collection(COLLECTION_NAME, ...) ``` min_gram and max_gram controls how we tokenize the documents. For example, for min_gram=2 and max_gram=4, we will tokenize each document with 2-gram, 3-gram and 4-gram. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-07-01 10:08:44 +08:00
sthuang	0d57acb13a	enhance: [StorageV2] field id as meta path for wide column when load (#42863 ) related: #42862 #39173 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-06-25 11:08:48 +08:00
congqixia	4ba177cd2c	enhance: [StorageV2] Handle narrow column group resource estimation (#42842 ) Related to #39173 In storage v2, "narrow" column group could have group id not mapped schema, which causing loading fails or resource estimation result inaccurate. This PR handles this case by mapping binlog from index instead of vice versa. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-06-19 14:44:39 +08:00
Spade A	e2c85eec81	fix: load stats index based on mmap config (#42788 ) ref https://github.com/milvus-io/milvus/issues/42626 This PR makes text match index and json key stats index be loaded based on mmap config. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-06-19 10:10:39 +08:00
Chun Han	001619aef9	feat: supporing load priority for loading (#42413 ) related: #40781 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-06-17 15:22:38 +08:00
congqixia	c9bc70f272	fix: [AddField] Use shared_ptr of schema in plan fixing dangling ref (#42693 ) Related to #42640 The search/query plan holded a reference to schema, which could be destructed after schema change. This PR make plan hold a shared ptr to it fixing dangling reference problem under concurrent read & schema change. This PR also remove field binlog check for loading index for old segment with old schema may have binlog lack. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-06-12 20:46:36 +08:00
XuanYang-cn	83877b9faf	enhance: remove extra get collection (#42042 ) Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-06-10 18:34:35 +08:00
Zhen Ye	508264f953	fix: querynode upgrade from 2.5 get stucked (#42502 ) issue: #42492 - consider the old RO query node (not streaming node) when balancing channel. - querynode graceful stop can be done if there's only L0 segment exists. Signed-off-by: chyezh <chyezh@outlook.com>	2025-06-04 11:20:30 +08:00
congqixia	b76478378a	feat: [Tiered] Make load list work as warmup hint (#42490 ) Related to #42489 See also #41435 This PR's main target is to make partial load field list work as caching layer warmup policy hint. If user specify load field list, the fields not included in the list shall use `disabled` warmup policy and be able to lazily loaded if any read op uses them. The major changes are listed here: - Pass load list to segcore and creating collection&schema - Add util functions to check field shall be proactively loaded - Adapt storage v2 column group, which may lead to hint fail if columns share same group --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-06-04 10:28:32 +08:00
congqixia	cc42d49769	fix: [StorageV2][AddField] Handle lack binlog rows in storage v2 (#42186 ) Related to #39173 #39718 In storage v2, the `lack_bin_rows` cannot be used since field id is not column group id, which will not be matched forever. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-05-31 02:44:30 +08:00
congqixia	6d2ad519b1	enhance:[StorageV2] Adapt local storage & other minor issue (#42167 ) Related to #39173 This PR - Handle storage v2 log path in local storage mode on querynode - Ignore field info check when append index for loaded sealed segment when using storage v2 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-05-30 10:22:29 +08:00
cqy123456	5fe7015f63	enhance: InterimIndex support more index type and data type (#41021 ) issue: https://github.com/milvus-io/milvus/issues/27678 cherry pick from : https://github.com/milvus-io/milvus/pull/39180, https://github.com/milvus-io/milvus/pull/40429 Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>	2025-05-28 08:40:28 +08:00
wei liu	dad43a3894	fix: cost metrics collection logic for replica selection (#41965 ) issue: #41621 - Deprecate EnableWorkerSQCostMetrics parameter - Always collect cost metrics from all search and retrieve results - Update code with comments explaining the changes rationale Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-05-22 10:20:25 +08:00
Ted Xu	ae32203d3a	fix: support group by with nullable grouping keys (#41797 ) See #36264 In this PR: - Enhanced error handling in parse of grouping field. - Fixed null handling in reduce tasks in proxy nodes. - Updated tests to reflect changes in error handling and data processing logic. --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-05-17 20:54:22 +08:00
Buqian Zheng	b0260d8676	feat: manual evict cache after built interim index (#41836 ) issue: https://github.com/milvus-io/milvus/issues/41435 this PR also makes HasRawData of ChunkedSegmentSealedImpl to return based on metadata, without needing to load the cache just to answer this simple question. --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-05-16 16:34:23 +08:00
congqixia	a6d09ff4cd	enhance: [StorageV2] fix issues integrating basic RW operations (#41834 ) Related to #39173 This PR: - Upgrade milvus-storage commit to fix filesystem finalized issue - Add bucket-name as prefix for all fs style access io - Initial arrow fs on querynodes startup - Fix timestamp access when loading sealed segment --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-05-15 09:52:23 +08:00
congqixia	c45c1fadb2	enhance: [AddField] Keep all binlog when loading (#41809 ) Related to #41726 #41736 The load field list blocks the new field from being loaded. `load_fields` shall work as hint after tiered storage support API to specifiy this behavior. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-05-14 17:30:21 +08:00
Buqian Zheng	ff5c2770e5	feat: cachinglayer: various improvements (#41546 ) issue: https://github.com/milvus-io/milvus/issues/41435 this PR is based on https://github.com/milvus-io/milvus/pull/41436. Improvements include: - Lazy Load support for Storage v1 - Use Low/High watermark to control eviction - Caching Layer related config changes - Removed ChunkCache related configs and code in golang - Add `PinAllCells` helper method to CacheSlot class - Modified ValueAt, RawAt, PrimitiveRawAt to Bulk version, to reduce caching layer overhead - Removed some unclear templated bulk_subscript methods - CachedSearchIterator to store PinWrapper when searching on ChunkedColumn, and removed unused contrustor. --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-05-10 09:19:16 +08:00
Buqian Zheng	3de904c7ea	feat: add cachinglayer to sealed segment (#41436 ) issue: https://github.com/milvus-io/milvus/issues/41435 --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-04-28 10:52:40 +08:00
congqixia	b5443ddbd0	enhance: [AddField] Reopen loaded segments after AddField (#41529 ) Related to #39718 This PR: - Add reopen logic for growing & sealed segments - Lazy reopen when schema version increases - Add FinishLoad api for loading progress --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-04-26 08:48:39 +08:00
Zhen Ye	5fd47c3c89	fix: mockery too unavailable after upgrade golang version (#41481 ) issue: #41291 pr: #41318 Signed-off-by: chyezh <chyezh@outlook.com>	2025-04-24 10:46:43 +08:00
SimFG	91d40fa558	fix: Update logging context and upgrade dependencies (#41318 ) - issue: #41291 --------- Signed-off-by: SimFG <bang.fu@zilliz.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2025-04-23 10:52:38 +08:00

1 2 3 4 5 ...

405 Commits