milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-07 19:31:51 +08:00

Author	SHA1	Message	Date
sparknack	f815f57b82	enhance: check both eviction and warmup when estimate segment loading size (#45222 ) issue: #44857 Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-11-10 14:15:36 +08:00
congqixia	e284733399	fix: Move FinishLoad before text index creation to ensure raw data availability (#45334 ) Related to #45333 Fix segment loading failure when adding fields with text match enabled. The issue occurred because text indexes were being loaded before FinishLoad() was called, meaning raw data was not properly available when text index creation attempted to access it, resulting in "failed to create text index, neither raw data nor index are found" errors. Solution is to move the FinishLoad() call to execute after raw data loading but before text index loading. This ensures that: 1. Raw data is properly loaded and available in memory 2. Text indexes can access the raw data they need during creation 3. The segment is in the correct state before any index operations Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-11-06 14:49:34 +08:00
Lior Friedman	a4d69031f1	fix: Add AiSAQ index type RAM estimation implementation on the query node. (#45246 ) Currently, the index type AiSAQ RAM usage estimation is not being calculated correctly. AiSAQ index type consumes less RAM usage while loading the index than DISKANN does, and the query node module is missing the implementation of the RAM usage estimation for that AiSAQ index type. We suggest that the AiSAQ RAM usage estimation calculation should be as follows: UsedDiskMemoryRatioAisaq = 1024 (contrary to the UsedDiskMemoryRatio, which is 4) neededMemSize = indexInfo.IndexSize / UsedDiskMemoryRatioAisaq neededDiskSize = indexInfo.IndexSize Reported issue is #45247 --------- Signed-off-by: Lior Friedman <lior.friedman@il.kioxia.com> Signed-off-by: friedl <lior.friedman@kioxia.com> Co-authored-by: friedl <lior.friedman@kioxia.com>	2025-11-06 08:53:34 +08:00
congqixia	e6be590b97	enhance: set schema version when creating new collection (#45263 ) Related to #43028 Initialize the schema version field when creating a new collection instance in QueryNode. The schema version is extracted from loadMetaInfo and assigned to the collection, ensuring proper schema version tracking and consistency across the distributed system. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-11-04 10:15:32 +08:00
Zhen Ye	576084fe86	enhance: support alter collection/database with WAL-based DDL framework (#45266 ) issue: #43897 - Alter collection/database is implemented by WAL-based DDL framework now. - Support AlterCollection/AlterDatabase in wal now. - Alter operation can be synced by new CDC now. - Refactor some UT for alter DDL. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-11-04 09:59:33 +08:00
congqixia	8c98adfeb3	fix: update QueryNode NumEntities metrics when collection has no segments (#45147 ) Related to #44509 Fix a bug where QueryNodeNumEntities metrics were not updated for collections with zero segments, causing stale metrics when all segments are flushed or compacted. The previous implementation used separate loops: one to update size metrics for all collections, and another to update num entities metrics only for collections present in the grouped segments map. Collections with no segments were skipped in the second loop, leaving their NumEntities metrics stale. Changes: - Consolidate size and num entities metric updates into single loop - Iterate over all collections instead of grouped segments - Get collection metadata from manager instead of segment instances - Correctly set NumEntities to 0 for collections with no segments - Apply the same fix to both growing and sealed segment processing - Add nil check for collection metadata before processing This ensures all collection metrics are updated consistently, even when segment count drops to zero. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-10-30 10:12:08 +08:00
aoiasd	ad9a0cae48	enhance: add global analyzer options (#44684 ) relate: https://github.com/milvus-io/milvus/issues/43687 Add global analyzer options, avoid having to merge some milvus params into user's analyzer params. Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-10-28 14:52:10 +08:00
congqixia	36a887b38b	enhance: add NewSegmentWithLoadInfo API to support segment self-managed loading (#45061 ) This commit introduces the foundation for enabling segments to manage their own loading process by passing load information during segment creation. Changes: C++ Layer: - Add NewSegmentWithLoadInfo() C API to create segments with serialized load info - Add SetLoadInfo() method to SegmentInterface for storing load information - Refactor segment creation logic into shared CreateSegment() helper function - Add comprehensive documentation for the new API Go Layer: - Extend CreateCSegmentRequest to support optional LoadInfo field - Update segment creation in querynode to pass SegmentLoadInfo when available - Add ConvertToSegcoreSegmentLoadInfo() and helper converters for proto translation Proto Definitions: - Add segcorepb.SegmentLoadInfo message with essential loading metadata - Add supporting messages: Binlog, FieldBinlog, FieldIndexInfo, TextIndexStats, JsonKeyStats - Remove dependency on data_coord.proto by creating segcore-specific definitions Testing: - Add comprehensive unit tests for proto conversion functions - Test edge cases including nil inputs, empty data, and nil array/map elements This is the first step toward issue #45060 - enabling segments to autonomously manage their loading process, which will: - Clarify responsibilities between Go and C++ layers - Reduce cross-language call overhead - Enable precise resource management at the C++ level - Support better integration with caching layer - Enable proactive schema evolution handling Related to #45060 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-10-27 15:28:12 +08:00
aoiasd	cfeb095ad7	enhance: forbid build analyzer at proxy (#44067 ) relate: https://github.com/milvus-io/milvus/issues/43687 We used to run the temporary analyzer and validate analyzer on the proxy, but the proxy should not be a computation-heavy node. This PR move all analyzer calculations to the streaming node. --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-10-23 10:58:12 +08:00
aoiasd	ac82bad0b3	enhance: optimize idf oracle sync logic (#44628 ) Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-10-20 15:42:08 +08:00
sparknack	935160840c	enhance: add a disk quota for the loaded binlog size to prevent load failure of querynode (#44893 ) issue: #41435 --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-10-19 19:44:01 +08:00
aoiasd	754997ac2b	enhance: update some annotations (#44769 ) relate: https://github.com/milvus-io/milvus/issues/43114 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-10-17 16:22:02 +08:00
sparknack	c8a4d6e2ef	enhance: add cachinglayer management for TextMatchIndex (#44741 ) issue: #41435, #44502 Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-10-13 14:37:58 +08:00
sparknack	6d5b41644b	enhance: remove logical usage checks during segment loading (#44743 ) issue: #41435 Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-10-13 14:21:58 +08:00
congqixia	5ece760d73	fix: Pass fs via `FileManagerContext` when loading index (#44733 ) Related to #44615 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-10-11 09:55:57 +08:00
wei liu	33d1e7de83	fix: Replace incorrect log import with milvus v2 log package (#44731 ) issue: #44730 Fix the issue where logs were not outputting as expected due to incorrect log package imports across multiple components. Changes include: - Add golangci-lint rule to forbid github.com/pingcap/log usage - Replace github.com/pingcap/log with github.com/milvus-io/milvus/pkg/v2/log Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-10-10 20:27:57 +08:00
Zhen Ye	a110d8cc49	fix: don't use logical resource for metrics of quota center on streaming node (#44613 ) issue: #44599 Signed-off-by: chyezh <chyezh@outlook.com>	2025-09-29 21:34:13 +08:00
aoiasd	78ee76f018	enhance: support preload sealed segment bm25 stats and optimize bm25 stats serialize (#44279 ) relate: https://github.com/milvus-io/milvus/issues/41424 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-09-29 16:35:05 +08:00
Zhen Ye	b6b59bd222	fix: remove redundant initialization of storage v2 (#44597 ) issue: #44596 - querynode already init the storage v2 and segcore, so streamingnode should not do this again. - It also fix the gcp object storage access denied. Signed-off-by: chyezh <chyezh@outlook.com>	2025-09-29 10:17:04 +08:00
zhagnlu	eac16a577c	enhance:support cachelayer for json stats (#44446 ) #42533 Signed-off-by: zhagnlu <lu.zhang@zilliz.com>	2025-09-24 15:30:04 +08:00
Tianx	2c0c5ef41e	feat: timestamptz expression & index & timezone (#44080 ) issue: https://github.com/milvus-io/milvus/issues/27467 >My plan is as follows. >- [x] M1 Create collection with timestamptz field >- [x] M2 Insert timestamptz field data >- [x] M3 Retrieve timestamptz field data >- [x] M4 Implement handoff >- [x] M5 Implement compare operator >- [x] M6 Implement extract operator >- [x] M8 Support database/collection level default timezone >- [x] M7 Support STL-SORT index for datatype timestamptz --- The third PR of issue: https://github.com/milvus-io/milvus/issues/27467, which completes M5, M6, M7, M8 described above. ## M8 Default Timezone We will be able to use alter_collection() and alter_database() in a future Python SDK release to modify the default timezone at the collection or database level. For insert requests, the timezone will be resolved using the following order of precedence: String Literal-> Collection Default -> Database Default. For retrieval requests, the timezone will be resolved in this order: Query Parameters -> Collection Default -> Database Default. In both cases, the final fallback timezone is UTC. ## M5: Comparison Operators We can now use the following expression format to filter on the timestamptz field: - `timestamptz_field [+/- INTERVAL 'interval_string'] {comparison_op} ISO 'iso_string' ` - The interval_string follows the ISO 8601 duration format, for example: P1Y2M3DT1H2M3S. - The iso_string follows the ISO 8601 timestamp format, for example: 2025-01-03T00:00:00+08:00. - Example expressions: "tsz + INTERVAL 'P0D' != ISO '2025-01-03T00:00:00+08:00'" or "tsz != ISO '2025-01-03T00:00:00+08:00'". ## M6: Extract We will be able to extract sepecific time filed by kwargs in a future Python SDK release. The key is `time_fields`, and value should be one or more of "year, month, day, hour, minute, second, microsecond", seperated by comma or space. Then the result of each record would be an array of int64. ## M7: Indexing Support Expressions without interval arithmetic can be accelerated using an STL-SORT index. However, expressions that include interval arithmetic cannot be indexed. This is because the result of an interval calculation depends on the specific timestamp value. For example, adding one month to a date in February results in a different number of added days than adding one month to a date in March. --- After this PR, the input / output type of timestamptz would be iso string. Timestampz would be stored as timestamptz data, which is int64_t finally. > for more information, see https://en.wikipedia.org/wiki/ISO_8601 --------- Signed-off-by: xtx <xtianx@smail.nju.edu.cn>	2025-09-23 10:24:12 +08:00
jiaqizho	338ed2fed4	enhance: Introduce sparse filter in query (#44347 ) issue: #44373 The current commit implements sparse filtering in query tasks using the statistical information (Bloom filter/MinMax) of the Primary Key (PK). The statistical information of the PK is bound to the segment during the segment loading phase. A new filter has been added to the segment filter to enable the sparse filtering functionality. Signed-off-by: jiaqizho <jiaqi.zhou@zilliz.com>	2025-09-23 09:58:09 +08:00
Gao	d3784c6515	enhance: add storage resource usage for vector search (#44308 ) issue: #44212 Implement search/query storage usage statistics in go side(result reduce), only record storage usage in vector search C++ path. Need to be implemented in query c++ path in next prs. --------- Signed-off-by: chasingegg <chao.gao@zilliz.com> Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com> Co-authored-by: marcelo.chen <marcelo.chen@zilliz.com>	2025-09-19 20:20:02 +08:00
zhenshan.cao	691a8df953	feat: Add RESTful api for rolling upgrade support (#44381 ) issue: https://github.com/milvus-io/milvus/issues/43968 Co-authored-by: chyezh <ye.zhen@zilliz.com>	2025-09-16 20:08:00 +08:00
yihao.dai	51f69f32d0	feat: Add CDC support (#44124 ) This PR implements a new CDC service for Milvus 2.6, providing log-based cross-cluster replication. issue: https://github.com/milvus-io/milvus/issues/44123 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com> Signed-off-by: chyezh <chyezh@outlook.com> Co-authored-by: chyezh <chyezh@outlook.com>	2025-09-16 16:32:01 +08:00
congqixia	aa861f55e6	enhance: [StorageV2] Reverts #44232 bucket name change (#44390 ) Related to #39173 - Put bucket name concatenation logic back for azure support This reverts commit 8f97eb355fde6b86cf37f166d2191750b4210ba3. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-09-16 10:10:00 +08:00
sthuang	9140201b8f	fix: add init fs check for querynode and streaming node (#44360 ) related: #44354 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-09-13 10:31:58 +08:00
congqixia	abe22b95c7	enhance: Utilize group info estimating logic usage as well (#44356 ) Related to #44257 #44334 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-09-12 22:47:57 +08:00
congqixia	9d2ff48d63	enhance: Utilize group split info to estimate usage (#44338 ) Related to #44257 #44334 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-09-12 14:49:57 +08:00
aoiasd	9add663a08	fix: idf oracle use wrong dir (#44266 ) relate: https://github.com/milvus-io/milvus/issues/44264 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-09-10 14:41:56 +08:00
congqixia	8f97eb355f	enhance: [StorageV2] Make bucket name concatenation transparent to user (#44232 ) Related to #39173 This PR: - Bump milvus-storage commit to handle bucket name concatenation logic in multipart s3 fs - Remove all user-side bucket name concatenation code Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-09-08 10:15:55 +08:00
zhagnlu	d67f1ea0ab	enhance: add param to modify dump snapshot batch size (#44215 ) issue: #44216 Signed-off-by: luzhang <luzhang@zilliz.com>	2025-09-05 14:29:54 +08:00
Gao	2e98cb0103	enhance: load resource estimation for tiered index (#44171 ) issue: https://github.com/milvus-io/milvus/issues/42032 - Use bytes to estimate load resource in the whole estimation procedure - Add num_rows and dim info for vector index to better estimate - Disable eviction for tiered index's meta --------- Signed-off-by: chasingegg <chao.gao@zilliz.com>	2025-09-04 19:41:53 +08:00
Bingyi Sun	0c0630cc38	feat: support dropping index without releasing collection (#42941 ) issue: #42942 This pr includes the following changes: 1. Added checks for index checker in querycoord to generate drop index tasks 2. Added drop index interface to querynode 3. To avoid search failure after dropping the index, the querynode allows the use of lazy mode (warmup=disable) to load raw data even when indexes contain raw data. 4. In segcore, loading the index no longer deletes raw data; instead, it evicts it. 5. In expr, the index is pinned to prevent concurrent errors. --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-09-02 16:17:52 +08:00
congqixia	7721edf32a	enhance: Add mutex and range check preventing concurrent del (#44128 ) This PR adds a mutex prevent concurrent applying delete on same segment and check latestDeltaTimestamp to skip overlapping delete range Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-09-01 14:29:52 +08:00
zhagnlu	fc876639cf	enhance: support json stats with shredding design (#42534 ) #42533 Co-authored-by: luzhang <luzhang@zilliz.com>	2025-09-01 10:49:52 +08:00
sparknack	70c8114e85	enhance: cachinglayer: resource management for segment loading (#43846 ) issue: #41435 --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-08-29 11:37:50 +08:00
Buqian Zheng	6420d72391	enhance: print as storage size unit MB with 2 digits only, so the log is easier to read (#44085 ) issue: https://github.com/milvus-io/milvus/issues/41435 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-08-27 19:47:50 +08:00
Chun Han	da156981c6	feat: milvus support posix-compatible mode(milvus-io#43942) (#43944 ) related: #43942 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-08-27 16:29:50 +08:00
XuanYang-cn	37a447d166	feat: Add CMEK cipher plugin (#43722 ) 1. Enable Milvus to read cipher configs 2. Enable cipher plugin in binlog reader and writer 3. Add a testCipher for unittests 4. Support pooling for datanode 5. Add encryption in storagev2 See also: #40321 Signed-off-by: yangxuan <xuan.yang@zilliz.com> --------- Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2025-08-27 11:15:52 +08:00
Spade A	8456f824be	feat: impl StructArray -- miscellaneous staffs for struct array (#43960 ) Ref https://github.com/milvus-io/milvus/issues/42148 1. enable storage v2 2. implement some missing staffs 3. fix some bugs and add tests --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-08-26 21:35:53 +08:00
Tianx	c0d62268ac	feat: add timesatmptz data type (#44005 ) issue: https://github.com/milvus-io/milvus/issues/27467 > https://github.com/milvus-io/milvus/issues/27467#issuecomment-3092211420 > * [x] M1 Create collection with timestamptz field > * [x] M2 Insert timestamptz field data > * [x] M3 Retrieve timestamptz field data > * [x] M4 Implement handoff[ ] The second PR of issue: https://github.com/milvus-io/milvus/issues/27467, which completes M1-M4 described above. --------- Signed-off-by: xtx <xtianx@smail.nju.edu.cn>	2025-08-26 15:59:53 +08:00
Gao	e97a618630	enhance: support readAt interface for remote input stream (#43997 ) #42032 Also, fix the cacheoptfield method to work in storagev2. Also, change the sparse related interface for knowhere version bump #43974 . Also, includes https://github.com/milvus-io/milvus/pull/44046 for metric lost. --------- Signed-off-by: chasingegg <chao.gao@zilliz.com> Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com> Signed-off-by: Congqi Xia <congqi.xia@zilliz.com> Co-authored-by: marcelo.chen <marcelo.chen@zilliz.com> Co-authored-by: Congqi Xia <congqi.xia@zilliz.com>	2025-08-26 11:19:58 +08:00
zhagnlu	8934c18792	enhance: support cache result cache for expr (#43923 ) issue: #43878 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-08-26 10:55:52 +08:00
sparknack	4fae074d56	enhance: add write rate limit for disk file writer (#43912 ) issue: #43040 --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-08-25 10:27:47 +08:00
wei liu	399f63300c	enhance: Implement dynamic interval updates for ticker components (#43865 ) issue: #43858 Enable dynamic configuration updates for ticker intervals without restart. This enhancement allows runtime configuration changes to take effect immediately for better operational flexibility. Changes include: - Apply "drain+Reset only when interval changed" pattern across all ticker components to preserve existing timing phases - Fix goroutine variable capture issue in CheckerController.Start() - Remove unnecessary ticker.Stop() in manual trigger paths - Add dynamic interval checking in QueryCoordV2 components: * checkers/controller.go: Various checker intervals * dist/dist_handler.go: DistPullInterval, CheckExecutedFlagInterval * session/cluster.go: CheckNodeSessionInterval * server.go: CheckAutoBalanceConfigInterval * observers/target_observer.go: UpdateNextTargetInterval * observers/collection_observer.go: CollectionObserverInterval - Add dynamic interval checking in QueryNodeV2 components: * segments/disk_usage_fetcher.go: DiskSizeFetchInterval - Ensure thread safety by performing all ticker operations in same goroutine with proper drain before Reset to avoid spurious triggers --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-08-21 10:07:47 +08:00
Spade A	d6a428e880	feat: impl StructArray -- support create index for vector array (embedding list) and search on it (#43726 ) Ref https://github.com/milvus-io/milvus/issues/42148 This PR supports create index for vector array (now, only for `DataType.FLOAT_VECTOR`) and search on it. The index type supported in this PR is `EMB_LIST_HNSW` and the metric type is `MAX_SIM` only. The way to use it: ```python milvus_client = MilvusClient("xxx:19530") schema = milvus_client.create_schema(enable_dynamic_field=True, auto_id=True) ... struct_schema = milvus_client.create_struct_array_field_schema("struct_array_field") ... struct_schema.add_field("struct_float_vec", DataType.ARRAY_OF_VECTOR, element_type=DataType.FLOAT_VECTOR, dim=128, max_capacity=1000) ... schema.add_struct_array_field(struct_schema) index_params = milvus_client.prepare_index_params() index_params.add_index(field_name="struct_float_vec", index_type="EMB_LIST_HNSW", metric_type="MAX_SIM", index_params={"nlist": 128}) ... milvus_client.create_index(COLLECTION_NAME, schema=schema, index_params=index_params) ``` Note: This PR uses `Lims` to convey offsets of the vector array to knowhere where vectors of multiple vector arrays are concatenated and we need offsets to specify which vectors belong to which vector array. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-08-20 10:27:46 +08:00
Xianhui Lin	c7d8dc100a	fix: add segment lock in LoadTextIndex and LoadJSONKeyIndex (#43811 ) fix: add segment lock in LoadTextIndex and LoadJSONKeyIndex issue:https://github.com/milvus-io/milvus/issues/43572 Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-08-18 01:17:52 +08:00
congqixia	de3e5c285b	enhance: Add downgrade tsafe switch param item (#43874 ) Related to #43873 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-08-15 12:31:43 +08:00
congqixia	f032044125	enhance: Refine segcore param change callback (#43838 ) Related to #43230 This PR - Move segcore setup function to `initcore` package to remove cgo dependency from pkg - Register core callback only for components depends on segcore - Rectify `UpdateLogLevel` implementation Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-08-13 19:31:44 +08:00

1 2 3 4 5 ...

817 Commits