milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-07 11:21:52 +08:00

Author	SHA1	Message	Date
marcelo-cjl	3b599441fd	feat: Add nullable vector support for proxy and querynode (#46305 ) related: #45993 This commit extends nullable vector support to the proxy layer, querynode, and adds comprehensive validation, search reduce, and field data handling for nullable vectors with sparse storage. Proxy layer changes: - Update validate_util.go checkAligned() with getExpectedVectorRows() helper to validate nullable vector field alignment using valid data count - Update checkFloatVectorFieldData/checkSparseFloatVectorFieldData for nullable vector validation with proper row count expectations - Add FieldDataIdxComputer in typeutil/schema.go for logical-to-physical index translation during search reduce operations - Update search_reduce_util.go reduceSearchResultData to use idxComputers for correct field data indexing with nullable vectors - Update task.go, task_query.go, task_upsert.go for nullable vector handling - Update msg_pack.go with nullable vector field data processing QueryNode layer changes: - Update segments/result.go for nullable vector result handling - Update segments/search_reduce.go with nullable vector offset translation Storage and index changes: - Update data_codec.go and utils.go for nullable vector serialization - Update indexcgowrapper/dataset.go and index.go for nullable vector indexing Utility changes: - Add FieldDataIdxComputer struct with Compute() method for efficient logical-to-physical index mapping across multiple field data - Update EstimateEntitySize() and AppendFieldData() with fieldIdxs parameter - Update funcutil.go with nullable vector support functions <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Full support for nullable vector fields (float, binary, float16, bfloat16, int8, sparse) across ingest, storage, indexing, search and retrieval; logical↔physical offset mapping preserves row semantics. * Client: compaction control and compaction-state APIs. * Bug Fixes * Improved validation for adding vector fields (nullable + dimension checks) and corrected search/query behavior for nullable vectors. * Chores * Persisted validity maps with indexes and on-disk formats. * Tests * Extensive new and updated end-to-end nullable-vector tests. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>	2025-12-24 10:13:19 +08:00
Buqian Zheng	e379b1f0f4	enhance: moved query optimization to proxy, added various optimizations (#45526 ) issue: https://github.com/milvus-io/milvus/issues/45525 see added README.md for added optimizations <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Added query expression optimization feature with a new `optimizeExpr` configuration flag to enable automatic simplification of filter predicates, including range predicate optimization, merging of IN/NOT IN conditions, and flattening of nested logical operators. * Bug Fixes * Adjusted delete operation behavior to correctly handle expression evaluation. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-12-24 00:39:19 +08:00
Spade A	f6f716bcfd	feat: impl StructArray -- support embedding searches embeddings in embedding list with element level filter expression (#45830 ) issue: https://github.com/milvus-io/milvus/issues/42148 For a vector field inside a STRUCT, since a STRUCT can only appear as the element type of an ARRAY field, the vector field in STRUCT is effectively an array of vectors, i.e. an embedding list. Milvus already supports searching embedding lists with metrics whose names start with the prefix MAX_SIM_. This PR allows Milvus to search embeddings inside an embedding list using the same metrics as normal embedding fields. Each embedding in the list is treated as an independent vector and participates in ANN search. Further, since STRUCT may contain scalar fields that are highly related to the embedding field, this PR introduces an element-level filter expression to refine search results. The grammar of the element-level filter is: element_filter(structFieldName, $[subFieldName] == 3) where $[subFieldName] refers to the value of subFieldName in each element of the STRUCT array structFieldName. It can be combined with existing filter expressions, for example: "varcharField == 'aaa' && element_filter(struct_field, $[struct_int] == 3)" A full example: ``` struct_schema = milvus_client.create_struct_field_schema() struct_schema.add_field("struct_str", DataType.VARCHAR, max_length=65535) struct_schema.add_field("struct_int", DataType.INT32) struct_schema.add_field("struct_float_vec", DataType.FLOAT_VECTOR, dim=EMBEDDING_DIM) schema.add_field( "struct_field", datatype=DataType.ARRAY, element_type=DataType.STRUCT, struct_schema=struct_schema, max_capacity=1000, ) ... filter = "varcharField == 'aaa' && element_filter(struct_field, $[struct_int] == 3 && $[struct_str] == 'abc')" res = milvus_client.search( COLLECTION_NAME, data=query_embeddings, limit=10, anns_field="struct_field[struct_float_vec]", filter=filter, output_fields=["struct_field[struct_int]", "varcharField"], ) ``` TODO: 1. When an `element_filter` expression is used, a regular filter expression must also be present. Remove this restriction. 2. Implement `element_filter` expressions in the `query`. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-12-15 12:01:15 +08:00
zhagnlu	a86b8b7a12	enhance: move jsonshredding meta from parquet to meta.json (#46130 ) #42533 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-12-11 14:01:13 +08:00
cai.zhang	bb486c0db3	fix: Fix path concatenation error when rootPath = "." in minio (#46220 ) issue: #46219 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-12-10 13:53:13 +08:00
Buqian Zheng	1372e84d7f	fix: move cursor after skip index skipped a chunk (#46054 ) issue: #46053 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-12-05 10:47:11 +08:00
Xinyi7	59752f216d	fix: add check in batch_score function to prevent query node seg fault (#46025 ) previously we saw that when doing reranker with phrase matching, the query node throws a segmentation fault error. github issue link: https://github.com/milvus-io/milvus/issues/45990 --------- Signed-off-by: Xinyi Jiang <xinyi.jiang@reddit.com> Co-authored-by: Xinyi Jiang <xinyi.jiang@reddit.com>	2025-12-04 17:35:17 +08:00
Ted Xu	20ce9fdc23	feat: bump loon version (#46029 ) See: #44956 This PR upgrades loon to the latest version and resolves building conflicts. --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com> Signed-off-by: Congqi Xia <congqi.xia@zilliz.com> Co-authored-by: Congqi Xia <congqi.xia@zilliz.com>	2025-12-04 10:57:12 +08:00
zhagnlu	d5bd17315c	enhance: remove some meta cache for json shredding (#45888 ) #42533 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-12-01 11:57:09 +08:00
sparknack	0392db6976	enhance: add cancellation checking in each operator and expr (#45354 ) issue: #45353 --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-11-26 10:15:07 +08:00
Gao	09a3195867	enhance: support max_connections config for remote storage (#45225 ) related: https://github.com/milvus-io/milvus/issues/45344 Signed-off-by: chasingegg <chao.gao@zilliz.com>	2025-11-13 15:37:38 +08:00
Spade A	929dc65882	fix: fix index compatibility after upgrade (#45373 ) issue: https://github.com/milvus-io/milvus/issues/45380 --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-11-13 12:59:38 +08:00
Buqian Zheng	515a939edf	enhance: remove obsolete code (#45307 ) issue: #44452 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-11-07 16:07:35 +08:00
cai.zhang	7527ddf50f	enhance: [test] Move R-Tree index tests into the implementation package (#45355 ) Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-11-07 10:03:33 +08:00
Spade A	cd0b36c39e	feat: impl StructArray -- support diskann index (#45223 ) issue: https://github.com/milvus-io/milvus/issues/42148 --------- Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com> Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-11-04 11:57:33 +08:00
zhagnlu	653e95aaad	fix: fix bug for shredding json when empty json but not null (#45221 ) #45157 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-11-04 11:11:33 +08:00
Buqian Zheng	c284e8c4a8	enhance: some minor code cleanup, prepare for scalar benchmark (#45008 ) issue: https://github.com/milvus-io/milvus/issues/44452 --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-10-24 14:22:05 +08:00
cqy123456	822588302a	enhance: embedding_list support mmap in MemVectorIndex (#44764 ) issue: https://github.com/milvus-io/milvus/issues/44702 Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>	2025-10-15 15:22:00 +08:00
Spade A	c4f3f0ce4c	feat: impl StructArray -- support more types of vector in STRUCT (#44736 ) ref: https://github.com/milvus-io/milvus/issues/42148 --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-10-15 10:25:59 +08:00
congqixia	5ece760d73	fix: Pass fs via `FileManagerContext` when loading index (#44733 ) Related to #44615 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-10-11 09:55:57 +08:00
Gao	d3dfb90587	enhance: move tracer test to milvus-common (#44605 ) #43931 --------- Signed-off-by: chasingegg <chao.gao@zilliz.com>	2025-09-30 15:07:06 +08:00
cai.zhang	19346fa389	feat: Geospatial Data Type and GIS Function support for milvus (#44547 ) issue: #43427 This pr's main goal is merge #37417 to milvus 2.5 without conflicts. # Main Goals 1. Create and describe collections with geospatial type 2. Insert geospatial data into the insert binlog 3. Load segments containing geospatial data into memory 4. Enable query and search can display geospatial data 5. Support using GIS funtions like ST_EQUALS in query 6. Support R-Tree index for geometry type # Solution 1. Add Type: Modify the Milvus core by adding a Geospatial type in both the C++ and Go code layers, defining the Geospatial data structure and the corresponding interfaces. 2. Dependency Libraries: Introduce necessary geospatial data processing libraries. In the C++ source code, use Conan package management to include the GDAL library. In the Go source code, add the go-geom library to the go.mod file. 3. Protocol Interface: Revise the Milvus protocol to provide mechanisms for Geospatial message serialization and deserialization. 4. Data Pipeline: Facilitate interaction between the client and proxy using the WKT format for geospatial data. The proxy will convert all data into WKB format for downstream processing, providing column data interfaces, segment encapsulation, segment loading, payload writing, and cache block management. 5. Query Operators: Implement simple display and support for filter queries. Initially, focus on filtering based on spatial relationships for a single column of geospatial literal values, providing parsing and execution for query expressions.Now only support brutal search 7. Client Modification: Enable the client to handle user input for geospatial data and facilitate end-to-end testing.Check the modification in pymilvus. --------- Signed-off-by: Yinwei Li <yinwei.li@zilliz.com> Signed-off-by: Cai Zhang <cai.zhang@zilliz.com> Co-authored-by: ZhuXi <150327960+Yinwei-Yu@users.noreply.github.com>	2025-09-28 19:43:05 +08:00
sparknack	14c085374e	fix: set mmap_file_raii_ to nullptr when mmap is disabled (#44516 ) issue: #44510 related: #44501 Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-09-24 11:50:03 +08:00
Gao	539f17f1ad	enhance: tiered index updates (#44433 ) issue: #42032 #44212 - special case for warmup param and cell storage size for tiered index - add a config to enable/disable storage usage tracking --------- Signed-off-by: chasingegg <chao.gao@zilliz.com>	2025-09-22 21:34:11 +08:00
sparknack	ab64afba2f	enhance: add storage resource usage for scalar search (#44414 ) issue: #44212 --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-09-22 14:28:06 +08:00
Gao	d3784c6515	enhance: add storage resource usage for vector search (#44308 ) issue: #44212 Implement search/query storage usage statistics in go side(result reduce), only record storage usage in vector search C++ path. Need to be implemented in query c++ path in next prs. --------- Signed-off-by: chasingegg <chao.gao@zilliz.com> Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com> Co-authored-by: marcelo.chen <marcelo.chen@zilliz.com>	2025-09-19 20:20:02 +08:00
congqixia	b532a3e026	enhance: Move c API unittest aside to src files (#44458 ) Related to #43931 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-09-19 10:30:01 +08:00
congqixia	7b83314bf3	enhance: [StorageV2] Make datanode use non-singleton fs (#44418 ) Related to #39173 According to the current design, datanode shall create fs from storage config in request instead of using singleton fs. This PR upgrade milvus-storage and make packed reader/writer compose new fs from storage config. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-09-18 20:06:00 +08:00
zhagnlu	9b6703626d	fix:fix unescaped bug for json stats (#44421 ) #42533 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-09-17 20:54:01 +08:00
sthuang	2f70a73258	fix: turn on azure by default (#44377 ) related: #44354, #44138, #43869 --------- Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-09-17 10:12:01 +08:00
sthuang	b38013352d	enhance: [StorageV2] enable build with azure (#44177 ) related: #43869 --------- Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-09-14 08:05:58 +08:00
zhagnlu	16e6b6aa8a	fix:fix build json stats bug for nested object (#44303 ) issue: #44132 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-09-11 14:13:56 +08:00
sparknack	4a01c726f3	enhance: cachinglayer: some metric and params update (#44276 ) issue: #41435 --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-09-10 11:03:57 +08:00
Spade A	45adf2d426	fix: load resource considers ngram index (#44237 ) fix https://github.com/milvus-io/milvus/issues/44236 --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-09-10 10:27:56 +08:00
Chun Han	26a024625d	feat: support search by on json field and dynamic field(#43124 ) (#43203 ) related: #43124 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-09-09 21:51:56 +08:00
Spade A	575d490af6	fix: ngram index is mistakenly used for unsopported operations 2 (#44142 ) issue: https://github.com/milvus-io/milvus/issues/44020 https://github.com/milvus-io/milvus/pull/43955 only fixed unary expression This fixes all expressions and add more tests. --------- Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com> Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-09-09 19:05:56 +08:00
Buqian Zheng	9bf2b5c10c	enhance: moved more segcore unit test files (#44210 ) issue: https://github.com/milvus-io/milvus/issues/43931 --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-09-08 10:21:55 +08:00
Spade A	ba4cd68edb	fix: adjust params to make CPP UT run faster (#44223 ) fix: https://github.com/milvus-io/milvus/issues/44224 --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-09-06 14:13:54 +08:00
cqy123456	1d4d721859	test: Reduce the run time of interim index cpp ut (#44200 ) issue: https://github.com/milvus-io/milvus/issues/44176 Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>	2025-09-05 16:45:53 +08:00
Gao	2e98cb0103	enhance: load resource estimation for tiered index (#44171 ) issue: https://github.com/milvus-io/milvus/issues/42032 - Use bytes to estimate load resource in the whole estimation procedure - Add num_rows and dim info for vector index to better estimate - Disable eviction for tiered index's meta --------- Signed-off-by: chasingegg <chao.gao@zilliz.com>	2025-09-04 19:41:53 +08:00
Buqian Zheng	b76bf13fc3	enhance: move c++ unit test file to aside of the production code (#43932 ) issue: https://github.com/milvus-io/milvus/issues/43931 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-09-03 23:45:53 +08:00
Spade A	825a134739	feat: impl StructArray -- reject json types for struct (#44190 ) issue: https://github.com/milvus-io/milvus/issues/42148 --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-09-03 19:33:53 +08:00
Spade A	7cb15ef141	feat: impl StructArray -- optimize vector array serialization (#44035 ) issue: https://github.com/milvus-io/milvus/issues/42148 Optimized from Go VectorArray → VectorArray Proto → Binary → C++ VectorArray Proto → C++ VectorArray local impl → Memory to Go VectorArray → Arrow ListArray → Memory --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-09-03 16:39:53 +08:00
foxspy	d55bf49bf1	enhance: update knowhere version (#44144 ) issue: #42937 --------- Signed-off-by: xianliang.li <xianliang.li@zilliz.com>	2025-09-03 01:31:53 +08:00
Bingyi Sun	0c0630cc38	feat: support dropping index without releasing collection (#42941 ) issue: #42942 This pr includes the following changes: 1. Added checks for index checker in querycoord to generate drop index tasks 2. Added drop index interface to querynode 3. To avoid search failure after dropping the index, the querynode allows the use of lazy mode (warmup=disable) to load raw data even when indexes contain raw data. 4. In segcore, loading the index no longer deletes raw data; instead, it evicts it. 5. In expr, the index is pinned to prevent concurrent errors. --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-09-02 16:17:52 +08:00
Bingyi Sun	c420e7bd27	enhance: align the behavior of exist expr between brute force and index (#44030 ) https://github.com/milvus-io/milvus/issues/44031 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-09-01 15:03:52 +08:00
zhagnlu	fc876639cf	enhance: support json stats with shredding design (#42534 ) #42533 Co-authored-by: luzhang <luzhang@zilliz.com>	2025-09-01 10:49:52 +08:00
sparknack	70c8114e85	enhance: cachinglayer: resource management for segment loading (#43846 ) issue: #41435 --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-08-29 11:37:50 +08:00
congqixia	e3b3502287	fix: Use correct regex for cppcheck (#44077 ) Related to #44076 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-08-27 20:57:50 +08:00
marcelo-cjl	e13e19cd2c	enhance: add sparse_u32_f32 data type for sparse vertor (#43974 ) issue: #43973 Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com>	2025-08-27 16:47:50 +08:00

1 2 3 4 5 ...

787 Commits