milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-07 19:31:51 +08:00

Author	SHA1	Message	Date
sparknack	c8a4d6e2ef	enhance: add cachinglayer management for TextMatchIndex (#44741 ) issue: #41435, #44502 Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-10-13 14:37:58 +08:00
congqixia	5ece760d73	fix: Pass fs via `FileManagerContext` when loading index (#44733 ) Related to #44615 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-10-11 09:55:57 +08:00
sparknack	7e750190b6	enhance: add a size getter for tantivy inverted index (#44609 ) issue: #41435 --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-10-10 17:43:57 +08:00
cai.zhang	19346fa389	feat: Geospatial Data Type and GIS Function support for milvus (#44547 ) issue: #43427 This pr's main goal is merge #37417 to milvus 2.5 without conflicts. # Main Goals 1. Create and describe collections with geospatial type 2. Insert geospatial data into the insert binlog 3. Load segments containing geospatial data into memory 4. Enable query and search can display geospatial data 5. Support using GIS funtions like ST_EQUALS in query 6. Support R-Tree index for geometry type # Solution 1. Add Type: Modify the Milvus core by adding a Geospatial type in both the C++ and Go code layers, defining the Geospatial data structure and the corresponding interfaces. 2. Dependency Libraries: Introduce necessary geospatial data processing libraries. In the C++ source code, use Conan package management to include the GDAL library. In the Go source code, add the go-geom library to the go.mod file. 3. Protocol Interface: Revise the Milvus protocol to provide mechanisms for Geospatial message serialization and deserialization. 4. Data Pipeline: Facilitate interaction between the client and proxy using the WKT format for geospatial data. The proxy will convert all data into WKB format for downstream processing, providing column data interfaces, segment encapsulation, segment loading, payload writing, and cache block management. 5. Query Operators: Implement simple display and support for filter queries. Initially, focus on filtering based on spatial relationships for a single column of geospatial literal values, providing parsing and execution for query expressions.Now only support brutal search 7. Client Modification: Enable the client to handle user input for geospatial data and facilitate end-to-end testing.Check the modification in pymilvus. --------- Signed-off-by: Yinwei Li <yinwei.li@zilliz.com> Signed-off-by: Cai Zhang <cai.zhang@zilliz.com> Co-authored-by: ZhuXi <150327960+Yinwei-Yu@users.noreply.github.com>	2025-09-28 19:43:05 +08:00
foxspy	13c3b0b909	enhance: add autoindex configuration for the int8 vector type (#44554 ) issue: #38666 Add int8 support for autoindex to ensure it can be independently configured. At the same time, remove the restriction on int8 type for vectorDiskIndex (note that vectorDiskIndex only determines the building and loading method of the index, not the index type). Signed-off-by: xianliang.li <xianliang.li@zilliz.com>	2025-09-24 17:48:04 +08:00
zhagnlu	eac16a577c	enhance:support cachelayer for json stats (#44446 ) #42533 Signed-off-by: zhagnlu <lu.zhang@zilliz.com>	2025-09-24 15:30:04 +08:00
sparknack	14c085374e	fix: set mmap_file_raii_ to nullptr when mmap is disabled (#44516 ) issue: #44510 related: #44501 Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-09-24 11:50:03 +08:00
congqixia	ea307ea3c9	fix: [StorageV2] Make DiskFileManager use fs from context (#44535 ) Related to #44534 Datanode shall not use singleton fs after 2.6+. This patch make disk file manager use filesystem passed by fileManagerContext instead of errorous singleton one. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-09-24 10:12:03 +08:00
Tianx	2c0c5ef41e	feat: timestamptz expression & index & timezone (#44080 ) issue: https://github.com/milvus-io/milvus/issues/27467 >My plan is as follows. >- [x] M1 Create collection with timestamptz field >- [x] M2 Insert timestamptz field data >- [x] M3 Retrieve timestamptz field data >- [x] M4 Implement handoff >- [x] M5 Implement compare operator >- [x] M6 Implement extract operator >- [x] M8 Support database/collection level default timezone >- [x] M7 Support STL-SORT index for datatype timestamptz --- The third PR of issue: https://github.com/milvus-io/milvus/issues/27467, which completes M5, M6, M7, M8 described above. ## M8 Default Timezone We will be able to use alter_collection() and alter_database() in a future Python SDK release to modify the default timezone at the collection or database level. For insert requests, the timezone will be resolved using the following order of precedence: String Literal-> Collection Default -> Database Default. For retrieval requests, the timezone will be resolved in this order: Query Parameters -> Collection Default -> Database Default. In both cases, the final fallback timezone is UTC. ## M5: Comparison Operators We can now use the following expression format to filter on the timestamptz field: - `timestamptz_field [+/- INTERVAL 'interval_string'] {comparison_op} ISO 'iso_string' ` - The interval_string follows the ISO 8601 duration format, for example: P1Y2M3DT1H2M3S. - The iso_string follows the ISO 8601 timestamp format, for example: 2025-01-03T00:00:00+08:00. - Example expressions: "tsz + INTERVAL 'P0D' != ISO '2025-01-03T00:00:00+08:00'" or "tsz != ISO '2025-01-03T00:00:00+08:00'". ## M6: Extract We will be able to extract sepecific time filed by kwargs in a future Python SDK release. The key is `time_fields`, and value should be one or more of "year, month, day, hour, minute, second, microsecond", seperated by comma or space. Then the result of each record would be an array of int64. ## M7: Indexing Support Expressions without interval arithmetic can be accelerated using an STL-SORT index. However, expressions that include interval arithmetic cannot be indexed. This is because the result of an interval calculation depends on the specific timestamp value. For example, adding one month to a date in February results in a different number of added days than adding one month to a date in March. --- After this PR, the input / output type of timestamptz would be iso string. Timestampz would be stored as timestamptz data, which is int64_t finally. > for more information, see https://en.wikipedia.org/wiki/ISO_8601 --------- Signed-off-by: xtx <xtianx@smail.nju.edu.cn>	2025-09-23 10:24:12 +08:00
Buqian Zheng	75557f3eb8	enhance: Use std::shared_lock and std::unique_lock for mutexes (#44459 ) issue: https://github.com/milvus-io/milvus/issues/44452 Signed-off-by: zhengbuqian <zhengbuqian@gmail.com> Co-authored-by: buqian.zheng <buqian.zheng@zilliz.com>	2025-09-22 18:02:09 +08:00
sparknack	ab64afba2f	enhance: add storage resource usage for scalar search (#44414 ) issue: #44212 --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-09-22 14:28:06 +08:00
Gao	d3784c6515	enhance: add storage resource usage for vector search (#44308 ) issue: #44212 Implement search/query storage usage statistics in go side(result reduce), only record storage usage in vector search C++ path. Need to be implemented in query c++ path in next prs. --------- Signed-off-by: chasingegg <chao.gao@zilliz.com> Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com> Co-authored-by: marcelo.chen <marcelo.chen@zilliz.com>	2025-09-19 20:20:02 +08:00
congqixia	b532a3e026	enhance: Move c API unittest aside to src files (#44458 ) Related to #43931 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-09-19 10:30:01 +08:00
zhagnlu	9b6703626d	fix:fix unescaped bug for json stats (#44421 ) #42533 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-09-17 20:54:01 +08:00
zhagnlu	16e6b6aa8a	fix:fix build json stats bug for nested object (#44303 ) issue: #44132 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-09-11 14:13:56 +08:00
zhagnlu	77f7d19400	fix:avoid mmap rewrite by multi json fields (#44299 ) issue: #44127 Signed-off-by: zhagnlu <lu.zhang@zilliz.com>	2025-09-11 10:13:57 +08:00
sparknack	4a01c726f3	enhance: cachinglayer: some metric and params update (#44276 ) issue: #41435 --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-09-10 11:03:57 +08:00
Spade A	45adf2d426	fix: load resource considers ngram index (#44237 ) fix https://github.com/milvus-io/milvus/issues/44236 --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-09-10 10:27:56 +08:00
Spade A	575d490af6	fix: ngram index is mistakenly used for unsopported operations 2 (#44142 ) issue: https://github.com/milvus-io/milvus/issues/44020 https://github.com/milvus-io/milvus/pull/43955 only fixed unary expression This fixes all expressions and add more tests. --------- Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com> Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-09-09 19:05:56 +08:00
Buqian Zheng	9bf2b5c10c	enhance: moved more segcore unit test files (#44210 ) issue: https://github.com/milvus-io/milvus/issues/43931 --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-09-08 10:21:55 +08:00
Gao	2e98cb0103	enhance: load resource estimation for tiered index (#44171 ) issue: https://github.com/milvus-io/milvus/issues/42032 - Use bytes to estimate load resource in the whole estimation procedure - Add num_rows and dim info for vector index to better estimate - Disable eviction for tiered index's meta --------- Signed-off-by: chasingegg <chao.gao@zilliz.com>	2025-09-04 19:41:53 +08:00
Buqian Zheng	b76bf13fc3	enhance: move c++ unit test file to aside of the production code (#43932 ) issue: https://github.com/milvus-io/milvus/issues/43931 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-09-03 23:45:53 +08:00
Spade A	7cb15ef141	feat: impl StructArray -- optimize vector array serialization (#44035 ) issue: https://github.com/milvus-io/milvus/issues/42148 Optimized from Go VectorArray → VectorArray Proto → Binary → C++ VectorArray Proto → C++ VectorArray local impl → Memory to Go VectorArray → Arrow ListArray → Memory --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-09-03 16:39:53 +08:00
zhagnlu	fc876639cf	enhance: support json stats with shredding design (#42534 ) #42533 Co-authored-by: luzhang <luzhang@zilliz.com>	2025-09-01 10:49:52 +08:00
sparknack	70c8114e85	enhance: cachinglayer: resource management for segment loading (#43846 ) issue: #41435 --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-08-29 11:37:50 +08:00
congqixia	e3b3502287	fix: Use correct regex for cppcheck (#44077 ) Related to #44076 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-08-27 20:57:50 +08:00
marcelo-cjl	e13e19cd2c	enhance: add sparse_u32_f32 data type for sparse vertor (#43974 ) issue: #43973 Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com>	2025-08-27 16:47:50 +08:00
Spade A	8456f824be	feat: impl StructArray -- miscellaneous staffs for struct array (#43960 ) Ref https://github.com/milvus-io/milvus/issues/42148 1. enable storage v2 2. implement some missing staffs 3. fix some bugs and add tests --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-08-26 21:35:53 +08:00
Gao	e97a618630	enhance: support readAt interface for remote input stream (#43997 ) #42032 Also, fix the cacheoptfield method to work in storagev2. Also, change the sparse related interface for knowhere version bump #43974 . Also, includes https://github.com/milvus-io/milvus/pull/44046 for metric lost. --------- Signed-off-by: chasingegg <chao.gao@zilliz.com> Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com> Signed-off-by: Congqi Xia <congqi.xia@zilliz.com> Co-authored-by: marcelo.chen <marcelo.chen@zilliz.com> Co-authored-by: Congqi Xia <congqi.xia@zilliz.com>	2025-08-26 11:19:58 +08:00
zhagnlu	8934c18792	enhance: support cache result cache for expr (#43923 ) issue: #43878 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-08-26 10:55:52 +08:00
sparknack	4fae074d56	enhance: add write rate limit for disk file writer (#43912 ) issue: #43040 --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-08-25 10:27:47 +08:00
Spade A	d6a428e880	feat: impl StructArray -- support create index for vector array (embedding list) and search on it (#43726 ) Ref https://github.com/milvus-io/milvus/issues/42148 This PR supports create index for vector array (now, only for `DataType.FLOAT_VECTOR`) and search on it. The index type supported in this PR is `EMB_LIST_HNSW` and the metric type is `MAX_SIM` only. The way to use it: ```python milvus_client = MilvusClient("xxx:19530") schema = milvus_client.create_schema(enable_dynamic_field=True, auto_id=True) ... struct_schema = milvus_client.create_struct_array_field_schema("struct_array_field") ... struct_schema.add_field("struct_float_vec", DataType.ARRAY_OF_VECTOR, element_type=DataType.FLOAT_VECTOR, dim=128, max_capacity=1000) ... schema.add_struct_array_field(struct_schema) index_params = milvus_client.prepare_index_params() index_params.add_index(field_name="struct_float_vec", index_type="EMB_LIST_HNSW", metric_type="MAX_SIM", index_params={"nlist": 128}) ... milvus_client.create_index(COLLECTION_NAME, schema=schema, index_params=index_params) ``` Note: This PR uses `Lims` to convey offsets of the vector array to knowhere where vectors of multiple vector arrays are concatenated and we need offsets to specify which vectors belong to which vector array. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-08-20 10:27:46 +08:00
foxspy	647c2bca2d	enhance: Support streaming read and write of vector index files (#43824 ) issue: #42032 Signed-off-by: xianliang.li <xianliang.li@zilliz.com>	2025-08-15 23:41:43 +08:00
Gao	81a0915c29	enhance: add milvus-common module to decouple knwhere & segcore (#43624 ) issue: https://github.com/milvus-io/milvus/issues/42032 https://github.com/milvus-io/milvus/issues/41435 based on pr: https://github.com/milvus-io/milvus/pull/42124 --------- Signed-off-by: chasingegg <chao.gao@zilliz.com> Co-authored-by: xianliang.li <xianliang.li@zilliz.com>	2025-08-11 14:09:42 +08:00
Bingyi Sun	b59bc5e2c0	fix: make json path index non exists offsets compatible with 2.5 (#43691 ) issue: https://github.com/milvus-io/milvus/issues/43666 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-08-01 23:22:23 +08:00
Xianhui Lin	0f0edff7f0	fix: increment offset for null data rows in JsonKeyStats (#43679 ) fix: increment offset for null data rows in JsonKeyStatsInvertedIndex issue: https://github.com/milvus-io/milvus/issues/43151 Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-08-01 15:53:37 +08:00
congqixia	f29964bd17	fix: Add padding for sorted index preventing 0 length mmap (#43663 ) Related to #43655 This patch add a padding when writing mmap file for ScalarSortedIndex in case of mmap falure due to 0 mmap length. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-31 18:53:36 +08:00
Bingyi Sun	a765cd1eaa	enhance: unlink mmap file when chunk and index are destructed (#43524 ) issue: https://github.com/milvus-io/milvus/issues/41636 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-07-29 16:05:36 +08:00
Spade A	864d1b93b1	enhance: enable stlsort with mmap support (#43359 ) issue: https://github.com/milvus-io/milvus/issues/43358 --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-07-28 15:32:55 +08:00
Bingyi Sun	742d72a6c2	fix: Fix wrong null offsets for json path index (#43390 ) issue: https://github.com/milvus-io/milvus/issues/43315 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-07-26 17:26:54 +08:00
Bingyi Sun	a89e579485	fix: use tantivy version to make json index compatible with milvus 2.5 (#43563 ) issue: https://github.com/milvus-io/milvus/issues/43562 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-07-26 17:18:55 +08:00
Spade A	10fe53ff59	feat: support json for ngram (#43170 ) Ref https://github.com/milvus-io/milvus/issues/42053 This PR enable ngram to support json data type. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-07-25 10:28:54 +08:00
Buqian Zheng	d367770649	enhance: greatly reduce the loading memory overhead - by up to 25% (#43533 ) issue: #43088 issue: #43038 The current loading process: * When loading an index, we first download the index files into a list of buffers, say A * then constructing(copying) them into a vector of FieldDatas(each file is a FieldData), say B * assembles them together as a huge BinarySet, say C * lastly, copy into the actual index data structure, say D The problem: * We can see that, after each step, we don't need the data in previous step. * But currently, we release the memory of A, B, C only after we have finished constructing D * This leads to a up to 4x peak memory usage comparing with the raw index size, during the loading process * This PR allows timely releasing of B after we assembled C. So after this PR, the peak memory usage during loading will be up to 3x of the raw index size. I will create another PR to release A after we created B, that seems more complicated and need more work. Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-07-24 11:26:54 +08:00
Buqian Zheng	389104d200	enhance: rename PanicInfo to ThrowInfo (#43384 ) issue: #41435 this is to prevent AI from thinking of our exception throwing as a dangerous PANIC operation that terminates the program. Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-07-19 20:22:52 +08:00
Spade A	8612a2c946	enhance: optimize in by batch-in (#43268 ) fix: https://github.com/milvus-io/milvus/issues/43267 --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-07-17 19:40:52 +08:00
sparknack	9b4081e110	enhance: cachinglayer: some performance optimization (#42858 ) issue: #41435 We compared the performance using the modified test_sealed.cpp, which randomly accesses all rows in all chunks and counts the number of runs within 3s. ## performance data comparison (ops/second) chunk config: 1x1000 \| Field Type \| w/o cachinglayer (commit 640f526301) \| w/ cachinglayer \| w/ cachinglayer + opt \| \|---\|---\|---\|---\| \| Bool field \| 82428 \| -63.6% (29983) \| +2.7% (84675) \| \| Int8 field \| 82228 \| -63.3% (30166) \| +2.4% (84163) \| \| Int16 field \| 82572 \| -63.8% (29867) \| +1.8% (84036) \| \| Int32 field \| 82797 \| -63.7% (30031) \| +1.5% (84043) \| \| Int64 field \| 81077 \| -62.9% (30107) \| +0.6% (81604) \| \| Float field \| 82678 \| -63.4% (30266) \| +1.8% (84146) \| \| Double field \| 81925 \| -63.4% (29974) \| +0.2% (82097) \| \| Varchar field \| 19933 \| -19.6% (16027) \| +18.9% (23690) \| \| JSON field \| 16519 \| -96.8% (533) \| +2.5% (16927) \| \| Int array field \| 7325 \| -13.7% (6321) \| -1.4% (7220) \| \| Long array field \| 6347 \| -8.9% (5781) \| -0.1% (6344) \| \| Bool array field \| 8275 \| -14.0% (7116) \| +0.4% (8311) \| \| String array field \| 2281 \| -5.0% (2168) \| +0.2% (2287) \| \| Double array field \| 6427 \| -13.3% (5574) \| -2.0% (6301) \| \| Float array field \| 7291 \| -13.0% (6346) \| -1.5% (7183) \| \| Vector field \| 27487 \| -40.4% (16371) \| -4.7% (26192) \| \| Float16 vector field \| 49773 \| -54.6% (22601) \| -5.9% (46834) \| \| BFloat16 vector field \| 49783 \| -53.1% (23350) \| -5.7% (46934) \| \| Int8 vector field \| 63871 \| -59.0% (26179) \| -6.2% (59926) \| --- chunk config: 10x1000 \| Field Type \| w/o cachinglayer (commit 640f526301) \| w/ cachinglayer \| w/ cachinglayer + opt \| \|---\|---\|---\|---\| \| Bool field \| 3659 \| -48.6% (1879) \| +110.1% (7686) \| \| Int8 field \| 3410 \| -45.3% (1864) \| +123.9% (7636) \| \| Int16 field \| 3647 \| -48.6% (1874) \| +110.1% (7661) \| \| Int32 field \| 3647 \| -48.8% (1866) \| +109.6% (7645) \| \| Int64 field \| 3645 \| -48.9% (1863) \| +107.8% (7573) \| \| Float field \| 3647 \| -49.0% (1861) \| +109.5% (7639) \| \| Double field \| 3640 \| -45.1% (1998) \| +108.4% (7586) \| \| Varchar field \| 1594 \| -23.9% (1213) \| +20.6% (1922) \| \| JSON field \| 1202 \| -26.5% (884) \| +16.1% (1396) \| \| Int array field \| 602 \| -12.3% (528) \| +12.7% (678) \| \| Long array field \| 529 \| -12.2% (465) \| +7.5% (569) \| \| Double array field \| 537 \| -13.0% (467) \| +6.4% (571) \| \| Vector field \| 1520 \| -37.9% (943) \| -5.5% (1437) \| \| Float16 vector field \| 2607 \| -47.0% (1382) \| +6.4% (2774) \| \| BFloat16 vector field \| 2586 \| -46.5% (1383) \| +8.8% (2813) \| \| Int8 vector field \| 3101 \| -47.3% (1633) \| +41.9% (4400) \| --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-07-17 11:20:51 +08:00
Spade A	d750816ba0	fix: remove std::string support for stlsort index (#43355 ) fix: https://github.com/milvus-io/milvus/issues/43354 The current implementation of stdsort index is not supported for std::string. Remove the code. Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-07-16 17:46:51 +08:00
Bingyi Sun	1b8c958cff	enhance: fix tantivy wrapper is freed after json flat executor is destructed (#43233 ) Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-07-16 10:58:50 +08:00
Spade A	db91d85dbc	feat: more types of matches for ngram (#43081 ) Ref https://github.com/milvus-io/milvus/issues/42053 This PR enable ngram to support more kinds of matches such as prefix and postfix match. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-07-14 20:34:50 +08:00
Spade A	e14a52721e	enhance: use stl sort with high cardinality for data_type int (#43305 ) fix: https://github.com/milvus-io/milvus/issues/43304 Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-07-14 18:40:50 +08:00

1 2 3 4 5 ...

404 Commits