milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-07 19:31:51 +08:00

Author	SHA1	Message	Date
foxspy	647c2bca2d	enhance: Support streaming read and write of vector index files (#43824 ) issue: #42032 Signed-off-by: xianliang.li <xianliang.li@zilliz.com>	2025-08-15 23:41:43 +08:00
Gao	81a0915c29	enhance: add milvus-common module to decouple knwhere & segcore (#43624 ) issue: https://github.com/milvus-io/milvus/issues/42032 https://github.com/milvus-io/milvus/issues/41435 based on pr: https://github.com/milvus-io/milvus/pull/42124 --------- Signed-off-by: chasingegg <chao.gao@zilliz.com> Co-authored-by: xianliang.li <xianliang.li@zilliz.com>	2025-08-11 14:09:42 +08:00
Bingyi Sun	b59bc5e2c0	fix: make json path index non exists offsets compatible with 2.5 (#43691 ) issue: https://github.com/milvus-io/milvus/issues/43666 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-08-01 23:22:23 +08:00
Xianhui Lin	0f0edff7f0	fix: increment offset for null data rows in JsonKeyStats (#43679 ) fix: increment offset for null data rows in JsonKeyStatsInvertedIndex issue: https://github.com/milvus-io/milvus/issues/43151 Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-08-01 15:53:37 +08:00
congqixia	f29964bd17	fix: Add padding for sorted index preventing 0 length mmap (#43663 ) Related to #43655 This patch add a padding when writing mmap file for ScalarSortedIndex in case of mmap falure due to 0 mmap length. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-31 18:53:36 +08:00
Bingyi Sun	a765cd1eaa	enhance: unlink mmap file when chunk and index are destructed (#43524 ) issue: https://github.com/milvus-io/milvus/issues/41636 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-07-29 16:05:36 +08:00
Spade A	864d1b93b1	enhance: enable stlsort with mmap support (#43359 ) issue: https://github.com/milvus-io/milvus/issues/43358 --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-07-28 15:32:55 +08:00
Bingyi Sun	742d72a6c2	fix: Fix wrong null offsets for json path index (#43390 ) issue: https://github.com/milvus-io/milvus/issues/43315 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-07-26 17:26:54 +08:00
Bingyi Sun	a89e579485	fix: use tantivy version to make json index compatible with milvus 2.5 (#43563 ) issue: https://github.com/milvus-io/milvus/issues/43562 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-07-26 17:18:55 +08:00
Spade A	10fe53ff59	feat: support json for ngram (#43170 ) Ref https://github.com/milvus-io/milvus/issues/42053 This PR enable ngram to support json data type. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-07-25 10:28:54 +08:00
Buqian Zheng	d367770649	enhance: greatly reduce the loading memory overhead - by up to 25% (#43533 ) issue: #43088 issue: #43038 The current loading process: * When loading an index, we first download the index files into a list of buffers, say A * then constructing(copying) them into a vector of FieldDatas(each file is a FieldData), say B * assembles them together as a huge BinarySet, say C * lastly, copy into the actual index data structure, say D The problem: * We can see that, after each step, we don't need the data in previous step. * But currently, we release the memory of A, B, C only after we have finished constructing D * This leads to a up to 4x peak memory usage comparing with the raw index size, during the loading process * This PR allows timely releasing of B after we assembled C. So after this PR, the peak memory usage during loading will be up to 3x of the raw index size. I will create another PR to release A after we created B, that seems more complicated and need more work. Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-07-24 11:26:54 +08:00
Buqian Zheng	389104d200	enhance: rename PanicInfo to ThrowInfo (#43384 ) issue: #41435 this is to prevent AI from thinking of our exception throwing as a dangerous PANIC operation that terminates the program. Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-07-19 20:22:52 +08:00
Spade A	8612a2c946	enhance: optimize in by batch-in (#43268 ) fix: https://github.com/milvus-io/milvus/issues/43267 --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-07-17 19:40:52 +08:00
sparknack	9b4081e110	enhance: cachinglayer: some performance optimization (#42858 ) issue: #41435 We compared the performance using the modified test_sealed.cpp, which randomly accesses all rows in all chunks and counts the number of runs within 3s. ## performance data comparison (ops/second) chunk config: 1x1000 \| Field Type \| w/o cachinglayer (commit 640f526301) \| w/ cachinglayer \| w/ cachinglayer + opt \| \|---\|---\|---\|---\| \| Bool field \| 82428 \| -63.6% (29983) \| +2.7% (84675) \| \| Int8 field \| 82228 \| -63.3% (30166) \| +2.4% (84163) \| \| Int16 field \| 82572 \| -63.8% (29867) \| +1.8% (84036) \| \| Int32 field \| 82797 \| -63.7% (30031) \| +1.5% (84043) \| \| Int64 field \| 81077 \| -62.9% (30107) \| +0.6% (81604) \| \| Float field \| 82678 \| -63.4% (30266) \| +1.8% (84146) \| \| Double field \| 81925 \| -63.4% (29974) \| +0.2% (82097) \| \| Varchar field \| 19933 \| -19.6% (16027) \| +18.9% (23690) \| \| JSON field \| 16519 \| -96.8% (533) \| +2.5% (16927) \| \| Int array field \| 7325 \| -13.7% (6321) \| -1.4% (7220) \| \| Long array field \| 6347 \| -8.9% (5781) \| -0.1% (6344) \| \| Bool array field \| 8275 \| -14.0% (7116) \| +0.4% (8311) \| \| String array field \| 2281 \| -5.0% (2168) \| +0.2% (2287) \| \| Double array field \| 6427 \| -13.3% (5574) \| -2.0% (6301) \| \| Float array field \| 7291 \| -13.0% (6346) \| -1.5% (7183) \| \| Vector field \| 27487 \| -40.4% (16371) \| -4.7% (26192) \| \| Float16 vector field \| 49773 \| -54.6% (22601) \| -5.9% (46834) \| \| BFloat16 vector field \| 49783 \| -53.1% (23350) \| -5.7% (46934) \| \| Int8 vector field \| 63871 \| -59.0% (26179) \| -6.2% (59926) \| --- chunk config: 10x1000 \| Field Type \| w/o cachinglayer (commit 640f526301) \| w/ cachinglayer \| w/ cachinglayer + opt \| \|---\|---\|---\|---\| \| Bool field \| 3659 \| -48.6% (1879) \| +110.1% (7686) \| \| Int8 field \| 3410 \| -45.3% (1864) \| +123.9% (7636) \| \| Int16 field \| 3647 \| -48.6% (1874) \| +110.1% (7661) \| \| Int32 field \| 3647 \| -48.8% (1866) \| +109.6% (7645) \| \| Int64 field \| 3645 \| -48.9% (1863) \| +107.8% (7573) \| \| Float field \| 3647 \| -49.0% (1861) \| +109.5% (7639) \| \| Double field \| 3640 \| -45.1% (1998) \| +108.4% (7586) \| \| Varchar field \| 1594 \| -23.9% (1213) \| +20.6% (1922) \| \| JSON field \| 1202 \| -26.5% (884) \| +16.1% (1396) \| \| Int array field \| 602 \| -12.3% (528) \| +12.7% (678) \| \| Long array field \| 529 \| -12.2% (465) \| +7.5% (569) \| \| Double array field \| 537 \| -13.0% (467) \| +6.4% (571) \| \| Vector field \| 1520 \| -37.9% (943) \| -5.5% (1437) \| \| Float16 vector field \| 2607 \| -47.0% (1382) \| +6.4% (2774) \| \| BFloat16 vector field \| 2586 \| -46.5% (1383) \| +8.8% (2813) \| \| Int8 vector field \| 3101 \| -47.3% (1633) \| +41.9% (4400) \| --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-07-17 11:20:51 +08:00
Spade A	d750816ba0	fix: remove std::string support for stlsort index (#43355 ) fix: https://github.com/milvus-io/milvus/issues/43354 The current implementation of stdsort index is not supported for std::string. Remove the code. Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-07-16 17:46:51 +08:00
Bingyi Sun	1b8c958cff	enhance: fix tantivy wrapper is freed after json flat executor is destructed (#43233 ) Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-07-16 10:58:50 +08:00
Spade A	db91d85dbc	feat: more types of matches for ngram (#43081 ) Ref https://github.com/milvus-io/milvus/issues/42053 This PR enable ngram to support more kinds of matches such as prefix and postfix match. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-07-14 20:34:50 +08:00
Spade A	e14a52721e	enhance: use stl sort with high cardinality for data_type int (#43305 ) fix: https://github.com/milvus-io/milvus/issues/43304 Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-07-14 18:40:50 +08:00
Spade A	fce0bbe2ae	fix: remove redundant locks for null_offset (#43103 ) Ref: https://github.com/milvus-io/milvus/issues/40308 https://github.com/milvus-io/milvus/pull/40363 add lock for protecting concurrent read/write for null offset. But we don't need this for sealed segment. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-07-04 10:10:45 +08:00
sparknack	7e855f1046	enhance: add disk file writer with Direct IO support (#42665 ) issue: #43040 This patch introduces a disk file writer that supports Direct IO. Currently, it is exclusively utilized during the QueryNode load process. Below is its parameters: 1. `common.diskWriteMode` This parameter controls the write mode of the local disk, which is used to write temporary data downloaded from remote storage. Currently, only QueryNode uses 'common.diskWrite*' parameters. Support for other components will be added in the future. The options include 'direct' and 'buffered'. The default value is 'buffered'. 2. `common.diskWriteBufferSizeKb` Disk write buffer size in KB, only used when disk write mode is 'direct', default is 64KB. Current valid range is [4, 65536]. If the value is not aligned to 4KB, it will be rounded up to the nearest multiple of 4KB. 3. `common.diskWriteNumThreads` This parameter controls the number of writer threads used for disk write operations. The valid range is [0, hardware_concurrency]. It is designed to limit the maximum concurrency of disk write operations to reduce the impact on disk read performance. For example, if you want to limit the maximum concurrency of disk write operations to 1, you can set this parameter to 1. The default value is 0, which means the caller will perform write operations directly without using an additional writer thread pool. In this case, the maximum concurrency of disk write operations is determined by the caller's thread pool size. Both parameters can be updated during runtime. --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-07-02 22:18:44 +08:00
Spade A	26ec841feb	feat: optimize `Like` query with n-gram (#41803 ) Ref #42053 This is the first PR for optimizing `LIKE` with ngram inverted index. Now, only VARCHAR data type is supported and only InnerMatch LIKE (%xxx%) query is supported. How to use it: ``` milvus_client = MilvusClient("http://localhost:19530") schema = milvus_client.create_schema() ... schema.add_field("content_ngram", DataType.VARCHAR, max_length=10000) ... index_params = milvus_client.prepare_index_params() index_params.add_index(field_name="content_ngram", index_type="NGRAM", index_name="ngram_index", min_gram=2, max_gram=3) milvus_client.create_collection(COLLECTION_NAME, ...) ``` min_gram and max_gram controls how we tokenize the documents. For example, for min_gram=2 and max_gram=4, we will tokenize each document with 2-gram, 3-gram and 4-gram. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-07-01 10:08:44 +08:00
zhagnlu	69872f45ad	fix: fix is_not_in for trie index (#42716 ) #42604 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-06-25 16:52:42 +08:00
Spade A	50f7579d8f	fix: fix some bugs discovered by chaos tests (#42906 ) fix: https://github.com/milvus-io/milvus/issues/42870 This PR fixes: 1. SetBitset fn shuold consider growing segments with concurrent write 2. avoid using from_raw_parts directly --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-06-24 16:32:42 +08:00
Bingyi Sun	669ea51ce5	enhance: Make json index compatible with caching layer (#42484 ) issue: https://github.com/milvus-io/milvus/issues/42483 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-06-24 15:16:41 +08:00
Spade A	e2c85eec81	fix: load stats index based on mmap config (#42788 ) ref https://github.com/milvus-io/milvus/issues/42626 This PR makes text match index and json key stats index be loaded based on mmap config. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-06-19 10:10:39 +08:00
Spade A	80f1d707f7	fix: tidy up path for scalar index (#42676 ) Ref #42626 This path tidy up path for scalar index including path for loading index from remote storage and temporary path for buliding index. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-06-18 00:42:38 +08:00
Chun Han	001619aef9	feat: supporing load priority for loading (#42413 ) related: #40781 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-06-17 15:22:38 +08:00
Spade A	911a8df17c	feat: impl StructArray -- data storage support in segcore (#42406 ) Ref https://github.com/milvus-io/milvus/issues/42148 This PR mainly enables segcore to support array of vector (read and write, but not indexing). Now only float vector as the element type is supported. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-06-12 14:38:35 +08:00
Buqian Zheng	8511ede5f8	feat: add back queryNode.cache.warmup for compatibility (#42621 ) issue: https://github.com/milvus-io/milvus/issues/41435 also make ChunkTranslator to load in parallel --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-06-12 10:56:40 +08:00
Bingyi Sun	6c16d3dbee	enhance: Add bulk api for json data (#42407 ) issue: https://github.com/milvus-io/milvus/issues/42409 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-06-12 10:40:39 +08:00
Bingyi Sun	fbf5cb4e62	feat: Add json flat index (#39917 ) issue: https://github.com/milvus-io/milvus/issues/35528 This PR introduces a JSON flat index that allows indexing JSON fields and dynamic fields in the same way as other field types. In a previous PR (#36750), we implemented a JSON index that requires specifying a JSON path and casting a type. The only distinction lies in the json_cast_type parameter. When json_cast_type is set to JSON type, Milvus automatically creates a JSON flat index. For details on how Tantivy interprets JSON data, refer to the [tantivy documentation](https://github.com/quickwit-oss/tantivy/blob/main/doc/src/json.md#pitfalls-limitation-and-corner-cases). Limitations Array handling: Arrays do not function as nested objects. See the [limitations section](https://github.com/quickwit-oss/tantivy/blob/main/doc/src/json.md#arrays-do-not-work-like-nested-object) for more details. --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-06-10 19:14:35 +08:00
Bingyi Sun	6404e02d99	fix: Check cast type is array for json contains expr (#42184 ) issue: https://github.com/milvus-io/milvus/issues/42181 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-06-09 17:04:33 +08:00
congqixia	f1188b6781	enhance: [storagev2] Support partition key isolation index (#42574 ) Related to #39173 This patch make storage v2 support partition key isolation index feature Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-06-09 14:02:33 +08:00
Xianhui Lin	7e46fc6618	feat: implement batch commit for JSON Stats (#42494 ) implement batch commit for JSON Stats issue:https://github.com/milvus-io/milvus/issues/41616 Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-06-08 19:58:33 +08:00
Bingyi Sun	cc5ac1c220	enhance: Support cast function for json index (#41949 ) issue: #41948 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-06-05 19:42:32 +08:00
zhagnlu	0c4b12565e	fix: fix is null bug for marisa index (#42420 ) #42255 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-06-05 16:40:32 +08:00
congqixia	cc42d49769	fix: [StorageV2][AddField] Handle lack binlog rows in storage v2 (#42186 ) Related to #39173 #39718 In storage v2, the `lack_bin_rows` cannot be used since field id is not column group id, which will not be matched forever. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-05-31 02:44:30 +08:00
cqy123456	5fe7015f63	enhance: InterimIndex support more index type and data type (#41021 ) issue: https://github.com/milvus-io/milvus/issues/27678 cherry pick from : https://github.com/milvus-io/milvus/pull/39180, https://github.com/milvus-io/milvus/pull/40429 Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>	2025-05-28 08:40:28 +08:00
junjiejiangjjj	0b2ecb7632	fix: Solve clang compilation errors (#42041 ) https://github.com/milvus-io/milvus/issues/42040 Signed-off-by: junjiejiangjjj <junjie.jiang@zilliz.com>	2025-05-27 20:32:29 +08:00
foxspy	3dbad0306a	fix: Add bypass thread pool mode to avoid growing indexes blocking insert/load (#41012 ) issue: #40825 Signed-off-by: xianliang.li <xianliang.li@zilliz.com>	2025-05-20 14:30:24 +08:00
congqixia	f2a8330f87	fix: [StorageV2] Use correct group building index (#41925 ) Related to #39173 #41534 This pr fixes an issue that building mem index may report datatype not match error when collection split fields into multiple groups --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-05-20 13:26:23 +08:00
Buqian Zheng	cae0091071	feat: make SkipIndex lazyload (#41826 ) issue: https://github.com/milvus-io/milvus/issues/41435 this PR also: 1. fixed the skip index for VARCHAR. before this PR, skip index of VARCHAR uses the minmax of the entire column as the minmax of chunk 0, and provides no minmax for other chunks. 2. refactored some skip index loading related code 3. partly fixed a bug in test_expr.cpp --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-05-15 01:30:23 +08:00
zhagnlu	f094d026f8	fix: add params to ignore config type exception (#41776 ) #41707 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-05-13 13:48:56 +08:00
Buqian Zheng	ff5c2770e5	feat: cachinglayer: various improvements (#41546 ) issue: https://github.com/milvus-io/milvus/issues/41435 this PR is based on https://github.com/milvus-io/milvus/pull/41436. Improvements include: - Lazy Load support for Storage v1 - Use Low/High watermark to control eviction - Caching Layer related config changes - Removed ChunkCache related configs and code in golang - Add `PinAllCells` helper method to CacheSlot class - Modified ValueAt, RawAt, PrimitiveRawAt to Bulk version, to reduce caching layer overhead - Removed some unclear templated bulk_subscript methods - CachedSearchIterator to store PinWrapper when searching on ChunkedColumn, and removed unused contrustor. --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-05-10 09:19:16 +08:00
zhagnlu	f674e232b9	fix: GetValueFromConfig return nullopt instead of exception for null value (#41709 ) #41707 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-05-09 11:18:53 +08:00
zhagnlu	e3c81ba1cc	enhance: use scan mode for like although inverted index exists (#41325 ) #41065 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-05-09 10:36:54 +08:00
zhagnlu	39e7ad33d7	enhance: add optimize for like expr (#41066 ) #41065 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-05-08 14:28:52 +08:00
foxspy	e2ddbe4962	feat: add cachinglayer to index (#41653 ) issue: #41435 Signed-off-by: xianliang.li <xianliang.li@zilliz.com>	2025-05-08 10:12:54 +08:00
Bingyi Sun	4c08090687	feat: Add json index support for json contains expr (#41478 ) issue: #35528 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-05-06 11:44:52 +08:00
Buqian Zheng	73bbf4c674	fix: error when lack_binlog_rows = 0 (#41644 ) issue: https://github.com/milvus-io/milvus/issues/41643 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-05-04 00:24:56 +08:00

1 2 3 4 5 ...

372 Commits