milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-07 19:31:51 +08:00

Author	SHA1	Message	Date
Spade A	8e1ce15146	fix: ngram index is mistakenly used for unsopported operations (#43955 ) issue: https://github.com/milvus-io/milvus/issues/43917 1. fix ngrma index to be mistakenly used for unsopported operation 2. fix potential uaf problem --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-08-21 14:41:46 +08:00
zhagnlu	d904c4e677	enhance: optimize compare expr performance for pk field (#43154 ) #43153 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-08-21 10:59:46 +08:00
Spade A	864d1b93b1	enhance: enable stlsort with mmap support (#43359 ) issue: https://github.com/milvus-io/milvus/issues/43358 --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-07-28 15:32:55 +08:00
Spade A	10fe53ff59	feat: support json for ngram (#43170 ) Ref https://github.com/milvus-io/milvus/issues/42053 This PR enable ngram to support json data type. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-07-25 10:28:54 +08:00
Xianhui Lin	c13393418c	fix: invalid string error when enabled json stats (#43380 ) fix: invalid string error when enabled json stats issue: https://github.com/milvus-io/milvus/issues/43151 Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-07-20 23:38:53 +08:00
Buqian Zheng	389104d200	enhance: rename PanicInfo to ThrowInfo (#43384 ) issue: #41435 this is to prevent AI from thinking of our exception throwing as a dangerous PANIC operation that terminates the program. Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-07-19 20:22:52 +08:00
zhagnlu	ee43954534	fix:fix text_match bug because of not adapting to multi-chunk model (#43303 ) https://github.com/milvus-io/milvus/issues/43296 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-07-17 10:32:51 +08:00
Spade A	db91d85dbc	feat: more types of matches for ngram (#43081 ) Ref https://github.com/milvus-io/milvus/issues/42053 This PR enable ngram to support more kinds of matches such as prefix and postfix match. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-07-14 20:34:50 +08:00
Spade A	26ec841feb	feat: optimize `Like` query with n-gram (#41803 ) Ref #42053 This is the first PR for optimizing `LIKE` with ngram inverted index. Now, only VARCHAR data type is supported and only InnerMatch LIKE (%xxx%) query is supported. How to use it: ``` milvus_client = MilvusClient("http://localhost:19530") schema = milvus_client.create_schema() ... schema.add_field("content_ngram", DataType.VARCHAR, max_length=10000) ... index_params = milvus_client.prepare_index_params() index_params.add_index(field_name="content_ngram", index_type="NGRAM", index_name="ngram_index", min_gram=2, max_gram=3) milvus_client.create_collection(COLLECTION_NAME, ...) ``` min_gram and max_gram controls how we tokenize the documents. For example, for min_gram=2 and max_gram=4, we will tokenize each document with 2-gram, 3-gram and 4-gram. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-07-01 10:08:44 +08:00
Spade A	911a8df17c	feat: impl StructArray -- data storage support in segcore (#42406 ) Ref https://github.com/milvus-io/milvus/issues/42148 This PR mainly enables segcore to support array of vector (read and write, but not indexing). Now only float vector as the element type is supported. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-06-12 14:38:35 +08:00
Bingyi Sun	6c16d3dbee	enhance: Add bulk api for json data (#42407 ) issue: https://github.com/milvus-io/milvus/issues/42409 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-06-12 10:40:39 +08:00
Bingyi Sun	fbf5cb4e62	feat: Add json flat index (#39917 ) issue: https://github.com/milvus-io/milvus/issues/35528 This PR introduces a JSON flat index that allows indexing JSON fields and dynamic fields in the same way as other field types. In a previous PR (#36750), we implemented a JSON index that requires specifying a JSON path and casting a type. The only distinction lies in the json_cast_type parameter. When json_cast_type is set to JSON type, Milvus automatically creates a JSON flat index. For details on how Tantivy interprets JSON data, refer to the [tantivy documentation](https://github.com/quickwit-oss/tantivy/blob/main/doc/src/json.md#pitfalls-limitation-and-corner-cases). Limitations Array handling: Arrays do not function as nested objects. See the [limitations section](https://github.com/quickwit-oss/tantivy/blob/main/doc/src/json.md#arrays-do-not-work-like-nested-object) for more details. --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-06-10 19:14:35 +08:00
Xianhui Lin	26cbc74478	fix: support infix and suffix match types in JsonStats (#41720 ) fix: support infix and suffix match types in JsonStats issue:https://github.com/milvus-io/milvus/issues/41386 Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-05-09 10:42:53 +08:00
zhagnlu	e3c81ba1cc	enhance: use scan mode for like although inverted index exists (#41325 ) #41065 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-05-09 10:36:54 +08:00
zhagnlu	39e7ad33d7	enhance: add optimize for like expr (#41066 ) #41065 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-05-08 14:28:52 +08:00
foxspy	e2ddbe4962	feat: add cachinglayer to index (#41653 ) issue: #41435 Signed-off-by: xianliang.li <xianliang.li@zilliz.com>	2025-05-08 10:12:54 +08:00
Buqian Zheng	3de904c7ea	feat: add cachinglayer to sealed segment (#41436 ) issue: https://github.com/milvus-io/milvus/issues/41435 --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-04-28 10:52:40 +08:00
Xianhui Lin	3d4889586d	fix: JsonStats filter by conjunctExpr and improve the task slot calculation logic (#41459 ) Optimized JSON filter execution by introducing ProcessJsonStatsChunkPos() for unified position calculation and GetNextBatchSize() for better batch processing. Improved JSON key generation by replacing manual path joining with milvus::Json::pointer() and adjusted slot size calculation for JSON key index jobs. Updated the task slot calculation logic in calculateStatsTaskSlot() to handle the increased resource needs of JSON key index jobs. issue: https://github.com/milvus-io/milvus/issues/41378 https://github.com/milvus-io/milvus/issues/41218 --------- Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-04-23 16:30:37 +08:00
Bingyi Sun	bf617115ca	enhance: Remove single chunk segment related codes (#39249 ) https://github.com/milvus-io/milvus/issues/39112 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-04-11 18:56:29 +08:00
Spade A	9ce3e3cb44	enhance: add documents in batch for json key stats (#41228 ) issue: https://github.com/milvus-io/milvus/issues/40897 After this, the document add operations scheduling duration is decreased roughly from 6s to 0.9s for the case in the issue. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-11 14:08:26 +08:00
Xianhui Lin	3bc24c264f	enhance: Add json key inverted index in stats for optimization (#38039 ) Add json key inverted index in stats for optimization https://github.com/milvus-io/milvus/issues/36995 --------- Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-04-10 15:20:28 +08:00
zhagnlu	6c55db44f1	enhance: reorder sub expr for conjunct expr (#39872 ) two point: (1) reoder conjucts expr's subexpr, postpone heavy operations sequence: int(column) -> index(column) -> string(column) -> light conjuct ...... -> json(column) -> heavy conjuct -> two_column_compare (2) support pre filter for expr execute, skip scan raw data that had been skipped because of preceding expr result. #39869 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-03-19 14:50:14 +08:00
Bingyi Sun	d3adab15ac	fix: Build double index for all json numeric field (#40619 ) issue: #35528 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-14 16:52:11 +08:00
Bingyi Sun	db4769281c	fix: Fall back to a brute-force search if json index type unmatched (#40076 ) issue: https://github.com/milvus-io/milvus/issues/35528 If the query data type does not match the index type, fall back to a brute-force search --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-02-24 16:25:57 +08:00
congqixia	7ccde3300e	fix: Use `text_log` prefix for TextMatchIndex null offset file (#39935 ) Related to #39933 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-02-17 20:17:25 +08:00
zhagnlu	8a9f02ef71	enhance: optimize expr performace for some points (#39695 ) 1. skip get expr arguments which deserialize proto for every batch execute. 2. replace unordered_set with sort array that has better performace for small set. #39688 Co-authored-by: luzhang <luzhang@zilliz.com>	2025-02-16 20:32:14 +08:00
Bingyi Sun	b59555057d	feat: support json index (#36750 ) https://github.com/milvus-io/milvus/issues/35528 This PR adds json index support for json and dynamic fields. Now you can only do unary query like 'a["b"] > 1' using this index. We will support more filter type later. basic usage: ``` collection.create_index("json_field", {"index_type": "INVERTED", "params": {"json_cast_type": DataType.STRING, "json_path": 'json_field["a"]["b"]'}}) ``` There are some limits to use this index: 1. If a record does not have the json path you specify, it will be ignored and there will not be an error. 2. If a value of the json path fails to be cast to the type you specify, it will be ignored and there will not be an error. 3. A specific json path can have only one json index. 4. If you try to create more than one json indexes for one json field, sdk(pymilvus<=2.4.7) may return immediately because of internal implementation. This will be fixed in a later version. --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-02-15 14:06:15 +08:00
Spade A	032292a432	feat: support phrase match query (#38869 ) The relevant issue: https://github.com/milvus-io/milvus/issues/38930 --------- Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-01-12 20:24:58 +08:00
smellthemoon	907fc24f85	enhance: support null expr (#38772 ) #31728 --------- Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2025-01-02 14:16:54 +08:00
Ted Xu	4919ccf543	enhance: eliminate compile warnings (#38420 ) See: #38435 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-12-16 09:58:43 +08:00
Gao	994fc544e7	enhance: support iterative filter execution (#37363 ) issue: #37360 --------- Signed-off-by: chasingegg <chao.gao@zilliz.com>	2024-12-11 11:32:44 +08:00
Xianhui Lin	6d0a4fdb31	fix: Fix bug for Search fails with filter expression contains underscore (#38085 ) Enhance the matching for elements within the UnaryRangeArray https://github.com/milvus-io/milvus/issues/38068 --------- Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2024-12-05 10:18:40 +08:00
smellthemoon	b8492498ac	fix: mask with valid data when preCheckOverflow (#37221 ) #37175 --------- Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-10-31 10:44:26 +08:00
smellthemoon	eb3e4583ec	enhance: all op(Null) is false in expr (#35527 ) #31728 --------- Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-10-17 21:14:30 +08:00
Bingyi Sun	a75bb85f3a	feat: support chunked column for sealed segment (#35764 ) This PR splits sealed segment to chunked data to avoid unnecessary memory copy and save memory usage when loading segments so that loading can be accelerated. To support rollback to previous version, we add an option `multipleChunkedEnable` which is false by default. Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-10-12 15:04:52 +08:00
Jiquan Long	89bf226f0b	feat: support keyword text match (#35923 ) fix: #35922 --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2024-09-10 15:11:08 +08:00
congqixia	de8a266d8a	enhance: Enable linux code checker (#35084 ) See also #34483 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-07-30 15:53:51 +08:00
zhagnlu	f77f5364b2	fix: disable use_index when some array expr (#34894 ) #34797 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-07-29 00:17:46 +08:00
zhagnlu	804ec24c02	fix:fix retrieve raw data from bitmap array index (#34848 ) #34795 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-07-27 01:53:47 +08:00
zhagnlu	03a3f50892	enhance: add skip using array index when some situation (#33947 ) #32900 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-06-23 21:26:02 +08:00
cqy123456	32f685ff12	enhance: growing segment support mmap (#32633 ) issue: https://github.com/milvus-io/milvus/issues/32984 Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>	2024-06-18 14:42:00 +08:00
Jiquan Long	ecf2bcee42	enhance: speed up array-equal operator via inverted index (#33633 ) fix: #33632 --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2024-06-11 14:13:54 +08:00
Jiquan Long	ccce1e928a	fix: regex query can't handle text with newline (#32569 ) issue: https://github.com/milvus-io/milvus/issues/32482 --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2024-04-26 12:01:26 +08:00
Alexander Guzhva	c4b37fb285	enhance: Custom bitset and bitsetview prototypes (#30454 ) Issue: #31285 Basically, I've replaced `FixedVector<bool>` and `boost::dynamic_bitset` with custom bitset and bitsetview in order to reduce the memory bandwidth & increase performance for the filtering. This PR is for internal use only. Current progress (numbers are for GCC 9.5.0 on Ubuntu 22.04 LTS; clang-17 produces better performance numbers): Baseline: ``` [ RUN ] CApiTest.AssembeChunkPerfTest start test cost: 17903us [ OK ] CApiTest.AssembeChunkPerfTest (183 ms) [ RUN ] Expr.TestMultiLogicalExprsOptimization cost: 1391us cost: 5us cost: 4us cost: 4us cost: 6us cost: 4us cost: 4us cost: 4us cost: 4us cost: 4us 143 cost: 10us cost: 8us cost: 10us cost: 8us cost: 8us cost: 8us cost: 8us cost: 8us cost: 8us cost: 9us 8 /home/ubuntu/zilliz/milvus4/milvus/internal/core/unittest/test_expr.cpp:1561: Failure Expected: (cost_op) < (cost_no_op), actual: 143 vs 8 [ FAILED ] Expr.TestMultiLogicalExprsOptimization (7 ms) [ RUN ] Expr.TestExprs start test 3cost: 889us start test 10cost: 2us start test 20cost: 2us start test 30cost: 2us start test 50cost: 3us start test 100cost: 7us start test 200cost: 16us [ OK ] Expr.TestExprs (9 ms) [ RUN ] Expr.TestUnaryBenchTest start test type:2 cost: 124.8us start test type:3 cost: 163.1us start test type:4 cost: 275.9us start test type:5 cost: 590.9us start test type:10 cost: 62.7us start test type:11 cost: 65.9us [ OK ] Expr.TestUnaryBenchTest (1153 ms) [ RUN ] Expr.TestBinaryRangeBenchTest start test type:2 cost: 151.4us start test type:3 cost: 198.4us start test type:4 cost: 361.9us start test type:5 cost: 753.9us start test type:10 cost: 64.6us start test type:11 cost: 62.2us [ OK ] Expr.TestBinaryRangeBenchTest (1151 ms) [ RUN ] Expr.TestLogicalUnaryBenchTest start test type:2 cost: 121.14us start test type:3 cost: 156.84us start test type:4 cost: 249.76us start test type:5 cost: 534.44us start test type:10 cost: 82.2us start test type:11 cost: 83.52us [ OK ] Expr.TestLogicalUnaryBenchTest (1202 ms) [ RUN ] Expr.TestBinaryLogicalBenchTest start test type:2 cost: 80.64us start test type:3 cost: 78.22us start test type:4 cost: 255.76us start test type:5 cost: 532.04us start test type:10 cost: 89.26us start test type:11 cost: 90us [ OK ] Expr.TestBinaryLogicalBenchTest (1198 ms) [ RUN ] Expr.TestBinaryArithOpEvalRangeBenchExpr start test type:2 cost: 401.7us start test type:3 cost: 420.96us start test type:4 cost: 418.04us start test type:5 cost: 470.54us start test type:10 cost: 250.32us start test type:11 cost: 850.08us [ OK ] Expr.TestBinaryArithOpEvalRangeBenchExpr (1273 ms) [ RUN ] Expr.TestCompareExprBenchTest start test type:2 cost: 162us start test type:3 cost: 142us start test type:4 cost: 374us start test type:5 cost: 674us start test type:10 cost: 366us start test type:11 cost: 645us [ OK ] Expr.TestCompareExprBenchTest (1214 ms) [ RUN ] Expr.TestRefactorExprs start test 3cost: 1253us start test 10cost: 1060us start test 20cost: 681us start test 30cost: 522us start test 50cost: 511us start test 100cost: 506us start test 200cost: 497us [ OK ] Expr.TestRefactorExprs (1142 ms) ``` Candidate: ``` [ RUN ] CApiTest.AssembeChunkPerfTest start test cost: 6099us [ OK ] CApiTest.AssembeChunkPerfTest (153 ms) [ RUN ] Expr.TestMultiLogicalExprsOptimization cost: 42us cost: 15us cost: 15us cost: 14us cost: 15us cost: 15us cost: 15us cost: 15us cost: 15us cost: 15us 17 cost: 41us cost: 39us cost: 33us cost: 33us cost: 33us cost: 33us cost: 34us cost: 41us cost: 34us cost: 34us 35 [ OK ] Expr.TestMultiLogicalExprsOptimization (6 ms) [ RUN ] Expr.TestExprs start test 3cost: 20us start test 10cost: 2us start test 20cost: 2us start test 30cost: 2us start test 50cost: 4us start test 100cost: 8us start test 200cost: 15us [ OK ] Expr.TestExprs (8 ms) [ RUN ] Expr.TestUnaryBenchTest start test type:2 cost: 55.7us start test type:3 cost: 79.8us start test type:4 cost: 177.6us start test type:5 cost: 337.2us start test type:10 cost: 16.9us start test type:11 cost: 15.7us [ OK ] Expr.TestUnaryBenchTest (1140 ms) [ RUN ] Expr.TestBinaryRangeBenchTest start test type:2 cost: 57.1us start test type:3 cost: 87us start test type:4 cost: 177.5us start test type:5 cost: 342.7us start test type:10 cost: 17.9us start test type:11 cost: 16.7us [ OK ] Expr.TestBinaryRangeBenchTest (1152 ms) [ RUN ] Expr.TestLogicalUnaryBenchTest start test type:2 cost: 34.58us start test type:3 cost: 68.86us start test type:4 cost: 151.38us start test type:5 cost: 286.8us start test type:10 cost: 16.54us start test type:11 cost: 16.7us [ OK ] Expr.TestLogicalUnaryBenchTest (1165 ms) [ RUN ] Expr.TestBinaryLogicalBenchTest start test type:2 cost: 20us start test type:3 cost: 17.1us start test type:4 cost: 154.12us start test type:5 cost: 286.1us start test type:10 cost: 19.6us start test type:11 cost: 19.24us [ OK ] Expr.TestBinaryLogicalBenchTest (1188 ms) [ RUN ] Expr.TestBinaryArithOpEvalRangeBenchExpr start test type:2 cost: 125.7us start test type:3 cost: 111.34us start test type:4 cost: 148.02us start test type:5 cost: 306.7us start test type:10 cost: 149.3us start test type:11 cost: 282.94us [ OK ] Expr.TestBinaryArithOpEvalRangeBenchExpr (1221 ms) [ RUN ] Expr.TestCompareExprBenchTest start test type:2 cost: 89us start test type:3 cost: 79us start test type:4 cost: 323us start test type:5 cost: 629us start test type:10 cost: 313us start test type:11 cost: 591us [ OK ] Expr.TestCompareExprBenchTest (1228 ms) [ RUN ] Expr.TestRefactorExprs start test 3cost: 874us start test 10cost: 611us start test 20cost: 290us start test 30cost: 294us start test 50cost: 272us start test 100cost: 278us start test 200cost: 279us [ OK ] Expr.TestRefactorExprs (1149 ms) ``` Signed-off-by: Alexandr Guzhva <alexanderguzhva@gmail.com>	2024-03-24 21:49:07 +08:00
Jiquan Long	e2f35954d4	enhance: support pattern matching on json field (#30779 ) issue: https://github.com/milvus-io/milvus/issues/30714 --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2024-02-28 18:31:00 +08:00
Jiquan Long	e549148a19	enhance: full-support for wildcard pattern matching (#30288 ) issue: #29988 This pr adds full-support for wildcard pattern matching from end to end. Before this pr, the users can only use prefix match in their expression, for example, "like 'prefix%'". With this pr, more flexible syntax can be combined. To do so, this pr makes these changes: - 1. support regex query both on index and raw data; - 2. translate the pattern matching to regex query, so that it can be handled by the regex query logic; - 3. loose the limit of the expression parsing, which allows general pattern matching syntax; With the support of regex query in segcore backend, we can also add mysql-like `REGEXP` syntax later easily. --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2024-02-01 12:37:04 +08:00
zhagnlu	79c417b14e	fix: pass active count to query context instead of timestamp (#29541 ) #29319 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2023-12-31 16:08:48 +08:00
yah01	8f89e9cf75	enhance: remove all unnecessary string formatting (#29323 ) done by two regex expressions: - `PanicInfo\((.+),[. \n]+fmt::format\(([.\s\S]+?)\)\)` - `AssertInfo\((.+),[. \n]+fmt::format\(([.\s\S]+?)\)\)` related: #28811 --------- Signed-off-by: yah01 <yang.cen@zilliz.com>	2023-12-20 10:04:43 +08:00
zhagnlu	a602171d06	enhance: Refactor runtime and expr framework (#28166 ) #28165 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2023-12-18 12:04:42 +08:00

49 Commits