milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-07 19:31:51 +08:00

Author	SHA1	Message	Date
Alexander Guzhva	c4b37fb285	enhance: Custom bitset and bitsetview prototypes (#30454 ) Issue: #31285 Basically, I've replaced `FixedVector<bool>` and `boost::dynamic_bitset` with custom bitset and bitsetview in order to reduce the memory bandwidth & increase performance for the filtering. This PR is for internal use only. Current progress (numbers are for GCC 9.5.0 on Ubuntu 22.04 LTS; clang-17 produces better performance numbers): Baseline: ``` [ RUN ] CApiTest.AssembeChunkPerfTest start test cost: 17903us [ OK ] CApiTest.AssembeChunkPerfTest (183 ms) [ RUN ] Expr.TestMultiLogicalExprsOptimization cost: 1391us cost: 5us cost: 4us cost: 4us cost: 6us cost: 4us cost: 4us cost: 4us cost: 4us cost: 4us 143 cost: 10us cost: 8us cost: 10us cost: 8us cost: 8us cost: 8us cost: 8us cost: 8us cost: 8us cost: 9us 8 /home/ubuntu/zilliz/milvus4/milvus/internal/core/unittest/test_expr.cpp:1561: Failure Expected: (cost_op) < (cost_no_op), actual: 143 vs 8 [ FAILED ] Expr.TestMultiLogicalExprsOptimization (7 ms) [ RUN ] Expr.TestExprs start test 3cost: 889us start test 10cost: 2us start test 20cost: 2us start test 30cost: 2us start test 50cost: 3us start test 100cost: 7us start test 200cost: 16us [ OK ] Expr.TestExprs (9 ms) [ RUN ] Expr.TestUnaryBenchTest start test type:2 cost: 124.8us start test type:3 cost: 163.1us start test type:4 cost: 275.9us start test type:5 cost: 590.9us start test type:10 cost: 62.7us start test type:11 cost: 65.9us [ OK ] Expr.TestUnaryBenchTest (1153 ms) [ RUN ] Expr.TestBinaryRangeBenchTest start test type:2 cost: 151.4us start test type:3 cost: 198.4us start test type:4 cost: 361.9us start test type:5 cost: 753.9us start test type:10 cost: 64.6us start test type:11 cost: 62.2us [ OK ] Expr.TestBinaryRangeBenchTest (1151 ms) [ RUN ] Expr.TestLogicalUnaryBenchTest start test type:2 cost: 121.14us start test type:3 cost: 156.84us start test type:4 cost: 249.76us start test type:5 cost: 534.44us start test type:10 cost: 82.2us start test type:11 cost: 83.52us [ OK ] Expr.TestLogicalUnaryBenchTest (1202 ms) [ RUN ] Expr.TestBinaryLogicalBenchTest start test type:2 cost: 80.64us start test type:3 cost: 78.22us start test type:4 cost: 255.76us start test type:5 cost: 532.04us start test type:10 cost: 89.26us start test type:11 cost: 90us [ OK ] Expr.TestBinaryLogicalBenchTest (1198 ms) [ RUN ] Expr.TestBinaryArithOpEvalRangeBenchExpr start test type:2 cost: 401.7us start test type:3 cost: 420.96us start test type:4 cost: 418.04us start test type:5 cost: 470.54us start test type:10 cost: 250.32us start test type:11 cost: 850.08us [ OK ] Expr.TestBinaryArithOpEvalRangeBenchExpr (1273 ms) [ RUN ] Expr.TestCompareExprBenchTest start test type:2 cost: 162us start test type:3 cost: 142us start test type:4 cost: 374us start test type:5 cost: 674us start test type:10 cost: 366us start test type:11 cost: 645us [ OK ] Expr.TestCompareExprBenchTest (1214 ms) [ RUN ] Expr.TestRefactorExprs start test 3cost: 1253us start test 10cost: 1060us start test 20cost: 681us start test 30cost: 522us start test 50cost: 511us start test 100cost: 506us start test 200cost: 497us [ OK ] Expr.TestRefactorExprs (1142 ms) ``` Candidate: ``` [ RUN ] CApiTest.AssembeChunkPerfTest start test cost: 6099us [ OK ] CApiTest.AssembeChunkPerfTest (153 ms) [ RUN ] Expr.TestMultiLogicalExprsOptimization cost: 42us cost: 15us cost: 15us cost: 14us cost: 15us cost: 15us cost: 15us cost: 15us cost: 15us cost: 15us 17 cost: 41us cost: 39us cost: 33us cost: 33us cost: 33us cost: 33us cost: 34us cost: 41us cost: 34us cost: 34us 35 [ OK ] Expr.TestMultiLogicalExprsOptimization (6 ms) [ RUN ] Expr.TestExprs start test 3cost: 20us start test 10cost: 2us start test 20cost: 2us start test 30cost: 2us start test 50cost: 4us start test 100cost: 8us start test 200cost: 15us [ OK ] Expr.TestExprs (8 ms) [ RUN ] Expr.TestUnaryBenchTest start test type:2 cost: 55.7us start test type:3 cost: 79.8us start test type:4 cost: 177.6us start test type:5 cost: 337.2us start test type:10 cost: 16.9us start test type:11 cost: 15.7us [ OK ] Expr.TestUnaryBenchTest (1140 ms) [ RUN ] Expr.TestBinaryRangeBenchTest start test type:2 cost: 57.1us start test type:3 cost: 87us start test type:4 cost: 177.5us start test type:5 cost: 342.7us start test type:10 cost: 17.9us start test type:11 cost: 16.7us [ OK ] Expr.TestBinaryRangeBenchTest (1152 ms) [ RUN ] Expr.TestLogicalUnaryBenchTest start test type:2 cost: 34.58us start test type:3 cost: 68.86us start test type:4 cost: 151.38us start test type:5 cost: 286.8us start test type:10 cost: 16.54us start test type:11 cost: 16.7us [ OK ] Expr.TestLogicalUnaryBenchTest (1165 ms) [ RUN ] Expr.TestBinaryLogicalBenchTest start test type:2 cost: 20us start test type:3 cost: 17.1us start test type:4 cost: 154.12us start test type:5 cost: 286.1us start test type:10 cost: 19.6us start test type:11 cost: 19.24us [ OK ] Expr.TestBinaryLogicalBenchTest (1188 ms) [ RUN ] Expr.TestBinaryArithOpEvalRangeBenchExpr start test type:2 cost: 125.7us start test type:3 cost: 111.34us start test type:4 cost: 148.02us start test type:5 cost: 306.7us start test type:10 cost: 149.3us start test type:11 cost: 282.94us [ OK ] Expr.TestBinaryArithOpEvalRangeBenchExpr (1221 ms) [ RUN ] Expr.TestCompareExprBenchTest start test type:2 cost: 89us start test type:3 cost: 79us start test type:4 cost: 323us start test type:5 cost: 629us start test type:10 cost: 313us start test type:11 cost: 591us [ OK ] Expr.TestCompareExprBenchTest (1228 ms) [ RUN ] Expr.TestRefactorExprs start test 3cost: 874us start test 10cost: 611us start test 20cost: 290us start test 30cost: 294us start test 50cost: 272us start test 100cost: 278us start test 200cost: 279us [ OK ] Expr.TestRefactorExprs (1149 ms) ``` Signed-off-by: Alexandr Guzhva <alexanderguzhva@gmail.com>	2024-03-24 21:49:07 +08:00
Bingyi Sun	66d679ecbb	fix: clear binlog files in CleanData (#31039 ) issue: https://github.com/milvus-io/milvus/issues/31042 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-03-20 11:11:07 +08:00
Chun Han	6939ad15f2	fix:possible out-of-bound due to groupby when reduing(#30711 ) (#31200 ) related: #30711 Signed-off-by: MrPresent-Han <chun.han@zilliz.com>	2024-03-14 13:07:03 +08:00
Buqian Zheng	7fc3094a42	fix: fix growing index data race and properly handle build error (#31170 ) issue: https://github.com/milvus-io/milvus/issues/31169 also properly handling index build error by re-create a new index so that nothing will be left in the previous failed index build attempt. Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2024-03-13 20:19:04 +08:00
Buqian Zheng	96cfae55a5	feat: [Sparse Float Vector] segcore to support sparse vector search and get raw vector by id (#30629 ) This PR adds the ability to search/get sparse float vectors in segcore, and added unit tests by modifying lots of existing tests into parameterized ones. https://github.com/milvus-io/milvus/issues/29419 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2024-03-12 09:16:30 -07:00
Buqian Zheng	070dfc77bf	feat: [Sparse Float Vector] segcore basics and index building (#30357 ) This commit adds sparse float vector support to segcore with the following: 1. data type enum declarations 2. Adds corresponding data structures for handling sparse float vectors in various scenarios, including: * FieldData as a bridge between the binlog and the in memory data structures * mmap::Column as the in memory representation of a sparse float vector column of a sealed segment; * ConcurrentVector as the in memory representation of a sparse float vector of a growing segment which supports inserts. 3. Adds logic in payload reader/writer to serialize/deserialize from/to binlog 4. Adds the ability to allow the index node to build sparse float vector index 5. Adds the ability to allow the query node to build growing index for growing segment and temp index for sealed segment without index built This commit also includes some code cleanness, comment improvement, and some unit tests for sparse vector. https://github.com/milvus-io/milvus/issues/29419 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2024-03-11 14:45:02 +08:00
Jiquan Long	16b785e149	enhance: optimize the memory usage and speed up loading variable length data (#30787 ) /kind improvement this removes the 1x copying while loading variable length data, also avoids constructing std::string, which could lead to memory fragmentation --------- Signed-off-by: yah01 <yah2er0ne@outlook.com> Signed-off-by: longjiquan <jiquan.long@zilliz.com> Co-authored-by: yah01 <yah2er0ne@outlook.com>	2024-02-28 16:45:00 +08:00
Cai Yudong	8a219e0102	feat: Support knowhere trace using OpenTelemetry (#30750 ) Issue: #21508 Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>	2024-02-28 12:29:00 +08:00
yah01	57397b1307	enhance: add new LRU cache impl (#30360 ) - remove the unused LRU cache - add new LRU cache impl which wraps github.com/karlseguin/ccache related #30361 --------- Signed-off-by: yah01 <yang.cen@zilliz.com>	2024-02-27 20:58:40 +08:00
MrPresent-Han	77eb6defb1	feat: support groupby on growing and non-indexed sealed egment(#30307 ) (#30644 ) related: #30308 Signed-off-by: MrPresent-Han <chun.han@zilliz.com>	2024-02-21 14:02:53 +08:00
zhagnlu	976b6fc0e4	enhance: change opendal as compile configurable (#30384 ) #30373 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-02-20 19:16:52 +08:00
yah01	b74673c147	enhance: calculate the accuracy memory usage while loading segment (#30473 ) the old version Knowhere would copy the index data while loading, we need to consider this to avoid OOM. Knowhere provides a util function to indicate whether it will load the index with disk, if not, we need to double the memory usage prediction for index data Signed-off-by: yah01 <yang.cen@zilliz.com>	2024-02-20 14:52:51 +08:00
zhagnlu	e8a6f1ea2b	fix: erase pk empty check when pk index replace raw data (#30432 ) #30350 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-02-07 14:56:47 +08:00
cqy123456	5449e862d5	fix: safety access unordered_map and remove some useless code excute (#30504 ) issue: https://github.com/milvus-io/milvus/issues/30358 and https://github.com/milvus-io/milvus/issues/30491 Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>	2024-02-05 22:03:09 +08:00
cqy123456	74cfba0249	enhance:limit binlog index rows num (#30173 ) issue: https://github.com/milvus-io/milvus/issues/27678 also relate issue: https://github.com/milvus-io/milvus/issues/30065 Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>	2024-01-29 19:49:02 +08:00
xige-16	e9fdd2475d	fix: fix searchPlan metricType modified concurrently (#30227 ) issue: #30225 /kind bug Signed-off-by: xige-16 <xi.ge@zilliz.com> --------- Signed-off-by: xige-16 <xi.ge@zilliz.com>	2024-01-26 14:03:09 +08:00
yihao.dai	c02fb64ad6	enhance: Allows proactive warming up of chunk cache (#30182 ) Allows proactive warming up of chunk cache. Original vector data will be asynchronously loaded into the chunk cache during the load process. It has the potential to significantly reduce query/search latency for a certain duration after the load, albeit with a concurrent increase in disk usage. issue: https://github.com/milvus-io/milvus/issues/30181 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-01-25 19:55:39 +08:00
MrPresent-Han	4436effdc3	enhance: support groupby based on scalar-index(#29965 ) (#30091 ) related: #29965 Signed-off-by: MrPresent-Han <chun.han@zilliz.com>	2024-01-22 10:50:54 +08:00
yah01	f542bdbf3c	enhance: calc the accurate mem size of segment (#30093 ) this stats the real memory size of segment, also reduces the memory usage in mmap mode resolve #30095 Signed-off-by: yah01 <yang.cen@zilliz.com>	2024-01-19 12:32:53 +08:00
xige-16	fa7cf587b0	enhance: Opt metric type does not match error message (#29927 ) issue: #29791 /kind improvement Signed-off-by: xige-16 <xi.ge@zilliz.com> Signed-off-by: xige-16 <xi.ge@zilliz.com>	2024-01-17 20:25:03 +08:00
yah01	1185e4dcd5	fix: written file size is over the int32 range and raises error (#30057 ) we sum the total data size in int32, which could lead to an overflow error related #30056 Signed-off-by: yah01 <yang.cen@zilliz.com>	2024-01-17 16:42:54 +08:00
chyezh	def717af55	fix: SealedIndexingEntry in SealedIndexingRecord may leak without smart pointer protect. (#29932 ) may related issue: #29828 Signed-off-by: chyezh <ye.zhen@zilliz.com>	2024-01-14 10:28:51 +08:00
Bingyi Sun	e1258b8cad	feat: integrate storagev2 into loading segment (#29336 ) issue: #29335 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-01-12 18:10:51 +08:00
yah01	6c477ce3a7	enhance: optimize the loading strategy (#29910 ) as we have the pool size limit so we don't need to limit the concurrency manually Signed-off-by: yah01 <yang.cen@zilliz.com>	2024-01-12 14:26:50 +08:00
yah01	aba2656e68	fix: missing field data after appending scalar index to loaded segment (#29912 ) related #29843 Signed-off-by: yah01 <yang.cen@zilliz.com>	2024-01-12 14:04:54 +08:00
Xu Tong	e429965f32	Add float16 approve for multi-type part (#28427 ) issue：https://github.com/milvus-io/milvus/issues/22837 Add bfloat16 vector, add the index part of float16 vector. Signed-off-by: Writer-X <1256866856@qq.com>	2024-01-11 15:48:51 +08:00
congqixia	d6429933a7	enhance: make Load process traceable in querynode & segcore (#29858 ) See also #29803 This PR: - Add trace span for `LoadIndex` & `LoadFieldData` in segment loader - Add `TraceCtx` parameter for `Index.Load` in segcore - Add span for ReadFiles & Engine Load for Memory/Disk Vector index --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-01-10 21:58:51 +08:00
zhenshan.cao	60e88fb833	fix: Restore the MVCC functionality. (#29749 ) When the TimeTravel functionality was previously removed, it inadvertently affected the MVCC functionality within the system. This PR aims to reintroduce the internal MVCC functionality as follows: 1. Add MvccTimestamp to the requests of Search/Query and the results of Search internally. 2. When the delegator receives a Query/Search request and there is no MVCC timestamp set in the request, set the delegator's current tsafe as the MVCC timestamp of the request. If the request already has an MVCC timestamp, do not modify it. 3. When the Proxy handles Search and triggers the second phase ReQuery, divide the ReQuery into different shards and pass the MVCC timestamp to the corresponding Query requests. issue: #29656 Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2024-01-09 11:38:48 +08:00
xige-16	9702cef2b5	feat: Support multiple vector search (#29433 ) issue #25639 Signed-off-by: xige-16 <xi.ge@zilliz.com> Signed-off-by: xige-16 <xi.ge@zilliz.com>	2024-01-08 15:34:48 +08:00
cai.zhang	5dc300c4a9	fix: Fix bug for pk index doesn't have raw data (#29711 ) issue: #29697 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-01-07 19:36:48 +08:00
MrPresent-Han	9e2e7157e9	feat: support search_group_by for milvus(#25324 ) (#28983 ) related: #25324 Search GroupBy function, used to aggregate result entities based on a specific scalar column. several points to mention: 1. Temporarliy, the whole groupby is implemented separated from iterative expr framework for the first period 2. In the long term, the groupBy operation will be incorporated into the iterative expr framework:https://github.com/milvus-io/milvus/pull/28166 3. This pr includes some unrelated mocked interface regarding alterIndex due to some unworth-to-mention reasons. All these un-associated content will be removed before the final pr is merged. This version of pr is only for review 4. All other related details were commented in the files comparison Signed-off-by: MrPresent-Han <chun.han@zilliz.com>	2024-01-05 15:50:47 +08:00
Jiquan Long	6f4791da0b	fix: panic in concurrent insert/query scenario (#29408 ) issue: https://github.com/milvus-io/milvus/issues/29405 --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2023-12-26 15:10:48 +08:00
yah01	b8318fcd7d	enhance: improve the handling for segcore error (#29471 ) - fix lost exception details in segcore - improve the logs of handling errors from segcore Signed-off-by: yah01 <yang.cen@zilliz.com>	2023-12-26 14:06:46 +08:00
cqy123456	4c979538a4	enhance: update cagra index params in config and add params check (#29045 ) issue:https://github.com/milvus-io/milvus/issues/29230 this pr do two things about cagra index: a.milvus yaml config support gpu memory settings b.add cagra-params check Signed-off-by: cqy123456 <qianya.cheng@zilliz.com> Co-authored-by: yusheng.ma <yusheng.ma@zilliz.com>	2023-12-26 11:04:47 +08:00
yah01	aef483806d	enhance: improve the segcore logs (#29372 ) - remove the streaming logging - refine existing logs fix #29366 --------- Signed-off-by: yah01 <yah2er0ne@outlook.com>	2023-12-23 21:52:43 +08:00
chyezh	be87c18b44	fix: fixup data race at generate binlog index (#29370 ) issue: #29339 Signed-off-by: chyezh <ye.zhen@zilliz.com>	2023-12-21 14:58:49 +08:00
Gao	9b52cb6417	enhance: improve reducing results when many segments are filtered (#29073 ) Do not fill the invalid ids for the empty results, it will incur useless memory overhead and reduce overhead when nq and topk is large. --------- Signed-off-by: chasingegg <chao.gao@zilliz.com>	2023-12-20 12:56:42 +08:00
MrPresent-Han	bfca0a7926	fix: refine skipIndex to resolve cyclic dependcy(#29132 ) (#29189 ) related: #29132 Signed-off-by: MrPresent-Han <chun.han@zilliz.com>	2023-12-19 10:26:40 +08:00
zhagnlu	a602171d06	enhance: Refactor runtime and expr framework (#28166 ) #28165 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2023-12-18 12:04:42 +08:00
MrPresent-Han	464bc9e8f4	fix: fix reduce precision for search(#27325 ) (#29031 ) related: #27325 Signed-off-by: MrPresent-Han <chun.han@zilliz.com>	2023-12-08 10:04:37 +08:00
congqixia	dcb662d9ed	enhance: Refine C.NewSegment response and handle exception (#28952 ) See also #28795 Orignal `C.NewSegment` may panic if some condition is not met, this pr changes response struct to `CNewSegmentResult`, which contains `C.CStatus` and may return catched exception --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-12-07 13:34:35 +08:00
cai.zhang	fb089cda8b	enhance: Load raw data while scalar index doesn't have raw data (#28888 ) issue: #28886 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2023-12-06 20:36:36 +08:00
Bingyi Sun	36f69ea031	feat: integrate storagev2 in building index of segcore (#28768 ) issue: https://github.com/milvus-io/milvus/issues/28655 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2023-12-05 16:48:54 +08:00
yihao.dai	f5856812a2	fix: Fix get binary vector from chunk cache (#28866 ) The way of getting binary vector size is wrong. This PR will fix it. issue: https://github.com/milvus-io/milvus/issues/28865 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2023-12-01 14:40:32 +08:00
yah01	d69440524b	fix: bypass growing index if no index meta (#28791 ) we shouldn't panic if no index meta, just skip building it fix #28022 Signed-off-by: yah01 <yang.cen@zilliz.com>	2023-11-30 14:10:27 +08:00
congqixia	1dc086496f	fix: schema->size() check logic with system field (#28802 ) Now segcore load system field info as well, the growing segment assertion shall not pass with "+ 2" value This will cause all growing segments load failure Fix #28801 Related to #28478 See also #28524 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-11-29 22:40:28 +08:00
cqy123456	3b1b14dd78	fix: update binlog index memory uasge before loading segments (#28528 ) issue: #27678 when interimIndex = true, memory predict should be update with the memory usage of binlog index build process. Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>	2023-11-29 16:42:27 +08:00
yah01	02c5a649cf	enhance: store system fields in segcore (#28524 ) we need the system fields info for some usacase fix: #28523 --------- Signed-off-by: yah01 <yah2er0ne@outlook.com>	2023-11-21 09:28:22 +08:00
yah01	f7d2ab6677	enhance: reduce 1x copy for variable length field while retrieving (#28345 ) - Reduce 1x copy for varchar/string/JSON/array types while retrieving - Reduce 1x copy for int8/int16 while retrieving Signed-off-by: yah01 <yah2er0ne@outlook.com>	2023-11-15 18:08:20 +08:00
MrPresent-Han	836f300536	support skip-index based on chunk-metrics to accelerate expr filter(#27925 ) (#28297 ) related: #27925 Signed-off-by: MrPresent-Han <chun.han@zilliz.com>	2023-11-15 11:20:19 +08:00

1 2 3 4 5 ...

521 Commits