302 Commits

Author SHA1 Message Date
smellthemoon
475c333fa2
enhance: add valid_data in span (#35030)
#31728

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-08-02 15:40:14 +08:00
congqixia
a642a26ed4
enhance: Resolve ChunkFileWriter lint issue (#35166)
See also #34483

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-08-01 16:52:13 +08:00
Bingyi Sun
f229f244d2
enhance: add chunk basic impl (#34634)
https://github.com/milvus-io/milvus/issues/35112
This pr would not affect milvus functionality by now.
It implments a Chunk memory layout that looks like 
```
VariableColumn
|offset|offset|offset|
|data|data|data|

```
We maybe move offsets to the beginning and add null bitmaps later but
not in this PR.
And mmap test will also be added in another PR.

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-08-01 10:29:51 +08:00
congqixia
de8a266d8a
enhance: Enable linux code checker (#35084)
See also #34483

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-30 15:53:51 +08:00
congqixia
972752258a
enhance: Support otlp http exporter (#35053)
See also #35052

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-29 17:43:49 +08:00
smellthemoon
5616b7e8d2
enhance: support null in c data_datacodec and load null value (#32183)
1. support read and write null in segcore
    will store valid_data(use uint8_t type to save memory) in fieldData.
2. support load null
binlog reader read and write data into column(sealed segment),
insertRecord(growing segment). In sealed segment, store valid_data
directly. In growing segment, considering prior implementation and easy
code reading, it covert uint8_t to fbvector<bool>, which may optimize in
future.
3.  retrieve valid_data.
    parse valid_data in search/query.
#31728

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-07-23 16:07:51 +08:00
Chun Han
ed057e6fce
fix: non-init seg_offset for growing raw-data when doing groupby (#34748)
related:  #34713

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2024-07-19 17:01:40 +08:00
zhagnlu
bd9727a1f7
fix: fix bug that set incorrect info to columnbase (#34428)
#34427

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-07-14 22:27:46 +08:00
Chun Han
f00c529aea
feat: support group_size for search_group_by(#33544) (#33720)
related: #33544

mainly changes in three aspects:

1. enable setting group_size for group by function
2. separate normal reduce and group by reduce
3. eleminate uncessary padding in search result for reducing

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2024-07-12 10:17:36 +08:00
smellthemoon
ef3ced8138
fix: descriptor event in previous version not has nullable to parse error (#34235)
#34176

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-07-01 16:38:06 +08:00
zhagnlu
3030e4625e
enhance: refactor variable column to reduce memory cost (#33875)
#33874

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-06-30 20:16:06 +08:00
Cai Yudong
ad90360162
enhance: Update knowhere commit (#34223)
Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2024-06-27 18:20:06 +08:00
zhagnlu
0d7ea8ec42
enhance: Enhance and correct exception module (#33705)
#33704

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-06-23 21:22:01 +08:00
cqy123456
dc4437ff82
enhance: use segment id and type to register in MmapChunkManager and opt malloc in variableChunk (#33993)
issue: https://github.com/milvus-io/milvus/issues/32984

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-06-20 17:42:02 +08:00
Gao
0d20303e54
fix: fix binary vector data size (#33750)
issue: https://github.com/milvus-io/milvus/issues/22837

- fix byte size wrong for binary vectors
- fix the expect/actual error msg

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2024-06-18 21:39:59 +08:00
cqy123456
32f685ff12
enhance: growing segment support mmap (#32633)
issue: https://github.com/milvus-io/milvus/issues/32984

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-06-18 14:42:00 +08:00
zhagnlu
d43ec4db0b
enhance: support array bitmap index (#33527)
#32900

---------

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-06-16 21:51:58 +08:00
Jiquan Long
ecf2bcee42
enhance: speed up array-equal operator via inverted index (#33633)
fix: #33632

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-06-11 14:13:54 +08:00
chyezh
f53ab54c5d
enhance: async cgo utility (#33133)
issue: #30926, #33132

- implement future-based cgo utility.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-06-09 22:55:53 +08:00
cai.zhang
27cc9f2630
enhance: Support analyze data (#33651)
issue: #30633

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Co-authored-by: chasingegg <chao.gao@zilliz.com>
2024-06-06 17:37:51 +08:00
Gao
545d4725fb
fix: correct get vector data size for bf16/fp16/binary vector (#33377)
related #22837

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2024-06-05 14:31:57 +08:00
congqixia
2b285e5573
fix: Wrap init segcore tracing with golang timeout (#33494)
See also #33483

Wrap `C.InitTrace` & `C.SetTrace` with timeout preventing otlp
initializtion hangs forever when endpoint is not set correctly

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-06-03 19:25:51 +08:00
Chun Han
416a2cf507
fix: query iterator lack results(#33137) (#33422)
related: #33137 
adding has_more_result_tag for various level's reduce to rectify
reduce_stop_for_best

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2024-05-30 17:51:44 +08:00
zhagnlu
589d4dfd82
enhance: optimize bitmap index (#33358)
#32900

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-05-30 13:09:43 +08:00
yihao.dai
5cf4161394
fix: Fix exception info is missing (#33393)
Replace based std::exception to prevent "object slicing"

issue: https://github.com/milvus-io/milvus/issues/33392

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-05-27 14:33:41 +08:00
Buqian Zheng
7c60d725cc
fix: validate sparse vector in search request (#32856)
issue: #32368

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-05-15 15:39:33 +08:00
aoiasd
54a51b1236
enhance: Support dynamic config for opentelemetry trace (#32169)
relate: https://github.com/milvus-io/milvus/issues/31940

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-05-09 17:43:30 +08:00
Jiquan Long
ccce1e928a
fix: regex query can't handle text with newline (#32569)
issue: https://github.com/milvus-io/milvus/issues/32482

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-04-26 12:01:26 +08:00
Jiquan Long
c002745902
enhance: retrieve output fields after local reduce (#32346)
issue: #31822

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-04-25 09:49:26 +08:00
Gao
0fab265eed
enhance: update knowhere and some header changes (#32468)
Signed-off-by: chasingegg <chao.gao@zilliz.com>
2024-04-22 15:47:26 +08:00
chyezh
e19d17076f
fix: delete may lost when enable lru cache, some field should be reset when ReleaseData (#32012)
issue: #30361

- Delete may be lost when segment is not data-loaded status in lru
cache. skip filtering to fix it.

- `stats_` and `variable_fields_avg_size_` should be reset when
`ReleaseData`

- Remove repeat load delta log operation in lru.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-04-16 11:17:20 +08:00
Cai Yudong
a0a4ec8b67
enhance: make range search param check message more meaningful (#32006)
Issue: #31970

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2024-04-09 16:17:26 +08:00
Cai Yudong
246586be27
enhance: Unify data type check APIs under internal/core (#31800)
Issue: #22837 

Move and rename following C++ APIs:
datatype_sizeof() ==> GetDataTypeSize()
datatype_name() ==> GetDataTypeName()
datatype_is_vector() / IsVectorType() ==> IsVectorDataType()
datatype_is_variable() ==> IsVariableDataType()
datatype_is_sparse_vector() ==> IsSparseFloatVectorDataType()
datatype_is_string() / IsString() ==> IsDataTypeString()
datatype_is_floating() / IsFloat() ==> IsDataTypeFloat()
datatype_is_binary() ==> IsDataTypeBinary()
datatype_is_json() ==> IsDataTypeJson()
datatype_is_array() ==> IsDataTypeArray()
datatype_is_variable() == IsDataTypeVariable()
datatype_is_integer() / IsIntegral() ==> IsDataTypeInteger()

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2024-04-02 19:15:14 +08:00
Cai Yudong
675a5dc822
fix: Save traceID and spanID as std::vector into search config (#31278)
Issue: #30961

Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>
2024-03-29 14:29:11 +08:00
SimFG
b1a1cca10b
feat: add more operation detail info for better allocation (#30438)
issue: #30436

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-03-28 06:33:11 +08:00
zhagnlu
659ad81ab7
fix: remove deprecated ut test (#31499)
#31498

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-03-26 14:01:07 +08:00
Alexander Guzhva
c4b37fb285
enhance: Custom bitset and bitsetview prototypes (#30454)
Issue: #31285 

Basically, I've replaced `FixedVector<bool>` and `boost::dynamic_bitset`
with custom bitset and bitsetview in order to reduce the memory
bandwidth & increase performance for the filtering.

This PR is for internal use only. 

Current progress (numbers are for GCC 9.5.0 on Ubuntu 22.04 LTS;
clang-17 produces better performance numbers):
Baseline:
```
[ RUN      ] CApiTest.AssembeChunkPerfTest
start test
cost: 17903us
[       OK ] CApiTest.AssembeChunkPerfTest (183 ms)

[ RUN      ] Expr.TestMultiLogicalExprsOptimization
cost: 1391us
cost: 5us
cost: 4us
cost: 4us
cost: 6us
cost: 4us
cost: 4us
cost: 4us
cost: 4us
cost: 4us
143
cost: 10us
cost: 8us
cost: 10us
cost: 8us
cost: 8us
cost: 8us
cost: 8us
cost: 8us
cost: 8us
cost: 9us
8
/home/ubuntu/zilliz/milvus4/milvus/internal/core/unittest/test_expr.cpp:1561: Failure
Expected: (cost_op) < (cost_no_op), actual: 143 vs 8
[  FAILED  ] Expr.TestMultiLogicalExprsOptimization (7 ms)
[ RUN      ] Expr.TestExprs
start test
3cost: 889us
start test
10cost: 2us
start test
20cost: 2us
start test
30cost: 2us
start test
50cost: 3us
start test
100cost: 7us
start test
200cost: 16us
[       OK ] Expr.TestExprs (9 ms)

[ RUN      ] Expr.TestUnaryBenchTest
start test type:2
 cost: 124.8us
start test type:3
 cost: 163.1us
start test type:4
 cost: 275.9us
start test type:5
 cost: 590.9us
start test type:10
 cost: 62.7us
start test type:11
 cost: 65.9us
[       OK ] Expr.TestUnaryBenchTest (1153 ms)
[ RUN      ] Expr.TestBinaryRangeBenchTest
start test type:2
 cost: 151.4us
start test type:3
 cost: 198.4us
start test type:4
 cost: 361.9us
start test type:5
 cost: 753.9us
start test type:10
 cost: 64.6us
start test type:11
 cost: 62.2us
[       OK ] Expr.TestBinaryRangeBenchTest (1151 ms)
[ RUN      ] Expr.TestLogicalUnaryBenchTest
start test type:2
 cost: 121.14us
start test type:3
 cost: 156.84us
start test type:4
 cost: 249.76us
start test type:5
 cost: 534.44us
start test type:10
 cost: 82.2us
start test type:11
 cost: 83.52us
[       OK ] Expr.TestLogicalUnaryBenchTest (1202 ms)
[ RUN      ] Expr.TestBinaryLogicalBenchTest
start test type:2
 cost: 80.64us
start test type:3
 cost: 78.22us
start test type:4
 cost: 255.76us
start test type:5
 cost: 532.04us
start test type:10
 cost: 89.26us
start test type:11
 cost: 90us
[       OK ] Expr.TestBinaryLogicalBenchTest (1198 ms)
[ RUN      ] Expr.TestBinaryArithOpEvalRangeBenchExpr
start test type:2
 cost: 401.7us
start test type:3
 cost: 420.96us
start test type:4
 cost: 418.04us
start test type:5
 cost: 470.54us
start test type:10
 cost: 250.32us
start test type:11
 cost: 850.08us
[       OK ] Expr.TestBinaryArithOpEvalRangeBenchExpr (1273 ms)
[ RUN      ] Expr.TestCompareExprBenchTest
start test type:2
 cost: 162us
start test type:3
 cost: 142us
start test type:4
 cost: 374us
start test type:5
 cost: 674us
start test type:10
 cost: 366us
start test type:11
 cost: 645us
[       OK ] Expr.TestCompareExprBenchTest (1214 ms)
[ RUN      ] Expr.TestRefactorExprs
start test
3cost: 1253us
start test
10cost: 1060us
start test
20cost: 681us
start test
30cost: 522us
start test
50cost: 511us
start test
100cost: 506us
start test
200cost: 497us
[       OK ] Expr.TestRefactorExprs (1142 ms)

```

Candidate:
```
[ RUN      ] CApiTest.AssembeChunkPerfTest
start test
cost: 6099us
[       OK ] CApiTest.AssembeChunkPerfTest (153 ms)

[ RUN      ] Expr.TestMultiLogicalExprsOptimization
cost: 42us
cost: 15us
cost: 15us
cost: 14us
cost: 15us
cost: 15us
cost: 15us
cost: 15us
cost: 15us
cost: 15us
17
cost: 41us
cost: 39us
cost: 33us
cost: 33us
cost: 33us
cost: 33us
cost: 34us
cost: 41us
cost: 34us
cost: 34us
35
[       OK ] Expr.TestMultiLogicalExprsOptimization (6 ms)
[ RUN      ] Expr.TestExprs
start test
3cost: 20us
start test
10cost: 2us
start test
20cost: 2us
start test
30cost: 2us
start test
50cost: 4us
start test
100cost: 8us
start test
200cost: 15us
[       OK ] Expr.TestExprs (8 ms)

[ RUN      ] Expr.TestUnaryBenchTest
start test type:2
 cost: 55.7us
start test type:3
 cost: 79.8us
start test type:4
 cost: 177.6us
start test type:5
 cost: 337.2us
start test type:10
 cost: 16.9us
start test type:11
 cost: 15.7us
[       OK ] Expr.TestUnaryBenchTest (1140 ms)
[ RUN      ] Expr.TestBinaryRangeBenchTest
start test type:2
 cost: 57.1us
start test type:3
 cost: 87us
start test type:4
 cost: 177.5us
start test type:5
 cost: 342.7us
start test type:10
 cost: 17.9us
start test type:11
 cost: 16.7us
[       OK ] Expr.TestBinaryRangeBenchTest (1152 ms)
[ RUN      ] Expr.TestLogicalUnaryBenchTest
start test type:2
 cost: 34.58us
start test type:3
 cost: 68.86us
start test type:4
 cost: 151.38us
start test type:5
 cost: 286.8us
start test type:10
 cost: 16.54us
start test type:11
 cost: 16.7us
[       OK ] Expr.TestLogicalUnaryBenchTest (1165 ms)
[ RUN      ] Expr.TestBinaryLogicalBenchTest
start test type:2
 cost: 20us
start test type:3
 cost: 17.1us
start test type:4
 cost: 154.12us
start test type:5
 cost: 286.1us
start test type:10
 cost: 19.6us
start test type:11
 cost: 19.24us
[       OK ] Expr.TestBinaryLogicalBenchTest (1188 ms)
[ RUN      ] Expr.TestBinaryArithOpEvalRangeBenchExpr
start test type:2
 cost: 125.7us
start test type:3
 cost: 111.34us
start test type:4
 cost: 148.02us
start test type:5
 cost: 306.7us
start test type:10
 cost: 149.3us
start test type:11
 cost: 282.94us
[       OK ] Expr.TestBinaryArithOpEvalRangeBenchExpr (1221 ms)
[ RUN      ] Expr.TestCompareExprBenchTest
start test type:2
 cost: 89us
start test type:3
 cost: 79us
start test type:4
 cost: 323us
start test type:5
 cost: 629us
start test type:10
 cost: 313us
start test type:11
 cost: 591us
[       OK ] Expr.TestCompareExprBenchTest (1228 ms)
[ RUN      ] Expr.TestRefactorExprs
start test
3cost: 874us
start test
10cost: 611us
start test
20cost: 290us
start test
30cost: 294us
start test
50cost: 272us
start test
100cost: 278us
start test
200cost: 279us
[       OK ] Expr.TestRefactorExprs (1149 ms)

```

Signed-off-by: Alexandr Guzhva <alexanderguzhva@gmail.com>
2024-03-24 21:49:07 +08:00
Patrick Weizhi Xu
982dd2834b
enhance: add materialized view search info (#30888)
issue: #29892 

This PR
1. Pass Materialized View (MV) search information obtained from the
expression parsing planning procedure to Knowhere. It only performs when
MV is enabled and the partition key is involved in the expression. The
search information includes:
1. Touched field_id and the count of related categories in the
expression. E.g., `color == red && color == blue` yields `field_id ->
2`.
2. Whether the expression only includes AND (&&) logical operator,
default `true`.
    3. Whether the expression has NOT (!) operator, default `false`.
4. Store if turning on MV on the proxy to eliminate reading from
paramtable for every search request.
5. Renames to MV.

## Rebuttals
1. Did not write in `ExtractInfoPlanNodeVisitor` since the new scalar
framework was introduced and this part might be removed in the future.
2. Currently only interested in `==` and `in` expression, `string` data
type, anything else is a bonus.
3. Leave handling expressions like `F == A || F == A` for future works
of the optimizer.

## Detailed MV Info

![image](https://github.com/milvus-io/milvus/assets/6563846/b27c08a0-9fd3-4474-8897-30a3d6d6b36f)

Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>
2024-03-21 11:19:07 +08:00
groot
c81909bfab
enhance: Support MinIO TLS connection (#31311)
issue: https://github.com/milvus-io/milvus/issues/30709
pr: #31292

Signed-off-by: yhmo <yihua.mo@zilliz.com>
Co-authored-by: Chen Rao <chenrao317328@163.com>
2024-03-21 11:15:20 +08:00
Chun Han
6939ad15f2
fix:possible out-of-bound due to groupby when reduing(#30711) (#31200)
related: #30711

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2024-03-14 13:07:03 +08:00
Buqian Zheng
96cfae55a5
feat: [Sparse Float Vector] segcore to support sparse vector search and get raw vector by id (#30629)
This PR adds the ability to search/get sparse float vectors in segcore,
and added unit tests by modifying lots of existing tests into
parameterized ones.

https://github.com/milvus-io/milvus/issues/29419

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-03-12 09:16:30 -07:00
cai.zhang
6a83f16871
feat: Support for multiple forms of JSON (#31052)
issue: #31051

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-03-11 19:55:02 +08:00
Buqian Zheng
070dfc77bf
feat: [Sparse Float Vector] segcore basics and index building (#30357)
This commit adds sparse float vector support to segcore with the
following:

1. data type enum declarations
2. Adds corresponding data structures for handling sparse float vectors
in various scenarios, including:
* FieldData as a bridge between the binlog and the in memory data
structures
* mmap::Column as the in memory representation of a sparse float vector
column of a sealed segment;
* ConcurrentVector as the in memory representation of a sparse float
vector of a growing segment which supports inserts.
3. Adds logic in payload reader/writer to serialize/deserialize from/to
binlog
4. Adds the ability to allow the index node to build sparse float vector
index
5. Adds the ability to allow the query node to build growing index for
growing segment and temp index for sealed segment without index built

This commit also includes some code cleanness, comment improvement, and
some unit tests for sparse vector.

https://github.com/milvus-io/milvus/issues/29419

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-03-11 14:45:02 +08:00
Cai Yudong
a99143dd52
fix: Save traceID and spanID as hex string into search config (#31071)
Issue: #30961

Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>
2024-03-11 14:21:01 +08:00
Jiquan Long
e2f35954d4
enhance: support pattern matching on json field (#30779)
issue: https://github.com/milvus-io/milvus/issues/30714

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-02-28 18:31:00 +08:00
Jiquan Long
16b785e149
enhance: optimize the memory usage and speed up loading variable length data (#30787)
/kind improvement
this removes the 1x copying while loading variable length data, also
avoids constructing std::string, which could lead to memory
fragmentation

---------

Signed-off-by: yah01 <yah2er0ne@outlook.com>
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
Co-authored-by: yah01 <yah2er0ne@outlook.com>
2024-02-28 16:45:00 +08:00
Cai Yudong
8a219e0102
feat: Support knowhere trace using OpenTelemetry (#30750)
Issue: #21508

Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>
2024-02-28 12:29:00 +08:00
Jiquan Long
3e82d21ca1
enhance: reduce 1x memory copy when loading json (#30753)
/kind improvement

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-02-27 10:18:55 +08:00
MrPresent-Han
77eb6defb1
feat: support groupby on growing and non-indexed sealed egment(#30307) (#30644)
related: #30308

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2024-02-21 14:02:53 +08:00
zhagnlu
976b6fc0e4
enhance: change opendal as compile configurable (#30384)
#30373

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-02-20 19:16:52 +08:00