95 Commits

Author SHA1 Message Date
Bingyi Sun
7040ba1c12
enhance: make json path index support term filter (#40140)
issue: #35528

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-03-04 11:56:02 +08:00
Bingyi Sun
db4769281c
fix: Fall back to a brute-force search if json index type unmatched (#40076)
issue: https://github.com/milvus-io/milvus/issues/35528
If the query data type does not match the index type, fall back to a
brute-force search

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-02-24 16:25:57 +08:00
Bingyi Sun
b59555057d
feat: support json index (#36750)
https://github.com/milvus-io/milvus/issues/35528

This PR adds json index support for json and dynamic fields. Now you can
only do unary query like 'a["b"] > 1' using this index. We will support
more filter type later.

basic usage:
```
collection.create_index("json_field", {"index_type": "INVERTED",
    "params": {"json_cast_type": DataType.STRING, "json_path":
'json_field["a"]["b"]'}})
```

There are some limits to use this index:
1. If a record does not have the json path you specify, it will be
ignored and there will not be an error.
2. If a value of the json path fails to be cast to the type you specify,
it will be ignored and there will not be an error.
3. A specific json path can have only one json index.
4. If you try to create more than one json indexes for one json field,
sdk(pymilvus<=2.4.7) may return immediately because of internal
implementation. This will be fixed in a later version.

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-02-15 14:06:15 +08:00
Spade A
032292a432
feat: support phrase match query (#38869)
The relevant issue: https://github.com/milvus-io/milvus/issues/38930

---------

Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-01-12 20:24:58 +08:00
smellthemoon
907fc24f85
enhance: support null expr (#38772)
#31728

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2025-01-02 14:16:54 +08:00
Gao
994fc544e7
enhance: support iterative filter execution (#37363)
issue: #37360

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2024-12-11 11:32:44 +08:00
smellthemoon
3389a6b500
enhance: support null in text match index (#37517)
#37508

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-11-13 11:08:29 +08:00
smellthemoon
b8492498ac
fix: mask with valid data when preCheckOverflow (#37221)
#37175

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-10-31 10:44:26 +08:00
Yinzuo Jiang
3628593d20
feat: Implement custom function module in milvus expr (#36560)
OSPP 2024 project:
https://summer-ospp.ac.cn/org/prodetail/247410235?list=org&navpage=org

Solutions:

- parser (planparserv2)
    - add CallExpr in planparserv2/Plan.g4
    - update parser_visitor and show_visitor
- grpc protobuf
    - add CallExpr in plan.proto
- execution (`core/src/exec`)
- add `CallExpr` `ValueExpr` and `ColumnExpr` (both logical and
physical) for function call and function parameters
- function factory (`core/src/exec/expression/function`)
    - create a global hashmap when starting milvus (see server.go)
- the global hashmap stores function signatures and their function
pointers, the CallExpr in execution engine can get the function pointer
by function signature.
- custom functions
    - empty(string)
    - starts_with(string, string)
- add cpp/go unittests and E2E tests

closes: #36559

Signed-off-by: Yinzuo Jiang <jiangyinzuo@foxmail.com>
2024-10-25 15:25:30 +08:00
smellthemoon
eb3e4583ec
enhance: all op(Null) is false in expr (#35527)
#31728

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-10-17 21:14:30 +08:00
zhagnlu
489087d18b
enhance: refactor executor framework V2 (#35251)
#32636

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-09-13 20:57:09 +08:00
cai.zhang
2c9bb4dfa3
feat: Support stats task to sort segment by PK (#35054)
issue: #33744 

This PR includes the following changes:
1. Added a new task type to the task scheduler in datacoord: stats task,
which sorts segments by primary key.
2. Implemented segment sorting in indexnode.
3. Added a new field `FieldStatsLog` to SegmentInfo to store token index
information.

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-09-02 14:19:03 +08:00
zhagnlu
4b553b0333
enhance: revert remove duplicated pk function (#35103)
issue: #34778
 Revert "fix: fix query count(*) concurrently"
 Revert "enhance: mark duplicated pk as deleted "

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-08-05 10:48:17 +08:00
smellthemoon
5616b7e8d2
enhance: support null in c data_datacodec and load null value (#32183)
1. support read and write null in segcore
    will store valid_data(use uint8_t type to save memory) in fieldData.
2. support load null
binlog reader read and write data into column(sealed segment),
insertRecord(growing segment). In sealed segment, store valid_data
directly. In growing segment, considering prior implementation and easy
code reading, it covert uint8_t to fbvector<bool>, which may optimize in
future.
3.  retrieve valid_data.
    parse valid_data in search/query.
#31728

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-07-23 16:07:51 +08:00
zhagnlu
804dd5409a
enhance: mark duplicated pk as deleted (#34586)
fix #34247

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-07-16 14:25:39 +08:00
zhagnlu
589d4dfd82
enhance: optimize bitmap index (#33358)
#32900

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-05-30 13:09:43 +08:00
Cai Yudong
246586be27
enhance: Unify data type check APIs under internal/core (#31800)
Issue: #22837 

Move and rename following C++ APIs:
datatype_sizeof() ==> GetDataTypeSize()
datatype_name() ==> GetDataTypeName()
datatype_is_vector() / IsVectorType() ==> IsVectorDataType()
datatype_is_variable() ==> IsVariableDataType()
datatype_is_sparse_vector() ==> IsSparseFloatVectorDataType()
datatype_is_string() / IsString() ==> IsDataTypeString()
datatype_is_floating() / IsFloat() ==> IsDataTypeFloat()
datatype_is_binary() ==> IsDataTypeBinary()
datatype_is_json() ==> IsDataTypeJson()
datatype_is_array() ==> IsDataTypeArray()
datatype_is_variable() == IsDataTypeVariable()
datatype_is_integer() / IsIntegral() ==> IsDataTypeInteger()

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2024-04-02 19:15:14 +08:00
zhagnlu
659ad81ab7
fix: remove deprecated ut test (#31499)
#31498

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-03-26 14:01:07 +08:00
Buqian Zheng
96cfae55a5
feat: [Sparse Float Vector] segcore to support sparse vector search and get raw vector by id (#30629)
This PR adds the ability to search/get sparse float vectors in segcore,
and added unit tests by modifying lots of existing tests into
parameterized ones.

https://github.com/milvus-io/milvus/issues/29419

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-03-12 09:16:30 -07:00
cai.zhang
1aa97a5c21
enhance: Support more relational operators for binary expressions (#30902)
issue: #30677

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-03-01 16:57:00 +08:00
Cai Yudong
8a219e0102
feat: Support knowhere trace using OpenTelemetry (#30750)
Issue: #21508

Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>
2024-02-28 12:29:00 +08:00
zhagnlu
601a8b801b
fix: add move cursor function to physical expr (#29603)
#29570

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-01-09 17:08:48 +08:00
zhagnlu
79c417b14e
fix: pass active count to query context instead of timestamp (#29541)
#29319

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2023-12-31 16:08:48 +08:00
zhagnlu
a6eb7e5f9a
enhance: skip segment when using pk in (..) expr (#29394)
#29293

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2023-12-21 20:06:42 +08:00
zhagnlu
a602171d06
enhance: Refactor runtime and expr framework (#28166)
#28165

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2023-12-18 12:04:42 +08:00
Enwei Jiao
b80a3e19d3
Add code for PanicInfo (#27364)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-09-27 12:01:28 +08:00
foxspy
5db4a0489e
dynamic index version control (#27335)
Co-authored-by: longjiquan <jiquan.long@zilliz.com>
2023-09-25 21:39:27 +08:00
cai.zhang
a362bb1457
Support array datatype (#26369)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2023-09-19 14:23:23 +08:00
Enwei Jiao
0afdfdb9af
Remove other Exceptions, keeps SegcoreError only (#27017)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-09-14 14:05:20 +08:00
cai.zhang
c073aa0dc3
Fix bug for json_contains_all has multiple array elements (#26446)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2023-08-18 22:36:19 +08:00
zhagnlu
709352f96c
Repeat multi times for performace compare test because of cpu cache (#26394)
Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2023-08-17 15:18:17 +08:00
cai.zhang
a0198ce8ae
Support json contains feature (#25384)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2023-08-11 17:09:30 +08:00
zhagnlu
9489e14000
Optimize multi logical exprs performance when meet some situations (#26265)
Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2023-08-11 15:31:29 +08:00
zhagnlu
65cb52d06b
Support dynamic simd framework and using term expr as example (#25260)
Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2023-07-13 16:22:30 +08:00
yah01
dd5f896dc8
Load batch by batch (#25212)
This will significantly reduce the memory usage while loading
- 1x memory usage and MBs overhead for buffer (memory mode)
- only MBs overhead for buffer (mmap mode)

Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-07-06 13:58:27 +08:00
Enwei Jiao
816158e4af
Remove outdated searchplan (#25282)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-07-04 18:30:25 +08:00
xige-16
04082b3de2
Migrate the ability to upload and download binlog to cpp (#22984)
Signed-off-by: xige-16 <xi.ge@zilliz.com>
2023-06-25 14:38:44 +08:00
zhagnlu
f60b839127
Support element in json array in segcore part(#24677) (#24829)
Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2023-06-14 14:38:37 +08:00
yah01
ceda0ed598
Optimize the performance of filter by JSON field (#24268)
- Construct JSON pointer only once
- Avoid copying nested path for each row

Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-05-22 00:47:25 +08:00
yah01
c75e7a5d05
Fix failed to compare int value with double value (#24229)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-05-19 12:57:23 +08:00
zhagnlu
113f9a0ebc
Support SIMD of several Expr (#23715) (#23717)
Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2023-05-12 14:11:20 +08:00
cai.zhang
9715a850fa
Support expr with json field (#23804)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2023-05-10 10:19:19 +08:00
yah01
62eea5286f
Support to filter with json expr (#23739)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-04-30 20:36:39 +08:00
yah01
60fdd7e4f4
Introduce simdjson (#23644)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-04-26 10:30:34 +08:00
foxspy
6f4ed517de
add growing segment index (#23615)
Signed-off-by: xianliang <xianliang.li@zilliz.com>
2023-04-26 10:14:41 +08:00
Jiquan Long
a36fefb009
Fix cpplint (#22657)
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2023-03-10 09:47:54 +08:00
xige-16
8c9c1672ae
Assign different storage config for indexes (#19517)
Signed-off-by: xige-16 <xi.ge@zilliz.com>

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-10-14 14:45:23 +08:00
xige-16
428840178c
Support diskann index for vector field (#19093)
Signed-off-by: xige-16 <xi.ge@zilliz.com>

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-09-21 20:16:51 +08:00
Cai Yudong
a001412e12
Replace faiss::MetricType with knowhere::MetricType (#17891)
Signed-off-by: yudong.cai <yudong.cai@zilliz.com>
2022-06-29 14:20:19 +08:00
xige-16
56778787be
Reverse data from scalar index (#17145)
Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-05-26 14:58:01 +08:00