99 Commits

Author SHA1 Message Date
Spade A
c4f3f0ce4c
feat: impl StructArray -- support more types of vector in STRUCT (#44736)
ref: https://github.com/milvus-io/milvus/issues/42148

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-10-15 10:25:59 +08:00
cai.zhang
19346fa389
feat: Geospatial Data Type and GIS Function support for milvus (#44547)
issue: #43427

This pr's main goal is merge #37417 to milvus 2.5 without conflicts.

# Main Goals

1. Create and describe collections with geospatial type
2. Insert geospatial data into the insert binlog
3. Load segments containing geospatial data into memory
4. Enable query and search can display  geospatial data
5. Support using GIS funtions like ST_EQUALS in query
6. Support R-Tree index for geometry type

# Solution

1. **Add Type**: Modify the Milvus core by adding a Geospatial type in
both the C++ and Go code layers, defining the Geospatial data structure
and the corresponding interfaces.
2. **Dependency Libraries**: Introduce necessary geospatial data
processing libraries. In the C++ source code, use Conan package
management to include the GDAL library. In the Go source code, add the
go-geom library to the go.mod file.
3. **Protocol Interface**: Revise the Milvus protocol to provide
mechanisms for Geospatial message serialization and deserialization.
4. **Data Pipeline**: Facilitate interaction between the client and
proxy using the WKT format for geospatial data. The proxy will convert
all data into WKB format for downstream processing, providing column
data interfaces, segment encapsulation, segment loading, payload
writing, and cache block management.
5. **Query Operators**: Implement simple display and support for filter
queries. Initially, focus on filtering based on spatial relationships
for a single column of geospatial literal values, providing parsing and
execution for query expressions.Now only support brutal search
7. **Client Modification**: Enable the client to handle user input for
geospatial data and facilitate end-to-end testing.Check the modification
in pymilvus.

---------

Signed-off-by: Yinwei Li <yinwei.li@zilliz.com>
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Co-authored-by: ZhuXi <150327960+Yinwei-Yu@users.noreply.github.com>
2025-09-28 19:43:05 +08:00
congqixia
cc53b25ba4
fix: [skip e2e] Update unit test after hnsw support binary vector (#44575)
fix: #44574

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-26 18:21:04 +08:00
cqy123456
d50b365375
enhance: add autoindex config for deduplication case (#44186)
Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2025-09-03 17:19:53 +08:00
Spade A
03c46e686f
fix: ngram index for json rejects type of non-varchar field (#44157)
issue: https://github.com/milvus-io/milvus/issues/43934

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-09-03 16:45:54 +08:00
Spade A
1b583e4b54
fix: fixing ngram index rejecting mmap (#44175)
issue: https://github.com/milvus-io/milvus/issues/44164

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-09-03 14:35:53 +08:00
Spade A
d6a428e880
feat: impl StructArray -- support create index for vector array (embedding list) and search on it (#43726)
Ref https://github.com/milvus-io/milvus/issues/42148

This PR supports create index for vector array (now, only for
`DataType.FLOAT_VECTOR`) and search on it.
The index type supported in this PR is `EMB_LIST_HNSW` and the metric
type is `MAX_SIM` only.

The way to use it:
```python
milvus_client = MilvusClient("xxx:19530")
schema = milvus_client.create_schema(enable_dynamic_field=True, auto_id=True)
...
struct_schema = milvus_client.create_struct_array_field_schema("struct_array_field")
...
struct_schema.add_field("struct_float_vec", DataType.ARRAY_OF_VECTOR, element_type=DataType.FLOAT_VECTOR, dim=128, max_capacity=1000)
...
schema.add_struct_array_field(struct_schema)
index_params = milvus_client.prepare_index_params()
index_params.add_index(field_name="struct_float_vec", index_type="EMB_LIST_HNSW", metric_type="MAX_SIM", index_params={"nlist": 128})
...
milvus_client.create_index(COLLECTION_NAME, schema=schema, index_params=index_params)
```

Note: This PR uses `Lims` to convey offsets of the vector array to
knowhere where vectors of multiple vector arrays are concatenated and we
need offsets to specify which vectors belong to which vector array.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-08-20 10:27:46 +08:00
Spade A
10fe53ff59
feat: support json for ngram (#43170)
Ref https://github.com/milvus-io/milvus/issues/42053

This PR enable ngram to support json data type.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-07-25 10:28:54 +08:00
Bingyi Sun
6e38e9d18f
fix: Add json cast type for flat index (#42970)
issue: #42916

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-07-03 14:14:44 +08:00
Spade A
26ec841feb
feat: optimize Like query with n-gram (#41803)
Ref #42053

This is the first PR for optimizing `LIKE` with ngram inverted index.
Now, only VARCHAR data type is supported and only InnerMatch LIKE
(%xxx%) query is supported.


How to use it:
```
milvus_client = MilvusClient("http://localhost:19530")
schema = milvus_client.create_schema()
...
schema.add_field("content_ngram", DataType.VARCHAR, max_length=10000)
...
index_params = milvus_client.prepare_index_params()
index_params.add_index(field_name="content_ngram", index_type="NGRAM", index_name="ngram_index", min_gram=2, max_gram=3)
milvus_client.create_collection(COLLECTION_NAME, ...)
```

min_gram and max_gram controls how we tokenize the documents. For
example, for min_gram=2 and max_gram=4, we will tokenize each document
with 2-gram, 3-gram and 4-gram.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-07-01 10:08:44 +08:00
cqy123456
317bbfbf81
enhance: milvus support minhash vector and mhjaccard metric (#42036)
issue:
https://github.com/issues/assigned?issue=milvus-io%7Cmilvus%7C41746

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2025-06-10 14:38:34 +08:00
Bingyi Sun
6404e02d99
fix: Check cast type is array for json contains expr (#42184)
issue: https://github.com/milvus-io/milvus/issues/42181

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-06-09 17:04:33 +08:00
Bingyi Sun
cc5ac1c220
enhance: Support cast function for json index (#41949)
issue: #41948

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-06-05 19:42:32 +08:00
presburger
e878fe588e
enhance: Set the default GPU version autoindex to use the CAgra index (#41906)
issue:  #41907

Signed-off-by: yusheng.ma <yusheng.ma@zilliz.com>
2025-05-23 01:20:28 +08:00
SimFG
91d40fa558
fix: Update logging context and upgrade dependencies (#41318)
- issue: #41291

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-04-23 10:52:38 +08:00
Bingyi Sun
9676365af9
fix: Fix json index not equal filter (#40647)
issue: #35528

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-03-27 23:06:23 +08:00
Bingyi Sun
ecea17c576
enhance: Set field name as json path if not specified (#40419)
issue: #35528

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-03-21 11:26:13 +08:00
Bingyi Sun
a7cff3873b
fix: Verify json_cast_type (#40604)
issue: https://github.com/milvus-io/milvus/issues/40420

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-03-14 16:48:26 +08:00
congqixia
cb7f2fa6fd
enhance: Use v2 package name for pkg module (#39990)
Related to #39095

https://go.dev/doc/modules/version-numbers

Update pkg version according to golang dep version convention

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-02-22 23:15:58 +08:00
Bingyi Sun
b59555057d
feat: support json index (#36750)
https://github.com/milvus-io/milvus/issues/35528

This PR adds json index support for json and dynamic fields. Now you can
only do unary query like 'a["b"] > 1' using this index. We will support
more filter type later.

basic usage:
```
collection.create_index("json_field", {"index_type": "INVERTED",
    "params": {"json_cast_type": DataType.STRING, "json_path":
'json_field["a"]["b"]'}})
```

There are some limits to use this index:
1. If a record does not have the json path you specify, it will be
ignored and there will not be an error.
2. If a value of the json path fails to be cast to the type you specify,
it will be ignored and there will not be an error.
3. A specific json path can have only one json index.
4. If you try to create more than one json indexes for one json field,
sdk(pymilvus<=2.4.7) may return immediately because of internal
implementation. This will be fixed in a later version.

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-02-15 14:06:15 +08:00
congqixia
b3791a6f90
fix: Use param item formatter to avoid SetConfig to overlay (#39597)
Related to #39596

When updating the build param configuration, the `Formatter` could be
used to do so and completed avoid touching the `overlay` config items

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-01-27 10:51:07 +08:00
Cai Yudong
5bf1b2b929
feat: Support Int8Vector in go (#38990)
Issue: #38666

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2025-01-14 20:43:06 +08:00
Zhen Ye
bb8d1ab3bf
enhance: make new go package to manage proto (#39114)
issue: #39095

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-10 10:49:01 +08:00
Bingyi Sun
f0096ec292
fix: Fix IsMmapSupported for scalar index (#38135)
https://github.com/milvus-io/milvus/issues/38134

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-12-17 20:30:44 +08:00
foxspy
cf883b114e
enhance: update knowhere version (#37510)
issue: #36925

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2024-11-13 16:36:27 +08:00
foxspy
642a651f60
enhance: remove useless vector index param checker (#37329)
issue: #34298 
because all vector index config checker has been moved into
vector_index_checker, then the useless checkers can be removed.

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2024-11-01 06:20:21 +08:00
foxspy
d7b2ffe5aa
enhance: add an unify vector index config checker (#36844)
issue: #34298

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2024-10-28 10:11:37 +08:00
jaime
c9d0c157ec
Move some modules from internal to public package (#22572)
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-04-06 19:14:32 +08:00
Cai Yudong
ef63e64ded
Remove ANNOY index type (#23189)
Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>
2023-04-04 16:30:27 +08:00
Cai Yudong
9ad6d9f1a0
Rename RAFT_IVF_FLAT/RAFT_IVF_PQ to GPU_IVF_FLAT/GPU_IVF_PQ (#23194)
Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>
2023-04-04 11:44:26 +08:00
Cai Yudong
7612c75c47
Let RAFT_IVF_PQ param accept m=0 (#23134)
Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>
2023-03-31 11:22:22 +08:00
Cai Yudong
3febb5e45a
Remove gpu mode param check (#23074)
Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>
2023-03-29 14:24:01 +08:00
zhenshan.cao
d55c860383
Align the maximum dim of the diskann index and collection (#23039)
Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2023-03-28 00:06:00 +08:00
Cai Yudong
8aebc6f3b7
Remove faiss GPU index support (#22966)
Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>
2023-03-24 17:53:58 +08:00
Cai Yudong
ab3cbdfc61
Partial change to prepare for GPU index type support (#22591)
Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>
2023-03-14 23:21:56 +08:00
congqixia
732986aa04
Remove fmt.Print from internal package (#22722)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-03-14 17:36:05 +08:00
Enwei Jiao
697dedac7e
Use cockroachdb/errors to replace other error pkg (#22390)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-02-26 11:31:49 +08:00
Cai Yudong
d4e0b6e91b
Remove unused index param check (#22271)
Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>
2023-02-21 17:36:26 +08:00
xige-16
82570e057c
Limit MinDim to build disk index (#20724)
Signed-off-by: xige-16 <xi.ge@zilliz.com>

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-11-21 17:43:10 +08:00
Jiquan Long
8d0cc4226c
Fix IVF_SQ nbits check (#20183)
Signed-off-by: longjiquan <jiquan.long@zilliz.com>

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2022-10-31 10:13:34 +08:00
SimFG
a55f739608
Separate public proto files (#19782)
Signed-off-by: SimFG <bang.fu@zilliz.com>

Signed-off-by: SimFG <bang.fu@zilliz.com>
2022-10-16 20:49:27 +08:00
xige-16
428840178c
Support diskann index for vector field (#19093)
Signed-off-by: xige-16 <xi.ge@zilliz.com>

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-09-21 20:16:51 +08:00
SimFG
d7f38a803d
Separate some proto files (#19218)
Signed-off-by: SimFG <bang.fu@zilliz.com>

Signed-off-by: SimFG <bang.fu@zilliz.com>
2022-09-16 16:56:49 +08:00
XuanYang-cn
a782ded0bd
Fix RHNSWPQ pqm divide by zero (#18700)
See also: #18671

Signed-off-by: yangxuan <xuan.yang@zilliz.com>

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2022-08-18 19:24:51 +08:00
XuanYang-cn
43c7c1ff03
Fix indexcheck division by zero bug (#18482)
See also: #18479

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2022-08-02 16:04:34 +08:00
Jiquan Long
2fe8677cbf
Enable dimension check in Proxy when create index request received (#16718)
Signed-off-by: dragondriver <jiquan.long@zilliz.com>
2022-04-29 18:01:49 +08:00
Jiquan Long
f8d9bc919d
Unify interface of vector index & scalar index. (#15959)
Signed-off-by: dragondriver <jiquan.long@zilliz.com>
2022-03-21 14:23:24 +08:00
zhenshan.cao
543e348730
[skip e2e]Update license (#15131)
Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2022-01-11 00:09:34 +08:00
zhenshan.cao
0f7a04fc8c
[skip e2e]Update license (#15129)
Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2022-01-11 00:03:38 +08:00
jaime
828b9d399f
[skip e2e] Update license (#15123)
Signed-off-by: yun.zhang <yun.zhang@zilliz.com>

Co-authored-by: yun.zhang <yun.zhang@zilliz.com>
2022-01-10 22:45:37 +08:00