116 Commits

Author SHA1 Message Date
Chun Han
406fa7b694
fix: failed to get raw data for hybrid index(#45318) (#45411)
related: #45318

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-11-13 10:17:37 +08:00
sparknack
9d75d0393e
enhance: some optimization of scalar field fetching in tiered storage scenarios (#45360)
issue: #43611

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-11-11 17:17:41 +08:00
Buqian Zheng
515a939edf
enhance: remove obsolete code (#45307)
issue: #44452

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-11-07 16:07:35 +08:00
cai.zhang
ed8ba4a28c
enhance: Make GeometryCache an optional configuration (#45192)
issue: #45187

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-11-03 19:59:32 +08:00
zhagnlu
ae19c93c14
enhance: remove timestamp filter for search_ids to optimize performance (#44634)
#44352

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-10-17 16:10:01 +08:00
sparknack
4bd30a74ca
enhance: cachinglayer: add mmap and eviction support for TextMatchIndex (#44806)
issue: #41435, #44502

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-10-17 14:42:02 +08:00
Bingyi Sun
26d06c6340
feat: load skip index using parquet statistics (#44252)
#44011

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-10-15 19:16:00 +08:00
Spade A
c4f3f0ce4c
feat: impl StructArray -- support more types of vector in STRUCT (#44736)
ref: https://github.com/milvus-io/milvus/issues/42148

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-10-15 10:25:59 +08:00
sparknack
c8a4d6e2ef
enhance: add cachinglayer management for TextMatchIndex (#44741)
issue: #41435, #44502

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-10-13 14:37:58 +08:00
cai.zhang
19346fa389
feat: Geospatial Data Type and GIS Function support for milvus (#44547)
issue: #43427

This pr's main goal is merge #37417 to milvus 2.5 without conflicts.

# Main Goals

1. Create and describe collections with geospatial type
2. Insert geospatial data into the insert binlog
3. Load segments containing geospatial data into memory
4. Enable query and search can display  geospatial data
5. Support using GIS funtions like ST_EQUALS in query
6. Support R-Tree index for geometry type

# Solution

1. **Add Type**: Modify the Milvus core by adding a Geospatial type in
both the C++ and Go code layers, defining the Geospatial data structure
and the corresponding interfaces.
2. **Dependency Libraries**: Introduce necessary geospatial data
processing libraries. In the C++ source code, use Conan package
management to include the GDAL library. In the Go source code, add the
go-geom library to the go.mod file.
3. **Protocol Interface**: Revise the Milvus protocol to provide
mechanisms for Geospatial message serialization and deserialization.
4. **Data Pipeline**: Facilitate interaction between the client and
proxy using the WKT format for geospatial data. The proxy will convert
all data into WKB format for downstream processing, providing column
data interfaces, segment encapsulation, segment loading, payload
writing, and cache block management.
5. **Query Operators**: Implement simple display and support for filter
queries. Initially, focus on filtering based on spatial relationships
for a single column of geospatial literal values, providing parsing and
execution for query expressions.Now only support brutal search
7. **Client Modification**: Enable the client to handle user input for
geospatial data and facilitate end-to-end testing.Check the modification
in pymilvus.

---------

Signed-off-by: Yinwei Li <yinwei.li@zilliz.com>
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Co-authored-by: ZhuXi <150327960+Yinwei-Yu@users.noreply.github.com>
2025-09-28 19:43:05 +08:00
zhagnlu
eac16a577c
enhance:support cachelayer for json stats (#44446)
#42533

Signed-off-by: zhagnlu <lu.zhang@zilliz.com>
2025-09-24 15:30:04 +08:00
sparknack
ab64afba2f
enhance: add storage resource usage for scalar search (#44414)
issue: #44212

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-09-22 14:28:06 +08:00
Gao
d3784c6515
enhance: add storage resource usage for vector search (#44308)
issue: #44212 

Implement search/query storage usage statistics in go side(result
reduce), only record storage usage in vector search C++ path. Need to be
implemented in query c++ path in next prs.

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com>
Co-authored-by: marcelo.chen <marcelo.chen@zilliz.com>
2025-09-19 20:20:02 +08:00
congqixia
98d23de36c
enhance: [StorageV2] Make load info contains child info (#44384)
Related to #44257

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-16 16:14:00 +08:00
zhagnlu
baa84e0b2b
fix: avoid mvcc when doing pk compare expr (#44353)
#44352

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-09-15 10:17:59 +08:00
Bingyi Sun
1931dcd9b5
fix: Fix initialize timestamp index concurrently (#44317)
#issue: https://github.com/milvus-io/milvus/issues/44341

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-09-12 14:25:57 +08:00
sparknack
4a01c726f3
enhance: cachinglayer: some metric and params update (#44276)
issue: #41435

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-09-10 11:03:57 +08:00
Gao
2e98cb0103
enhance: load resource estimation for tiered index (#44171)
issue: https://github.com/milvus-io/milvus/issues/42032

- Use bytes to estimate load resource in the whole estimation procedure
- Add num_rows and dim info for vector index to better estimate
- Disable eviction for tiered index's meta

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2025-09-04 19:41:53 +08:00
Bingyi Sun
0c0630cc38
feat: support dropping index without releasing collection (#42941)
issue: #42942

This pr includes the following changes:
1. Added checks for index checker in querycoord to generate drop index
tasks
2. Added drop index interface to querynode
3. To avoid search failure after dropping the index, the querynode
allows the use of lazy mode (warmup=disable) to load raw data even when
indexes contain raw data.
4. In segcore, loading the index no longer deletes raw data; instead, it
evicts it.
5. In expr, the index is pinned to prevent concurrent errors.

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-09-02 16:17:52 +08:00
sparknack
70c8114e85
enhance: cachinglayer: resource management for segment loading (#43846)
issue: #41435

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-08-29 11:37:50 +08:00
Buqian Zheng
6b22661c06
fix: use tbb::concurrent_unordered_map for ChunkedSegmentSealedImpl::fields_ (#44084)
issue: https://github.com/milvus-io/milvus/issues/44078

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-08-29 10:01:51 +08:00
congqixia
e3b3502287
fix: Use correct regex for cppcheck (#44077)
Related to #44076

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-08-27 20:57:50 +08:00
marcelo-cjl
e13e19cd2c
enhance: add sparse_u32_f32 data type for sparse vertor (#43974)
issue: #43973

Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com>
2025-08-27 16:47:50 +08:00
Chun Han
da156981c6
feat: milvus support posix-compatible mode(milvus-io#43942) (#43944)
related: #43942

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-08-27 16:29:50 +08:00
Spade A
8456f824be
feat: impl StructArray -- miscellaneous staffs for struct array (#43960)
Ref https://github.com/milvus-io/milvus/issues/42148

1. enable storage v2
2. implement some missing staffs
3. fix some bugs and add tests

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-08-26 21:35:53 +08:00
Tianx
c0d62268ac
feat: add timesatmptz data type (#44005)
issue: https://github.com/milvus-io/milvus/issues/27467
>
https://github.com/milvus-io/milvus/issues/27467#issuecomment-3092211420
> * [x]  M1 Create collection with timestamptz field
> * [x]  M2 Insert timestamptz field data
> * [x]  M3 Retrieve timestamptz field data
> * [x]  M4 Implement handoff[ ]  

The second PR of issue:
https://github.com/milvus-io/milvus/issues/27467, which completes M1-M4
described above.

---------

Signed-off-by: xtx <xtianx@smail.nju.edu.cn>
2025-08-26 15:59:53 +08:00
Gao
e97a618630
enhance: support readAt interface for remote input stream (#43997)
#42032 

Also, fix the cacheoptfield method to work in storagev2.
Also, change the sparse related interface for knowhere version bump
#43974 .
Also, includes https://github.com/milvus-io/milvus/pull/44046 for metric
lost.

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com>
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Co-authored-by: marcelo.chen <marcelo.chen@zilliz.com>
Co-authored-by: Congqi Xia <congqi.xia@zilliz.com>
2025-08-26 11:19:58 +08:00
zhagnlu
d904c4e677
enhance: optimize compare expr performance for pk field (#43154)
#43153

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-08-21 10:59:46 +08:00
congqixia
7963b17ac1
fix: Revert "fix: Use folly::SharedMutex preventing starvation (#43937)" (#43959)
Related to #43958

This reverts commit 580350495ab40b3c0a2ec473882258edf6d7dd08.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-08-21 10:09:47 +08:00
Spade A
d6a428e880
feat: impl StructArray -- support create index for vector array (embedding list) and search on it (#43726)
Ref https://github.com/milvus-io/milvus/issues/42148

This PR supports create index for vector array (now, only for
`DataType.FLOAT_VECTOR`) and search on it.
The index type supported in this PR is `EMB_LIST_HNSW` and the metric
type is `MAX_SIM` only.

The way to use it:
```python
milvus_client = MilvusClient("xxx:19530")
schema = milvus_client.create_schema(enable_dynamic_field=True, auto_id=True)
...
struct_schema = milvus_client.create_struct_array_field_schema("struct_array_field")
...
struct_schema.add_field("struct_float_vec", DataType.ARRAY_OF_VECTOR, element_type=DataType.FLOAT_VECTOR, dim=128, max_capacity=1000)
...
schema.add_struct_array_field(struct_schema)
index_params = milvus_client.prepare_index_params()
index_params.add_index(field_name="struct_float_vec", index_type="EMB_LIST_HNSW", metric_type="MAX_SIM", index_params={"nlist": 128})
...
milvus_client.create_index(COLLECTION_NAME, schema=schema, index_params=index_params)
```

Note: This PR uses `Lims` to convey offsets of the vector array to
knowhere where vectors of multiple vector arrays are concatenated and we
need offsets to specify which vectors belong to which vector array.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-08-20 10:27:46 +08:00
congqixia
580350495a
fix: Use folly::SharedMutex preventing starvation (#43937)
Related to #43936

This PR:
- Use `folly::SharedMutex` instead of `std::shared_mutex` preventing
starvation
- Use `folly::SharedMutex::WriteHolder/ReadHolder` instead of
std::shared_lock and std::unique_lock to get better performance

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-08-19 20:05:46 +08:00
Gao
81a0915c29
enhance: add milvus-common module to decouple knwhere & segcore (#43624)
issue: https://github.com/milvus-io/milvus/issues/42032
https://github.com/milvus-io/milvus/issues/41435

based on pr: https://github.com/milvus-io/milvus/pull/42124

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
Co-authored-by: xianliang.li <xianliang.li@zilliz.com>
2025-08-11 14:09:42 +08:00
congqixia
b6199acb05
enhance: Utilize search_batch_pks for search_ids of PkTerm (#43751)
Related to #43660

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-08-07 14:19:40 +08:00
congqixia
d414f6bd4d
enhance: Add assertion preventing reload same field (#43736)
Related to #43725

This patch add assertion preventing segment reloading same field column.
Also improve the message info when pk already exists.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-08-05 19:35:39 +08:00
Chun Han
d826d6ac91
fix: try to get span raw data for variable length data type(#43544) (#43705)
related: #43544

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-08-04 11:15:38 +08:00
congqixia
4aff581007
enhance: Pass callback in search batch pks to void large result (#43695)
Related to #43660

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-08-02 17:57:37 +08:00
congqixia
5f2f4eb3d6
enhance: Ignore entry with same ts when DeleteRecord search pks (#43669)
Related to #43660

This patch reduces the unwanted offset&ts entries having same timestamp
of delete record. Under large amount of upsert, this false hit could
increase large amount of memory usage while applying delete.

The next step could be passing a callback to `search_pk_func_` to handle
hit entry streamingly.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-08-01 10:15:36 +08:00
congqixia
6a74a7de66
enhance: Make DeleteRecord search pks by batch and PinAll (#43640)
Related to #43592

When delete records are large, search pk one by one will result into
many `Pincells` call which creates lots of futures.

This patch make search pk execute in batch to reduce this cost.

Also add `GetAllChunks` API to utilize `PinAllCells` to reduce pins.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-30 19:15:36 +08:00
congqixia
268f1cdace
fix: Hold field shared_ptr in case of being released (#43614)
Related to #43584

Directly accessing `fields_` in `get_raw_data` may have race if load vec
index happens concurrently during getting raw data.

This PR make `bulk_subscript` hold shared_ptr of field column prevent
field column being release during reading it.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-29 12:15:36 +08:00
sthuang
5cebc9f7f6
fix: [StorageV2] handle correct cid with multiple files and add storage v2 prefix logs (#43539)
related: #43372

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-07-25 11:22:54 +08:00
Spade A
10fe53ff59
feat: support json for ngram (#43170)
Ref https://github.com/milvus-io/milvus/issues/42053

This PR enable ngram to support json data type.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-07-25 10:28:54 +08:00
Buqian Zheng
0599113a4b
enhance: add timeout to resource reservation (#43441)
issue: https://github.com/milvus-io/milvus/issues/41435

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-07-22 15:24:53 +08:00
sthuang
6c5f5f1e32
enhance: [StorageV2] refactor group chunk translator (#43406)
related: #43372

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-07-21 19:46:53 +08:00
Buqian Zheng
389104d200
enhance: rename PanicInfo to ThrowInfo (#43384)
issue: #41435

this is to prevent AI from thinking of our exception throwing as a
dangerous PANIC operation that terminates the program.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-07-19 20:22:52 +08:00
Buqian Zheng
f7b262a702
feat: make storagev1 to support eviction (#43219)
issue: https://github.com/milvus-io/milvus/issues/41435

turns out we have per file binlog size in golang code, by passing it
into segcore we can support eviction in storage v1

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-07-19 02:02:52 +08:00
sparknack
9b4081e110
enhance: cachinglayer: some performance optimization (#42858)
issue: #41435

We compared the performance using the modified test_sealed.cpp, which
randomly accesses all rows in all chunks and counts the number of runs
within 3s.

## performance data comparison (ops/second)

chunk config: 1x1000

| Field Type | w/o cachinglayer (commit 640f526301) | w/ cachinglayer |
w/ cachinglayer + opt |
|---|---|---|---|
| Bool field | 82428 | -63.6% (29983) | +2.7% (84675) |
| Int8 field | 82228 | -63.3% (30166) | +2.4% (84163) |
| Int16 field | 82572 | -63.8% (29867) | +1.8% (84036) |
| Int32 field | 82797 | -63.7% (30031) | +1.5% (84043) |
| Int64 field | 81077 | -62.9% (30107) | +0.6% (81604) |
| Float field | 82678 | -63.4% (30266) | +1.8% (84146) |
| Double field | 81925 | -63.4% (29974) | +0.2% (82097) |
| Varchar field | 19933 | -19.6% (16027) | +18.9% (23690) |
| JSON field | 16519 | -96.8% (533) | +2.5% (16927) |
| Int array field | 7325 | -13.7% (6321) | -1.4% (7220) |
| Long array field | 6347 | -8.9% (5781) | -0.1% (6344) |
| Bool array field | 8275 | -14.0% (7116) | +0.4% (8311) |
| String array field | 2281 | -5.0% (2168) | +0.2% (2287) |
| Double array field | 6427 | -13.3% (5574) | -2.0% (6301) |
| Float array field | 7291 | -13.0% (6346) | -1.5% (7183) |
| Vector field | 27487 | -40.4% (16371) | -4.7% (26192) |
| Float16 vector field | 49773 | -54.6% (22601) | -5.9% (46834) |
| BFloat16 vector field | 49783 | -53.1% (23350) | -5.7% (46934) |
| Int8 vector field | 63871 | -59.0% (26179) | -6.2% (59926) |

---

chunk config: 10x1000

| Field Type | w/o cachinglayer (commit 640f526301) | w/ cachinglayer |
w/ cachinglayer + opt |
|---|---|---|---|
| Bool field | 3659 | -48.6% (1879) | +110.1% (7686) |
| Int8 field | 3410 | -45.3% (1864) | +123.9% (7636) |
| Int16 field | 3647 | -48.6% (1874) | +110.1% (7661) |
| Int32 field | 3647 | -48.8% (1866) | +109.6% (7645) |
| Int64 field | 3645 | -48.9% (1863) | +107.8% (7573) |
| Float field | 3647 | -49.0% (1861) | +109.5% (7639) |
| Double field | 3640 | -45.1% (1998) | +108.4% (7586) |
| Varchar field | 1594 | -23.9% (1213) | +20.6% (1922) |
| JSON field | 1202 | -26.5% (884) | +16.1% (1396) |
| Int array field | 602 | -12.3% (528) | +12.7% (678) |
| Long array field | 529 | -12.2% (465) | +7.5% (569) |
| Double array field | 537 | -13.0% (467) | +6.4% (571) |
| Vector field | 1520 | -37.9% (943) | -5.5% (1437) |
| Float16 vector field | 2607 | -47.0% (1382) | +6.4% (2774) |
| BFloat16 vector field | 2586 | -46.5% (1383) | +8.8% (2813) |
| Int8 vector field | 3101 | -47.3% (1633) | +41.9% (4400) |

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-07-17 11:20:51 +08:00
Spade A
db91d85dbc
feat: more types of matches for ngram (#43081)
Ref https://github.com/milvus-io/milvus/issues/42053

This PR enable ngram to support more kinds of matches such as prefix and
postfix match.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-07-14 20:34:50 +08:00
congqixia
6bbed3b019
fix: [AddField] Add shared_lock for insert prevent race (#43229)
Related to #43113

When schema change happens, insert shall not happen, otherwise:
- Data race may happen causing insertion failure
- Inconsistent data schema

This PR add shared_lock prevent this data race.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-10 21:26:48 +08:00
congqixia
f027eea545
enhance: [AddField] Add log for segcore segment schema change (#43215)
Related to #39178

This PR add logs for segment schema change operations.

Also fixes the nit comments from PR #42490

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-10 10:22:47 +08:00
sthuang
9f361a228e
enhance: storage v2 chunked column memory size from meta (#43130)
use meta to get chunked column memory size to avoid getting cells
actually from storage.
related: #39173

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-07-07 14:24:46 +08:00