251 Commits

Author SHA1 Message Date
cai.zhang
3d11ba06ef
fix: Double check to avoid iter has been earsed by other thread (#45013)
issue: #44974

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-10-21 23:36:04 +08:00
cai.zhang
a35a3b7c69
fix: Ensure fulfill promise when CreateArrowFileSystem throw an exception (#44975)
issue: #44974

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-10-20 23:32:03 +08:00
congqixia
27dbb8e75d
fix: support JSON default value in CreateArrowScalarFromDefaultValue (#44912)
Related to #44897

Add missing JSON data type handling in CreateArrowScalarFromDefaultValue
to fix query failures when dynamic fields are enabled. JSON default
values are now properly converted to arrow::BinaryScalar using
bytes_data().

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-10-17 18:22:00 +08:00
Spade A
c4f3f0ce4c
feat: impl StructArray -- support more types of vector in STRUCT (#44736)
ref: https://github.com/milvus-io/milvus/issues/42148

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-10-15 10:25:59 +08:00
congqixia
5ece760d73
fix: Pass fs via FileManagerContext when loading index (#44733)
Related to #44615

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-10-11 09:55:57 +08:00
congqixia
8a443c699e
fix: Make aws credential provider singleton (#44687)
Related to #44647

This patch make milvus-storage using singleton credential provider in
case of data race when concurrent index build task recieved.

See also milvus-io/milvus-storage#44647

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-10-09 16:11:58 +08:00
congqixia
1d85b83215
enhance: [backlog] Fix unittest and remove fs fallback logic (#44615)
Related to #44535

This PR:
- Fix the unittest creating `DiskFileManagerImpl` without `filesystem`
- Add comments for methods need `fs_`
- Remove fallback creation and add assertion for nullptr fs

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-10-09 10:41:57 +08:00
cai.zhang
19346fa389
feat: Geospatial Data Type and GIS Function support for milvus (#44547)
issue: #43427

This pr's main goal is merge #37417 to milvus 2.5 without conflicts.

# Main Goals

1. Create and describe collections with geospatial type
2. Insert geospatial data into the insert binlog
3. Load segments containing geospatial data into memory
4. Enable query and search can display  geospatial data
5. Support using GIS funtions like ST_EQUALS in query
6. Support R-Tree index for geometry type

# Solution

1. **Add Type**: Modify the Milvus core by adding a Geospatial type in
both the C++ and Go code layers, defining the Geospatial data structure
and the corresponding interfaces.
2. **Dependency Libraries**: Introduce necessary geospatial data
processing libraries. In the C++ source code, use Conan package
management to include the GDAL library. In the Go source code, add the
go-geom library to the go.mod file.
3. **Protocol Interface**: Revise the Milvus protocol to provide
mechanisms for Geospatial message serialization and deserialization.
4. **Data Pipeline**: Facilitate interaction between the client and
proxy using the WKT format for geospatial data. The proxy will convert
all data into WKB format for downstream processing, providing column
data interfaces, segment encapsulation, segment loading, payload
writing, and cache block management.
5. **Query Operators**: Implement simple display and support for filter
queries. Initially, focus on filtering based on spatial relationships
for a single column of geospatial literal values, providing parsing and
execution for query expressions.Now only support brutal search
7. **Client Modification**: Enable the client to handle user input for
geospatial data and facilitate end-to-end testing.Check the modification
in pymilvus.

---------

Signed-off-by: Yinwei Li <yinwei.li@zilliz.com>
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Co-authored-by: ZhuXi <150327960+Yinwei-Yu@users.noreply.github.com>
2025-09-28 19:43:05 +08:00
foxspy
13c3b0b909
enhance: add autoindex configuration for the int8 vector type (#44554)
issue: #38666 

Add int8 support for autoindex to ensure it can be independently
configured. At the same time, remove the restriction on int8 type for
vectorDiskIndex (note that vectorDiskIndex only determines the building
and loading method of the index, not the index type).

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-09-24 17:48:04 +08:00
congqixia
ea307ea3c9
fix: [StorageV2] Make DiskFileManager use fs from context (#44535)
Related to #44534

Datanode shall not use singleton fs after 2.6+. This patch make disk
file manager use filesystem passed by fileManagerContext instead of
errorous singleton one.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-24 10:12:03 +08:00
Tianx
2c0c5ef41e
feat: timestamptz expression & index & timezone (#44080)
issue: https://github.com/milvus-io/milvus/issues/27467

>My plan is as follows.
>- [x] M1 Create collection with timestamptz field
>- [x] M2 Insert timestamptz field data
>- [x] M3 Retrieve timestamptz field data
>- [x] M4 Implement handoff
>- [x] M5 Implement compare operator
>- [x] M6 Implement extract operator
 >- [x] M8 Support database/collection level default timezone
>- [x] M7 Support STL-SORT index for datatype timestamptz

---

The third PR of issue: https://github.com/milvus-io/milvus/issues/27467,
which completes M5, M6, M7, M8 described above.

## M8 Default Timezone

We will be able to use alter_collection() and alter_database() in a
future Python SDK release to modify the default timezone at the
collection or database level.

For insert requests, the timezone will be resolved using the following
order of precedence: String Literal-> Collection Default -> Database
Default.
For retrieval requests, the timezone will be resolved in this order:
Query Parameters -> Collection Default -> Database Default.
In both cases, the final fallback timezone is UTC.


## M5: Comparison Operators

We can now use the following expression format to filter on the
timestamptz field:

- `timestamptz_field [+/- INTERVAL 'interval_string'] {comparison_op}
ISO 'iso_string' `

- The interval_string follows the ISO 8601 duration format, for example:
P1Y2M3DT1H2M3S.

- The iso_string follows the ISO 8601 timestamp format, for example:
2025-01-03T00:00:00+08:00.

- Example expressions: "tsz + INTERVAL 'P0D' != ISO
'2025-01-03T00:00:00+08:00'" or "tsz != ISO
'2025-01-03T00:00:00+08:00'".

## M6: Extract

We will be able to extract sepecific time filed by kwargs in a future
Python SDK release.
The key is `time_fields`, and value should be one or more of "year,
month, day, hour, minute, second, microsecond", seperated by comma or
space. Then the result of each record would be an array of int64.



## M7: Indexing Support

Expressions without interval arithmetic can be accelerated using an
STL-SORT index. However, expressions that include interval arithmetic
cannot be indexed. This is because the result of an interval calculation
depends on the specific timestamp value. For example, adding one month
to a date in February results in a different number of added days than
adding one month to a date in March.

--- 

After this PR, the input / output type of timestamptz would be iso
string. Timestampz would be stored as timestamptz data, which is int64_t
finally.

> for more information, see https://en.wikipedia.org/wiki/ISO_8601

---------

Signed-off-by: xtx <xtianx@smail.nju.edu.cn>
2025-09-23 10:24:12 +08:00
congqixia
7b83314bf3
enhance: [StorageV2] Make datanode use non-singleton fs (#44418)
Related to #39173

According to the current design, datanode shall create fs from storage
config in request instead of using singleton fs. This PR upgrade
milvus-storage and make packed reader/writer compose new fs from storage
config.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-18 20:06:00 +08:00
sthuang
2f70a73258
fix: turn on azure by default (#44377)
related: #44354, #44138, #43869

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-09-17 10:12:01 +08:00
sthuang
b38013352d
enhance: [StorageV2] enable build with azure (#44177)
related: #43869

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-09-14 08:05:58 +08:00
congqixia
f5618d5153
enhance: [StorageV2] Utilized advance split policy and persist in meta (#44282)
Related to #44257

This PR:
- Utilize configurable split policy for storage v2, enabling system
field policy
- Store split result in field binlog struct
- Adapt legacy binlog without child fields

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-10 14:47:57 +08:00
sparknack
4a01c726f3
enhance: cachinglayer: some metric and params update (#44276)
issue: #41435

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-09-10 11:03:57 +08:00
Buqian Zheng
9bf2b5c10c
enhance: moved more segcore unit test files (#44210)
issue: https://github.com/milvus-io/milvus/issues/43931

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-09-08 10:21:55 +08:00
Buqian Zheng
b76bf13fc3
enhance: move c++ unit test file to aside of the production code (#43932)
issue: https://github.com/milvus-io/milvus/issues/43931

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-09-03 23:45:53 +08:00
Spade A
7cb15ef141
feat: impl StructArray -- optimize vector array serialization (#44035)
issue: https://github.com/milvus-io/milvus/issues/42148

Optimized from
Go VectorArray → VectorArray Proto → Binary → C++ VectorArray Proto →
C++ VectorArray local impl → Memory
to
Go VectorArray → Arrow ListArray  → Memory

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-09-03 16:39:53 +08:00
zhagnlu
fc876639cf
enhance: support json stats with shredding design (#42534)
#42533

Co-authored-by: luzhang <luzhang@zilliz.com>
2025-09-01 10:49:52 +08:00
congqixia
e3b3502287
fix: Use correct regex for cppcheck (#44077)
Related to #44076

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-08-27 20:57:50 +08:00
marcelo-cjl
e13e19cd2c
enhance: add sparse_u32_f32 data type for sparse vertor (#43974)
issue: #43973

Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com>
2025-08-27 16:47:50 +08:00
Chun Han
da156981c6
feat: milvus support posix-compatible mode(milvus-io#43942) (#43944)
related: #43942

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-08-27 16:29:50 +08:00
XuanYang-cn
37a447d166
feat: Add CMEK cipher plugin (#43722)
1. Enable Milvus to read cipher configs
2. Enable cipher plugin in binlog reader and writer
3. Add a testCipher for unittests
4. Support pooling for datanode
5. Add encryption in storagev2

See also: #40321 
Signed-off-by: yangxuan <xuan.yang@zilliz.com>

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-08-27 11:15:52 +08:00
Tianx
c0d62268ac
feat: add timesatmptz data type (#44005)
issue: https://github.com/milvus-io/milvus/issues/27467
>
https://github.com/milvus-io/milvus/issues/27467#issuecomment-3092211420
> * [x]  M1 Create collection with timestamptz field
> * [x]  M2 Insert timestamptz field data
> * [x]  M3 Retrieve timestamptz field data
> * [x]  M4 Implement handoff[ ]  

The second PR of issue:
https://github.com/milvus-io/milvus/issues/27467, which completes M1-M4
described above.

---------

Signed-off-by: xtx <xtianx@smail.nju.edu.cn>
2025-08-26 15:59:53 +08:00
Gao
e97a618630
enhance: support readAt interface for remote input stream (#43997)
#42032 

Also, fix the cacheoptfield method to work in storagev2.
Also, change the sparse related interface for knowhere version bump
#43974 .
Also, includes https://github.com/milvus-io/milvus/pull/44046 for metric
lost.

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com>
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Co-authored-by: marcelo.chen <marcelo.chen@zilliz.com>
Co-authored-by: Congqi Xia <congqi.xia@zilliz.com>
2025-08-26 11:19:58 +08:00
sparknack
4fae074d56
enhance: add write rate limit for disk file writer (#43912)
issue: #43040

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-08-25 10:27:47 +08:00
Gao
b602b4187d
enhance: upgrade aws-sdk from 1.9.234 to 1.11.352 (#43916)
issue: #43908

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2025-08-19 11:11:45 +08:00
foxspy
647c2bca2d
enhance: Support streaming read and write of vector index files (#43824)
issue: #42032

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-08-15 23:41:43 +08:00
Gao
81a0915c29
enhance: add milvus-common module to decouple knwhere & segcore (#43624)
issue: https://github.com/milvus-io/milvus/issues/42032
https://github.com/milvus-io/milvus/issues/41435

based on pr: https://github.com/milvus-io/milvus/pull/42124

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
Co-authored-by: xianliang.li <xianliang.li@zilliz.com>
2025-08-11 14:09:42 +08:00
zhagnlu
c04d678ad4
enhance: make segcore params effective without restarting milvus (#43231)
#43230

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-08-08 10:33:48 +08:00
congqixia
0b860b4aec
fix: Revert "enhance: DataCodec to release ownership of input_data after initialization (#43542)" (#43571) 2025-07-25 20:48:16 +08:00
Buqian Zheng
d23205b718
enhance: DataCodec to release ownership of input_data after initialization (#43542)
issue: https://github.com/milvus-io/milvus/issues/43088
issue: https://github.com/milvus-io/milvus/issues/43038

see also https://github.com/milvus-io/milvus/pull/43533.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-07-25 14:24:54 +08:00
sthuang
5cebc9f7f6
fix: [StorageV2] handle correct cid with multiple files and add storage v2 prefix logs (#43539)
related: #43372

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-07-25 11:22:54 +08:00
sthuang
59bbdd93f5
fix: [StorageV2] fill the correct group chunk into cell (#43486)
The root cause of the issue lies in the fact that when a sealed segment
contains multiple row groups, the get_cells function may receive
unordered cids. This can result in row groups being written into
incorrect cells during data retrieval.

Previously, this issue was hard to reproduce because the old Storage V2
writer had a bug that caused it to write row groups larger than 1MB.
These large row groups could lead to uncontrolled memory usage and
eventually an OOM (Out of Memory) error. Additionally, compaction
typically produced a single large row group, which avoided the incorrect
cell-filling issue during query execution.

related: https://github.com/milvus-io/milvus/issues/43388,
https://github.com/milvus-io/milvus/issues/43372,
https://github.com/milvus-io/milvus/issues/43464, #43446, #43453

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-07-22 22:22:53 +08:00
Buqian Zheng
389104d200
enhance: rename PanicInfo to ThrowInfo (#43384)
issue: #41435

this is to prevent AI from thinking of our exception throwing as a
dangerous PANIC operation that terminates the program.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-07-19 20:22:52 +08:00
Buqian Zheng
f7b262a702
feat: make storagev1 to support eviction (#43219)
issue: https://github.com/milvus-io/milvus/issues/41435

turns out we have per file binlog size in golang code, by passing it
into segcore we can support eviction in storage v1

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-07-19 02:02:52 +08:00
congqixia
ae48f0e484
fix: [StorageV2] Handle missing column creating index (#43292)
Related to #43250

Use FieldIDList to check missing field. If column is missing, return
empty resultset

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-14 17:06:50 +08:00
sthuang
276c52490d
fix: [StorageV2] missing arrow fs when building index (#43162)
fix: https://github.com/milvus-io/milvus/issues/43150,
https://github.com/milvus-io/milvus/issues/43149

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-07-07 15:26:46 +08:00
congqixia
1d9a9a993d
fix: [StorageV2] Use correct template typename for cache_raw_data_to_disk_common (#43104)
Related to #43099

Previously `cache_raw_data_to_disk_common` used `milvus::DataType`
template typename, which shall be `knowhere::bf16` or other actual
datatype.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-03 18:50:46 +08:00
Zhen Ye
bbbc7d4517
enhance: collect all cgo calling into metric and log slow cgo call (#43035)
issue: #42833

- also fix the error metric for async cgo.
- also make sure the roles can be seen when node startup, #43041.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-03 15:00:44 +08:00
sparknack
7e855f1046
enhance: add disk file writer with Direct IO support (#42665)
issue: #43040 

This patch introduces a disk file writer that supports Direct IO.

Currently, it is exclusively utilized during the QueryNode load process.

Below is its parameters:

1. `common.diskWriteMode`
This parameter controls the write mode of the local disk, which is used
to write temporary data downloaded from remote storage.
Currently, only QueryNode uses 'common.diskWrite*' parameters. Support
for other components will be added in the future.
The options include 'direct' and 'buffered'. The default value is
'buffered'.

2. `common.diskWriteBufferSizeKb`
Disk write buffer size in KB, only used when disk write mode is
'direct', default is 64KB.
Current valid range is [4, 65536]. If the value is not aligned to 4KB,
it will be rounded up to the nearest multiple of 4KB.

3. `common.diskWriteNumThreads`
This parameter controls the number of writer threads used for disk write
operations. The valid range is [0, hardware_concurrency].
It is designed to limit the maximum concurrency of disk write operations
to reduce the impact on disk read performance.
For example, if you want to limit the maximum concurrency of disk write
operations to 1, you can set this parameter to 1.
The default value is 0, which means the caller will perform write
operations directly without using an additional writer thread pool.
In this case, the maximum concurrency of disk write operations is
determined by the caller's thread pool size.

Both parameters can be updated during runtime.

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-07-02 22:18:44 +08:00
Spade A
26ec841feb
feat: optimize Like query with n-gram (#41803)
Ref #42053

This is the first PR for optimizing `LIKE` with ngram inverted index.
Now, only VARCHAR data type is supported and only InnerMatch LIKE
(%xxx%) query is supported.


How to use it:
```
milvus_client = MilvusClient("http://localhost:19530")
schema = milvus_client.create_schema()
...
schema.add_field("content_ngram", DataType.VARCHAR, max_length=10000)
...
index_params = milvus_client.prepare_index_params()
index_params.add_index(field_name="content_ngram", index_type="NGRAM", index_name="ngram_index", min_gram=2, max_gram=3)
milvus_client.create_collection(COLLECTION_NAME, ...)
```

min_gram and max_gram controls how we tokenize the documents. For
example, for min_gram=2 and max_gram=4, we will tokenize each document
with 2-gram, 3-gram and 4-gram.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-07-01 10:08:44 +08:00
sthuang
238bd30f42
fix: [StorageV2] end to end minor issues for sync, stats, and load (#42948)
Fix issues in end-to-end tests: 
1. **Split column groups based on schema**, rather than estimating by
average chunk row size. **Ensure column group consistency within a
segment**, to avoid errors caused by loading multiple column group
chunks simultaneously.
2. **Use sorted segmentId** when generating the stats binlog path, to
ensure consistent and correct file path resolution.
3. **Determine field IDs as follows**:
For multi-column column groups, retrieve the field ID list from
metadata.
For single-column column groups, use the column group ID directly as the
field ID.

related: #39173 
fix: #42862

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-06-27 14:44:42 +08:00
XuanYang-cn
0dfe5308e1
enhance: Tidy Download and decode in segcore storage (#42902)
1. Unify calling from GetObjectData
2. Move SetData inside Deserialize

See also: #40013

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-06-25 11:10:43 +08:00
sthuang
0d57acb13a
enhance: [StorageV2] field id as meta path for wide column when load (#42863)
related: #42862 #39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-06-25 11:08:48 +08:00
Xianhui Lin
b902960057
fix: revert remote jsonstats path (#42882)
fix: revert remote jsonstats path
relate-pr:https://github.com/milvus-io/milvus/pull/42676
issue:https://github.com/milvus-io/milvus/issues/42872

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-06-21 13:24:39 +08:00
sthuang
4a0a2441f2
enhance: [StorageV2] field id as meta path for wide column (#42787)
related: #39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-06-19 15:00:38 +08:00
Spade A
e2c85eec81
fix: load stats index based on mmap config (#42788)
ref https://github.com/milvus-io/milvus/issues/42626

This PR makes text match index and json key stats index be loaded based
on mmap config.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-06-19 10:10:39 +08:00
Spade A
80f1d707f7
fix: tidy up path for scalar index (#42676)
Ref #42626

This path tidy up path for scalar index including path for loading index
from remote storage and temporary path for buliding index.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-06-18 00:42:38 +08:00