1843 Commits

Author SHA1 Message Date
sparknack
c8a4d6e2ef
enhance: add cachinglayer management for TextMatchIndex (#44741)
issue: #41435, #44502

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-10-13 14:37:58 +08:00
aoiasd
09865a5da5
fix: BM25 with boost return result not ordered. (#44744)
relate: https://github.com/milvus-io/milvus/issues/44758
Wrong code which should be `(result.seg_offsets_[i] >= 0 &&
result.seg_offsets_[j] < 0)`, but was `(result.seg_offsets_[j] >= 0 &&
result.seg_offsets_[j] < 0) ` now.
But because all placeholder which was offset -1, will fill with worst
distance value.
For IP, L2 or COSIN, it will be +inf or -inf. So sort distance was
enough.
But when use BM25, it will be NAN. Will case sort out of ordered.

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-10-11 17:17:58 +08:00
congqixia
5ece760d73
fix: Pass fs via FileManagerContext when loading index (#44733)
Related to #44615

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-10-11 09:55:57 +08:00
sparknack
7e750190b6
enhance: add a size getter for tantivy inverted index (#44609)
issue: #41435

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-10-10 17:43:57 +08:00
congqixia
8a443c699e
fix: Make aws credential provider singleton (#44687)
Related to #44647

This patch make milvus-storage using singleton credential provider in
case of data race when concurrent index build task recieved.

See also milvus-io/milvus-storage#44647

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-10-09 16:11:58 +08:00
congqixia
1d85b83215
enhance: [backlog] Fix unittest and remove fs fallback logic (#44615)
Related to #44535

This PR:
- Fix the unittest creating `DiskFileManagerImpl` without `filesystem`
- Add comments for methods need `fs_`
- Remove fallback creation and add assertion for nullptr fs

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-10-09 10:41:57 +08:00
cai.zhang
9d1bb8497c
fix: Get R-Tree index correct for growing segment (#44612)
issue: #43427 

R-Tree index is the entire segment, not the chunk.

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-09-29 21:34:54 +08:00
cai.zhang
aecb46a08b
fix: Skip empty loop for process growing segment (#44606)
issue: #43427 

The GISFunction asserts that the segment_offsets cannot be nullptr. When
size is 0, the segment_offsets is nullptr, so the loop is skiped.

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-09-29 21:15:05 +08:00
cai.zhang
19346fa389
feat: Geospatial Data Type and GIS Function support for milvus (#44547)
issue: #43427

This pr's main goal is merge #37417 to milvus 2.5 without conflicts.

# Main Goals

1. Create and describe collections with geospatial type
2. Insert geospatial data into the insert binlog
3. Load segments containing geospatial data into memory
4. Enable query and search can display  geospatial data
5. Support using GIS funtions like ST_EQUALS in query
6. Support R-Tree index for geometry type

# Solution

1. **Add Type**: Modify the Milvus core by adding a Geospatial type in
both the C++ and Go code layers, defining the Geospatial data structure
and the corresponding interfaces.
2. **Dependency Libraries**: Introduce necessary geospatial data
processing libraries. In the C++ source code, use Conan package
management to include the GDAL library. In the Go source code, add the
go-geom library to the go.mod file.
3. **Protocol Interface**: Revise the Milvus protocol to provide
mechanisms for Geospatial message serialization and deserialization.
4. **Data Pipeline**: Facilitate interaction between the client and
proxy using the WKT format for geospatial data. The proxy will convert
all data into WKB format for downstream processing, providing column
data interfaces, segment encapsulation, segment loading, payload
writing, and cache block management.
5. **Query Operators**: Implement simple display and support for filter
queries. Initially, focus on filtering based on spatial relationships
for a single column of geospatial literal values, providing parsing and
execution for query expressions.Now only support brutal search
7. **Client Modification**: Enable the client to handle user input for
geospatial data and facilitate end-to-end testing.Check the modification
in pymilvus.

---------

Signed-off-by: Yinwei Li <yinwei.li@zilliz.com>
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Co-authored-by: ZhuXi <150327960+Yinwei-Yu@users.noreply.github.com>
2025-09-28 19:43:05 +08:00
aoiasd
1b20e956be
enhance: support random score for boost function score (#44214)
And support set function mode and boost mode when run search with boost.

RandomScore support get random function score between [0, weight).
FunctionMode decide how to calculate boost score for multiple boost
function scores.
BoostMode decide how to calculate final score for origin score and boost
score.
relate: https://github.com/milvus-io/milvus/issues/43867

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-09-24 17:50:04 +08:00
foxspy
13c3b0b909
enhance: add autoindex configuration for the int8 vector type (#44554)
issue: #38666 

Add int8 support for autoindex to ensure it can be independently
configured. At the same time, remove the restriction on int8 type for
vectorDiskIndex (note that vectorDiskIndex only determines the building
and loading method of the index, not the index type).

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-09-24 17:48:04 +08:00
sparknack
0145dc8c06
fix: refund loaded resource usage in Insert/DeleteRecord destructor (#44555)
issue: #44528

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-09-24 16:16:04 +08:00
zhagnlu
eac16a577c
enhance:support cachelayer for json stats (#44446)
#42533

Signed-off-by: zhagnlu <lu.zhang@zilliz.com>
2025-09-24 15:30:04 +08:00
sparknack
14c085374e
fix: set mmap_file_raii_ to nullptr when mmap is disabled (#44516)
issue: #44510
related: #44501

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-09-24 11:50:03 +08:00
congqixia
ea307ea3c9
fix: [StorageV2] Make DiskFileManager use fs from context (#44535)
Related to #44534

Datanode shall not use singleton fs after 2.6+. This patch make disk
file manager use filesystem passed by fileManagerContext instead of
errorous singleton one.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-24 10:12:03 +08:00
Bingyi Sun
f0446fd9a0
enhance: optimize the performance of binary_search_string (#44469)
Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-09-23 10:52:13 +08:00
Tianx
2c0c5ef41e
feat: timestamptz expression & index & timezone (#44080)
issue: https://github.com/milvus-io/milvus/issues/27467

>My plan is as follows.
>- [x] M1 Create collection with timestamptz field
>- [x] M2 Insert timestamptz field data
>- [x] M3 Retrieve timestamptz field data
>- [x] M4 Implement handoff
>- [x] M5 Implement compare operator
>- [x] M6 Implement extract operator
 >- [x] M8 Support database/collection level default timezone
>- [x] M7 Support STL-SORT index for datatype timestamptz

---

The third PR of issue: https://github.com/milvus-io/milvus/issues/27467,
which completes M5, M6, M7, M8 described above.

## M8 Default Timezone

We will be able to use alter_collection() and alter_database() in a
future Python SDK release to modify the default timezone at the
collection or database level.

For insert requests, the timezone will be resolved using the following
order of precedence: String Literal-> Collection Default -> Database
Default.
For retrieval requests, the timezone will be resolved in this order:
Query Parameters -> Collection Default -> Database Default.
In both cases, the final fallback timezone is UTC.


## M5: Comparison Operators

We can now use the following expression format to filter on the
timestamptz field:

- `timestamptz_field [+/- INTERVAL 'interval_string'] {comparison_op}
ISO 'iso_string' `

- The interval_string follows the ISO 8601 duration format, for example:
P1Y2M3DT1H2M3S.

- The iso_string follows the ISO 8601 timestamp format, for example:
2025-01-03T00:00:00+08:00.

- Example expressions: "tsz + INTERVAL 'P0D' != ISO
'2025-01-03T00:00:00+08:00'" or "tsz != ISO
'2025-01-03T00:00:00+08:00'".

## M6: Extract

We will be able to extract sepecific time filed by kwargs in a future
Python SDK release.
The key is `time_fields`, and value should be one or more of "year,
month, day, hour, minute, second, microsecond", seperated by comma or
space. Then the result of each record would be an array of int64.



## M7: Indexing Support

Expressions without interval arithmetic can be accelerated using an
STL-SORT index. However, expressions that include interval arithmetic
cannot be indexed. This is because the result of an interval calculation
depends on the specific timestamp value. For example, adding one month
to a date in February results in a different number of added days than
adding one month to a date in March.

--- 

After this PR, the input / output type of timestamptz would be iso
string. Timestampz would be stored as timestamptz data, which is int64_t
finally.

> for more information, see https://en.wikipedia.org/wiki/ISO_8601

---------

Signed-off-by: xtx <xtianx@smail.nju.edu.cn>
2025-09-23 10:24:12 +08:00
Gao
539f17f1ad
enhance: tiered index updates (#44433)
issue: #42032 #44212 

- special case for warmup param and cell storage size for tiered index
- add a config to enable/disable storage usage tracking

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2025-09-22 21:34:11 +08:00
Buqian Zheng
75557f3eb8
enhance: Use std::shared_lock and std::unique_lock for mutexes (#44459)
issue: https://github.com/milvus-io/milvus/issues/44452

Signed-off-by: zhengbuqian <zhengbuqian@gmail.com>
Co-authored-by: buqian.zheng <buqian.zheng@zilliz.com>
2025-09-22 18:02:09 +08:00
Buqian Zheng
846cf52a95
enhance: Remove unused vector plan node subclasses (#44453)
Remove redundant `VectorPlanNode` subclasses and simplify the visitor
pattern by consolidating to a single `VectorPlanNode`.

The previous design used distinct `VectorPlanNode` subclasses and a
templated `VectorVisitorImpl` for type-directed dispatch. However, the
template parameter was not functionally used to implement different
logic for each vector type, making the subclasses redundant for their
intended purpose.

This PR is created by Cursor Agent and manually moved from
https://github.com/zhengbuqian/milvus/pull/14.

Signed-off-by: zhengbuqian <zhengbuqian@gmail.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: buqian.zheng <buqian.zheng@zilliz.com>
2025-09-22 18:00:27 +08:00
sparknack
ab64afba2f
enhance: add storage resource usage for scalar search (#44414)
issue: #44212

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-09-22 14:28:06 +08:00
Gao
d3784c6515
enhance: add storage resource usage for vector search (#44308)
issue: #44212 

Implement search/query storage usage statistics in go side(result
reduce), only record storage usage in vector search C++ path. Need to be
implemented in query c++ path in next prs.

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com>
Co-authored-by: marcelo.chen <marcelo.chen@zilliz.com>
2025-09-19 20:20:02 +08:00
congqixia
b532a3e026
enhance: Move c API unittest aside to src files (#44458)
Related to #43931

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-19 10:30:01 +08:00
congqixia
7b83314bf3
enhance: [StorageV2] Make datanode use non-singleton fs (#44418)
Related to #39173

According to the current design, datanode shall create fs from storage
config in request instead of using singleton fs. This PR upgrade
milvus-storage and make packed reader/writer compose new fs from storage
config.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-18 20:06:00 +08:00
zhagnlu
9b6703626d
fix:fix unescaped bug for json stats (#44421)
#42533

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-09-17 20:54:01 +08:00
sthuang
2f70a73258
fix: turn on azure by default (#44377)
related: #44354, #44138, #43869

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-09-17 10:12:01 +08:00
congqixia
6f7318a731
enhance: [StorageV2] Use compressed size as log file size (#44402)
Related to #39173

backlog issue that memory size and log size shared same value. This
patch add `GetFileSize` api to get remote compressed binlog size as meta
log file size to calculate usage more accurate.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-16 21:20:02 +08:00
congqixia
98d23de36c
enhance: [StorageV2] Make load info contains child info (#44384)
Related to #44257

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-16 16:14:00 +08:00
zhagnlu
baa84e0b2b
fix: avoid mvcc when doing pk compare expr (#44353)
#44352

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-09-15 10:17:59 +08:00
zhagnlu
e9bbb6aa9b
fix: fix json_contains bug for stats (#44325)
#42533

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-09-15 10:16:07 +08:00
sthuang
b38013352d
enhance: [StorageV2] enable build with azure (#44177)
related: #43869

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-09-14 08:05:58 +08:00
Bingyi Sun
1931dcd9b5
fix: Fix initialize timestamp index concurrently (#44317)
#issue: https://github.com/milvus-io/milvus/issues/44341

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-09-12 14:25:57 +08:00
zhagnlu
16e6b6aa8a
fix:fix build json stats bug for nested object (#44303)
issue: #44132

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-09-11 14:13:56 +08:00
zhagnlu
77f7d19400
fix:avoid mmap rewrite by multi json fields (#44299)
issue: #44127

Signed-off-by: zhagnlu <lu.zhang@zilliz.com>
2025-09-11 10:13:57 +08:00
congqixia
f5618d5153
enhance: [StorageV2] Utilized advance split policy and persist in meta (#44282)
Related to #44257

This PR:
- Utilize configurable split policy for storage v2, enabling system
field policy
- Store split result in field binlog struct
- Adapt legacy binlog without child fields

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-10 14:47:57 +08:00
sparknack
4a01c726f3
enhance: cachinglayer: some metric and params update (#44276)
issue: #41435

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-09-10 11:03:57 +08:00
zhagnlu
2f8620fa79
fix: fix like failed and add max columns limit (#44233)
#44137

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-09-10 10:33:57 +08:00
Spade A
45adf2d426
fix: load resource considers ngram index (#44237)
fix https://github.com/milvus-io/milvus/issues/44236

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-09-10 10:27:56 +08:00
Chun Han
26a024625d
feat: support search by on json field and dynamic field(#43124) (#43203)
related: #43124

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-09-09 21:51:56 +08:00
Spade A
575d490af6
fix: ngram index is mistakenly used for unsopported operations 2 (#44142)
issue: https://github.com/milvus-io/milvus/issues/44020
https://github.com/milvus-io/milvus/pull/43955 only fixed unary
expression
This fixes all expressions and add more tests.

---------

Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-09-09 19:05:56 +08:00
Buqian Zheng
dae0fd0e90
enhance: removed unused map_c (#44183)
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-09-09 16:46:04 +08:00
Buqian Zheng
9bf2b5c10c
enhance: moved more segcore unit test files (#44210)
issue: https://github.com/milvus-io/milvus/issues/43931

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-09-08 10:21:55 +08:00
aoiasd
c71b47b52c
enhance: add internal core latency metric for rescore node (#44010)
For fetching latency of boost.

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-09-05 17:37:54 +08:00
zhagnlu
d67f1ea0ab
enhance: add param to modify dump snapshot batch size (#44215)
issue: #44216

Signed-off-by: luzhang <luzhang@zilliz.com>
2025-09-05 14:29:54 +08:00
Gao
2e98cb0103
enhance: load resource estimation for tiered index (#44171)
issue: https://github.com/milvus-io/milvus/issues/42032

- Use bytes to estimate load resource in the whole estimation procedure
- Add num_rows and dim info for vector index to better estimate
- Disable eviction for tiered index's meta

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2025-09-04 19:41:53 +08:00
Buqian Zheng
b76bf13fc3
enhance: move c++ unit test file to aside of the production code (#43932)
issue: https://github.com/milvus-io/milvus/issues/43931

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-09-03 23:45:53 +08:00
Spade A
7cb15ef141
feat: impl StructArray -- optimize vector array serialization (#44035)
issue: https://github.com/milvus-io/milvus/issues/42148

Optimized from
Go VectorArray → VectorArray Proto → Binary → C++ VectorArray Proto →
C++ VectorArray local impl → Memory
to
Go VectorArray → Arrow ListArray  → Memory

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-09-03 16:39:53 +08:00
Buqian Zheng
ad16441aa0
enhance: removed unused VectorFunction (#44178)
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-09-03 14:37:53 +08:00
Bingyi Sun
0c0630cc38
feat: support dropping index without releasing collection (#42941)
issue: #42942

This pr includes the following changes:
1. Added checks for index checker in querycoord to generate drop index
tasks
2. Added drop index interface to querynode
3. To avoid search failure after dropping the index, the querynode
allows the use of lazy mode (warmup=disable) to load raw data even when
indexes contain raw data.
4. In segcore, loading the index no longer deletes raw data; instead, it
evicts it.
5. In expr, the index is pinned to prevent concurrent errors.

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-09-02 16:17:52 +08:00
congqixia
aa4ef9c996
feat: Support enabling dynamic schema on existing collection (#44151)
Related to #44150

This PR make enabling `dynamic schema` feature for an existing
collection possible.

This related API is to reuse `AlterCollection` and underhood its
redirected to `adding nullable json field`

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-02 15:51:52 +08:00