1377 Commits

Author SHA1 Message Date
wei liu
1fae8f5ae3
enhance: Optimize FlushAll performance for multi-table scenarios (#43339)
Replace multiple per-table flush RPC calls with single FlushAll RPC to
improve performance in multi-table scenarios.
issue: #43338
- Implement server-side FlushAll request processing in
DataCoord/MixCoord
- Add flushAllTask to handle unified flush operations across all tables
- Replace proxy-side per-table flush iteration with single RPC call
- Support both streaming and non-streaming service execution paths
- Add comprehensive unit tests for new FlushAll implementation

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-07-30 15:37:37 +08:00
tinswzy
1718b0d141
enhance: update wp version v0.1.2 (#43636)
#43638 
update wp to v0.1.2
fix read failure when minio is killed during data reading. related wp
commit#[aabd1c4eb2](aabd1c4eb2
)

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-07-30 14:39:36 +08:00
Zhen Ye
3e3775fb81
fix: panics when describe collection internal failure (#43630)
issue: #43629

- also fix the scanner_switchable panic underlying wal scanner return
context error.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-29 20:33:36 +08:00
Zhen Ye
cd38d65417
fix: make savebinlogpath idompotent at binlog level (#43615)
issue: #43574

- update all binlog every time when calling udpate savebinlogpath.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-29 19:47:36 +08:00
XuanYang-cn
0ccb95303e
feat: [CMEK] Add utils to load plugins (#42986)
See also: #40321

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-07-29 17:17:36 +08:00
Buqian Zheng
052fb6c562
feat: add time based eviction to data managed by cachinglayer (#43490)
issue: https://github.com/milvus-io/milvus/issues/41435

also added disk capacity protection

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-07-29 16:17:35 +08:00
tinswzy
173efe2b98
enhance: wp metrics and update deps to v0.1.0 (#43569)
#43574   #43604 #43431  #43603 
Fix wp metrics not registered bug;
Update the version dependent on wp to v0.1.2-rc1;
improve advanced reader with concurrent prefetch blks;
add the segment rolling policy based on the number of blocks;
improve concurrent compaction
release lock failed bug

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-07-29 14:51:35 +08:00
Xiaofan
bd31b32167
fix: hybridsearch should support offset param in restful api (#43586)
Add support of offset param for reqeustful. api and refine some constant
usage related #43556

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
2025-07-28 22:15:36 +08:00
yihao.dai
a29b3272b0
fix: Improve import memory management to prevent OOM (#43568)
1. Use blocking memory allocation to wait until memory becomes available
2. Perform memory allocation at the file level instead of per task
3. Limit Parquet file reader batch size to prevent excessive memory
consumption
4. Limit import buffer size from 20% to 10% of total memory

issue: https://github.com/milvus-io/milvus/issues/43387,
https://github.com/milvus-io/milvus/issues/43131

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-07-28 21:25:35 +08:00
Zhen Ye
648994182f
fix: pulsar use more memory for queue (#43565)
issue: #43564

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-28 14:00:56 +08:00
Spade A
faeb7fd410
feat: impl StructArray -- create schema, insert, and retrieve data (#42855)
Ref https://github.com/milvus-io/milvus/issues/42148

https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of
storage for handling with VectorArray.
This PR:
1. impls the go part of storage for VectorArray
2. impls the collection creation with StructArrayField and VectorArray
3. insert and retrieve data from the collection.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>
2025-07-27 01:30:55 +08:00
Zhen Ye
070aabd27e
enhance: fix remove flushing state of segment (#43560)
issue: #43559, #42884

- also fix the data lost when streaming resuming from old arch message.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-25 18:08:54 +08:00
tinswzy
83f6811dbd
fix: local fs incomplete block read bug (#43444)
#43340 fix log reader bug
#43370 list object goroutine leak ; block flush bug
#43431 #43356 improve read latency 
other fix: local FS block CRC fix; incomplete block read bugfix;
multi-segment rolling not complete bug; local fs concurent flush bug
other enhance: log reader EOF-based segment end detection ; revisioned
log/segment meta updates.

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-07-25 10:30:54 +08:00
Zhen Ye
e9ab73e93d
enhance: add schema version at recovery storage (#43500)
issue: #43072, #43289

- manage the schema version at recovery storage.
- update the schema when creating collection or alter schema.
- get schema at write buffer based on version.
- recover the schema when upgrading from 2.5.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-23 21:38:54 +08:00
yihao.dai
9fbd41a97d
fix: Adjust binlog and parquet reader buffer size for import (#43495)
1. Modify the binlog reader to stop reading a fixed 4096 rows and
instead use the calculated bufferSize to avoid generating small binlogs.
2. Use a fixed bufferSize (32MB) for the Parquet reader to prevent OOM.

issue: https://github.com/milvus-io/milvus/issues/43387

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-07-23 21:28:54 +08:00
Buqian Zheng
0599113a4b
enhance: add timeout to resource reservation (#43441)
issue: https://github.com/milvus-io/milvus/issues/41435

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-07-22 15:24:53 +08:00
yihao.dai
a839017e81
fix: Handle retry state in import task (#43474)
issue: https://github.com/milvus-io/milvus/issues/43473

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-07-22 14:52:53 +08:00
congqixia
672a83f66b
enhance: Skip remove op if key in save set (#43425)
Related to #43407

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-18 17:37:39 +08:00
congqixia
8fc7069e1a
fix: Make MultiSaveAndRemove execute removal first (#43408)
Realted to #43407

When `MultiSaveAndRemove` like ops contains same key in saves and
removal keys it may cause data lost if the execution order is save first
than removal.

This PR make all the kv execute removal first then save the new values.
Even when same key appeared in both saves and removals, the new value
shall stay.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-18 15:41:40 +08:00
Zhen Ye
5aa7a116d2
fix: change maxTimeTickDelay from 5m into 20m (#43377)
issue: #43266

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-18 11:29:42 +08:00
tinswzy
26f2de4bcf
fix: fence failure and remove list API usage (#43365)
#43356  #43370 fence fail ; goroutine leaks
#43313 record too large

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-07-18 11:22:51 +08:00
Buqian Zheng
d793def47c
feat: impose a physical memory limit when loading cells (#43222)
issue: #41435 

issue: https://github.com/milvus-io/milvus/issues/43038

This PR also:


1. removed ERROR state from ListNode
2. CacheSlot will do reserveMemory once for all requested cells after
updating the state to LOADING, so now we transit a cell to LOADING
before its resource reservation
3. reject resource reservation directly if size >= max_size

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-07-18 11:18:52 +08:00
Zhen Ye
07fa2cbdd3
enhance: wal balance consider the wal status on streamingnode (#43265)
issue: #42995

- don't balance the wal if the producing-consuming lag is too long.
- don't balance if the rebalance is set as false.
- don't balance if the wal is balanced recently.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-18 11:10:51 +08:00
XuanYang-cn
4dcaa97682
fix: Use diskSegmentMaxSize for coll with sparse and dense vectors (#43194)
Previous code uses diskSegmentMaxSize if and only if all of the
collection's vector fields are indexed with DiskANN index.

When introducing sparse vectors, since sparse vector cannot be indexed
with DiskANN index, collections with both dense and sparse vectors will
use maxSize instead.

This PR changes the requirments of using diskSegmentMaxSize to all dense
vectors are indexed with DiskANN indexs, ignoring sparse vector fields.

See also: #43193

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-07-16 18:04:52 +08:00
congqixia
5d90b65342
enhance: [StorageV2] Add storage version in Data/Query view resp (#43348)
Related to #39173

Add `storage_version` in data/query view segment info response

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-16 15:52:51 +08:00
tinswzy
b5a1937699
fix: wp refuses to write only when both payload and properties are empty (#43319)
#43313

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-07-16 14:42:50 +08:00
liliu-z
003c348d6d
enhance: Upgrade go version to 1.24.4 to fix CVEs (#43019)
Signed-off-by: liliu-z <liliu-z@users.noreply.github.com>
Co-authored-by: liliu-z <liliu-z@users.noreply.github.com>
2025-07-16 11:28:50 +08:00
sthuang
4f17640598
enhance: [StorageV2] clean up legacy flag (#43290)
related: #39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-07-15 10:18:49 +08:00
tinswzy
0aeac94f8a
fix: no such file error was reported when reading an empty segment in local mode (#43284)
#43185

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-07-14 19:30:49 +08:00
Ted Xu
07894b37b6
enhance: returning collection metadata from cache (#42823)
See #43187

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-07-14 10:54:50 +08:00
tinswzy
7da62698e0
enhance: improve WP parallel sync mechanism and fencing logic (#42892)
related: #42595 
improve WP parallel sync mechanism and fencing logic; remove redundant
metrics and labels

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-07-13 23:04:49 +08:00
Zhen Ye
15a6631147
enhance: add quota limit based on sn consuming lag (#43105)
issue: #42995

- The consuming lag at streaming node will be reported to coordinator.
- The consuming lag will trigger the write limit and deny by quota
center.
- Set the ttProtection by default.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-11 14:10:49 +08:00
Zhen Ye
f598ca2b4e
fix: block at msgpack adaptor and wrong metrics (#43235)
issue: #43018

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-11 10:14:49 +08:00
PjJinchen
a90694165b
feat: Supports tracing services that require header-based authentication. (#43211)
issue: https://github.com/milvus-io/milvus/issues/43082

support tracing services that require header-based authentication.
for example: aliyun SLS, volcengine LogService etc...

[aliyun
SLS](https://help.aliyun.com/zh/sls/import-trace-data-from-golang-applications-to-log-service-by-using-opentelemetry-sdk-for-golang?spm=a2c4g.11186623.help-menu-search-28958.d_1#section-ktk-xxz-8om)

Add a headers config in trace config

```
trace:
  exporter: otlp
  sampleFraction: 1
  otlp:
    endpoint:  milvus-cn-beijing-pre.cn-beijing.log.aliyuncs.com:10010
    method:  # otlp export method, acceptable values: ["grpc", "http"],  using "grpc" by default
    secure: true
    headers:  # base64
  initTimeoutSeconds: 10
```

it is encoded as base64, raw data is json
```
{
    "x-sls-otel-project": "milvus-cn-beijing-pre",
    "x-sls-otel-instance-id": "milvus-cn-beijing-pre",
    "x-sls-otel-ak-id": "xxx",
    "x-sls-otel-ak-secret": "xxx"
}
```

[volcengine
tls](https://www.volcengine.com/docs/6470/812322#grpc-%E5%8D%8F%E8%AE%AE%E5%88%9D%E5%A7%8B%E5%8C%96%E7%A4%BA%E4%BE%8B)

Add a headers config in trace config

```
trace:
  exporter: otlp
  sampleFraction: 1
  otlp:
    endpoint:  xxx
    method:  # otlp export method, acceptable values: ["grpc", "http"],  using "grpc" by default
    secure: true
    headers:  # base64
  initTimeoutSeconds: 10
```

it is encoded as base64, raw data is json
```
{
    "x-tls-otel-region": "cn-beijing",
    "x-tls-otel-tracetopic": "milvus-cn-beijing-pre",
    "x-tls-otel-ak": "xxx",
    "x-tls-otel-sk": "xxx"
}
```

Signed-off-by: PjJinchen <6268414+pj1987111@users.noreply.github.com>
2025-07-10 17:32:48 +08:00
wei liu
b2597c6329
enhance: apply load config changes after QueryCoord restart (#43108)
issue: #43107 
- Add checkLoadConfigChanges() to apply load config during startup
- Call config check in startQueryCoord() after restart
- Skip auto-updates for collections with user-specified replica numbers
- Add is_user_specified_replica_mode field to preserve user settings
- Add comprehensive unit tests with mockey

Ensures existing collections use latest cluster-level config after
restart.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-07-10 14:28:48 +08:00
cai.zhang
3ffd44f302
fix: Fix remaining issues with Datanode pooling and StorageV2 (#43147)
issue: #43146

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-07-10 14:26:48 +08:00
Chun Han
07745439b5
fix: empty search groupby result causing crash(#43137) (#43214)
related: #43137

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-07-10 12:04:48 +08:00
Zhen Ye
490c5d5088
fix: lost message version after compatible message modification (#43217)
issue: #43018

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-10 10:36:48 +08:00
tinswzy
c4634d861e
fix: v2.6 WebUI metrics response schema change bug (#42957)
#42919  
fix metrics response schema incompatibility with WebUI v2.6

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-07-08 22:56:47 +08:00
cai.zhang
6989e18599
enhance: Move sort stats task to sort compaction (#42562)
issue: #42560

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-07-08 20:22:47 +08:00
Zhen Ye
ed9aa1d4db
fix: limit GC concurrency as CPU number (#43165)
issue: #42833

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-08 10:46:46 +08:00
Ted Xu
6153272d4b
enhance: disabling max entry limit by default (#43166)
See: #43055

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-07-08 10:10:46 +08:00
yihao.dai
9cbd194c6b
fix: Prevent import from generating small binlogs (#43132)
- Introduce dynamic buffer sizing to avoid generating small binlogs
during import
- Refactor import slot calculation based on CPU and memory constraints
- Implement dynamic pool sizing for sync manager and import tasks
according to CPU core count

issue: https://github.com/milvus-io/milvus/issues/43131

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-07-07 21:32:47 +08:00
Zhen Ye
46b6f1b9e2
fix: panic when logging a old message should be skipped (#43076)
issue: #43074

- fix: panic when logging a old message should be skipped, #43074 
- fix: make the ack of broadcaster idompotent, #43026
- fix: lost dropping collection when upgrading, #43092
- fix: panic when DropPartition happen after DropCollection, #43027,
#43078

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-04 16:04:44 +08:00
cai.zhang
4133e3b8fd
fix: Enable merge sort and fix sort bug (#43080)
issue: #42980, #43034

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-07-04 10:18:44 +08:00
Zhen Ye
e97e44d56e
enhance: limit the gc concurrency when cpu is high (#43059)
issue: #42833

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-04 09:22:43 +08:00
sparknack
7e855f1046
enhance: add disk file writer with Direct IO support (#42665)
issue: #43040 

This patch introduces a disk file writer that supports Direct IO.

Currently, it is exclusively utilized during the QueryNode load process.

Below is its parameters:

1. `common.diskWriteMode`
This parameter controls the write mode of the local disk, which is used
to write temporary data downloaded from remote storage.
Currently, only QueryNode uses 'common.diskWrite*' parameters. Support
for other components will be added in the future.
The options include 'direct' and 'buffered'. The default value is
'buffered'.

2. `common.diskWriteBufferSizeKb`
Disk write buffer size in KB, only used when disk write mode is
'direct', default is 64KB.
Current valid range is [4, 65536]. If the value is not aligned to 4KB,
it will be rounded up to the nearest multiple of 4KB.

3. `common.diskWriteNumThreads`
This parameter controls the number of writer threads used for disk write
operations. The valid range is [0, hardware_concurrency].
It is designed to limit the maximum concurrency of disk write operations
to reduce the impact on disk read performance.
For example, if you want to limit the maximum concurrency of disk write
operations to 1, you can set this parameter to 1.
The default value is 0, which means the caller will perform write
operations directly without using an additional writer thread pool.
In this case, the maximum concurrency of disk write operations is
determined by the caller's thread pool size.

Both parameters can be updated during runtime.

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-07-02 22:18:44 +08:00
congqixia
7bc7b18ed5
fix: [AddField] Prevent concurrent load during UpdateSchema (#43043)
Related to #43028

This PR:
- Add mutex prevent concurrent load segment & schema change
- Add schema verison field in load meta
- Update schema in PutOrRef if schema verison is larger

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-02 17:38:44 +08:00
Zhen Ye
08fff353af
fix: Revert "enhance: Enable mergeSort by default starting from version 2.6.0 (#42981)" (#43046)
issue: #43034

- implementation of mergeSortMultipleSegments is wrong.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-01 17:30:29 +08:00
cai.zhang
c82943dca1
enhance: Enable mergeSort by default starting from version 2.6.0 (#42981)
issue: #42980 

Enable mergeSort for mix compaction to reduce sort stats tasks.

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-06-30 21:46:43 +08:00