730 Commits

Author SHA1 Message Date
Zhen Ye
0d5e0ca795
fix: close timetick protection by default (#43650)
issue: #43266

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-30 19:51:37 +08:00
Buqian Zheng
052fb6c562
feat: add time based eviction to data managed by cachinglayer (#43490)
issue: https://github.com/milvus-io/milvus/issues/41435

also added disk capacity protection

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-07-29 16:17:35 +08:00
tinswzy
173efe2b98
enhance: wp metrics and update deps to v0.1.0 (#43569)
#43574   #43604 #43431  #43603 
Fix wp metrics not registered bug;
Update the version dependent on wp to v0.1.2-rc1;
improve advanced reader with concurrent prefetch blks;
add the segment rolling policy based on the number of blocks;
improve concurrent compaction
release lock failed bug

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-07-29 14:51:35 +08:00
yihao.dai
a29b3272b0
fix: Improve import memory management to prevent OOM (#43568)
1. Use blocking memory allocation to wait until memory becomes available
2. Perform memory allocation at the file level instead of per task
3. Limit Parquet file reader batch size to prevent excessive memory
consumption
4. Limit import buffer size from 20% to 10% of total memory

issue: https://github.com/milvus-io/milvus/issues/43387,
https://github.com/milvus-io/milvus/issues/43131

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-07-28 21:25:35 +08:00
yihao.dai
9fbd41a97d
fix: Adjust binlog and parquet reader buffer size for import (#43495)
1. Modify the binlog reader to stop reading a fixed 4096 rows and
instead use the calculated bufferSize to avoid generating small binlogs.
2. Use a fixed bufferSize (32MB) for the Parquet reader to prevent OOM.

issue: https://github.com/milvus-io/milvus/issues/43387

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-07-23 21:28:54 +08:00
Zhen Ye
5aa7a116d2
fix: change maxTimeTickDelay from 5m into 20m (#43377)
issue: #43266

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-18 11:29:42 +08:00
tinswzy
26f2de4bcf
fix: fence failure and remove list API usage (#43365)
#43356  #43370 fence fail ; goroutine leaks
#43313 record too large

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-07-18 11:22:51 +08:00
Buqian Zheng
d793def47c
feat: impose a physical memory limit when loading cells (#43222)
issue: #41435 

issue: https://github.com/milvus-io/milvus/issues/43038

This PR also:


1. removed ERROR state from ListNode
2. CacheSlot will do reserveMemory once for all requested cells after
updating the state to LOADING, so now we transit a cell to LOADING
before its resource reservation
3. reject resource reservation directly if size >= max_size

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-07-18 11:18:52 +08:00
Zhen Ye
07fa2cbdd3
enhance: wal balance consider the wal status on streamingnode (#43265)
issue: #42995

- don't balance the wal if the producing-consuming lag is too long.
- don't balance if the rebalance is set as false.
- don't balance if the wal is balanced recently.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-18 11:10:51 +08:00
sthuang
4f17640598
enhance: [StorageV2] clean up legacy flag (#43290)
related: #39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-07-15 10:18:49 +08:00
Zhen Ye
15a6631147
enhance: add quota limit based on sn consuming lag (#43105)
issue: #42995

- The consuming lag at streaming node will be reported to coordinator.
- The consuming lag will trigger the write limit and deny by quota
center.
- Set the ttProtection by default.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-11 14:10:49 +08:00
PjJinchen
a90694165b
feat: Supports tracing services that require header-based authentication. (#43211)
issue: https://github.com/milvus-io/milvus/issues/43082

support tracing services that require header-based authentication.
for example: aliyun SLS, volcengine LogService etc...

[aliyun
SLS](https://help.aliyun.com/zh/sls/import-trace-data-from-golang-applications-to-log-service-by-using-opentelemetry-sdk-for-golang?spm=a2c4g.11186623.help-menu-search-28958.d_1#section-ktk-xxz-8om)

Add a headers config in trace config

```
trace:
  exporter: otlp
  sampleFraction: 1
  otlp:
    endpoint:  milvus-cn-beijing-pre.cn-beijing.log.aliyuncs.com:10010
    method:  # otlp export method, acceptable values: ["grpc", "http"],  using "grpc" by default
    secure: true
    headers:  # base64
  initTimeoutSeconds: 10
```

it is encoded as base64, raw data is json
```
{
    "x-sls-otel-project": "milvus-cn-beijing-pre",
    "x-sls-otel-instance-id": "milvus-cn-beijing-pre",
    "x-sls-otel-ak-id": "xxx",
    "x-sls-otel-ak-secret": "xxx"
}
```

[volcengine
tls](https://www.volcengine.com/docs/6470/812322#grpc-%E5%8D%8F%E8%AE%AE%E5%88%9D%E5%A7%8B%E5%8C%96%E7%A4%BA%E4%BE%8B)

Add a headers config in trace config

```
trace:
  exporter: otlp
  sampleFraction: 1
  otlp:
    endpoint:  xxx
    method:  # otlp export method, acceptable values: ["grpc", "http"],  using "grpc" by default
    secure: true
    headers:  # base64
  initTimeoutSeconds: 10
```

it is encoded as base64, raw data is json
```
{
    "x-tls-otel-region": "cn-beijing",
    "x-tls-otel-tracetopic": "milvus-cn-beijing-pre",
    "x-tls-otel-ak": "xxx",
    "x-tls-otel-sk": "xxx"
}
```

Signed-off-by: PjJinchen <6268414+pj1987111@users.noreply.github.com>
2025-07-10 17:32:48 +08:00
cai.zhang
6989e18599
enhance: Move sort stats task to sort compaction (#42562)
issue: #42560

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-07-08 20:22:47 +08:00
Zhen Ye
ed9aa1d4db
fix: limit GC concurrency as CPU number (#43165)
issue: #42833

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-08 10:46:46 +08:00
Ted Xu
6153272d4b
enhance: disabling max entry limit by default (#43166)
See: #43055

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-07-08 10:10:46 +08:00
yihao.dai
9cbd194c6b
fix: Prevent import from generating small binlogs (#43132)
- Introduce dynamic buffer sizing to avoid generating small binlogs
during import
- Refactor import slot calculation based on CPU and memory constraints
- Implement dynamic pool sizing for sync manager and import tasks
according to CPU core count

issue: https://github.com/milvus-io/milvus/issues/43131

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-07-07 21:32:47 +08:00
cai.zhang
4133e3b8fd
fix: Enable merge sort and fix sort bug (#43080)
issue: #42980, #43034

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-07-04 10:18:44 +08:00
Zhen Ye
e97e44d56e
enhance: limit the gc concurrency when cpu is high (#43059)
issue: #42833

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-04 09:22:43 +08:00
sparknack
7e855f1046
enhance: add disk file writer with Direct IO support (#42665)
issue: #43040 

This patch introduces a disk file writer that supports Direct IO.

Currently, it is exclusively utilized during the QueryNode load process.

Below is its parameters:

1. `common.diskWriteMode`
This parameter controls the write mode of the local disk, which is used
to write temporary data downloaded from remote storage.
Currently, only QueryNode uses 'common.diskWrite*' parameters. Support
for other components will be added in the future.
The options include 'direct' and 'buffered'. The default value is
'buffered'.

2. `common.diskWriteBufferSizeKb`
Disk write buffer size in KB, only used when disk write mode is
'direct', default is 64KB.
Current valid range is [4, 65536]. If the value is not aligned to 4KB,
it will be rounded up to the nearest multiple of 4KB.

3. `common.diskWriteNumThreads`
This parameter controls the number of writer threads used for disk write
operations. The valid range is [0, hardware_concurrency].
It is designed to limit the maximum concurrency of disk write operations
to reduce the impact on disk read performance.
For example, if you want to limit the maximum concurrency of disk write
operations to 1, you can set this parameter to 1.
The default value is 0, which means the caller will perform write
operations directly without using an additional writer thread pool.
In this case, the maximum concurrency of disk write operations is
determined by the caller's thread pool size.

Both parameters can be updated during runtime.

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-07-02 22:18:44 +08:00
Zhen Ye
08fff353af
fix: Revert "enhance: Enable mergeSort by default starting from version 2.6.0 (#42981)" (#43046)
issue: #43034

- implementation of mergeSortMultipleSegments is wrong.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-01 17:30:29 +08:00
cai.zhang
c82943dca1
enhance: Enable mergeSort by default starting from version 2.6.0 (#42981)
issue: #42980 

Enable mergeSort for mix compaction to reduce sort stats tasks.

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-06-30 21:46:43 +08:00
Zhen Ye
8367e4ec6a
fix: set 72h for wal retention (#42910)
issue: #42706

Signed-off-by: chyezh <chyezh@outlook.com>
2025-06-27 17:36:43 +08:00
aoiasd
e2566c0e92
enhance: bm25 stats local cache use local storage path (#42923)
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-06-25 13:44:46 +08:00
Zhen Ye
a081906fb4
enhance: smaller backoff configuration for wal balancer to make faster recovery (#42869)
issue: #42835

Signed-off-by: chyezh <chyezh@outlook.com>
2025-06-23 10:32:40 +08:00
cai.zhang
d122e6d1e2
enhance: Make Web UI toggleable via config (#42814)
issue: #42813

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-06-18 13:02:38 +08:00
tinswzy
754ca58469
enhance: improve WP parameters according to performance testing (#42666) 2025-06-13 20:33:41 +08:00
Zhen Ye
1f66b650e9
fix: pulsar cannot work properly if backlog exceed (#42653)
issue: #42649

- the sync operation of different pchannel is concurrent now.
- add a option to notify the backlog clear automatically.
- make pulsar walimpls can be recovered from backlog exceed.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-06-13 14:28:37 +08:00
tinswzy
36a4b74fc0
enhance: Adjust the default parameters of WP according to performance tests (#42598)
#42595

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-06-10 16:30:35 +08:00
yihao.dai
837349dead
enhance: Adjust default import buffer size (#42541)
Increase insert buffer size from 16MB to 64MB, while keeping delete
buffer size at 16MB.

issue: https://github.com/milvus-io/milvus/issues/42518

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-06-09 13:02:33 +08:00
yihao.dai
ee9d08375d
enhance: Accelerate dispatcher building (#42500)
Reduce check interval to accelerate dispatcher building.

issue: https://github.com/milvus-io/milvus/issues/42067

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-06-05 16:36:32 +08:00
Zhen Ye
0567f512b3
fix: streamingnode get stucked when stop (#42501)
issue: #42498

- fix: sealed segment cannot be flushed after upgrading
- fix: get mvcc panic when upgrading
- ignore the L0 segment when graceful stop of querynode.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-06-05 12:22:31 +08:00
Ted Xu
35c17523de
feat: limit search result entries (#42522)
See: #42521

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-06-05 12:08:33 +08:00
tinswzy
f55f900c85
fix insert hang caused by WAL writer writing to a closing logfile (#42078)
related issue #42049 
wp commit
[94de4](94de4cbc60)

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-06-03 09:58:36 +08:00
Zhen Ye
b94cee2413
fix: growing segment from old arch is not flushed after upgrading (#42164)
issue: #42162

- enhance: add read ahead buffer size issue #42129
- fix: rocksmq consumer's close operation may get stucked
- fix: growing segment from old arch is not flushed after upgrading

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-29 23:00:28 +08:00
junjiejiangjjj
4202c775ba
feat: Support vllm and tei rerank (#41947)
https://github.com/milvus-io/milvus/issues/35856

Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>
2025-05-28 19:18:28 +08:00
Buqian Zheng
7243c1d0ce
feat: remove async warmup policy (#42123)
issue: https://github.com/milvus-io/milvus/issues/41993

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-05-28 10:30:28 +08:00
cqy123456
5fe7015f63
enhance: InterimIndex support more index type and data type (#41021)
issue: https://github.com/milvus-io/milvus/issues/27678
cherry pick from : https://github.com/milvus-io/milvus/pull/39180,
https://github.com/milvus-io/milvus/pull/40429

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2025-05-28 08:40:28 +08:00
wei liu
54619eaa2c
feat: Implement partial result support on node down (#42009)
issue: https://github.com/milvus-io/milvus/issues/41690
This commit implements partial search result functionality when query
nodes go down, improving system availability during node failures. The
changes include:

- Enhanced load balancing in proxy (lb_policy.go) to handle node
failures with retry support
- Added partial search result capability in querynode delegator and
distribution logic
- Implemented tests for various partial result scenarios when nodes go
down
- Added metrics to track partial search results in querynode_metrics.go
- Updated parameter configuration to support partial result required
data ratio
- Replaced old partial_search_test.go with more comprehensive
partial_result_on_node_down_test.go
- Updated proto definitions and improved retry logic

These changes improve query resilience by returning partial results to
users when some query nodes are unavailable, ensuring that queries don't
completely fail when a portion of data remains accessible.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-05-28 00:12:28 +08:00
Zhen Ye
212e17c4c5
fix: modify param to use less memory when flush and sync (#42102)
issue: #42097

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-27 10:12:27 +08:00
aoiasd
0fafb706ba
enhance: add segment bm25 stats local cache (#41775)
relate: https://github.com/milvus-io/milvus/issues/41424

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-05-26 18:44:27 +08:00
wei liu
f84650ece0
enhance: Reduce session TTL from 30s to 10s for faster failure detection (#42050)
Optimize session management by reducing the TTL (Time To Live) value for
service registration from 30 seconds to 10 seconds. This change improves
the system's ability to detect service failures more quickly and
enhances overall cluster responsiveness.

Changes include:
- Update default session TTL from 30s to 10s in milvus.yaml
- Adjust DefaultSessionTTL constant from 30 to 10 seconds
- Update SessionTTL default value from 60 to 10 seconds
- Maintain consistent TTL values across configuration files

This optimization reduces the time required for the system to detect
when services become unavailable, leading to faster failover and
improved cluster stability during node failures or network issues.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-05-26 12:04:26 +08:00
Chun Han
d1cfa58a0a
feature: support compact expiry data(#41336) (#42056)
related: #41336

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-05-25 16:46:31 +08:00
Buqian Zheng
2e3539319d
feat: vector field raw data to mmap by default (#41975)
issue: https://github.com/milvus-io/milvus/issues/41435

should address https://github.com/milvus-io/milvus/issues/41774

this PR also: 
* added caching layer memory overhead metric
* re-enable TextMatch.GrowingLoadData test

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-05-22 11:56:25 +08:00
yihao.dai
142bd2fc05
enhance: Pooling for data tasks (#41256)
1. Add global scheduler for datacoord.
2. Define and implement new CreateTask, QueryTask, DropTask interfaces.
3. Refine Import, Compaction, Stats, Index task.

issue: https://github.com/milvus-io/milvus/issues/41123

Co-authored-by: Cai Zhang <cai.zhang@zilliz.com>
2025-05-20 21:06:24 +08:00
aoiasd
b6f251718c
enhance: support print nq and params for search and query (#41802)
relate: https://github.com/milvus-io/milvus/issues/41801

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-05-20 14:10:23 +08:00
tinswzy
4edb1bc6f1
fix: resolve wp WALImpls concurrent read/write bug (#41763)
#41563 #41579 #41842 #41846 #41758
Upgraded the wp dependency to incorporate recent fixes addressing
multiple concurrency bugs in WALImpls.

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-05-16 12:02:27 +08:00
wei liu
2d0ae3a709
fix: unexpected password for root user (#41817)
issue: #41816 
pr #37983 introduced an issue, if doesn't specified
`defaultRootPassword` in milvus.yaml, then `"Milvus"` will be used as
default password for root user, instead of `Milvus`.

This PR fix the unexpected password for root, and add comment for case
which use large numeric password requires double quotes.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-05-14 19:42:22 +08:00
Zhen Ye
7beafe99a7
enhance: implement wal garbage collector with truncate api (#41770)
issue: #41544

- add a truncator implementation into wal recovery storage.
- add metrics for recovery storage.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-13 22:08:56 +08:00
zhagnlu
f094d026f8
fix: add params to ignore config type exception (#41776)
#41707

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-05-13 13:48:56 +08:00
Zhen Ye
61b6ca5b73
enhance: add in mem shard manager (#41749)
issue: #41544

- Implement in-memory shard manager to maintain the shard state at write
ahead.
- Remove all rpc and meta operation at write ahead, make the segment
assignment logic only use wal and memory.
- Refactor global stats management, add node-level flush policy.
- Fix the recovery storage inconsistency bug when graceful close.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-13 12:04:56 +08:00