48 Commits

Author SHA1 Message Date
Zhen Ye
bb913dd837
fix: simplify go ut (#46606)
issue: #46500

- simplify the run_go_codecov.sh to make sure the set -e to protect any
sub command failure.
- remove all embed etcd in test to make full test can be run at local.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## PR Summary: Simplify Go Unit Tests by Removing Embedded etcd and
Async Startup Scaffolding

**Core Invariant:**
This PR assumes that unit tests can be simplified by running without
embedded etcd servers (delegating to environment-based or external etcd
instances via `kvfactory.GetEtcdAndPath()` or `ETCD_ENDPOINTS`) and by
removing goroutine-based async startup scaffolding in favor of
synchronous component initialization. Tests remain functionally
equivalent while becoming simpler to run and debug locally.

**What is Removed or Simplified:**

1. **Embedded etcd test infrastructure deleted**: Removes
`EmbedEtcdUtil` type and its public methods (SetupEtcd,
TearDownEmbedEtcd) from `pkg/util/testutils/embed_etcd.go`, removes the
`StartTestEmbedEtcdServer()` helper from `pkg/util/etcd/etcd_util.go`,
and removes etcd embedding from test suites (e.g., `TaskSuite`,
`EtcdSourceSuite`, `mixcoord/client_test.go`). Tests now either skip
etcd-dependent tests (via `MILVUS_UT_WITHOUT_KAFKA=1` environment flag
in `kafka_test.go`) or source etcd from external configuration (via
`kvfactory.GetEtcdAndPath()` in `task_test.go`, or `ETCD_ENDPOINTS`
environment variable in `etcd_source_test.go`). This eliminates the
overhead of spinning up temporary etcd servers for unit tests.

2. **Async startup scaffolding replaced with synchronous
initialization**: In `internal/proxy/proxy_test.go` and
`proxy_rpc_test.go`, the `startGrpc()` method signature removes the
`sync.WaitGroup` parameter; components are now created, prepared, and
run synchronously in-place rather than in goroutines (e.g., `go
testServer.startGrpc(ctx, &p)` becomes `testServer.startGrpc(ctx, &p)`
running synchronously). Readiness checks (e.g., `waitForGrpcReady()`)
remain in place to ensure startup safety without concurrency constructs.
This simplifies control flow and reduces debugging complexity.

3. **Shell script orchestration unified with proper error handling**: In
`scripts/run_go_codecov.sh` and `scripts/run_intergration_test.sh`,
per-package inline test invocations are consolidated into a single
`test_cmd()` function with unified `TEST_CMD_WITH_ARGS` array containing
race, coverage, verbose, and other flags. The problematic `set -ex` is
replaced with `set -e` alone (removing debug output noise while
preserving strict error semantics), ensuring the scripts fail fast on
any command failure.

**Why No Regression:**
- Test assertions and code paths remain unchanged; only deployment
source of etcd (embedded → external) and startup orchestration (async →
sync) change.
- Readiness verification (e.g., `waitForGrpcReady()`) is retained,
ensuring components are initialized before test execution.
- Test flags (race detection, coverage, verbosity) are uniformly applied
across all packages via unified `TEST_CMD_WITH_ARGS`, preserving test
coverage and quality.
- `set -e` alone is sufficient for strict failure detection without the
`-x` flag's verbose output.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-12-31 16:07:22 +08:00
aoiasd
d261034af6
enhance: fix unstable config util unit test (#46702)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: config refresh events must reliably propagate updated
values and evict cached entries within a bounded time window; tests must
observe this deterministically without relying on fixed sleeps.
- Logic simplified: brittle fixed time.Sleep delays and separate error
assertions were replaced by assert.Eventually polling blocks that
combine value checks and cache-eviction verification, and consolidated
checks to reduce redundant assertions.
- Why no data loss / no behavior regression: only test synchronization
and assertions were changed—production config manager code paths (value
propagation, KV puts, cache eviction) are untouched; tests now wait for
the same outcomes more robustly, so no mutation of runtime behavior or
storage occurs.
- Enhancement scope: this is a test-stability improvement (no new
runtime capability); it fixes flaky unit tests (root cause: timing
assumptions) by replacing fixed waits with bounded polling and by using
t.Context for KV puts to align test context usage.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-12-31 15:59:21 +08:00
yihao.dai
5b97cb70a0
enhance: Support delaying scanner startup (#46369)
Introduce a ScannerStartupDelay configuration to enable WAL write-only
recovery, allowing fence messages to be persisted during
primary–secondary switchover when the StreamingNode is trapped in crash
loops.

issue: https://github.com/milvus-io/milvus/issues/46368

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added a configurable WAL scanner pause/resume and a consumer request
flag to optionally ignore pause signals.

* **Metrics**
* Added a scanner pause gauge and pause-duration tracking for WAL
scanning.

* **Tests**
* Added coverage for pause-consumption behavior and cleanup in stream
client tests.

* **Chores**
* Consolidated flush-all logging into a single field and added a helper
for bulk message conversion.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-12-24 11:53:19 +08:00
congqixia
80fff56364
enhance: Bump etcd in pkg go.mod (#46420)
Related to #44614
Previous PR: #44666

Bump etcd version in pkg/go.mod to 3.5.23 and update test code
accordingly

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-12-18 16:13:16 +08:00
Xiaofan
7210fc9780
feature: add a prefix on environment config (#40623)
fix #40622

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
2025-03-13 16:44:07 +08:00
congqixia
cb7f2fa6fd
enhance: Use v2 package name for pkg module (#39990)
Related to #39095

https://go.dev/doc/modules/version-numbers

Update pkg version according to golang dep version convention

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-02-22 23:15:58 +08:00
jaime
8a4ac8cccd
enhance: expose more metrics data (#39456)
issue: #36621 #39417
1. Adjust the server-side cache size.
2. Add source information for configurations.
3. Add node ID for compaction and indexing tasks.
4. Resolve localhost access issues to fix health check failures for
etcd.

Signed-off-by: jaime <yun.zhang@zilliz.com>
2025-02-07 11:50:50 +08:00
congqixia
f076898761
fix: Return io error other than NotExist refreshing config (#38924)
Related to #38923

This PR:

- Check whether `os.Stat` config file error is io.ErrNotExist
- Panic when get config return error during Milvus initialization

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-01-08 12:00:56 +08:00
tinswzy
27229f7907
enhance: refine exists log print with ctx (#38080)
issue: #35917 
Refines exists log print with ctx

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2024-12-14 22:36:44 +08:00
congqixia
5afcee6bfa
fix: Store default value if ErrKeyNotFound is returned (#37691)
Related to #37690

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-15 10:24:32 +08:00
foxspy
81141bd18d
fix: use yaml.v2 as yaml parser (#37423)
issue: #34298 
Viper uses yaml.v2 as the parser. This PR will adopt the parsing logic
from Viper to handle YAML files, ensuring maximum consistency in
parsing.

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2024-11-11 21:26:27 +08:00
congqixia
0645d46ec6
fix: Skip EOF error when default empty yaml file (#37445)
Related to #37404

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-05 19:26:24 +08:00
foxspy
1b98bb423a
fix: process null value in yaml (#37418)
issue: #34298 
fix key: null defined in the yaml file. 
viper will parse it as "", and yaml v3 will parse it as "null".

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2024-11-04 21:46:23 +08:00
foxspy
3224e58c5b
enhance: add unify vector index config management (#36846)
issue: #34298

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2024-11-01 06:18:21 +08:00
wei liu
3cd0b26285
enhance: Enable dynamic update loaded collection's replica (#35822)
issue: #35821
After collection loaded, if we need to increase/decrease collection's
replica, we need to release and load it again.

milvus offers 4 solution to update loaded collection's replica, this PR
aims to dynamic change the replica number without release, and after
replica number changed, milvus will execute load replica or release
replica in async, and the replica loaded status can be checked by
getReplicas API.

Notice that if set too much replicas than querynode can afford,the new
replica won't be loaded successfully until enough querynode joins.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-09-25 10:13:18 +08:00
wei liu
cf242f9e09
fix: fix dynamic update config doesn't works for some param (#35572)
issue: #35570
milvus support config cache to spped up config access, but only evict
param's cache when param has been updated. but milvus's param may rely
on other param's value, let's say ParamsA relys on paramsB, when paramsB
updated, it will evict paramB's cache, but the paramA's cache still keep
the old value.

This PR evict all config cache to solve the above issue, cause dynamic
update config won't be much frequetly.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-21 11:02:56 +08:00
aoiasd
3655ab10b2
fix: evict paramtable cache miss (#34771)
relate: https://github.com/milvus-io/milvus/issues/33461

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-08-02 11:52:14 +08:00
SimFG
b58a5617ef
enhance: add the seal segment when dispatch delete msgs (#34565)
/kind improvement

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-07-10 19:14:51 +08:00
congqixia
80b620ebcf
fix: Check err is ErrKeyNotFound when CASCachedValue (#34488)
See also #33785

When config item is not present in paramtable, CAS fails due to
GetConfig returns error.

This PR make this returned err instance of ErrKeyNotFound and check
error type in \`CASCachedValue\` methods.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-08 22:00:16 +08:00
congqixia
c689ef4822
fix: Remove eviction operations out of lock (#33834)
See also #33823

`EvictCacheValueByFormat` may be block by on going `CASCacheValue` and
cause possible deadlock

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-06-13 21:15:55 +08:00
congqixia
b90999b741
fix: Compare config value then swap when caching param value (#33785)
See also #33784

This PR change the behavior of `SetCacheValue` of config manager:

- Use mutex and map instead of concurrent map for `configCache`
- Compare config raw value before set cache value

With this implementation, concurrent caching & eviction shall always
have current output:

|time|caching |eviction|config   |cached   |
|----|--------|------- |---------|---------|
|t0  |get     |        |old value|null     |
|t1  |CAS OK  |        |old value|old value|
|t2  |        |update  |new value|old value|
|t3  |        |eviction|new value|null     |

|time|caching |eviction|config   |cached   |
|----|--------|------- |---------|---------|
|t0  |get     |        |old value|null     |
|t1  |        |update  |new value|null     |
|t2  |CAS fail|        |old value|null     |
|t3  |        |eviction|new value|null     |

|time|caching |eviction|config   |cached   |
|----|--------|------- |---------|---------|
|t0  |        |update  |new value|null     |
|t1  |get     |        |new value|null     |
|t2  |CAS OK  |        |new value|new value|
|t3  |        |eviction|new value|null     |

|time|caching |eviction|config   |cached   |
|----|--------|------- |---------|---------|
|t0  |        |update  |new value|null     |
|t1  |get     |        |new value|null     |
|t2  |        |eviction|new value|null     |
|t3  |CAS OK  |        |new value|new value|

|time|caching |eviction|config   |cached   |
|----|--------|------- |---------|---------|
|t0  |        |update  |new value|null     |
|t1  |        |eviction|new value|null     |
|t2  |get     |        |new value|null     |
|t3  |CAS OK  |        |new value|new value|

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-06-12 18:51:56 +08:00
aoiasd
2422084a29
fix: paramtable cache cause dynamic config non-dynamic (#33473)
relate: https://github.com/milvus-io/milvus/issues/33461

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-06-04 11:39:46 +08:00
Xiaofan
b6fefee0cf
fix: etcd not connectable when auth enabled (#31633)
Fix etcd config source didn't respect auth enabled
Also removed pulsar recoverable error when pulsar return ConsumerBusy.
It could happen that pulsar didn't find the original consumer is dead
and recover takes some time.
fix #31631

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
2024-04-01 15:23:19 +08:00
congqixia
74b7de3814
enhance: Cache formatted key for param item (#31388)
See also #30806

`formatKey` may cost lots of CPU on string processing under high QPS
scenario, this PR adds a formattedKeys cache preventing string operation
in each param get value.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-19 14:05:05 +08:00
aoiasd
b724753137
enhance: Add runtime config to paramtable (#31006)
relate: https://github.com/milvus-io/milvus/issues/30806
Avoid use string convert or format function when get some runtime
parameter

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-03-15 11:07:06 +08:00
Chun Han
3298e64bd3
enhance: cache config values for saving cpu cycles to parse config item (#30947)
related: #30958

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2024-03-12 11:09:04 +08:00
aoiasd
bbff9193d9
enhance: support clean paramtable config event in test (#30534)
relate: https://github.com/milvus-io/milvus/issues/30441

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-02-20 14:16:51 +08:00
aoiasd
8385157717
enhance: adjust config source for support config event use paramtable (#29995)
Adjust config source for support config event which for dynamic config
could use paramtable and not deadlock.
relate: https://github.com/milvus-io/milvus/issues/29807

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-01-26 19:07:00 +08:00
congqixia
d73b534f1e
fix: use atomic.Pointer to store EventHandler in case of data race (#30205)
Resolves #30204

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-23 19:09:04 +08:00
wei liu
797847904c
enhance: Change some frequency log to rated level (#29720)
This PR change some frequency log to rated level

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-11 16:30:50 +08:00
wei liu
1f759837c4
fix: remove Unnecessary lock in config manager (#29836)
issue: #29709 #291712
to avoid concurrent recursive RLock and Lock cause deadlock, This PR
remove the unnecessary lock in config manager

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-11 13:48:49 +08:00
MrPresent-Han
ed644983e2
enhance: add param for bloomfilter(#29388) (#29490)
related: #29388

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2023-12-28 18:10:46 +08:00
aoiasd
89d8ce2f73
enhance: refine access log to support format access log by yaml and print name info. (#28319)
relate: https://github.com/milvus-io/milvus/issues/28086

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2023-11-28 15:32:31 +08:00
congqixia
f1fc19e8a9
enhance: Add unittest for config.EventDispatcher (#28552)
These is no enough unittest case for EventDispatcher, see also #28540. 
This PR add unit test case for all methods for EventDispatcher
Related to #28538

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-11-18 19:12:20 +08:00
congqixia
c948a437a9
fix: protect EventDispatcher map with mutex (#28540)
Add mutex protection for `EventDispatcher.registry` map 
Fix #28538

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-11-17 20:32:20 +08:00
SimFG
7dda2e8814
Change some log level in the pkg package (#28181)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-11-08 23:34:22 +08:00
SimFG
26f06dd732
Format the code (#27275)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-09-21 09:45:27 +08:00
Enwei Jiao
fb0705df1b
Decouple basetable and componentparam (#26725)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-09-05 10:31:48 +08:00
yah01
c3f5856fbc
Fix data race for config with FileSource (#26518)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-08-24 09:18:24 +08:00
congqixia
0bc03ede0d
Add eventlog pkg and support grpc streaming event observation (#25812)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-07-25 17:23:01 +08:00
Gao
de6e4817a2
Support dynamic tuning config (#25152)
Signed-off-by: chasingegg <chao.gao@zilliz.com>
2023-07-03 15:18:24 +08:00
yah01
ebd0279d3f
Check error by Error() and NoError() for better report message (#24736)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-06-08 15:36:36 +08:00
wei liu
8e3ba74648
fix qc service unstable ut (#24340)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-05-24 18:49:25 +08:00
congqixia
7e9ef36de4
Fix TestConfigFromRemote/close_manager ut is not stable (#24279)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-05-22 09:59:25 +08:00
congqixia
3a66e1de65
Use suite for integration test (#24253)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-05-19 15:57:24 +08:00
congqixia
de4b4dafef
Refresh etcd source config with Searializable option (#23954)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-05-09 11:52:40 +08:00
Enwei Jiao
086f3bd748
Add it for refresh config (#23773)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-05-06 17:34:39 +08:00
jaime
c9d0c157ec
Move some modules from internal to public package (#22572)
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-04-06 19:14:32 +08:00