issue: #46500
- simplify the run_go_codecov.sh to make sure the set -e to protect any
sub command failure.
- remove all embed etcd in test to make full test can be run at local.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## PR Summary: Simplify Go Unit Tests by Removing Embedded etcd and
Async Startup Scaffolding
**Core Invariant:**
This PR assumes that unit tests can be simplified by running without
embedded etcd servers (delegating to environment-based or external etcd
instances via `kvfactory.GetEtcdAndPath()` or `ETCD_ENDPOINTS`) and by
removing goroutine-based async startup scaffolding in favor of
synchronous component initialization. Tests remain functionally
equivalent while becoming simpler to run and debug locally.
**What is Removed or Simplified:**
1. **Embedded etcd test infrastructure deleted**: Removes
`EmbedEtcdUtil` type and its public methods (SetupEtcd,
TearDownEmbedEtcd) from `pkg/util/testutils/embed_etcd.go`, removes the
`StartTestEmbedEtcdServer()` helper from `pkg/util/etcd/etcd_util.go`,
and removes etcd embedding from test suites (e.g., `TaskSuite`,
`EtcdSourceSuite`, `mixcoord/client_test.go`). Tests now either skip
etcd-dependent tests (via `MILVUS_UT_WITHOUT_KAFKA=1` environment flag
in `kafka_test.go`) or source etcd from external configuration (via
`kvfactory.GetEtcdAndPath()` in `task_test.go`, or `ETCD_ENDPOINTS`
environment variable in `etcd_source_test.go`). This eliminates the
overhead of spinning up temporary etcd servers for unit tests.
2. **Async startup scaffolding replaced with synchronous
initialization**: In `internal/proxy/proxy_test.go` and
`proxy_rpc_test.go`, the `startGrpc()` method signature removes the
`sync.WaitGroup` parameter; components are now created, prepared, and
run synchronously in-place rather than in goroutines (e.g., `go
testServer.startGrpc(ctx, &p)` becomes `testServer.startGrpc(ctx, &p)`
running synchronously). Readiness checks (e.g., `waitForGrpcReady()`)
remain in place to ensure startup safety without concurrency constructs.
This simplifies control flow and reduces debugging complexity.
3. **Shell script orchestration unified with proper error handling**: In
`scripts/run_go_codecov.sh` and `scripts/run_intergration_test.sh`,
per-package inline test invocations are consolidated into a single
`test_cmd()` function with unified `TEST_CMD_WITH_ARGS` array containing
race, coverage, verbose, and other flags. The problematic `set -ex` is
replaced with `set -e` alone (removing debug output noise while
preserving strict error semantics), ensuring the scripts fail fast on
any command failure.
**Why No Regression:**
- Test assertions and code paths remain unchanged; only deployment
source of etcd (embedded → external) and startup orchestration (async →
sync) change.
- Readiness verification (e.g., `waitForGrpcReady()`) is retained,
ensuring components are initialized before test execution.
- Test flags (race detection, coverage, verbosity) are uniformly applied
across all packages via unified `TEST_CMD_WITH_ARGS`, preserving test
coverage and quality.
- `set -e` alone is sufficient for strict failure detection without the
`-x` flag's verbose output.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #41609
- add env `MILVUS_NODE_ID_FOR_TESTING` to set up a node id for milvus
process.
- add env `MILVUS_CONFIG_REFRESH_INTERVAL` to set up the refresh
interval of paramtable.
- Init paramtable when calling `paramtable.Get()`.
- add new multi process framework for integration test.
- change all integration test into multi process.
- merge some test case into one suite to speed up it.
- modify some test, which need to wait for issue #42966, #42685.
- remove the waittssync for delete collection to fix issue: #42989
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #40532
- start timeticksync at rootcoord if the streaming service is not
available
- stop timeticksync if the streaming service is available
- open a read-only wal if some nodes in cluster is not upgrading to 2.6
- allow to open read-write wal after all nodes in cluster is upgrading
to 2.6
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #29507
Notice that api_testonly.go files should be guarded by compiler tag
`test`, so that production build rules don't compile them and these APIs
don't get misused.
Signed-off-by: yiwangdr <yiwangdr@gmail.com>
This PR mainly improve two items:
1. Target observer should refresh loading status during init time. An
uninitialized loading status blocks search/query. Currently, the target
observer refreshes every 10 seconds, i.e. we'd need to wait for 10s for
no reason. That's also the reason why we constantly see false log
"collection unloaded" upon mixcoord restarts.
2. Delete session when service is stopped. So that the new service
doesn't need to wait for the previous session to expire (~10s).
Item 1 is the major improvement of this PR, which should speed up init
time by 10s.
Item 2 is not a big concern in most cases as coordinators usually shut
down after stop(). In those cases, coordinator restart triggers serverID
change which further triggers an existing logic that deletes expired
session. This PR only fixes rare cases where serverID doesn't change.
integration test:
`go test -tags dynamic -v -coverprofile=profile.out -covermode=atomic
tests/integration/coordrecovery/coord_recovery_test.go -timeout=20m`
Performance after the change:
Average init time of coordinators: 10s
Hardware: M2 Pro
Test setup: 1000 collections with 1000 rows (dim=128) per collection.
issue: #29409
Signed-off-by: yiwangdr <yiwangdr@gmail.com>
currently integration test may timeout if any case run time is above 3
minutes. This duration was hard coded.
This PR change this duration into a customized parameter and could be
passed via test running commands.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>