issue: #46500
- simplify the run_go_codecov.sh to make sure the set -e to protect any
sub command failure.
- remove all embed etcd in test to make full test can be run at local.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## PR Summary: Simplify Go Unit Tests by Removing Embedded etcd and
Async Startup Scaffolding
**Core Invariant:**
This PR assumes that unit tests can be simplified by running without
embedded etcd servers (delegating to environment-based or external etcd
instances via `kvfactory.GetEtcdAndPath()` or `ETCD_ENDPOINTS`) and by
removing goroutine-based async startup scaffolding in favor of
synchronous component initialization. Tests remain functionally
equivalent while becoming simpler to run and debug locally.
**What is Removed or Simplified:**
1. **Embedded etcd test infrastructure deleted**: Removes
`EmbedEtcdUtil` type and its public methods (SetupEtcd,
TearDownEmbedEtcd) from `pkg/util/testutils/embed_etcd.go`, removes the
`StartTestEmbedEtcdServer()` helper from `pkg/util/etcd/etcd_util.go`,
and removes etcd embedding from test suites (e.g., `TaskSuite`,
`EtcdSourceSuite`, `mixcoord/client_test.go`). Tests now either skip
etcd-dependent tests (via `MILVUS_UT_WITHOUT_KAFKA=1` environment flag
in `kafka_test.go`) or source etcd from external configuration (via
`kvfactory.GetEtcdAndPath()` in `task_test.go`, or `ETCD_ENDPOINTS`
environment variable in `etcd_source_test.go`). This eliminates the
overhead of spinning up temporary etcd servers for unit tests.
2. **Async startup scaffolding replaced with synchronous
initialization**: In `internal/proxy/proxy_test.go` and
`proxy_rpc_test.go`, the `startGrpc()` method signature removes the
`sync.WaitGroup` parameter; components are now created, prepared, and
run synchronously in-place rather than in goroutines (e.g., `go
testServer.startGrpc(ctx, &p)` becomes `testServer.startGrpc(ctx, &p)`
running synchronously). Readiness checks (e.g., `waitForGrpcReady()`)
remain in place to ensure startup safety without concurrency constructs.
This simplifies control flow and reduces debugging complexity.
3. **Shell script orchestration unified with proper error handling**: In
`scripts/run_go_codecov.sh` and `scripts/run_intergration_test.sh`,
per-package inline test invocations are consolidated into a single
`test_cmd()` function with unified `TEST_CMD_WITH_ARGS` array containing
race, coverage, verbose, and other flags. The problematic `set -ex` is
replaced with `set -e` alone (removing debug output noise while
preserving strict error semantics), ensuring the scripts fail fast on
any command failure.
**Why No Regression:**
- Test assertions and code paths remain unchanged; only deployment
source of etcd (embedded → external) and startup orchestration (async →
sync) change.
- Readiness verification (e.g., `waitForGrpcReady()`) is retained,
ensuring components are initialized before test execution.
- Test flags (race detection, coverage, verbosity) are uniformly applied
across all packages via unified `TEST_CMD_WITH_ARGS`, preserving test
coverage and quality.
- `set -e` alone is sufficient for strict failure detection without the
`-x` flag's verbose output.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: chyezh <chyezh@outlook.com>
### **User description**
issue: #46507
we use the assign/unassign api to manage the consumer manually, the
commit operation will generate a new consumer group which is not what we
want. so we disable the auto commit to avoid it, also see:
https://github.com/confluentinc/confluent-kafka-python/issues/250#issuecomment-331377925
___
### **PR Type**
Bug fix
___
### **Description**
- Disable auto-commit in Kafka consumer configuration
- Prevents unwanted consumer group creation from manual offset
management
- Clarifies offset reset behavior with explanatory comments
___
### Diagram Walkthrough
```mermaid
flowchart LR
A["Kafka Consumer Config"] --> B["Set enable.auto.commit to false"]
B --> C["Prevent auto consumer group creation"]
A --> D["Set auto.offset.reset to earliest"]
D --> E["Handle deleted offsets gracefully"]
```
<details><summary><h3>File Walkthrough</h3></summary>
<table><thead><tr><th></th><th align="left">Relevant
files</th></tr></thead><tbody><tr><td><strong>Bug
fix</strong></td><td><table>
<tr>
<td>
<details>
<summary><strong>builder.go</strong><dd><code>Disable auto-commit and
add configuration comments</code>
</dd></summary>
<hr>
pkg/streaming/walimpls/impls/kafka/builder.go
<ul><li>Added <code>enable.auto.commit</code> configuration set to
<code>false</code> to prevent <br>automatic consumer group creation<br>
<li> Added explanatory comments for both <code>auto.offset.reset</code>
and <br><code>enable.auto.commit</code> settings<br> <li> Clarifies that
manual assign/unassign API is used for consumer <br>management</ul>
</details>
</td>
<td><a
href="https://github.com/milvus-io/milvus/pull/46508/files#diff-4b5635821fdc8b585d16c02d8a3b59079d8e667b2be43a073265112d72701add">+7/-0</a>
</td>
</tr>
</table></td></tr></tbody></table>
</details>
___
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Bug Fixes
* Kafka consumer now reads from the earliest available messages and
auto-commit has been disabled to support manual offset management.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #44369
woodpecker related[ issue:
#59](https://github.com/zilliztech/woodpecker/issues/59)
Refactor the WAL retention logic in Milvus StreamingNode:
- Remove the simple sampling-based truncation mechanism.
- After flush, WAL data is directly truncated.
- The retention control is now delegated to the underlying message queue
(MQ) implementation.
Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
issue: #44172, #45210, #44851
kafka will auto reset the offset to "latest" if the offset is
Out-of-range. the recovery of milvus wal cannot read any message from
that. So once the offset is out-of-range, kafka should read from eariest
to read the latest uncleared data.
https://kafka.apache.org/documentation/#consumerconfigs_auto.offset.reset
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #43785
- pulsar client will print log into milvus logger now.
- pulsar client open the metric by default.
- upgrade the pulsar client to v0.15.1, and use offical repo.
- the fixing of milvus-io/pulsar-client-go is already covered by
official v0.15.1.
Signed-off-by: chyezh <chyezh@outlook.com>
#43810
Fixed the issue where the result err returned by append timeout was
empty when objectstorage was unavailable, causing the client to
mistakenly believe that the write was successful.
Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
#43638#43810
add internal writer without session lock;
refactor and unify read state and log entry
refactor data reading related methods;
fix bug where a closed writer is reused for finalize;
Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
#43574#43604#43431#43603
Fix wp metrics not registered bug;
Update the version dependent on wp to v0.1.2-rc1;
improve advanced reader with concurrent prefetch blks;
add the segment rolling policy based on the number of blocks;
improve concurrent compaction
release lock failed bug
Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
issue: #42649
- the sync operation of different pchannel is concurrent now.
- add a option to notify the backlog clear automatically.
- make pulsar walimpls can be recovered from backlog exceed.
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #42162
- enhance: add read ahead buffer size issue #42129
- fix: rocksmq consumer's close operation may get stucked
- fix: growing segment from old arch is not flushed after upgrading
---------
Signed-off-by: chyezh <chyezh@outlook.com>
#41846#41894
Resolve SN OOM issue during small file loading in Woodpecker;
Correct WP fence/close execution order;
Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
issue: #41544
- add lock interceptor into wal.
- use recovery and shardmanager to replace the original implementation
of segment assignment.
- remove redundant implementation and unittest.
- remove redundant proto definition.
- use 2 streamingnode in e2e.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #41465
- implement truncate api for pulsar based on durable subscription.
- truncate api can only be called if wal is read-write.
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #38399
- Make a timetick-commit-based write ahead buffer at write side.
- Add a switchable scanner at read side to transfer the state between
catchup and tailing read
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #33285
- using streaming service in insert/upsert/flush/delete/querynode
- fixup flusher bugs and refactor the flush operation
- enable streaming service for dml and ddl
- pass the e2e when enabling streaming service
- pass the integration tst when enabling streaming service
---------
Signed-off-by: chyezh <chyezh@outlook.com>
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
issue: #33285
- support transaction on single wal.
- last confirmed message id can still be used when enable transaction.
- add fence operation for segment allocation interceptor.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #33285
- move streaming related proto into pkg.
- add v2 message type and change flush message into v2 message.
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #33285
- implement streaming service client.
- implement producing and consuming service client by streaming coord
client and streaming node client.
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #33285
- add specialized mutable and immutable message, make type safe.
- add version based constructor and type.
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #33285
- make message builder and message conversion type safe
- add adaptor type and function to adapt old msgstream msgpack and
message interface
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #33285
- add idAlloc interface
- fix binary unsafe bug for message
- fix service discovery lost when repeated address with different server
id
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #33285
- implement producing and consuming server of message
- implement management operation for streaming node server
---------
Signed-off-by: chyezh <chyezh@outlook.com>