congqixia 33060d9cf2
fix: avoid concurrent Reset/Add operations on DataCoord metrics (#44789)
This commit addresses issue #44788 where the
`datacoord_stored_binlog_size` metric could become inaccurate when
multiple concurrent `GetMetrics` calls arrived at DataCoord.

### Problem

The original implementation called `Reset()` followed by `Add()`
operations on Prometheus metrics within the `GetQuotaInfo()` method.
When multiple goroutines invoked this method concurrently, race
conditions occurred:
- Thread 1: Reset() → Add(value1)
- Thread 2: Reset() → Add(value2)
- Result: Metrics could be reset multiple times and values added in an
interleaved fashion, leading to inaccurate and inflated metric values

### Solution

Changed the approach from `Reset() + Add()` to aggregating metric values
in local maps first, then using `Set()` to update metrics atomically:

1. Collect segment size data into local maps:
   - `storedBinlogSize`: tracks size per collection per segment state
   - `binlogFileSize`: tracks total file count per collection
   - `coll2DbName`: maps collection IDs to database names

2. After aggregation is complete, use `Set()` (instead of `Add()`) to
update metrics in a single operation per label combination

This ensures that concurrent `GetMetrics` calls don't interfere with
each other, as each invocation works with its own local state and only
updates the final metric value atomically.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-10-13 18:39:59 +08:00
..
2023-09-21 09:45:27 +08:00
2023-09-26 17:15:27 +08:00
2021-11-16 15:41:11 +08:00

Data Coordinator

Data cooridnator(datacoord for short) is the component to organize DataNodes and segments allocations.

Dependency

  • KV store: a kv store has all the meta info datacoord needs to operate. (etcd)
  • Message stream: a message stream to communicate statistics information with data nodes. (Pulsar)
  • Root Coordinator: timestamp, id and meta source.
  • Data Node(s): could be an instance or a cluster, actual worker group handles data modification operations.