Metric aggregation

The percentile trap

Most homegrown distributed setups report per-node percentiles and average them. That number is wrong — often wildly. If agent A's p99 is 100 ms and agent B's p99 is 1000 ms, the fleet's true p99 is not 550 ms; it depends on the full shape of both distributions.

loadr never averages percentiles:

Every agent records trend metrics into HDR histograms (3 significant figures, auto-resizing).
Each second, the agent serializes a delta histogram (HDR V2 encoding) and streams it to the controller.
The controller merges histograms — a lossless operation — into a central aggregator per (metric, tag set).
Percentiles, thresholds, the live UI and the final summary are computed from the merged histograms only.

Counters and rates merge as exact sums (passes/total); gauges keep the most recent value plus min/max envelopes.

This is verified by tests: two in-process agents record disjoint latency ranges (1–1000 ms and 1001–2000 ms); the merged p99 must equal the true p99 of the union (~1980 ms), where naive averaging would claim ~1485 ms.

Tags & per-agent visibility

Every sample an agent emits carries an instance: <agent-name> tag, so the fleet view can show per-agent breakdowns and you can threshold per instance:

thresholds:
  "http_req_duration{instance:agent-1}": [ "p(95)<500" ]

Threshold evaluation

Thresholds run centrally against the merged data — abort_on_fail decisions consider fleet-wide reality, then fan stop commands out to every agent. Local evaluation on agents is disabled in distributed runs to avoid split-brain aborts.

loadr — the load testing platform

Metric aggregation

The percentile trap

Tags & per-agent visibility

Threshold evaluation