Phi_accrual - source-agnostic φ-accrual failure detector for Elixir/OTP

Two libraries. φ-accrual failure detector (Hayashibara 2004) split into core + transport.

phi_accrual 1.0phi_accrual | Hex

Source-agnostic. Feed in heartbeat arrivals, read out φ. No transport, no membership, no thresholding baked in.

  • Dual-α EWMA (separate smoothing for mean and variance)
  • Four-state result: :steady / :recovering / :insufficient_data / :stale
  • :erlang.system_monitor integration → local_pause? + confidence flags on φ output (:long_gc, :long_schedule, :busy_dist_port)
  • Per-node estimator GenServer under DynamicSupervisor + Registry
  • Overload shedding with telemetry
  • Optional PhiAccrual.Threshold module with hysteresis — multiple instances coexist (φ=4 for dashboards, φ=8 for routing)
  • Telemetry schema committed under SemVer, breaking changes only in v2
PhiAccrual.observe(:"peer@host")
PhiAccrual.phi(:"peer@host")
#=> {:ok, 0.42, :steady}

Bundled DistributionPing reference source for apps with no existing cross-node chatter. Inherits BEAM distribution head-of-line blocking — observable via confidence: false on φ events, but not fixable at this layer.

phi_accrual_udp 0.1.x (alpha)phi_accrual_udp | Hex

Dedicated UDP socket source. Escapes BEAM distribution HoL. Public API and wire format may change before 1.0 based on real-deployment feedback.

Wire format is 12 bytes fixed:

<<magic::16, version::8, flags::8, timestamp::64-unsigned>>
magic     = 0xCEA6
version   = 0x01
flags     = 0x00   (reserved)
timestamp = u64 ms (sender's clock; diagnostic only)

Receiver-driven clock discipline — the EWMA uses local monotonic receipt time, never the packet timestamp. Packet timestamp is for one-way delay diagnostics when NTP-synced, nothing more.

UDP is unauthenticated. Default node_resolver returns {ip, port} which is fine for dev/demos but proliferates estimator state under Sender restarts and NAT timeouts. Production deployments should supply a stable resolver mapping {ip, port} → application identifier.

Why split. φ math has zero variety — closed. Transport variety is high (UDP, BEAM dist, gRPC, MQTT, EDI cadence). Separate packages, separate repos, independent versioning. Anyone can write another transport against the public PhiAccrual.observe/2 contract.

Limitations worth knowing before wiring φ to anything irreversible:

  • Gaussian assumption misbehaves under bimodal BEAM-GC inter-arrival distributions. Correlate with :erlang.statistics(:garbage_collection) before acting on high φ. Non-parametric / mixture estimator is a v2 consideration once real traces exist.
  • One :erlang.system_monitor per node — if another library subscribes, one wins silently.

Apache-2.0 both. Feedback welcome, especially on UDP from anyone running it in anger.