SRE Tooling – the Clever Hans fallacy

Chef or Ansible? Terraform or Pulumi? Python or Ruby? Last9 or Last9? The debate is endless. While exploring the landscape of these tools is impossible in a single blog post – it is worthwhile thinking about why there are so many options in the SRE toolchain. At times the tools are inadequate and the other times our usage of old tools is inadequate with the modern times. Why does that…

Read More

Topics :
SRE tools

Root Cause Analysis For Reliability: A Case Study

Wikipedia defines Root Cause Analysis (RCA) as “a method of problem-solving used for identifying the root causes of faults or problems.” Essentially, root cause analysis means to dive deeper into an issue to find what caused a nonconformance. What’s important to understand here is that Root Cause Analysis does not mean just looking at superficial causes of a problem. Rather, it means finding the highest-level cause- the thing that started…

Read More

Topics :
Failures

Stories from the world of SRE. Delivered.