🏏 450 million fans watched the last IPL. What is 'Cricket Scale' for SREs? Know More

All Authors / Piyush Verma

Piyush Verma

Co-Founder, CTO at Last9

Find Piyush Verma elsewhere ⏤

Software Monitoring — Stuck in the 00s

Software Monitoring — Stuck in the 00s

A short history of software monitoring, from the 00s. What has changed? Why are things so arcane?

Piyush Verma

How we tame High Cardinality by Sharding a stream

How we tame High Cardinality by Sharding a stream

Using 'Sharding' to tame High Cardinality data for Levitate - Our Time Series Data Warehouse

Piyush Verma

How we tame high cardinality in time series databases

How we tame high cardinality in time series databases

Engineering innovation to solve high cardinality with Levitate - a multi-part series

Piyush Verma, Swati Modi

Who should define Reliability —  Engineering, or Product?

Who should define Reliability — Engineering, or Product?

Whoever owns Reliability should define its parameters. But who owns the Reliability of a Product? Engineering? Product Management? Or the Customer success team?

Piyush Verma

High Cardinality? No Problem! Stream Aggregation FTW

High Cardinality? No Problem! Stream Aggregation FTW

High cardinality in time series data is challenging to manage. But it is necessary to unlock meaningful answers. Learn how streaming aggregations can rein in high cardinality using Levitate.

Piyush Verma

When should I start thinking of observability?

When should I start thinking of observability?

How does one scale metrics maturity in a cloud-native world — A guide on observability tooling as your engineering org scales.

Piyush Verma

Sample vs Metrics vs Cardinality

Sample vs Metrics vs Cardinality

When dealing with Time Series databases, I always got confused with Sample vs Metrics vs Cardinality. Here’s an explanation as I have understood it.

Piyush Verma

Why Service Level Objectives?

Why Service Level Objectives?

Understanding how to measure the health of your servcie, benefits of using SLOs, how to set compliances and much more...

Piyush Verma

The origin of Service Level Objectives

The origin of Service Level Objectives

An obscure term - Service Level Objectives - rules the Software industry. But where does it come from? Strap on your seat belts, this is going to be a bumpy one (pun intended :p)

Akshay Chugh, Piyush Verma

Doing SRE the Right Way!

Doing SRE the Right Way!

A well-thought-out approach to SRE, which will help site reliability engineers and software engineers develop and maintain a useful, consistent, and effective SRE strategy for their products!

Piyush Verma

SLOs eased

SLOs eased

You can either love running or hate running, but you will definitely love this analogy - take a fresh look at SLOs!

Piyush Verma, Saurabh Hirani

Latency SLO

Latency SLO

How do you set Latency based alerts? The most common measurement is a percentile-based expression like: 95% of the requests must complete within 350ms. But is it as simple?

Piyush Verma

Services; not Server

Services; not Server

Gone are the days of yore when we named are our servers Etsy, Betsy, and Momo, fed them fish, and cleaned their poop.

Nishant Modak, Piyush Verma

Systems Observability

Systems Observability

Observability is not just about being able to ask questions to your systems. It's also about getting those answers in minutes and not hours.

Nishant Modak, Piyush Verma

Much That We Have Gotten Wrong About SRE

Much That We Have Gotten Wrong About SRE

An illustrated summary of Developers ➡ DevOps ➡ SRE

Piyush Verma

Infrastructure-As-Code-As-Software

Infrastructure-As-Code-As-Software

We ran a poll on Twitter. “Do you care about the quality of your infrastructure code?” And on Reddit That’s an approximate and staggering 60–30–10 split. What do you think will the response be if the poll was — “Do you care about the quality of your product code?” Reasons We asked a follow-up question to reason why ~30% are in the Somewhat but mostly no category and gleaned these reasons from Twitter and Reddit: 1. Someone manually created the legacy infrastructure. No one questioned t

Piyush Verma

SLOs That Lie

SLOs That Lie

What are SLOs and how do you define them. We usually set SLOs that might not accurately define what the requirements are. Here's a look at SLOs That Lie! SLO is an acronym for Service Level Objective. But before I explain SLO, you need one more acronym SLI (Service Level Indicator) An SLI is a quantitative measurement of a (and not the) quality of a Service. It may be unique to each use-case, but there are certain standard qualities of services that practitioners tend to follow. * Availabili

Piyush Verma

Latency Percentiles are Incorrect P99 of the Times

Latency Percentiles are Incorrect P99 of the Times

What are P90, P95, and P99 latency? Why are they incorrect P99 of the times? Latency is for a unit of time and the preferred aggregate is percentile.

Piyush Verma

SRE Tooling – the Clever Hans fallacy

SRE Tooling – the Clever Hans fallacy

Chef or Ansible? Terraform or Pulumi? Python or Ruby? Last9 or Last9? What if we told you that the mindset of building new tools has an age old link to the story of a horse who could do arithmetic?

Piyush Verma

Root Cause Analysis For Reliability: A Case Study

Root Cause Analysis For Reliability: A Case Study

Let's explore the importance of RCAs in Site Reliability Engineering, why use RCAs, and our take on what constitutes a “good” RCA.

Piyush Verma