We’ve raised a $11M Series A led by Sequoia Capital India!
We’ve raised a $11M Series A led by Sequoia Capital India!

Change is the only constant in a cloud environment. The number of microservices is constantly growing and each of these is being deployed several times a day or week, all hosted on ephemeral servers. A typical customer request depends on at least 3 internal and 1 external service. It’s

Read ->
Why Service Level Objectives?
Why Service Level Objectives?

Understanding how to measure the health of your servcie, benefits of using SLOs, how to set compliances and much more...

Read ->
How to Improve On-Call Experience!
How to Improve On-Call Experience!

Better practices and tools for management of on-call practices

Read ->
Best Practices for Postmortems: A guide
Best Practices for Postmortems: A guide

The ins and outs of conducting an effective postmortem. Ready templates and examples from leading organizations around the world!

Read ->
Choosing Effective SLIs
Choosing Effective SLIs

Practical advice to choose an effective SLI.

Read ->
Running a Database on EC2 is Slowing It Down
Running a Database on EC2 is Slowing It Down

Learn everything about the advantages of EC2, it's use cases and how to optimize EC2 further.

Read ->
Deployment Readiness Checklists
Deployment Readiness Checklists

A ready checklist of a comprehensive list of steps and activities involved in the deployment of your application.

Read ->
The most interesting talks from SREcon21!
The most interesting talks from SREcon21!

Learn about some of the most interesting talks from SREcon21.

Read ->
Doing SRE the Right Way!
Doing SRE the Right Way!

A well-thought-out approach to SRE, which will help site reliability engineers and software engineers develop and maintain a useful, consistent, and effective SRE strategy for their products!

Read ->
Getting the big picture with Log Analysis
Getting the big picture with Log Analysis

How to get the most out of your logs!

Read ->
Microservices - Tracking Dependencies
Microservices - Tracking Dependencies

Quick primer into microservices architecture and the importance of tracking dependencies

Read ->
Latency SLO
Latency SLO

How do you set Latency based alerts? The most common measurement is a percentile-based expression like: 95% of the requests must complete within 350ms. But is it as simple?

Read ->
Deeper Dive into SLO: Effects on Development, Culture and Performance
Deeper Dive into SLO: Effects on Development, Culture and Performance

Thanks to Service Level Objectives (SLOs), your teams have a numerical threshold for system availability, so everyone has a clear vision of what keeps the users and the business happy.

Read ->
Monorepos - The Good, Bad, and Ugly
Monorepos - The Good, Bad, and Ugly

A monorepo is a single version control repository that holds all the code, configuration files, and components required for your project (including services like search) and it’s how most projects start. However, as a project grows, there is debate as to whether the project's code should be split into

Read ->
Components in Designing Effective SLOs
Components in Designing Effective SLOs

Service Level Objectives or SLOs serve as an objective measure of your system's performance. And when designed well, SLOs can help you direct engineering efforts effectively. It does not matter whether you're working in a startup or a tech giant; there is always a natural tension between the speed of

Read ->
Strace – A Hidden Superpower
Strace – A Hidden Superpower

As with any operating system, it’s not uncommon to encounter issues while running Linux and associated applications. This is especially true while using closed-source programs since granular code inspection isn’t possible.

Read ->
A Primer on Saturation SLO: What Is It and Do You Need to Consider It?
A Primer on Saturation SLO: What Is It and Do You Need to Consider It?

What is Saturation and why should you think about it as an SLO? Saturation can be understood as the load on your network and server resources.

Read ->
Sleep Friendly Alerting
Sleep Friendly Alerting

We've all been woken up with that dreaded Slack notification at ungodly hours only to realise that the alert was all smoke and no fire. The perfect recipe for dread and alert fatigue.

Read ->
Services; not Server
Services; not Server

Gone are the days of yore when we named are our servers Etsy, Betsy, and Momo, fed them fish, and cleaned their poop.

Read ->
Systems Observability
Systems Observability

Observability is not just about being able to ask questions to your systems. It's also about getting those answers in minutes and not hours.

Read ->
AWS security groups: canned answers and exploratory questions
AWS security groups: canned answers and exploratory questions

While using a Terraform lifecycle rule, what do you do when you get a canned response from a security group?

Read ->
If it ain't broke...
If it ain't broke...

A Terraform lifecycle rule in the right place can help prevent a deadlock. But the same lifecycle rule in the wrong place?

Read ->
mv aws-security-group shoot-foot
mv aws-security-group shoot-foot

How you can run into an unplanned downtime while making a seemingly harmless change of renaming an AWS security group through Terraform?

Read ->
Rescuing a SPAghetti React project
Rescuing a SPAghetti React project

I gave a talk at react.geekle.us today about improving reliability of our React app. Here are slides of that talk. Here is transcript of the talk. Hello all, my name is Prathamesh Sonpatki. I work at Last9 building a world class operational intelligence platform for SREs. The Last9

Read ->
One year at Last9
One year at Last9

I completed one year at Last9 today. When I joined Last9 on April 20th 2020 last year, I was unsure how it will pan out. I only knew Nishant and Piyush - founders of Last9 from Pune tech community. But I had never worked with them before. Some of the

Read ->
Much That We Have Gotten Wrong About SRE
Much That We Have Gotten Wrong About SRE

An illustrated summary of Developers ➡ DevOps ➡ SRE

Read ->
Infrastructure-As-Code-As-Software
Infrastructure-As-Code-As-Software

We ran a poll on Twitter“Do you care about the quality of your infrastructure code?”And on RedditThat’s an approximate and staggering 60–30–10 split. What do you think will the response be if the poll was — “Do you care about the quality of your product code?

Read ->
SLOs That Lie
SLOs That Lie

What are SLOs and how do you define them. We usually set SLOs that might not accurately define what the requirements are. Here's a look at SLOs That Lie! SLO is an acronym for Service Level Objective. But before I explain SLO, you need one more acronym SLI (Service Level

Read ->
Latency Percentiles are Incorrect P99 of the Times
Latency Percentiles are Incorrect P99 of the Times

What are P90, P95, and P99 latency? Why are they incorrect P99 of the times? Latency is for a unit of time and the preferred aggregate is percentile.

Read ->
SRE Tooling – the Clever Hans fallacy
SRE Tooling – the Clever Hans fallacy

Chef or Ansible? Terraform or Pulumi? Python or Ruby? Last9 or Last9? What if we told you that the mindset of building new tools has an age old link to the story of a horse who could do arithmetic?

Read ->
Root Cause Analysis For Reliability: A Case Study
Root Cause Analysis For Reliability: A Case Study

Let's explore the importance of RCAs in Site Reliability Engineering, why use RCAs, and our take on what constitutes a “good” RCA.

Read ->

SRE with Last9 is incredibly easy. But don’t just take our word for it.