
Change is the only constant in a cloud environment. The number of microservices is constantly growing and each of these is being deployed several times a day or week, all hosted on ephemeral servers. A typical customer request depends on at least 3 internal and 1 external service. It’s
Read ->Understanding how to measure the health of your servcie, benefits of using SLOs, how to set compliances and much more...
Read ->Better practices and tools for management of on-call practices
Read ->The ins and outs of conducting an effective postmortem. Ready templates and examples from leading organizations around the world!
Read ->Learn everything about the advantages of EC2, it's use cases and how to optimize EC2 further.
Read ->A ready checklist of a comprehensive list of steps and activities involved in the deployment of your application.
Read ->Learn about some of the most interesting talks from SREcon21.
Read ->A well-thought-out approach to SRE, which will help site reliability engineers and software engineers develop and maintain a useful, consistent, and effective SRE strategy for their products!
Read ->Quick primer into microservices architecture and the importance of tracking dependencies
Read ->How do you set Latency based alerts? The most common measurement is a percentile-based expression like: 95% of the requests must complete within 350ms. But is it as simple?
Read ->Thanks to Service Level Objectives (SLOs), your teams have a numerical threshold for system availability, so everyone has a clear vision of what keeps the users and the business happy.
Read ->A monorepo is a single version control repository that holds all the code, configuration files, and components required for your project (including services like search) and it’s how most projects start. However, as a project grows, there is debate as to whether the project's code should be split into
Read ->Service Level Objectives or SLOs serve as an objective measure of your system's performance. And when designed well, SLOs can help you direct engineering efforts effectively. It does not matter whether you're working in a startup or a tech giant; there is always a natural tension between the speed of
Read ->
As with any operating system, it’s not uncommon to encounter issues while running Linux and associated applications. This is especially true while using closed-source programs since granular code inspection isn’t possible.
Read ->What is Saturation and why should you think about it as an SLO? Saturation can be understood as the load on your network and server resources.
Read ->
We've all been woken up with that dreaded Slack notification at ungodly hours only to realise that the alert was all smoke and no fire. The perfect recipe for dread and alert fatigue.
Read ->Gone are the days of yore when we named are our servers Etsy, Betsy, and Momo, fed them fish, and cleaned their poop.
Read ->Observability is not just about being able to ask questions to your systems. It's also about getting those answers in minutes and not hours.
Read ->
While using a Terraform lifecycle rule, what do you do when you get a canned response from a security group?
Read ->
A Terraform lifecycle rule in the right place can help prevent a deadlock. But the same lifecycle rule in the wrong place?
Read ->
How you can run into an unplanned downtime while making a seemingly harmless change of renaming an AWS security group through Terraform?
Read ->I gave a talk at react.geekle.us today about improving reliability of our React app. Here are slides of that talk. Here is transcript of the talk. Hello all, my name is Prathamesh Sonpatki. I work at Last9 building a world class operational intelligence platform for SREs. The Last9
Read ->I completed one year at Last9 today. When I joined Last9 on April 20th 2020 last year, I was unsure how it will pan out. I only knew Nishant and Piyush - founders of Last9 from Pune tech community. But I had never worked with them before. Some of the
Read ->
An illustrated summary of Developers ➡ DevOps ➡ SRE
Read ->
We ran a poll on Twitter“Do you care about the quality of your infrastructure code?”And on RedditThat’s an approximate and staggering 60–30–10 split. What do you think will the response be if the poll was — “Do you care about the quality of your product code?
Read ->
What are SLOs and how do you define them. We usually set SLOs that might not accurately define what the requirements are. Here's a look at SLOs That Lie! SLO is an acronym for Service Level Objective. But before I explain SLO, you need one more acronym SLI (Service Level
Read ->
What are P90, P95, and P99 latency? Why are they incorrect P99 of the times? Latency is for a unit of time and the preferred aggregate is percentile.
Read ->
Chef or Ansible? Terraform or Pulumi? Python or Ruby? Last9 or Last9? What if we told you that the mindset of building new tools has an age old link to the story of a horse who could do arithmetic?
Read ->
Let's explore the importance of RCAs in Site Reliability Engineering, why use RCAs, and our take on what constitutes a “good” RCA.
Read ->SRE with Last9 is incredibly easy. But don’t just take our word for it.