Subhash: “If I bring down the Kubernetes Deployment for a service, will it send me an alert?”
“Yeah, it will”
Subhash: “I’ll do it now, then?”
Subhash: “Yeah, why not?”
That was my second meeting with Subhash.
For those who are unaware, Subhash is the Co-Founder and CTO of Dukaan. Dukaan enables merchants with zero programming skills to set up their own e-commerce stores.
I work with Last9 — a reliability platform. In my current role, I help our customers integrate with Last9’s products. As an engineer, focused only on writing code, this was relatively new territory for me. Working with CTOs, Leadership of multiple orgs has been an interesting journey. More on that later.
Nobody dabbles with the production cluster, let alone be so forthright in the second-ever meeting. I was stunned. It then struck me - I was a developer talking to another developer. That’s when I realized the kind of engineering Subhash is building at Dukaan. And what a pleasure it’s been knowing him since that day.
Speak the language — Discovery
This was my first ever call with the Dukaan team. It happened in mid-Feb. (Fun fact: It happened on Feb 14th. Yup, a match made in heaven indeed. 😜) Within the first 2 minutes, I knew I was talking to a solid engineer. Someone who understands tech deeply, and wants to move fast. Subhash is a problem solver, and we quickly bonded on tech.
He had the impatience of an engineer who wanted to ship fast, and I was conscious to remove all sales lexicon within 2 minutes of my conversation. 😂 At this point, Subhash clearly articulated how he had hit limits with Grafana Cloud, and wanted an alternative. He was clear, direct, and terse.
🟢 Talk to me as a developer. Tell me how you can help me.
He walked me through some of Dukaan’s observability blind spots. On the very same day, we worked with their super eager engineering team to iron out a few configuration issues in their reliability stack. Once he got an idea of how we worked, he asked, “Chalo, when can I get a demo?” Before I could respond, he said, “Let’s do it tomorrow?”
Measuring Service Level Objectives, and holding engineers accountable for incident responses is usually an afterthought in orgs that are still scaling. But orgs who care to build resilient systems for customers, care deeply about latency in their systems.
And clearly, Subhash cares about customer experiences to the last T. This tweet thread demonstrates his passion for product and engineering excellence in equal measure. 👇
Shopify has a SERIOUS performance problem.— Subhash Choudhary (@subhashchy) June 14, 2022
Because of which you are losing thousands of dollars (or maybe more) and you don't even know this yet.
So let me demonstrate how Shopify's poor performance is impacting your D2C brand's revenues (and many other things).
Talk is cheap — Show me what you have
My second meeting with Subhash lasted about 10 minutes. We were not prepared. Subhash wanted an idea of how our alerting systems would work, and little did we envision he wanted alerts on Dukaan’s entire K8s cluster. In hindsight, this seemed obvious.
Even though this meeting didn’t go as planned, it taught me a valuable lesson. The larger picture on observability, monitoring, reliability tooling et al… is a gradual slow sell. From a First Principles immediate standpoint, Subhash was super clear on his ask: 'My system has hit a limit, can you give me an alert.' Simple. First principles. Let’s get down to the basics.
I was determined that the next time we meet, we’d nail all requirements. Specifically, there was an ask on an alerting UI we had not built yet. And things were super clear from Subhash - “If this works, let’s talk business. If not, let’s move on.”
Our engineering team took this up as a challenge. We took 2 weeks to build a reimagined alerting UI for all our customers. The new alerting system was highly configurable and had advanced capabilities to detect rate changes, loss of signal, and identify patterns based on seasonality. (Fantastic story for another day on our execution rigor, and the outcomes)
We meet at Dukaan’s office and get straight to it.
On production. 🤯
Now, we had to wait for the alert to come in. 😢
This was the most agonizing few minutes I had to wait. 🙈
Subhash started a timer on his phone.⏳
A few minutes in, this happened... 👇
Subhash: “Yup, this works. Where’s the contract, let’s close.”
And… That’s how Dukaan happened. 🟢
Want to know more about Last9 and our products? Check out last9.io; we're building reliability tools to make running systems at scale, fun, and embarrassingly easy.
Oh, and always up for a chat on this space. Hit me up on Twitter: @aniket_rao.
You might also like...
A simple guide to crunch numbers for understanding overall HTTP content length metrics.Read ->
Stories from the world of SRE. Delivered.
SRE with Last9 is incredibly easy. But don’t just take our word for it.