How we won Dukaan over

Subhash: “If I bring down the Kubernetes Deployment for a service, will it send me an alert?”
“Yeah, it will”
Subhash: “I’ll do it now, then?”
“On production?”
Subhash: “Yeah, why not?”

That was my second meeting with Subhash.

For those who are unaware, Subhash is the Co-Founder and CTO of Dukaan. Dukaan enables merchants with zero programming skills to set up their own e-commerce stores.

I work with Last9 — a reliability platform. In my current role, I help our customers integrate with Last9’s products. As an engineer, focused only on writing code, this was relatively new territory for me. Working with CTOs, Leadership of multiple orgs has been an interesting journey. More on that later.

Nobody dabbles with the production cluster, let alone be so forthright in the second-ever meeting. I was stunned. It then struck me - I was a developer talking to another developer. That’s when I realized the kind of engineering Subhash is building at Dukaan. And what a pleasure it’s been knowing him since that day.

Speak the language — Discovery

This was my first ever call with the Dukaan team. It happened in mid-Feb. (Fun fact: It happened on Feb 14th. Yup, a match made in heaven indeed. 😜) Within the first 2 minutes, I knew I was talking to a solid engineer. Someone who understands tech deeply, and wants to move fast. Subhash is a problem solver, and we quickly bonded on tech.

He had the impatience of an engineer who wanted to ship fast, and I was conscious to remove all sales lexicon within 2 minutes of my conversation. 😂 At this point, Subhash clearly articulated how he had hit limits with Grafana Cloud, and wanted an alternative. He was clear, direct, and terse.

🟢 Talk to me as a developer. Tell me how you can help me.

He walked me through some of Dukaan’s observability blind spots. On the very same day, we worked with their super eager engineering team to iron out a few configuration issues in their reliability stack. Once he got an idea of how we worked, he asked, “Chalo, when can I get a demo?” Before I could respond, he said, “Let’s do it tomorrow?”

Measuring Service Level Objectives, and holding engineers accountable for incident responses is usually an afterthought in orgs that are still scaling. But orgs who care to build resilient systems for customers, care deeply about latency in their systems.

💡

🟢 Great engineering leaders think of Mean Time To Recovery (MTTR) as a cultural epithet and measure it to gauge the efficacy of engineering.

And clearly, Subhash cares about customer experiences to the last T. This tweet thread demonstrates his passion for product and engineering excellence in equal measure. 👇

Shopify has a SERIOUS performance problem.

Because of which you are losing thousands of dollars (or maybe more) and you don't even know this yet.

So let me demonstrate how Shopify's poor performance is impacting your D2C brand's revenues (and many other things).
— Subhash Choudhary (@subhashchy) June 14, 2022

Talk is cheap — Show me what you have

My second meeting with Subhash lasted about 10 minutes. We were not prepared. Subhash wanted an idea of how our alerting systems would work, and little did we envision he wanted alerts on Dukaan’s entire K8s cluster. In hindsight, this seemed obvious.

Even though this meeting didn’t go as planned, it taught me a valuable lesson. The larger picture on observability, monitoring, reliability tooling et al… is a gradual slow sell. From a First Principles immediate standpoint, Subhash was super clear on his ask: 'My system has hit a limit, can you give me an alert.' Simple. First principles. Let’s get down to the basics.

I was determined that the next time we meet, we’d nail all requirements. Specifically, there was an ask on an alerting UI we had not built yet. And things were super clear from Subhash - “If this works, let’s talk business. If not, let’s move on.”

Our engineering team took this up as a challenge. We took 2 weeks to build a reimagined alerting UI for all our customers. The new alerting system was highly configurable and had advanced capabilities to detect rate changes, loss of signal, and identify patterns based on seasonality. (Fantastic story for another day on our execution rigor, and the outcomes)

D-Day arrives.

We meet at Dukaan’s office and get straight to it.

💡

Except this time, Subhash actually legit killed a k8s deployment for 10 seconds. 🤯
On production. 🤯
Now, we had to wait for the alert to come in. 😢
This was the most agonizing few minutes I had to wait. 🙈
Subhash started a timer on his phone.⏳

A few minutes in, this happened... 👇

Subhash: “Yup, this works. Where’s the contract, let’s close.”

And… That’s how Dukaan happened. 🟢

Want to know more about Last9 and our products? Check out last9.io; we're building reliability tools to make running systems at scale, fun, and embarrassingly easy.

Oh, and always up for a chat on this space. Hit me up on Twitter: @aniket_rao.

How we won Dukaan over

Speak the language — Discovery

Talk is cheap — Show me what you have

D-Day arrives.

Contents

Newsletter

Handcrafted Related Posts

SRE vs DevOps

What's the difference between SREs and DevOps professionals? How do they differ in their daily tasks?

Understanding the Rasmussen model for failures

What does the Rasmussen model teach us about Site Reliability Engineering?

The importance of structured communication in the world of SRE

How you communicate helps build your 9s. In the world of Site Reliability Engineering, this is crucial. How do you do it?