Stop Building Distributed Systems You Don’t Need

After 20+ years building systems that have scaled to millions of users, I’ve made one observation that consistently annoys conference speakers and tech bloggers alike: most engineering teams don’t need distributed systems. They need better SQL.

That’s a provocative statement in 2026, when “distributed-first” architecture has become the default flex for any team that considers itself serious. But I’ve seen this story play out too many times to stay quiet about it. Teams adopt Kafka, Kubernetes, microservices, and service meshes before they’ve hit 1,000 daily active users — and they pay for it in engineering hours, incident pages, and quiet desperation.

This is not a post against distributed systems. It’s a post against using them as a default.

The Distributed Systems Cargo Cult

There’s a pattern I’ve watched repeat itself since the early 2000s, just with different technology names attached. In 2010 it was NoSQL. In 2015 it was microservices. In 2020 it was “cloud-native.” In 2026, it’s whatever the hot orchestration layer is this quarter.

The pattern looks like this: a major tech company (Netflix, Uber, Google, pick your era) publishes a blog post about how they solved a massive-scale problem with a new architectural approach. Engineers read it, get excited, and start applying that solution to their 50,000 MAU SaaS product. The architecture gets more complex. Incidents multiply. The team spends 40% of its time on infrastructure instead of features. Eventually someone rewrites it as a monolith and everything gets faster.

This is cargo cult engineering. We copy the artifacts — the Kubernetes YAML, the event-driven architecture, the 17 microservices — without asking the foundational question: do we have the problem these tools were designed to solve?

What “Scale” Actually Means for Your System

Here’s a brutal truth from production: Postgres, tuned correctly, with proper indexing, handles tens of thousands of transactions per second on modest hardware. A single well-written Rails or Django or Spring application can serve hundreds of requests per second on a $50/month VM. Most startups and mid-size companies will never stress those limits.

I once consulted for a fintech company that had built a microservices architecture with 14 separate services, a Kafka event bus, and two Kubernetes clusters. Their peak load was 800 concurrent users. Their P99 latency was 3 seconds. Their team of 6 engineers spent most of their time debugging distributed tracing and service-to-service authentication failures.

We consolidated to a three-service architecture (API, background worker, scheduler), rewrote the database queries, added Redis for caching, and deployed to two plain EC2 instances behind a load balancer. P99 latency dropped to 180ms. The team started shipping features again.

“Scale” is not a property of your architecture. It’s a property of your load. And if you don’t measure your actual load, you’re making architecture decisions based on imagination.

The Monolith That Could (and Usually Should)

Let me describe a system architecture that can handle serious production load with minimal operational complexity:

One application process (or a small cluster of identical processes)
One relational database with proper indexes, connection pooling, and read replicas
One cache layer (Redis)
One background job system (Sidekiq, Celery, BullMQ — pick your language)
A CDN in front of static assets

This architecture can handle millions of monthly active users. It’s boring. It’s also been battle-tested for two decades. You can reason about it. You can debug it. When something goes wrong at 2am, you have one application log, one database, one place to look.

The “monolith is bad” narrative got supercharged by the microservices hype cycle, but what people actually experienced were poorly structured monoliths — big balls of mud where everything depended on everything else. The solution to a poorly structured monolith is better internal architecture: clear module boundaries, separation of concerns, well-defined interfaces between domains. That’s domain-driven design applied to a single codebase. You get the organizational clarity of microservices without the distributed systems tax.

When Distributed Actually Makes Sense

I’m not a monolith absolutist. There are real reasons to build distributed systems:

Independent deployment velocity. If you have 15 engineering teams that need to deploy independently without stepping on each other, service boundaries make organizational sense. This is Conway’s Law in practice — your architecture will mirror your team structure. But this is a people problem, not a scale problem.

Genuinely different scaling requirements. If your image processing pipeline needs 10,000 CPU cores to burst during peak hours, and your web application needs 20 servers, those should absolutely be separate. Scaling them together would be wasteful and fragile.

Regulatory isolation. Sometimes compliance requires hard data boundaries — HIPAA, PCI-DSS, GDPR. That’s a legitimate architectural driver.

Polyglot persistence needs. If part of your system is genuinely better served by a graph database, and another part by a document store, and another by a relational database, those parts probably deserve independence.

Notice what’s not on this list: “we might need to scale someday,” “it’s more modern,” or “the new hire wants to put it on their resume.”

The Real Cost Nobody Talks About

Distributed systems have a hidden tax that compounds over time: operational cognitive load.

Every service boundary is a potential failure point. Every network call is a latency budget you’re spending. Every event queue is an ordering and consistency problem waiting to happen. Every container is a deployment artifact someone has to manage. These costs don’t disappear — they move from the application layer into your infrastructure, your monitoring, your incident response runbooks, and your engineering team’s mental model.

I’ve interviewed engineers who could explain Raft consensus in detail but couldn’t tell me what their application’s P99 database query time was. That’s backwards. The engineers who are genuinely dangerous (in the best sense) are the ones who understand their system end-to-end, can read a query plan, can profile memory usage under load, and can trace a single request from the browser through to the database and back.

That end-to-end understanding gets exponentially harder as system complexity grows. Every service boundary is a seam where that understanding breaks.

The Framework I Use Before Adding Complexity

Before I add any significant architectural complexity, I ask these questions:

What specific problem does this solve? Not “it’s more scalable” — what specific bottleneck, failure mode, or organizational issue does this address?
What’s my measured evidence? Load tests, APM traces, database slow query logs — actual data, not guesses.
What’s the operational cost? Who maintains this in production? What does a 2am incident look like with this added?
What’s the simpler alternative? Have I actually tried throwing better hardware at it? Optimizing the slow query? Adding a cache?

You’d be surprised how often step 4 is enough. Vertical scaling is deeply underrated. A $400/month server can handle a lot more than engineers assume, and it’s a lot simpler to operate than a distributed cluster.

Conclusion: Earn Your Complexity

The most valuable engineering lesson I’ve internalized over 20+ years is this: complexity is a debt, not an asset. Every clever abstraction, every distributed component, every architectural pattern is borrowed time that you’ll eventually pay back in debugging, oncall pages, and onboarding pain.

Senior engineers earn complexity. They reach for it only when simpler options have been exhausted or ruled out with evidence. They start simple, measure obsessively, and scale to the problem they actually have — not the problem they imagine they might have someday.

The next time you’re in an architecture discussion and someone says “but what if we need to scale to 10x?”, ask them what the current load actually is. Ask them what’s actually slow. Ask them what the simplest thing that could possibly work looks like.

Build that first. You can always add complexity later. You almost never get to remove it.

What’s the most unnecessary architectural complexity you’ve seen in a production system? I’m collecting war stories. Drop them in the comments or find me on LinkedIn.

AI-Driven Software Engineer

AI-Driven Software Engineer