Summary of "Microservices are Technical Debt"
High-level summary
Central claim: microservices are a form of technical debt — they can speed a team up initially but create long-term socio-technical costs (coordination, operational complexity, and hard-to-clean-up dependencies).
Context: the discussion is grounded in DoorDash’s migration from a Python monolith to hundreds of services during pandemic-driven scale and velocity needs. Comparisons and lessons are drawn from experiences at Google and Uber.
Key technological concepts and trade-offs
Why teams split a monolith
- Reduce deployment contention when many engineers are stepping on each other.
- Enable independent deployment so individual teams can move faster.
- Support special technical needs (e.g., GPU inference, very large in‑memory data structures).
- Sometimes modularizing the deployable unit (for example, splitting a large SPA into independently deployable parts) can provide similar benefits without a full microservice decomposition.
The “distributed monolith” problem
- Many organizations end up with distributed monoliths: services that must all be up, with network RPCs replacing function calls while coupling and cascading failures remain.
- High fan-out and RPC-heavy call graphs increase latency, paging, and operational complexity. (Example claim: DoorDash front end averages ~1,000 RPCs per request.)
- Operational coupling and complicated failure modes often negate the intended benefits of independent services.
Socio-technical nature
- Core difficulties are as much organizational and incentive-related as they are technical.
- Teams often avoid cleanup; dependencies and upgrades are hard to enforce; short-term shipping pressure commonly drives decisions that create long-term cost.
Pragmatic compromises
- Sharing a datastore between services is sometimes chosen for speed; strict rules like “never share a DB” may be unrealistic during rapid migration.
- Tests are valuable but can ossify APIs and make refactors costly. Coverage percentage is a poor proxy for test quality — assertion quality matters far more than raw coverage numbers.
Tooling and partial solutions
- Tooling can mitigate some problems (examples: Uber-created tooling to identify the true faulting service in an RPC graph; Google’s internal tooling and the Service Weaver project).
- Company-level investment in tooling and SRE/infra shifts where trade-offs lie and can make higher-service counts manageable.
Practical recommendations / guidance
- Only split services when team/process pain or a technical necessity justifies the long-term debt; prefer as few services as possible.
- Consider alternatives to full microservices: modular deployables, libraries, or shared components, chosen based on context and cost.
- Be pragmatic about database sharing and migrations; don’t let dogma block shipping and velocity when time is limited.
- When using tests, focus on meaningful assertions rather than chasing high coverage numbers.
- Invest in tooling that reduces operational burden (tracing, fault attribution, dependency management).
- Learn dependencies: read library internals before depending on them and be explicit about the trade-offs you accept.
- Language/tooling preference: Go is recommended by the speaker for large-team backend work for its simplicity and suitability for collaboration.
Resources, guides, and references mentioned
- DoorDash blog post describing their monolith → microservices transition.
- Chris Mel-John’s PhD thesis on microservices (research on socio-technical aspects done at DoorDash).
- A 2016 talk by the interviewee (includes screenshots of Uber’s tooling for RPC fault attribution).
- Google examples (modularizing a large single-page application) and the Service Weaver project.
- Uber tooling example for determining ownership in RPC call graphs.
Concrete takeaways
- Microservices are a deliberate trade-off: initial developer velocity versus long-term coordination and maintenance costs.
- Many real-world deployments are hybrid or “distributed monoliths”; plan for operational coupling and team incentives, not just technical purity.
- The industry lacks widely adopted abstractions that sit between a monolith and a cloud of microservices — this is an area for innovation.
Main speakers / sources
- Matt — engineering leader at DoorDash (author of the referenced DoorDash blog post and the main interviewee).
- A YouTube host/interviewer (unnamed in the transcript).
- Secondary sources: Chris Mel-John (researcher/colleague), examples from Google and Uber, and Google’s Service Weaver project.
Category
Technology
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.
Preparing reprocess...