Summary of "How to make technical decisions? - Oresztesz Margaritisz, EPAM | Craft Confernece 2024"

Executive summary

The talk provides a practical framework for making technical technology/tooling decisions (e.g., databases, orchestrators, libraries, build systems) in a way that avoids costly “wrong choice” outcomes (performance issues, hidden complexity, refactoring dead-ends). It also helps secure management buy-in by translating decisions into measurable quality attributes and operational constraints.

Business impact of wrong technology choices (why it hurts)

Wrong choices tend to create long-lived problems and organizational friction:

Stickiness / inertia: managers don’t want “months of refactoring” that delays shipping features.
Cognitive noise: daily disruption from debugging/troubleshooting instead of feature delivery.
Technical debt + complexity: solving one visible issue reveals broader “iceberg” complexity (config-heavy systems, unknown IDE/settings, lack of performance testing).
“Technical debt” is hard to sell: “adaptation/tech debt” framing is often rejected by leadership.

Core “playbook” for a good technology choice

A good choice behaves like a puzzle piece:

Solves the intended problem without extra baggage
Fits interfaces and surrounding architecture
Is not bloated, not overly complex
Is new enough / maintainable enough
Works in practice (operationally and development-wise)

Decision metrics & KPI categories (what to measure)

Instead of relying on Microsoft’s full “quality attributes” list, the speaker proposes 4 practical categories.

1) Design (fit + ecosystem)

Community maturity / time invested (meaningful ecosystems need ~5 years)
API feature coverage (what you need, without unnecessary complexity)
Replaceability (cost/effort to remove if wrong)
Simplicity (explicitly: simplicity is king)

2) Learning curve (developer productivity)

Documentation quality (existence + usability)
Fast feedback loop Startup/build/test cycle time; example threshold mentioned: 30 seconds is not fast (prefer milliseconds/seconds).
Team expertise availability (internal capability matters)

3) Runtime (operational and technical constraints)

Hard constraints (examples: HTTP/S limitations, single sign-on integration, open telemetry compatibility, required interfaces)
Performance needs (latency/throughput and responsiveness)
Stability expectations
Security constraints
Single points of failure risk

4) Support (failure handling & operations)

Debugging / logging / monitoring / exception visibility
Testability
Avoid “non-pluggable complexity” (e.g., “snowflake” systems like plugin-heavy CI/CD that can crash or take minutes due to plugin ordering/startup)

How to measure when you don’t have an internal toolkit (data sources)

The talk suggests lightweight external proxies and checks.

Maturity / popularity
- Google Trends
- GitHub language stats
- GitHub activity/open issues
- Stack Overflow tag trends
Security
- Check open vulnerabilities; avoid high/critical vulnerabilities that are not fixed
Performance (without expensive measurement)
- Use existing benchmarks (but understand that benchmark validity can be misleading)
- Use third-party performance aggregation (example: TechEmpowerment site) for rough baselines
Stability / roadmap risk
- Review the issue tracker: closure time and unresolved issues impacting upcoming versions

Concrete example: choosing a container orchestrator

Decision question: Kubernetes vs simpler container options (e.g., Docker Compose)

If you’re a small team and need speed:
- start from the simplest option that meets requirements
- validate with a subset of metrics (e.g., simplicity, usability, fast feedback)
Choose Kubernetes only if it matches the needs better; otherwise, avoid unnecessary complexity.
Include migration effort implicitly via “replaceability/cost” logic.
- Example reasoning: containers can run across runtimes, so migration might be less risky than expected.

Advanced techniques (when deeper analysis is warranted)

Used as optional “depth,” not the default:

Capacity planning (not precise; can become an endless tunnel)
- model-based estimates from existing performance metrics
- check SLA / latency targets
- consider cloud RTT (round-trip time) across infra layers
Cost estimation
- use cloud calculators (example: AWS Pricing Calculator)
- compare service SLAs (example: DynamoDB availability vs other components)
Queuing modeling (Q modeling)
- tools/libraries can estimate throughput/latency without expensive full performance runs
Architecture Decision Records (ADRs / decision logs)
- snapshot decisions for future engineers
Technology Radar
- lightweight internal tracking of:
  - popularity trend (rising/falling)
  - alternatives
  - why selected

How to run the decision process (lean approach)

The emphasis is that the approach matters more than the specific tool choice.

Set-based design + Last Responsible Moment (lean techniques)

Investigate multiple alternatives in parallel
- each sub-team/engineer focuses on a different metric or alternative
- then consolidate findings
Defer the final choice until you have enough evidence
- decide late when possible; earlier only when constraints force it

Concrete tactics to defer/contain risk

Parallel investigation with a structured summary
Feature toggles
- hide the tech choice behind a flag
- allow turning implementations on/off even in production
AB testing
- validate behavior/requirements with real usage before finalizing
Deprecation planning
- for public-facing technology, deprecation can take weeks/months and be expensive

Organizational / leadership alignment guidance

Management buy-in improves when you present decisions as numbers (not “technical debt” rhetoric).
- Example: instead of qualitative “adaptation,” show a trend line of increasing technical adaptation tickets.
Cost framing matters:
- leadership often rejects “unbounded” estimates (e.g., “$500k/month cloud” type reactions)
Translation rule:
- convert technical risk into measurable business impact (time, cost, operational stability)

“Awesome decision-making recipe” (step-by-step playbook)

Recommended high-level recipe:

Don’t rely on one person’s opinion or deep dive.
Don’t depend on “Googling/chatGPT” for best choice (context dependent).
Limit metrics based on time:
- if you have ~1–1.5 months: consider up to 4 metrics
- if you have ~1–2 weeks: consider ~1–3 metrics
Run measurement collection across the team:
- score each technology per category (e.g., 1–10 or 1–5 stars)
- share results via an Excel/Confluence-like sheet
- aggregate and pick a winner
If disagreements remain:
- revisit using lean tactics: feature toggles, AB testing, and structured elimination

Examples from Q&A (additional actionable insights)

Brainstorming that led to real adoption
- The speaker used “throw crazy ideas” in meetings.
- Outcome example: repeated suggestion of monorepo for a microservices context; the team discussed merging because it was harder to find code across repos.
Best brainstorming structure
- Use a mind map:
  - central core idea
  - branching into options/alternatives
- Rules:
  - “no stupid ideas” in phase 1
  - judging/elimination in phase 2
How to involve management
- show statistically backed trends
- provide cost estimates within constraints
When you’ve dug too deep
- use a time-box (limit duration; avoid long single-alternative research)
- if you’re building without automated tests, you’re likely over-investing
How to involve management/engineering without conflict
- avoid arguing/pushing personal preference (“I’ve used this for 10 years”)
- use collaboration + experimentation

Key concrete thresholds/targets mentioned

While no explicit business KPIs like CAC/LTV/churn appear, several operational thresholds were suggested:

Maturity proxy: ecosystems should have meaningful usage over ~5 years
Learning speed target: avoid environments needing ~30 seconds to start; prefer much faster feedback
Time-box guidance for decision research:
- typical cap for “single alternative deep dive”: ~1–1.5 months maximum (and ideally shorter)
Practical metric scope based on time:
- ~4 metrics for ~1–1.5 months
- 1–3 metrics for ~1–2 weeks
Deprecation risk: public-facing tech deprecation can take weeks/months

Presenters / sources (as mentioned)

Oresz (Oresztesz) Margaritisz — Chief Software Engineer, EPAM

External references mentioned:

Microsoft (quality attributes/metrics approach referenced)
InfoQ (technology radar / templates mentioned)
Thoughtworks (technology radar referenced)
EPAM (technology radar mentioned as available internally/free)
AWS (pricing calculator, SLA examples mentioned)
TechEmpowerment (performance benchmarking site mentioned)

Books mentioned in Q&A/closing:

“How We Decide” (title as spoken)
“Emotional Intelligence” (title as spoken; exact subtitle not provided)

Share this summary

Is the summary off?

If you think the summary is inaccurate, you can reprocess it with the latest model.

Summarize another video

Summary of "How to make technical decisions? - Oresztesz Margaritisz, EPAM | Craft Confernece 2024"

Executive summary

Business impact of wrong technology choices (why it hurts)

Core “playbook” for a good technology choice

Decision metrics & KPI categories (what to measure)

1) Design (fit + ecosystem)

2) Learning curve (developer productivity)

3) Runtime (operational and technical constraints)

4) Support (failure handling & operations)

How to measure when you don’t have an internal toolkit (data sources)

Concrete example: choosing a container orchestrator

Advanced techniques (when deeper analysis is warranted)

How to run the decision process (lean approach)

Set-based design + Last Responsible Moment (lean techniques)

Concrete tactics to defer/contain risk

Organizational / leadership alignment guidance

“Awesome decision-making recipe” (step-by-step playbook)

Examples from Q&A (additional actionable insights)

Key concrete thresholds/targets mentioned

Presenters / sources (as mentioned)

Category

Share this summary

Is the summary off?

Video

Summary of "How to make technical decisions? - Oresztesz Margaritisz, EPAM | Craft Confernece 2024"

Executive summary

Business impact of wrong technology choices (why it hurts)

Core “playbook” for a good technology choice

Decision metrics & KPI categories (what to measure)

1) Design (fit + ecosystem)

2) Learning curve (developer productivity)

3) Runtime (operational and technical constraints)

4) Support (failure handling & operations)

How to measure when you don’t have an internal toolkit (data sources)

Concrete example: choosing a container orchestrator

Advanced techniques (when deeper analysis is warranted)

How to run the decision process (lean approach)

Set-based design + Last Responsible Moment (lean techniques)

Concrete tactics to defer/contain risk

Organizational / leadership alignment guidance

“Awesome decision-making recipe” (step-by-step playbook)

Examples from Q&A (additional actionable insights)

Key concrete thresholds/targets mentioned

Presenters / sources (as mentioned)

Category ?

Share this summary

Is the summary off?

Video

Category