Summary of "How to Scale Smarter with AI Agents and Automation – Wharton Scale School"
Business-focused takeaways: “Scaling Smarter with AI Agents and Automation (AI 2.0)”
The core message: companies are moving from AI experimentation to AI execution—but ROI and productivity only arrive when organizations reinvent workflows, reskill/upskill, and implement tight feedback + evaluation loops.
Core frameworks / playbooks mentioned or implied
IT Productivity Paradox (historical parallel)
- AI/tech spend doesn’t instantly translate into productivity.
- Results lag because teams must change workflows, upskill, and reinvent organizational processes.
Agent workflow partitioning (human/AI division of labor)
- Keep human-led work where nuance/judgment matters, e.g.:
- negotiations
- taste
- strategic hiring
- Automate repeatable “grunt work”, e.g.:
- coding/data tasks with fast verification
- Use hybrid collaboration where AI handles volume and humans set direction and accept output, e.g.:
- product specification workflows
Automated evaluation + feedback loops (“compounding” mechanism)
- In software, coding scales because there’s a natural feedback loop (e.g., unit tests give objective, immediate signals).
- For agents to scale, organizations need automated verification, such as:
- tests
- audits
- QA checks
- guardrails so the system learns/improves without slow human-only cycles.
Simulation-first experimentation (agentic test harnesses)
- Instead of slow real-world A/B tests, use LLM/human simulators to run thousands of “virtual worlds.”
- Goal: predict impact on messaging/UX before rollout.
GTM scaling via “agentic growth” workflow
- Move from:
- quarterly limited experiments + manual analysis
- to:
- full-stack growth teams deploying many more experiments and analyzing quickly using AI automation
What changes inside organizations (operations + management)
1) Engineering org restructuring: from role-based teams to “agentic full-stack”
Concrete example (CRED / John):
- Engineering scaled into an agentic model and became effectively roleless:
- no longer strict “front-end vs back-end”
- engineers must be full-stack and “accredited” to touch data pipelines + UI + backend
- PRs can be created by anyone in the organization (including sales/ops/non-engineers) as long as they contribute production-level output.
Automation intensity
- Agents doing work increased to ~96%
- Earlier baseline mentioned: ~12% (from the referenced timeframe)
Guardrails and multi-agent engineering roles
- After an agent “went rogue,” they introduced an agent lineup with distinct roles, e.g.:
- architecture expert
- QA expert
- front-end expert
- Merge process includes role-based review:
- architecture checks plan fit before building
2) Cost control via “output value per engineer”
Concrete example (CRED / John):
- They track output value per engineer using a leaderboard.
- Compensation/bonuses are tied to output value while accounting for:
- token usage (cost factor)
- whether output remains high quality/value even if tokens increase
- Model usage optimization:
- not every task uses the newest/most expensive model
- choose the right model per task
- compete models via the leaderboard (optimize token efficiency vs output quality)
3) Operational guardrails: security + continuous learning
Concrete example (CRED / John):
- Enterprise-grade safeguards, including:
- a “security agent” that checks appropriate parts
- similar guardrails across functions (e.g., sales pipeline: transcription → proposal/deck creation → “sales manager agent” review before sending)
Main risk theme
- A major barrier to adoption is enterprise fear around security/data loss.
- Mitigation requires better dependency management and risk controls.
What changes in business workflows beyond engineering
1) Healthcare operations: agentic workflow redesign (not job replacement first)
Concrete examples (AgentMan / Prasad):
- Independent medical practices often rely on heavy admin staffing:
- ~4.5 staff per doctor
- Insurance reimbursement context:
- for every $1 billed, practices may receive ~60 cents
- complexity due to ~2,500 insurance companies
Agent deployments
- Insurance eligibility check agent
- combines APIs + screen agents (portal scraping/data entry when APIs aren’t available)
- saves ~90 minutes/day of staff time
- humans remain supervisors/decision makers; agents assist and verify rather than fully replace
- Fax + voicemail triage agent
- fax is still used for doctor approvals/refills
- staff receive 50+ faxes/day
- saves ~2 hours/day by automating inbox triage:
- routing
- prioritization
- reducing touchpoints
Change management insight (Prasad)
- Staff turnover is high; automation reduces burnout and stabilizes operations.
- Resistance can occur if agents remove too much work too quickly—so rollout is positioned as “help” rather than “replacement.”
2) Legal / private debt workflows: reduce unbillable associate hours
Concrete example (AgentMan / Prasad):
- Private debt restructuring requires reviewing thousands of pages and covenants.
- Typical effort: ~200 associate hours
- Cost framing:
- top law firms charge ~$1,000/hour
- pressure increases due to unbillable hours
Agentic outcome
- reduces ~200 hours to ~2 hours
- senior human validation remains required before final use
Go-to-market (GTM) and marketing execution changes
1) Agentic growth: more experiments per quarter with leaner teams
Framework (Kartik):
- Traditional growth model:
- a few analysts/marketers (and possibly a GTM engineer)
- limited experiments per quarter (e.g., 4–5)
- manual measurement → slower learning cycles
- New model:
- full-stack growth people (strategy + data + execution)
- enabled by AI agents → deploy and learn from many more experiments per quarter
2) Customer decision journeys: AI outputs reduce clicks and shift marketing power
Metrics cited (Kartik):
- 60% of Google searches end with no click (AI overview answers directly)
- 34% reduction in clickthrough rate for top organic links
- 40% of Black Friday purchases influenced by AI recommendations
- B2B: 90% of buying journeys are influenced by AI at some point
Implication
- Marketing shifts from “create content and hope AI finds it” to:
- understanding how LLM systems interpret/match brand information
- In agentic commerce, AI may effectively become the “customer” that vendors must satisfy.
3) Agentic purchasing and “AI choosing your stack”
Concrete example (Kartik):
- In coding with tools like Claude Code:
- prompting “add checkout” selects Stripe 91% of the time
- database choice: Postgres ~60%
- Marketing implication:
- traditional messaging/content can matter less than:
- LLM/toolchain fit
- retrieval/search behavior
- integration compatibility
- presence of the right training/page content
- traditional messaging/content can matter less than:
4) Simulation-first optimization for marketing and UX
Concrete approach (Kartik):
- build and validate an LLM simulator that reproduces system answers
- run thousands of website/messaging variations in “virtual universes”
- detect which variants cause the AI system to recommend/act differently
- extend to human/persona simulators to test conversion paths before rollout
Startup mentioned
- Bliss Labs: building applied AI simulations for testing marketing/product changes in virtual worlds (per the talk, launching “sometime in the summer”).
Key metrics / KPIs explicitly mentioned
Time savings / operational productivity
- Healthcare eligibility checks: ~90 minutes saved
- Fax/voicemail triage: ~2 hours/day saved
- Legal doc workflow: ~200 hours → ~2 hours (associate time)
Automation intensity
- CRED engineering: agents doing work ~96%
- Earlier baseline: ~12%
Online marketing / customer journey impact
- Google searches with no click: 60%
- CTR reduction: 34%
- Black Friday AI-influenced purchases: 40%
- B2B buying journey AI influence: 90%
Agentic choice ratios (developer/tool recommendations)
- Checkout provider choice: 91% Stripe
- Database choice: ~60% Postgres
Experimentation capacity
- Traditional growth: ~4–5 experiments per quarter
- Agentic growth: “deploy a lot more” (no exact number stated beyond the contrast)
Actionable recommendations (supported by examples)
-
Don’t chase ROI by deploying AI “as-is.” ROI lags unless you reinvent workflows and reskill teams (IT productivity paradox lesson).
-
Design agent systems with role separation + guardrails. Use multi-agent “review before merge,” such as:
- architecture planning checks
- QA validation agents
- security agents Add continuous learning and audit trails.
-
Make agent performance measurable with objective feedback loops. Borrow the coding model:
- tests/unit verification (or analogs like audits, evaluation harnesses, QA scoring)
- automate evaluation to enable compounding improvement
-
Use simulation to reduce marketing/UX guesswork. Validate LLM simulator behavior first, then run high-volume virtual testing to find variants that shift AI recommendations.
-
Roll out automation as “assistive” before full replacement (especially in regulated, people-facing contexts).
- Healthcare: prioritize patient safety; keep humans supervising.
- Target admin bottlenecks first (eligibility checks, document routing).
-
Control experimentation noise through workflow design. Even with cheap experiments, leaders must prevent “start lots, finish few.” One proposed tactic: agents that detect duplicative work and eliminate low-value experiments.
Mentioned presenters / sources
- Lori Rosenoff — Vice Dean of Entrepreneurship; Simon and Mitch Pali Professor of Management, Wharton
- Kartik Hosanagar — John C. H. How Professor of Operations, Information and Decision; Faculty Director of AI, Wharton
- John Carr Harris — Founder and CEO, CRED
- Prasad Tamineni — Founder and CEO, AgentMan
Referenced historical/source work:
- Robert Solow (MIT economist; “computer age” productivity paradox quote)
- MIT / Robert Solow productivity paradox literature (general reference)
Referenced organizations/papers/articles (as cited in the talk context):
- Forester (B2B AI influence claim)
- Inc. magazine (Black Friday AI recommendations claim)
- Visa office visit (example reference)
- Meta digital twins approach (employee tracking)
Category
Business
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.