Summary of "Charting a Course for the Next Decade of Gaming With AI"

Overview

A panel discussion on how AI is reshaping game development across three broad axes:

Game creation — building content and workflows faster and better with AI.
Rendering and runtime improvements — for example, DLSS5 and related runtime optimizations.
Player-facing experiences — new kinds of immersion and gameplay enabled by agents, world models, and smarter NPCs.

Key themes and takeaways

AI is becoming central across the game pipeline: design, tooling, asset creation, runtime behavior, and QA.
No single paradigm dominates: traditional engines and world models will coexist. World models enable novel experiences and lower the barrier to experimentation, while classical systems remain valuable.
Pragmatism matters: start with approaches that reduce risk and deliver player value. Use classical systems where they make sense and hybridize with ML/LLMs where they add unique value.
Hands-on experimentation across teams is critical — letting developers play with LLMs/agents helps discover productive workflows.
Upskilling and returning AI tools/tech to game teams (rather than centralizing all AI work) is important for adoption and long-term success.

Use cases and examples mentioned

Rendering
- DLSS5 (NVIDIA demoed recent advances).
NPCs and in-game companions
- Craftton’s PUBG Ally — on-PC stack (ASR → LLM → TTS) running locally for immersive companions.
- ReLU Studio’s “Uncover the Smoking Gun” — early cloud LLM-driven detective NPCs (hallucination was acceptable given gameplay).
- Enzo / ZOE — live simulation with LLM-driven NPC agents; players can mod character prompts.
World models and generative interactive simulations
- DART — interactive real-time world models (Oasis One demo: 10M users in two weeks).
- World models used for rapid prototyping and democratizing development.
Game AI & advisors
- Creative Assembly (Total War series) exploring in-game AI advisors to improve onboarding and retention.
Bots and ML-driven gameplay
- Riot deploying ML bots in early PVE experiments; exploring imitation learning + RL for better tutorial/PVE bots.
Infrastructure & tooling
- Craftton building large training clusters (~1,000 GPUs) and training specialized speech-to-speech models for lower latency; focus on on-device inference.
- DART running expensive training (scaling toward hundreds of millions/year in compute) while aiming to reduce inference costs.
Evaluation
- Creative Assembly built an evaluation system combining designer/QA “golden” questions with LM-as-judge metrics (usefulness, correctness, tone/personality).

Strategies, practical tips and recommendations

Start small and practical
- Put off-the-shelf LLMs and models in the hands of teams to discover useful workflows before investing in custom models.
- Use Retrieval-Augmented Generation (RAG) and off-the-shelf inference where appropriate before training from scratch.
Hybrid design is often the best near-term approach
- Combine classical AI (behavior trees, utility systems) as “system 1” with LLM/agent “system 2” orchestrating or selecting scripted behaviors.
- Execute LLM-decided high-level plans by triggering existing robust game scripts.
Prioritize player value
- AI features should improve the game for players or enable new experiences — avoid adding AI purely for novelty.
Evaluation & control
- Build evaluation suites with designer-curated test cases and metrics (usefulness, correctness, tone). Treat an LM as a judge in early testing.
- Ensure explainability and designer control so players perceive fairness and predictable outcomes.
Cost & infrastructure pragmatics
- Start with cloud or rented GPU capacity for experimentation; benchmark costs and identify near-term productization paths (for example, internal dev tools) to justify compute spend.
- On-device inference is valuable for latency and cost but may require smaller or specialized SLMs.
Upskilling & adoption
- Create internal training, safe spaces, and curated toolkits to lower friction for non-AI specialists.
- Let central AI teams hand off prototypes and tooling to dev teams for long-term ownership and learning.
User testing & iteration
- Ship experimental features to learn; waiting for “perfect” will fall behind competitors.
- Collect player feedback and iterate — responses can vary widely by audience.

Technical & infrastructure notes

Model trends
- Small language models (SLMs) are improving quickly; the gap to large cloud LMs is shrinking.
- Emergent capabilities can appear with scale — certain tasks can suddenly become tractable above specific model sizes.
On-device vs cloud
- Local multi-model stacks (ASR → LLM → TTS) can run well on powerful PCs and are desirable for privacy and latency.
- Cloud models speed prototyping but increase ongoing cost and ops complexity.
Cost realities
- Training world-scale models is expensive (tens–hundreds of millions per year in compute).
- Inferencing, optimization, and engineering for efficiency are central technical challenges.
Tooling and resources
- Agents and automated summarization agents are already used internally to speed R&D.
- GPU/accelerator supply (e.g., NVIDIA Blackwell family) can be a bottleneck; teams should plan purchases and scaling accordingly.

Risks and challenges

Hallucinations
- Acceptable in some game contexts (e.g., detective games) but dangerous in others; require design-aware mitigation.
Difficulty tuning and fairness
- Strong models may exploit unintended mechanics or produce unfun results; tuning to achieve fair and “just-barely-beatable” difficulty is difficult.
Explainability and designer control
- Designers need predictable, controllable outcomes — pure generative systems can frustrate that need.
Organizational friction
- Central AI teams can be ineffective unless tightly integrated with game teams. Ownership and capabilities should sit with dev teams where possible.
Economic constraints
- High training costs demand near-term value extraction (internal tools, dev productivity gains) to sustain investment.

Concrete example best practices

Evaluating LMs for in-game characters
- Create designer/QA golden question sets and use automated LM-based judging for usefulness, correctness, and tone.
Handling NPC hallucination
- Design gameplay so hallucination is tolerable or enhances experience (for example, NPCs that lie during interrogation).
Hybrid scripting + agents
- Use agent orchestration to select high-level actions and trigger scripted behaviors for execution and control.
Public experimentation
- Test features publicly or in betas to learn player responses (example: Craftton’s PUBG LA beta testing).

People, companies, projects and sources mentioned

Panel hosts / speakers
- John Spitzer — VP of Developer Performance & Technology, NVIDIA (host/panelist)
- Dean — founder/CEO of DART (world-model work)
- Konguk — Chief AI Officer at Craftton; CTO at Ludo Robotics
- CH Kim — CEO of Craftton
- D (aka Dugu) — leads AI and R&D at Creative Assembly (SEGA)
- Troder Cashion (Troder) — Lead AI Product, Riot Games
Other contributors mentioned
- Wesker — leads Riot’s central AI research
- Yanni/Yiani — aided Creative Assembly with evaluation tooling
- Ike — initial introducer/moderator (brief mention)
- Jensen Huang — referenced (NVIDIA CEO)
Companies, projects and tech referenced
- NVIDIA — DLSS5, Blackwell GPUs
- DART — world models, Oasis One demo
- Craftton — PUBG Ally, PUBG LA, Enzo/ZOE, ReLU Studio projects
- Ludo Robotics — related robotics group
- Creative Assembly — Total War series, AI advisors
- Riot Games — ML bots, League of Legends, Valorant
- ReLU Studio — LLM NPC experiments
- Anthropic, OpenAI — model vendors referenced
- Gemini, Gemini CLI — referenced as used tools
- GitHub Copilot — example of developer productivity change
- Oasis One — DART’s public demo
- DLSS5 — NVIDIA rendering advancement
- Blackwell 300 — GPU family referenced
- RAG (retrieval-augmented generation) — architecture approach discussed