Summary of "Charting a Course for the Next Decade of Gaming With AI"
Overview
A panel discussion on how AI is reshaping game development across three broad axes:
- Game creation — building content and workflows faster and better with AI.
- Rendering and runtime improvements — for example, DLSS5 and related runtime optimizations.
- Player-facing experiences — new kinds of immersion and gameplay enabled by agents, world models, and smarter NPCs.
Key themes and takeaways
- AI is becoming central across the game pipeline: design, tooling, asset creation, runtime behavior, and QA.
- No single paradigm dominates: traditional engines and world models will coexist. World models enable novel experiences and lower the barrier to experimentation, while classical systems remain valuable.
- Pragmatism matters: start with approaches that reduce risk and deliver player value. Use classical systems where they make sense and hybridize with ML/LLMs where they add unique value.
- Hands-on experimentation across teams is critical — letting developers play with LLMs/agents helps discover productive workflows.
- Upskilling and returning AI tools/tech to game teams (rather than centralizing all AI work) is important for adoption and long-term success.
Use cases and examples mentioned
-
Rendering
- DLSS5 (NVIDIA demoed recent advances).
-
NPCs and in-game companions
- Craftton’s PUBG Ally — on-PC stack (ASR → LLM → TTS) running locally for immersive companions.
- ReLU Studio’s “Uncover the Smoking Gun” — early cloud LLM-driven detective NPCs (hallucination was acceptable given gameplay).
- Enzo / ZOE — live simulation with LLM-driven NPC agents; players can mod character prompts.
-
World models and generative interactive simulations
- DART — interactive real-time world models (Oasis One demo: 10M users in two weeks).
- World models used for rapid prototyping and democratizing development.
-
Game AI & advisors
- Creative Assembly (Total War series) exploring in-game AI advisors to improve onboarding and retention.
-
Bots and ML-driven gameplay
- Riot deploying ML bots in early PVE experiments; exploring imitation learning + RL for better tutorial/PVE bots.
-
Infrastructure & tooling
- Craftton building large training clusters (~1,000 GPUs) and training specialized speech-to-speech models for lower latency; focus on on-device inference.
- DART running expensive training (scaling toward hundreds of millions/year in compute) while aiming to reduce inference costs.
-
Evaluation
- Creative Assembly built an evaluation system combining designer/QA “golden” questions with LM-as-judge metrics (usefulness, correctness, tone/personality).
Strategies, practical tips and recommendations
-
Start small and practical
- Put off-the-shelf LLMs and models in the hands of teams to discover useful workflows before investing in custom models.
- Use Retrieval-Augmented Generation (RAG) and off-the-shelf inference where appropriate before training from scratch.
-
Hybrid design is often the best near-term approach
- Combine classical AI (behavior trees, utility systems) as “system 1” with LLM/agent “system 2” orchestrating or selecting scripted behaviors.
- Execute LLM-decided high-level plans by triggering existing robust game scripts.
-
Prioritize player value
- AI features should improve the game for players or enable new experiences — avoid adding AI purely for novelty.
-
Evaluation & control
- Build evaluation suites with designer-curated test cases and metrics (usefulness, correctness, tone). Treat an LM as a judge in early testing.
- Ensure explainability and designer control so players perceive fairness and predictable outcomes.
-
Cost & infrastructure pragmatics
- Start with cloud or rented GPU capacity for experimentation; benchmark costs and identify near-term productization paths (for example, internal dev tools) to justify compute spend.
- On-device inference is valuable for latency and cost but may require smaller or specialized SLMs.
-
Upskilling & adoption
- Create internal training, safe spaces, and curated toolkits to lower friction for non-AI specialists.
- Let central AI teams hand off prototypes and tooling to dev teams for long-term ownership and learning.
-
User testing & iteration
- Ship experimental features to learn; waiting for “perfect” will fall behind competitors.
- Collect player feedback and iterate — responses can vary widely by audience.
Technical & infrastructure notes
-
Model trends
- Small language models (SLMs) are improving quickly; the gap to large cloud LMs is shrinking.
- Emergent capabilities can appear with scale — certain tasks can suddenly become tractable above specific model sizes.
-
On-device vs cloud
- Local multi-model stacks (ASR → LLM → TTS) can run well on powerful PCs and are desirable for privacy and latency.
- Cloud models speed prototyping but increase ongoing cost and ops complexity.
-
Cost realities
- Training world-scale models is expensive (tens–hundreds of millions per year in compute).
- Inferencing, optimization, and engineering for efficiency are central technical challenges.
-
Tooling and resources
- Agents and automated summarization agents are already used internally to speed R&D.
- GPU/accelerator supply (e.g., NVIDIA Blackwell family) can be a bottleneck; teams should plan purchases and scaling accordingly.
Risks and challenges
-
Hallucinations
- Acceptable in some game contexts (e.g., detective games) but dangerous in others; require design-aware mitigation.
-
Difficulty tuning and fairness
- Strong models may exploit unintended mechanics or produce unfun results; tuning to achieve fair and “just-barely-beatable” difficulty is difficult.
-
Explainability and designer control
- Designers need predictable, controllable outcomes — pure generative systems can frustrate that need.
-
Organizational friction
- Central AI teams can be ineffective unless tightly integrated with game teams. Ownership and capabilities should sit with dev teams where possible.
-
Economic constraints
- High training costs demand near-term value extraction (internal tools, dev productivity gains) to sustain investment.
Concrete example best practices
-
Evaluating LMs for in-game characters
- Create designer/QA golden question sets and use automated LM-based judging for usefulness, correctness, and tone.
-
Handling NPC hallucination
- Design gameplay so hallucination is tolerable or enhances experience (for example, NPCs that lie during interrogation).
-
Hybrid scripting + agents
- Use agent orchestration to select high-level actions and trigger scripted behaviors for execution and control.
-
Public experimentation
- Test features publicly or in betas to learn player responses (example: Craftton’s PUBG LA beta testing).
People, companies, projects and sources mentioned
-
Panel hosts / speakers
- John Spitzer — VP of Developer Performance & Technology, NVIDIA (host/panelist)
- Dean — founder/CEO of DART (world-model work)
- Konguk — Chief AI Officer at Craftton; CTO at Ludo Robotics
- CH Kim — CEO of Craftton
- D (aka Dugu) — leads AI and R&D at Creative Assembly (SEGA)
- Troder Cashion (Troder) — Lead AI Product, Riot Games
-
Other contributors mentioned
- Wesker — leads Riot’s central AI research
- Yanni/Yiani — aided Creative Assembly with evaluation tooling
- Ike — initial introducer/moderator (brief mention)
- Jensen Huang — referenced (NVIDIA CEO)
-
Companies, projects and tech referenced
- NVIDIA — DLSS5, Blackwell GPUs
- DART — world models, Oasis One demo
- Craftton — PUBG Ally, PUBG LA, Enzo/ZOE, ReLU Studio projects
- Ludo Robotics — related robotics group
- Creative Assembly — Total War series, AI advisors
- Riot Games — ML bots, League of Legends, Valorant
- ReLU Studio — LLM NPC experiments
- Anthropic, OpenAI — model vendors referenced
- Gemini, Gemini CLI — referenced as used tools
- GitHub Copilot — example of developer productivity change
- Oasis One — DART’s public demo
- DLSS5 — NVIDIA rendering advancement
- Blackwell 300 — GPU family referenced
- RAG (retrieval-augmented generation) — architecture approach discussed
End (no further conversation)
Category
Gaming
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.