Summary of Reinforcement Learning in Action: Creating Arena Battle AI for 'Blade & Soul'
Summary of "Reinforcement Learning in Action: Creating Arena Battle AI for 'Blade & Soul'"
Overview:
This video presents a detailed case study on developing AI agents for the arena battle mode in the MMORPG Blade & Soul using Reinforcement Learning (RL). The project focuses on creating programmable, competitive AI agents capable of playing one-on-one battles with diverse fighting styles (aggressive, balanced, defensive) and performing at a professional gamer level.
Key Technological Concepts and Challenges:
- Problem Setting:
- Blade & Soul arena battle is a real-time, one-on-one fighting game where players compete to reduce the opponent’s HP to zero within three minutes.
- The game has 11 character classes; the project focused on the "Destroyer" class due to its popularity and fixed skill settings for fairness.
- The agent observes the environment (HP, skill points, distance, cooldowns, arena boundaries) and chooses actions every 100 milliseconds.
- Actions include skill use, movement, and targeting, with complex strategic trade-offs (e.g., crowd control skills before damage skills, timing resistance or escape skills).
- Challenges:
- High Complexity: Extremely large state and action spaces (~101800 for the Destroyer), combining skill, movement, and targeting decisions.
- Real-Time Response Constraint: Decisions must be made quickly, disallowing computationally expensive search methods; neural network policies are used for efficient inference.
- Generalization: The AI must perform well against a wide variety of opponents, including human players with unpredictable styles.
- Guiding Fighting Styles: Without hard-coded rules, the AI needs to exhibit diverse styles (aggressive, balanced, defensive) through reward shaping.
Reinforcement Learning Approach:
- Basic RL Concept: Agents learn through trial and error, modifying their policy to maximize cumulative rewards.
- Rewards:
- Primary: Winning the match and dealing damage (reducing opponent HP).
- Secondary: Shaped rewards to encourage different aggressiveness levels (damage dealt vs. HP preserved, distance maintained or closed).
- Features: The AI receives the same information as human players, such as HP, skill points, distance, position, and skill cooldowns.
- Training Setup:
- 100 parallel simulations running combat games.
- Used the ACER (Actor-Critic with Experience Replay) RL algorithm to update policies.
- Employed self-play with a growing pool of past agents to ensure diverse and progressively stronger opponents.
Engineering Techniques:
- Movement Policy Learning: To overcome minimal impact of single move actions, agents maintain the same move decision for 10 consecutive steps, enabling meaningful movement strategy learning.
- Skill Usage Constraints: Skills with limited range are blocked when opponents are too far, preventing wasteful exploration and speeding up learning.
- Feature Discretization: Continuous distance features were discretized to help the agent better learn exact skill ranges, improving skill usage precision (e.g., for the "drag" skill).
Experimental Results:
- Initial training against a fixed built-in AI showed rapid improvement but the agent was still exploitable by humans.
- Self-play training against a diverse pool of agents led to stable performance increases and better generalization.
- The trained AI defeated both built-in AI and skilled human players it had never encountered before, demonstrating robustness.
- Demo videos showed progressive improvement from random actions to executing complex combos and strategic skill use.
- Different fighting styles were visually and behaviorally distinct: aggressive agents relentlessly attack, defensive agents maintain distance and counterattack.
Professional Evaluation:
- Prior to a live blind match event, professional gamers tested the AI with imposed reaction delays (~230 ms) to ensure fairness.
- Aggressive AI agents dominated most matches, overwhelming human players by not allowing breaks in combat.
- Balanced and defensive agents produced more competitive, human-like matches.
- Pro gamers acknowledged the aggressive AI's flawless and unique playstyle, differing significantly from human strategies.
Conclusions:
- Successfully created pro-level AI for a complex real-time fighting game using Reinforcement Learning.
- Demonstrated that diverse fighting styles can be guided via reward shaping without hard-coded rules.
- Developed robust AI through self-play with a diverse opponent pool.
- Employed engineering techniques to reduce problem complexity and accelerate learning.
Speakers / Sources:
- Jinyoung Zhang – Team Leader, introduced the problem, challenges, and overall project goals.
- Sunro (Sun Noor) – AI Research Engineer, explained Reinforcement Learning fundamentals, reward design, training process, engineering techniques, and experimental results.
- Additional insights from professional gamers who tested the AI in blind matches.
Additional Notes:
- The team announced a related session on gym learning-based universe kinematics.
- Q&A touched on fairness of AI’s aggressive style and tuning of neural network weights through trial and error.
Category
Technology