Summary of Reinforcement Learning in Action: Creating Arena Battle AI for 'Blade & Soul'

Summary of "Reinforcement Learning in Action: Creating Arena Battle AI for 'Blade & Soul'"

Overview:
This video presents a detailed case study on developing AI agents for the arena battle mode in the MMORPG Blade & Soul using Reinforcement Learning (RL). The project focuses on creating programmable, competitive AI agents capable of playing one-on-one battles with diverse fighting styles (aggressive, balanced, defensive) and performing at a professional gamer level.

Key Technological Concepts and Challenges:

Problem Setting:
- Blade & Soul arena battle is a real-time, one-on-one fighting game where players compete to reduce the opponent’s HP to zero within three minutes.
- The game has 11 character classes; the project focused on the "Destroyer" class due to its popularity and fixed skill settings for fairness.
- The agent observes the environment (HP, skill points, distance, cooldowns, arena boundaries) and chooses actions every 100 milliseconds.
- Actions include skill use, movement, and targeting, with complex strategic trade-offs (e.g., crowd control skills before damage skills, timing resistance or escape skills).
Challenges:
- High Complexity: Extremely large state and action spaces (~10¹⁸⁰⁰ for the Destroyer), combining skill, movement, and targeting decisions.
- Real-Time Response Constraint: Decisions must be made quickly, disallowing computationally expensive search methods; neural network policies are used for efficient inference.
- Generalization: The AI must perform well against a wide variety of opponents, including human players with unpredictable styles.
- Guiding Fighting Styles: Without hard-coded rules, the AI needs to exhibit diverse styles (aggressive, balanced, defensive) through reward shaping.

Reinforcement Learning Approach:

Basic RL Concept: Agents learn through trial and error, modifying their policy to maximize cumulative rewards.
Rewards:
- Primary: Winning the match and dealing damage (reducing opponent HP).
- Secondary: Shaped rewards to encourage different aggressiveness levels (damage dealt vs. HP preserved, distance maintained or closed).
Features: The AI receives the same information as human players, such as HP, skill points, distance, position, and skill cooldowns.
Training Setup:
- 100 parallel simulations running combat games.
- Used the ACER (Actor-Critic with Experience Replay) RL algorithm to update policies.
- Employed self-play with a growing pool of past agents to ensure diverse and progressively stronger opponents.

Engineering Techniques:

Movement Policy Learning: To overcome minimal impact of single move actions, agents maintain the same move decision for 10 consecutive steps, enabling meaningful movement strategy learning.
Skill Usage Constraints: Skills with limited range are blocked when opponents are too far, preventing wasteful exploration and speeding up learning.
Feature Discretization: Continuous distance features were discretized to help the agent better learn exact skill ranges, improving skill usage precision (e.g., for the "drag" skill).

Experimental Results:

Initial training against a fixed built-in AI showed rapid improvement but the agent was still exploitable by humans.
Self-play training against a diverse pool of agents led to stable performance increases and better generalization.
The trained AI defeated both built-in AI and skilled human players it had never encountered before, demonstrating robustness.
Demo videos showed progressive improvement from random actions to executing complex combos and strategic skill use.
Different fighting styles were visually and behaviorally distinct: aggressive agents relentlessly attack, defensive agents maintain distance and counterattack.

Professional Evaluation:

Prior to a live blind match event, professional gamers tested the AI with imposed reaction delays (~230 ms) to ensure fairness.
Aggressive AI agents dominated most matches, overwhelming human players by not allowing breaks in combat.
Balanced and defensive agents produced more competitive, human-like matches.
Pro gamers acknowledged the aggressive AI's flawless and unique playstyle, differing significantly from human strategies.

Conclusions:

Successfully created pro-level AI for a complex real-time fighting game using Reinforcement Learning.
Demonstrated that diverse fighting styles can be guided via reward shaping without hard-coded rules.
Developed robust AI through self-play with a diverse opponent pool.
Employed engineering techniques to reduce problem complexity and accelerate learning.

Speakers / Sources:

Jinyoung Zhang – Team Leader, introduced the problem, challenges, and overall project goals.
Sunro (Sun Noor) – AI Research Engineer, explained Reinforcement Learning fundamentals, reward design, training process, engineering techniques, and experimental results.
Additional insights from professional gamers who tested the AI in blind matches.

Additional Notes:

The team announced a related session on gym learning-based universe kinematics.
Q&A touched on fairness of AI’s aggressive style and tuning of neural network weights through trial and error.