Summary of "Lecture 1: RL 수업소개 (Introduction)"

Reinforcement Learning — Introductory Lecture (Kim Seong-hun, HKUST)

Topic and purpose

This is an introductory lecture that explains what reinforcement learning (RL) is, why it matters, and what kinds of problems it can solve. The lecture covers conceptual explanations (with intuitive examples), historical context and recent breakthroughs, a survey of applications, and an outline of the course content and intended audience.

What reinforcement learning (RL) is — core concepts

Reinforcement learning: an agent interacts with an environment by taking actions, receives observations (states) and rewards, and aims to learn a policy that maximizes cumulative reward.

Key elements and ideas:

Agent, environment, actions, states/observations, and rewards.
Learning from interaction (trial-and-error) rather than from supervised labels.
The objective is to learn a behavior (policy) that maximizes expected cumulative reward.
Sparse-reward example: a mouse searching for cheese — many actions with no immediate feedback and only occasional reward when cheese is found.
RL mirrors real-life learning such as animal training with positive reinforcement or childhood praise/scolding.

Historical context and revival

RL has a long history and appears in older machine-learning textbooks (it is not new).
A major revival followed DeepMind’s deep RL work (DQN): a single algorithm applied to many Atari games, learning directly from pixel inputs.
DeepMind demonstrated that one general algorithm could learn to play many different games from raw pixels and often outperform humans; the results were widely publicized (Nature paper).
AlphaGo (2016), which defeated Lee Sedol 4–1, used RL-related techniques along with other methods and further increased public interest.

Example applications (broader impact)

Data-center energy management: DeepMind applied RL to reduce cooling energy consumption by roughly 40% in Google data centers.
Robotics: learning low-level control such as joint/actuation control.
Operations and business: inventory management and related decision problems.
Finance: algorithmic trading and portfolio control.
Media / recommendation / advertising: deciding which content or ads to show to users to optimize longer-term engagement or revenue.

Course audience, prerequisites, and tooling

Target audience: anyone interested; those with math or CS background will find it easier, but material is intended to be accessible to beginners.
Minimum useful math familiarity: basic notation and simple equations common in RL (for example expressions like R + γQ).
Programming: practical labs use Python, TensorFlow, and OpenAI Gym. Prior programming experience helps but is not strictly required.
Course goals / upcoming content:
- Start with OpenAI Gym environments and tabular methods.
- Replace tabular representations with neural networks (deep RL).
- Implement policy-gradient methods and value-based methods (Q-learning / deep Q).
- Cover other modern RL techniques and practical labs.

Study resources and recommendations

The lecturer provides slides and recorded videos for reference.
Many additional online resources (mostly in English) are recommended; studying them in parallel is helpful.
The lecture builds on concepts from an earlier introductory course taught by the instructor; students who haven’t taken that prior course are encouraged to review it.

Course/practical methodology (roadmap)

Gain conceptual understanding: agent, environment, state/observation, action, reward, episodic vs. continuing tasks.
Start with simple tabular RL methods (e.g., tabular Q-learning, policy evaluation) in small discrete environments.
Use OpenAI Gym to run simple environments and observe agent–environment interactions.
Replace tabular representations with function approximators (neural networks) to handle high-dimensional inputs (e.g., pixels).
Implement and study:
- Deep Q-learning (DQN) — value-based deep RL.
- Policy-gradient methods — directly parameterize and optimize policies.
Experiment with TensorFlow + OpenAI Gym examples and follow practical labs.
Consult provided slides/videos and additional online materials for deeper study.

Speakers and sources referenced

Kim Seong-hun — lecture presenter (HKUST)
DeepMind team — authors of the DQN Atari work and follow-up deep RL research
AlphaGo — DeepMind’s Go-playing system; Lee Sedol — professional Go player (defeated in 2016)
Google / DeepMind data-center application (energy/cooling optimization)
Positive reinforcement dog-training program — intuitive example used in lecture
OpenAI Gym — RL environment toolkit used in the course
TensorFlow — deep learning framework used in practicals
Older machine-learning textbooks and the Nature paper reporting DeepMind’s Atari results

(End of summary.)