Summary of CityLearn Tutorial
Summary of "CityLearn Tutorial"
This tutorial, presented by Kingsley (a graduate student at UT Austin), introduces the CityLearn (Cen) simulation environment designed for implementing and benchmarking control algorithms for distributed energy resources (DERs) in Grid-Interactive Buildings (GIBs). The tutorial is part of the Climate Change AI Summer School 2024 and focuses on controlling batteries and thermal energy storage in a multi-building environment using rule-based and Reinforcement Learning (RL) algorithms.
Main Ideas and Concepts
- CityLearn (Cen) Environment:
- Open-source Python platform for multi-agent control of DERs in building districts.
- Models thermal and electric loads, on-site solar generation, and battery storage.
- Supports simulation of grid outages and interaction with grid carbon intensity and electricity pricing data.
- Provides datasets from real residential communities (e.g., Fontana, California).
- Grid-Interactive Buildings (GIBs):
- Buildings equipped with smart controls, DERs (PV, batteries, EVs), and energy flexibility capabilities.
- Aim to reduce grid load peaks, emissions, and costs while maintaining occupant comfort.
- Use load shifting, load shedding, and demand flexibility strategies.
- Control Approaches for GIBs:
- Rule-Based Control (RBC):
- Simple if-then-else logic based on observed states (e.g., hour of day).
- Easy to implement but poor generalizability and adaptability.
- Model Predictive Control (MPC):
- Optimization-based, forecasts future states and plans actions.
- Requires accurate building models and domain knowledge; computationally expensive.
- Reinforcement Learning (RL):
- Learns control policies via trial and error using reward feedback.
- Adaptive and scalable but dependent on data quality and quantity.
- Includes tabular Q-learning and deep RL (e.g., Soft Actor-Critic, SAC).
- Rule-Based Control (RBC):
- Reinforcement Learning (RL) Basics:
- Unlike supervised learning, RL learns from rewards without explicit labels.
- Balances exploration (trying new actions) and exploitation (using known good actions).
- Q-learning updates a Q-table of state-action values using the Bellman equation.
- SAC uses neural networks to approximate Q-values and policies, better suited for continuous state/action spaces.
- Challenges in Building Control with RL:
- High dimensionality of continuous sensor data leads to large state-action spaces (curse of dimensionality).
- Tabular Q-learning struggles due to discrete state/action requirements and large tables.
- SAC provides better generalization via function approximation but requires careful tuning and reward design.
- Datasets and Simulation Setup:
- Data from a 17-building net-zero energy community with PV and batteries.
- Time series cover one year with hourly resolution, including weather, electricity prices, and grid carbon intensity.
- Tutorial uses a simplified setup with 2 buildings, 1-week simulation, and hour-of-day as the sole observation to illustrate concepts.
- Key Performance Indicators (KPIs) for Evaluation:
- Electricity cost (imported from grid)
- Carbon emissions
- Average daily peak demand
- Load ramping (smoothness of load changes)
- Load factor (efficiency of electricity consumption)
- KPIs are normalized against a baseline case with no battery control.
Methodology and Instructions (Experiments & Exercises)
- Setup and Environment Initialization:
- Confirm Python version (≥3.7).
- Install CityLearn and dependencies.
- Load dataset and select buildings and simulation period randomly but reproducibly (fixed random seed).
- Limit observations (hour of day) and use single-agent centralized control.
- Baseline and Random Control Agents:
- Baseline: No battery control, only PV self-generation.
- Random agent: Controls batteries with random actions.
- Run inference and visualize KPIs and load profiles.
- Rule-Based Control (RBC) Agent:
- Implement simple if-then rules based on hour of day to charge/discharge batteries.
- Example: Charge battery in first half of day, discharge in second half.
- Compare performance against baseline and random agents.
- Exercise: Tune RBC logic to reduce cost, emissions, peak, ramping, and improve load factor by at least 5%.
- Tabular Q-learning Agent:
- Discretize continuous observations and actions into bins (e.g., 24 for hour, 12 for battery actions).
- Initialize Q-table and train using episodes with epsilon-greedy exploration.
- Visualize Q-table,
Category
Educational