Summary of "W3L8: GANs as classifier-guided generative sampler"

This video lecture is part of a generative AI course and focuses on an alternative interpretation of Generative Adversarial Networks (GANs) as classifier-guided generative samplers. It builds upon the previous session's discussion of Variational Divergence Minimization (VDM) and GANs as a special case of VDM.

Main Ideas and Concepts

Recap of GANs and Variational Divergence Minimization (VDM)
- GANs are a special case of VDM algorithms.
- Two neural networks: a generator (produces samples) and a discriminator/classifier (distinguishes real from generated samples).
- The goal is to minimize a divergence (specifically an f-divergence) between the true data distribution \( p_x \) and the generator’s distribution \( p_\theta \).
- Direct minimization of f-divergence is intractable, so a lower bound is constructed and optimized alternately over the generator and discriminator.
Classifier-Guided Interpretation of GANs
- The discriminator can be viewed as a binary classifier \( D_w \) that outputs the likelihood that a sample \( x \) comes from the real data distribution \( p_x \) (label 1) or from the generator distribution \( p_\theta \) (label 0).
- The generator aims to fool the classifier by generating samples that the classifier cannot distinguish from real data.
- Training involves an adversarial game:
  - The classifier maximizes its ability to correctly distinguish real vs. generated data.
  - The generator minimizes this ability (i.e., tries to make the classifier fail).
Key Insight: Classifier Failure and Distribution Matching
- When the classifier fails to distinguish between \( p_x \) and \( p_\theta \), it implies the distributions are close.
- However, classifier failure does not necessarily imply \( p_\theta = p_x \). The generator can "trick" a fixed classifier without truly matching the data distribution.
- This is demonstrated via a counterexample with 2D data clusters where the generator moves samples to a different region that the fixed classifier cannot distinguish, yet the distributions differ.
Simultaneous Optimization of generator and Classifier
- To avoid the problem above, the classifier is not fixed but updated simultaneously with the generator.
- The process alternates between:
  - Updating the classifier to better distinguish real and generated data.
  - Updating the generator to fool the updated classifier.
- This corresponds to alternately tightening the lower bound on f-divergence and minimizing it.
mode collapse and Failure Modes
- The alternating updates can lead to mode collapse, where the generator oscillates between a few modes without fully capturing the real data distribution.
- The generator and classifier can get stuck in a loop, fooling each other without convergence.
- This explains why GAN training is notoriously unstable and challenging.
Mathematical Formulation of Classifier and generator Objectives
- The classifier \( D_w(x) \) outputs the probability that \( x \) is from \( p_x \).
- Classifier objective (maximize w.r.t. \( w \)): \[ \max_w \mathbb{E}_{x \sim p_x} [\log D_w(x)] + \mathbb{E}_{\hat{x} \sim p_\theta} [\log (1 - D_w(\hat{x}))] \]
- generator objective (minimize w.r.t. \( \theta \)): \[ \min_\theta \mathbb{E}_{\hat{x} \sim p_\theta} [\log (1 - D_w(\hat{x}))] \]
- This adversarial min-max optimization corresponds to the saddle point problem defining GAN training.
Relation to Variational Divergence Minimization
- The classifier objective corresponds to the lower bound on the f-divergence between \( p_x \) and \( p_\theta \).
- Improving the classifier tightens this bound, improving the generator’s ability to minimize the divergence.
- The alternating optimization between classifier and generator is equivalent to alternating between constructing and minimizing this lower bound.
Summary of the Classifier-Guided GAN Training Procedure
- Start with generator parameters \( \theta \).
- Construct the classifier \( D_w \) to distinguish \( p_x \) and \( p_\theta \).
- Update \( w \) to maximize classifier accuracy.
- Update \( \theta \) to minimize classifier accuracy (make classifier fail).
- Repeat until convergence or until mode collapse occurs.