Summary of "[ИАД, весна 2025] Рекомендательные системы, 7"

Lecture summary — Recommender Systems: Introduction to Bandits (Spring 2025)

High-level takeaways


Formal stochastic MAB problem

Setup

Common assumptions


Algorithms (stochastic MAB)

1. Naive greedy (sample mean)

2. ε-greedy

3. UCB (Upper Confidence Bound) family

4. Thompson Sampling (Bayesian / posterior sampling)


Contextual bandits — linear model (LinUCB-style)

Motivation

Linearity assumption

Two modeling approaches

Implementation tips


Empirical behavior and comparisons


Practical considerations and extensions

Tip (practical)

Randomize tie-breaking and ensure sufficient exploration (via ε, UCB constants, or priors) to avoid getting stuck on suboptimal actions.


Concise algorithmic checklists (implementation)

ε-greedy

UCB

Thompson Sampling (Bernoulli case)

LinUCB / linear contextual UCB


References and notes

Category ?

Educational


Share this summary


Is the summary off?

If you think the summary is inaccurate, you can reprocess it with the latest model.

Video