Summary of "[ИАД, весна 2025] Рекомендательные системы, 1"
Overview
This document summarizes Lesson 1 of the course “Recommender Systems”. The lecture covers historical context and motivation, formal problem statements, common recommendation approaches, data/feedback types, evaluation metrics, validation and splitting practices, common pitfalls in evaluation, and course logistics. The primary course focus is Top-K recommendation (typical K ≈ 10).
Historical context and motivation
- Early recommendation systems included human curation (tags, editor picks). Modern recommender systems emphasize algorithmic and mathematical models.
- Business motivation: recommenders drive revenue and engagement for many companies (examples: Amazon, Netflix).
- User motivation: help users filter growing content, save time, and discover relevant or novel items (example: Yandex.Music).
Types of recommender approaches
- Non-personalized
- Global selections such as “most popular” lists.
- Personalized
- Content-based: use item features or metadata to recommend similar items.
- Collaborative filtering: use interaction history across users (user-based, item-based, matrix factorization; more detail in later lectures).
- Hybrid: combine content and collaborative techniques.
Data and feedback types
- Explicit feedback: user-provided ratings (e.g., 1–5 stars).
- Implicit feedback: interactions such as views, clicks, add-to-cart, listen duration. Often binarized (e.g., listened > 50% → positive).
- Domain-specific behavior affects modeling:
- Repeated purchases common in groceries.
- Mostly-once items for movies, books, music — may require filtering already-seen items from recommendations.
Formal problem statement
Given finite sets U (users) and I (items) and a sparse user×item interaction matrix R:
- Common tasks
- Rating prediction (matrix completion): predict value for a given (user, item) pair (classic example: Netflix Prize).
- Top-K recommendation: for each user, select K items to recommend (primary focus of the course). Typical K ≈ 10; sometimes K ≈ 20 or K = 1 in specific settings.
- Item-to-item recommendation: e.g., “people who bought X also bought Y” (useful for item-level suggestions).
Evaluation metrics
- Ranking-aware metrics
- DCG (Discounted Cumulative Gain): discount relevance by position (logarithmic). Normalize per user by ideal DCG (IDCG) → NDCG ∈ [0,1]. Average NDCG across users.
- Precision and ordering-sensitive metrics
- Precision@K: fraction of relevant items in the top K.
- MAP@K (Mean Average Precision at K): for each user compute average precision over relevant positions up to K (precision@position when that item is relevant), then average across users.
- Other quality aspects (not covered in depth)
- Novelty, serendipity/unexpectedness, coverage (fraction of items recommended), and business metrics (engagement, profit, CTR).
Practical validation and data splitting
- Common split strategies
- Leave-last-one per user: keep each user’s last interaction as test, previous as validation, remaining for training.
- Time-based splits: use an earlier time window for training, later windows for validation/test (avoids temporal leakage).
- Per-user holdouts vs global time-splits: choose based on the evaluation goal; they affect which users/items appear in each split.
- Cold-start / cold users
- Users with no interactions in train/validation but items in test are “cold”. Handling depends on model family; some experiments remove cold users, others keep them (e.g., Transformer-based recommenders).
- Reproducibility and implementation pitfalls
- Same metric names may be implemented differently by frameworks (different denominators, clipping, smoothing).
- Small protocol changes for ranking metrics yield different numeric results. Document the exact protocol and ensure reproducible evaluation.
Types of experiments
- Offline evaluation: compute metrics on historical data (main approach in the course).
- Online evaluation / A/B testing (policy evaluation): real-user experiments with control and treatment groups; different measurement concerns and necessary for confirming business impact.
Methodology — actionable instructions
- Problem setup
- Define user set U and item set I and build the interaction matrix R.
- Decide the task: rating prediction or Top-K recommendation. Choose K to match the UI/business (commonly 10).
- Prepare data
- Determine whether feedback is explicit or implicit.
- For implicit signals, set binarization thresholds (e.g., listened_duration ≥ 50% → positive).
- Consider domain characteristics and whether to filter previously consumed items.
- Choose approach
- If rich item metadata and sparse collaborative signals → content-based or hybrid.
- If rich cross-user interaction data → collaborative filtering (user/item-based, matrix factorization, etc.).
- For production/advanced modeling → consider hybrid and modern deep-learning methods (Transformers mentioned for later).
- Split data for validation
- Use per-user holdouts (leave-last-one) for per-user evaluation or time-based splits to avoid temporal leakage.
- Decide treatment of cold users to match the intended deployment scenario.
- Evaluate
- For ranking sensitivity, use NDCG@K: compute DCG per user, compute IDCG per user, normalize, then average.
- For ordering-sensitive precision, compute MAP@K:
- For each user: at each rank m ≤ K, compute precision@m if the item is relevant; average these → AP@K for the user.
- Average AP@K across users → MAP@K.
- Report additional measures where relevant: novelty, serendipity, coverage, and business metrics (CTR, engagement, revenue).
- Reproducibility and rigor
- Document exact metric definitions (denominators, clipping, smoothing).
- Ensure evaluation code matches the protocol; be aware of multiple valid variants for ranking evaluations.
- When comparing with external results, verify matching metric and split logic.
- Production considerations
- Filter already viewed/purchased items when appropriate.
- Choose K to match UI and business needs.
- Account for domain-specific behaviors (repeat purchases, session-based vs long-term).
Common pitfalls and nuances
- Domain differences require different handling (repeat purchases, whether to filter seen items).
- Including or excluding cold users in test impacts realism and reported metrics.
- Metric name collisions: identical names can conceal different implementations.
- Offline metric improvements do not always translate to online gains; final assessment requires A/B testing.
References and materials
- Netflix Prize (early high-profile recommender competition).
- Company examples: Amazon, Netflix (business motivation).
- Product example: Yandex.Music.
- Instructor referenced classical recommender-system books, review articles, and a specific article on evaluation pitfalls and a taxonomy of evaluation choices (exact titles not provided in the transcript).
- Course schedule and materials are published on GitHub.
Course logistics
- Schedule: published on GitHub.
- Assessment: two homework assignments (grading based on homework). Additional optional tasks available. No formal exam planned; evaluation mainly via homework and optional tasks.
Speakers and sources
- Main speaker: course lecturer (unnamed).
- Audience: students asking clarifying questions during the lecture.
- Referenced external sources: Netflix Prize, Amazon, Netflix, Yandex.Music, and unspecified textbooks/articles on recommender systems.
Note: The course will cover more advanced models, evaluation protocols, and online policy evaluation in later lectures.
Category
Educational
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.
Preparing reprocess...