Summary of "[ИАД, весна 2025] Рекомендательные системы, 2"
Summary — main ideas and lessons
1) Topic and context
- The lecture/seminar covers early (historical and practical) recommender system models:
- Emphasis on collaborative filtering (CF), especially memory-based (neighborhood) approaches.
- Introduction to a matrix-based item-item learning approach (SLIM-like model).
- A practical seminar runs a reproducible notebook (course repository) on a MovieLens dataset (here the 1M version) to demonstrate training, validation and metrics.
2) Historical motivation and intuition
- Collaborative filtering grew from people annotating/filtering content (moderation, tagging) so others could find relevant items. Modern CF automates this with machine learning.
- The catalog-size / filtering problem: explicit filters (size, color, etc.) reduce choice; CF aims to automatically surface relevant items from a large catalog.
3) Types of collaborative filtering
- Memory-based (neighborhood) methods — focus of this lecture:
- Use observed ratings directly (no large parameterized model).
- User-based CF: compute similarity between users; predict a target user’s rating by aggregating ratings from similar users.
- Item-based CF: compute similarity between items; predict a user’s rating for an item by aggregating the user’s ratings of similar items.
- Model-based methods (covered later): learn parameterized models (matrix factorization, neural methods, etc.).
4) User-based vs. item-based — pros and cons
- Item-based:
- Similarities between items are usually more stable over time (item relationships change slowly).
- Often gives higher practical quality.
- User-based:
- Can find serendipitous or unexpected matches because user tastes may change.
- Conceptually both approaches follow the same pattern: compute similarity, then aggregate neighbors’ signals.
5) Formal setup and goals
- Given a sparse user-item rating matrix R:
- Predict missing ratings (rating prediction).
- Produce for each user a ranked list of top-N recommendations (ranking). Predicted ratings can be sorted to obtain recommendations.
- Notation:
- I_u = items with known ratings by user u.
- U_i = users who rated item i.
Memory-based (neighborhood) methods
Methodology / steps
-
Basic aggregation options to predict r_{u,i}:
- Simple average of items user u rated (non-personalized baseline).
- Weighted average of ratings of neighbor items/users (K-nearest neighbors): weights = similarities between items (or users).
- Centering/normalization: subtract user mean (or other normalization) before aggregation; add mean back after aggregation to handle different rating scales.
-
Similarity measures:
- Cosine similarity between item (or user) vectors (consider only co-rated entries).
- Pearson correlation (accounts for mean-centering).
-
Practical improvements:
- IDF-like weighting (analogous to NLP): downweight very popular items to reduce popularity bias.
- Shrinkage / significance weighting: reduce similarity when two items/users have few co-ratings (e.g., multiply by factor based on common-rating count). Typical shrinkage constant β ≈ 100.
- Normalize by number of ratings to reduce noise from rarely-rated items.
-
Neighborhood selection:
- Choose K nearest neighbors (K small) rather than aggregating over all items — typical KNN model.
- Candidate selection: items to consider are usually all items minus those already seen by the user; can also omit cold items.
-
Conceptual item-based KNN prediction formula:
r̂{u,i} = mean_u + (Σ |sim(i,j)|} sim(i,j) * (r_{u,j} − mean_u)) / Σ_{j ∈ N(i) Where N(i) is the chosen neighbor set of item i for user u.
Implementation tips
- Replace missing R entries with zeros only when needed for matrix ops; be careful about sparse vs dense semantics.
- Map original user/item IDs to compact integer indices and preserve mappings to restore outputs.
- Filter out test users with no train history (cold users) for many classical CF evaluations, or treat them separately.
Matrix-based item-item learning (SLIM-like)
Idea and objective
- Learn an item-item weight matrix W so that R ≈ R W. Each item’s predicted ratings are linear combinations of the user’s ratings on other items using W.
-
Optimization objective (conceptual):
minimize ||R − R W||^2 + λ ||W||_2^2 subject to diag(W) = 0 and optionally W ≥ 0 and/or L1 sparsity.
-
Notes on constraints and regularization:
- diag(W) = 0 prevents trivial identity solution.
- Non-negativity (W ≥ 0) is sometimes enforced for interpretability but is optional.
- L2 regularization (λ) prevents overfitting; adding L1 induces sparsity but removes closed-form solutions.
Solution approaches and practicalities
- With only L2 (and relaxed constraints) a closed-form solution may exist. With L1 or non-linear constraints, use iterative solvers.
- In practice, solve per-column regression problems (learn each item’s weights independently).
- When using matrix algebra, set missing entries to zero but track their meaning.
- Enforce zero diagonal and choose solvers/scaling appropriate for many items (sparse computations, memory efficiency).
Similarity computation practicalities
- Item columns and user rows are sparse vectors; exploit sparsity for efficiency.
- Adjust cosine similarity by IDF or shrinkage to handle popularity imbalance and sparse co-occurrences.
- Zipf’s law: item popularity follows a heavy-head/long-tail distribution; popularity heavily influences metrics.
Data, splits and evaluation (seminar practicals)
Data and preprocessing
- Dataset used: MovieLens 1M (≈1M interactions). Other MovieLens versions: 100k, 20M.
- Preprocessing steps:
- Map and sanitize user/item IDs.
- Sort by timestamp; if no timestamp, decide splitting order carefully.
- Build a sparse user-item rating matrix R.
Train/test split approaches
- Time-based split: train = interactions before time T, test = after T (recommended when timestamps exist).
- Per-user holdout: for each user, keep some recent interactions for test.
- Remove test users with no train history (unless deliberately evaluating cold-start).
Candidates and cold items/users
- Decide candidate pool for recommendations (all items minus seen items, or a restricted set).
- Cold users (no train history) are problematic for classical memory-based CF; handle with special methods (popularity / content-based), or exclude.
Metrics covered
- Hit Rate: proportion of users for which at least one recommendation contains a held-out item.
- Coverage: fraction of unique items that appear in recommendations across all users (measures breadth).
- Other auxiliary metrics: precision@N, recall@N, ranking metrics (functions in the notebook compute these).
Baselines
- Popularity baseline: recommend top-N most popular items (often a strong baseline due to popularity bias).
- Compare neighborhood methods and learned item-item models against popularity and other baselines.
Practical coding notes from the seminar
- The course provides a runnable notebook (CPU OK) — suggested to run in VLab/Colab/Binder and save copies.
- Notebook includes: data loading, splitting, similarity computation, baseline implementations, and metric computations.
- Instructor/debugging notes discussed live (examples: vector vs matrix division, setting diagonal to zero, adding identity for numerical stability). Instructor planned notebook updates.
Final course plan pointers
- Next lectures: matrix factorization (MF) and further models; next session led by Anya.
- Homework deadlines and submission details are in the repo README.
Detailed recipes — building and evaluating a memory-based item-item recommender
-
Preprocessing
- Map original user/item IDs to compact indices and keep mappings.
- Sort interactions by timestamp (if available).
- Build sparse user-item rating matrix R.
-
Create train/test split
- Time-based: train = interactions before T, test = after T; or per-user holdout (keep recent interactions per user for test).
- Remove test users with no train history (unless evaluating cold-start).
-
Compute item-item similarities
- For each item pair (i, j) compute cosine or Pearson similarity using only co-rated users.
- Apply shrinkage:
- sim_shrunk = (n_co_ratings / (n_co_ratings + β)) * sim
- Optionally apply IDF-like weighting to reduce popularity bias.
-
Make predictions (KNN item-based)
- For target (u, i): select K most similar items to i among items that u has rated.
- Compute weighted aggregate of u’s ratings on those neighbors; apply normalization (subtract user mean if using centered ratings).
- Exclude items already seen by u when producing top-N recommendations.
-
Evaluate
- For each test user, produce top-N recommendations from candidate pool.
- Compute hit rate, coverage, precision@N, recall@N, etc.
- Compare to simple baselines (popularity).
-
(Alternative) Learn item-item weight matrix W
- Solve optimization: minimize ||R − R W||^2 + λ ||W||_2^2 with diag(W)=0 (optionally W≥0, L1 for sparsity).
- Use closed-form when available (no L1, relaxed constraints) or iterative solvers otherwise.
- Compute predictions as R̂ = R W.
Parameters and typical values
- Shrinkage β for similarity: ≈ 100 (typical starting value).
- Regularization λ for W learning: tune by validation.
- K in KNN: choose by validation (not specified in transcript).
Speakers / sources referenced
- Main lecturer / course instructor (primary speaker).
- Senya — participant (student or TA).
- Anya — will lead the next lecture.
- Paper authors for SLIM/item-item learning (referenced but not named).
- A “classic” recommender-systems book referenced (title/author not specified).
- Datasets and external concepts:
- MovieLens (100k, 1M, 20M), Netflix (historical reference).
- Zipf’s law, and NLP IDF analogy for weighting.
Category
Educational
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.