Summary of "[ИАД, весна 2025] Рекомендательные системы, 2"

Summary — main ideas and lessons

1) Topic and context

The lecture/seminar covers early (historical and practical) recommender system models:
- Emphasis on collaborative filtering (CF), especially memory-based (neighborhood) approaches.
- Introduction to a matrix-based item-item learning approach (SLIM-like model).
A practical seminar runs a reproducible notebook (course repository) on a MovieLens dataset (here the 1M version) to demonstrate training, validation and metrics.

2) Historical motivation and intuition

Collaborative filtering grew from people annotating/filtering content (moderation, tagging) so others could find relevant items. Modern CF automates this with machine learning.
The catalog-size / filtering problem: explicit filters (size, color, etc.) reduce choice; CF aims to automatically surface relevant items from a large catalog.

3) Types of collaborative filtering

Memory-based (neighborhood) methods — focus of this lecture:
- Use observed ratings directly (no large parameterized model).
- User-based CF: compute similarity between users; predict a target user’s rating by aggregating ratings from similar users.
- Item-based CF: compute similarity between items; predict a user’s rating for an item by aggregating the user’s ratings of similar items.
Model-based methods (covered later): learn parameterized models (matrix factorization, neural methods, etc.).

4) User-based vs. item-based — pros and cons

Item-based:
- Similarities between items are usually more stable over time (item relationships change slowly).
- Often gives higher practical quality.
User-based:
- Can find serendipitous or unexpected matches because user tastes may change.
Conceptually both approaches follow the same pattern: compute similarity, then aggregate neighbors’ signals.

5) Formal setup and goals

Given a sparse user-item rating matrix R:
- Predict missing ratings (rating prediction).
- Produce for each user a ranked list of top-N recommendations (ranking). Predicted ratings can be sorted to obtain recommendations.
Notation:
- I_u = items with known ratings by user u.
- U_i = users who rated item i.

Memory-based (neighborhood) methods

Methodology / steps

Basic aggregation options to predict r_{u,i}:
- Simple average of items user u rated (non-personalized baseline).
- Weighted average of ratings of neighbor items/users (K-nearest neighbors): weights = similarities between items (or users).
- Centering/normalization: subtract user mean (or other normalization) before aggregation; add mean back after aggregation to handle different rating scales.
Similarity measures:
- Cosine similarity between item (or user) vectors (consider only co-rated entries).
- Pearson correlation (accounts for mean-centering).
Practical improvements:
- IDF-like weighting (analogous to NLP): downweight very popular items to reduce popularity bias.
- Shrinkage / significance weighting: reduce similarity when two items/users have few co-ratings (e.g., multiply by factor based on common-rating count). Typical shrinkage constant β ≈ 100.
- Normalize by number of ratings to reduce noise from rarely-rated items.
Neighborhood selection:
- Choose K nearest neighbors (K small) rather than aggregating over all items — typical KNN model.
- Candidate selection: items to consider are usually all items minus those already seen by the user; can also omit cold items.
Conceptual item-based KNN prediction formula:

r̂{u,i} = mean_u + (Σ |sim(i,j)|} sim(i,j) * (r_{u,j} − mean_u)) / Σ_{j ∈ N(i) Where N(i) is the chosen neighbor set of item i for user u.

Implementation tips

Replace missing R entries with zeros only when needed for matrix ops; be careful about sparse vs dense semantics.
Map original user/item IDs to compact integer indices and preserve mappings to restore outputs.
Filter out test users with no train history (cold users) for many classical CF evaluations, or treat them separately.

Matrix-based item-item learning (SLIM-like)

Idea and objective

Learn an item-item weight matrix W so that R ≈ R W. Each item’s predicted ratings are linear combinations of the user’s ratings on other items using W.
Optimization objective (conceptual):

minimize ||R − R W||^2 + λ ||W||_2^2 subject to diag(W) = 0 and optionally W ≥ 0 and/or L1 sparsity.
Notes on constraints and regularization:
- diag(W) = 0 prevents trivial identity solution.
- Non-negativity (W ≥ 0) is sometimes enforced for interpretability but is optional.
- L2 regularization (λ) prevents overfitting; adding L1 induces sparsity but removes closed-form solutions.

Solution approaches and practicalities

With only L2 (and relaxed constraints) a closed-form solution may exist. With L1 or non-linear constraints, use iterative solvers.
In practice, solve per-column regression problems (learn each item’s weights independently).
When using matrix algebra, set missing entries to zero but track their meaning.
Enforce zero diagonal and choose solvers/scaling appropriate for many items (sparse computations, memory efficiency).

Similarity computation practicalities

Item columns and user rows are sparse vectors; exploit sparsity for efficiency.
Adjust cosine similarity by IDF or shrinkage to handle popularity imbalance and sparse co-occurrences.
Zipf’s law: item popularity follows a heavy-head/long-tail distribution; popularity heavily influences metrics.

Data, splits and evaluation (seminar practicals)

Data and preprocessing

Dataset used: MovieLens 1M (≈1M interactions). Other MovieLens versions: 100k, 20M.
Preprocessing steps:
- Map and sanitize user/item IDs.
- Sort by timestamp; if no timestamp, decide splitting order carefully.
- Build a sparse user-item rating matrix R.

Train/test split approaches

Time-based split: train = interactions before time T, test = after T (recommended when timestamps exist).
Per-user holdout: for each user, keep some recent interactions for test.
Remove test users with no train history (unless deliberately evaluating cold-start).

Candidates and cold items/users

Decide candidate pool for recommendations (all items minus seen items, or a restricted set).
Cold users (no train history) are problematic for classical memory-based CF; handle with special methods (popularity / content-based), or exclude.

Metrics covered

Hit Rate: proportion of users for which at least one recommendation contains a held-out item.
Coverage: fraction of unique items that appear in recommendations across all users (measures breadth).
Other auxiliary metrics: precision@N, recall@N, ranking metrics (functions in the notebook compute these).

Baselines

Popularity baseline: recommend top-N most popular items (often a strong baseline due to popularity bias).
Compare neighborhood methods and learned item-item models against popularity and other baselines.

Practical coding notes from the seminar

The course provides a runnable notebook (CPU OK) — suggested to run in VLab/Colab/Binder and save copies.
Notebook includes: data loading, splitting, similarity computation, baseline implementations, and metric computations.
Instructor/debugging notes discussed live (examples: vector vs matrix division, setting diagonal to zero, adding identity for numerical stability). Instructor planned notebook updates.

Final course plan pointers

Next lectures: matrix factorization (MF) and further models; next session led by Anya.
Homework deadlines and submission details are in the repo README.

Detailed recipes — building and evaluating a memory-based item-item recommender

Preprocessing
- Map original user/item IDs to compact indices and keep mappings.
- Sort interactions by timestamp (if available).
- Build sparse user-item rating matrix R.
Create train/test split
- Time-based: train = interactions before T, test = after T; or per-user holdout (keep recent interactions per user for test).
- Remove test users with no train history (unless evaluating cold-start).
Compute item-item similarities
- For each item pair (i, j) compute cosine or Pearson similarity using only co-rated users.
- Apply shrinkage:
  - sim_shrunk = (n_co_ratings / (n_co_ratings + β)) * sim
- Optionally apply IDF-like weighting to reduce popularity bias.
Make predictions (KNN item-based)
- For target (u, i): select K most similar items to i among items that u has rated.
- Compute weighted aggregate of u’s ratings on those neighbors; apply normalization (subtract user mean if using centered ratings).
- Exclude items already seen by u when producing top-N recommendations.
Evaluate
- For each test user, produce top-N recommendations from candidate pool.
- Compute hit rate, coverage, precision@N, recall@N, etc.
- Compare to simple baselines (popularity).
(Alternative) Learn item-item weight matrix W
- Solve optimization: minimize ||R − R W||^2 + λ ||W||_2^2 with diag(W)=0 (optionally W≥0, L1 for sparsity).
- Use closed-form when available (no L1, relaxed constraints) or iterative solvers otherwise.
- Compute predictions as R̂ = R W.

Parameters and typical values

Shrinkage β for similarity: ≈ 100 (typical starting value).
Regularization λ for W learning: tune by validation.
K in KNN: choose by validation (not specified in transcript).

Speakers / sources referenced

Main lecturer / course instructor (primary speaker).
Senya — participant (student or TA).
Anya — will lead the next lecture.
Paper authors for SLIM/item-item learning (referenced but not named).
A “classic” recommender-systems book referenced (title/author not specified).
Datasets and external concepts:
- MovieLens (100k, 1M, 20M), Netflix (historical reference).
- Zipf’s law, and NLP IDF analogy for weighting.

Share this summary

Is the summary off?

If you think the summary is inaccurate, you can reprocess it with the latest model.

Summarize another video

Summary of "[ИАД, весна 2025] Рекомендательные системы, 2"

Summary — main ideas and lessons

1) Topic and context

2) Historical motivation and intuition

3) Types of collaborative filtering

4) User-based vs. item-based — pros and cons

5) Formal setup and goals

Memory-based (neighborhood) methods

Methodology / steps

Implementation tips

Matrix-based item-item learning (SLIM-like)

Idea and objective

Solution approaches and practicalities

Similarity computation practicalities

Data, splits and evaluation (seminar practicals)

Data and preprocessing

Train/test split approaches

Candidates and cold items/users

Metrics covered

Baselines

Practical coding notes from the seminar

Final course plan pointers

Detailed recipes — building and evaluating a memory-based item-item recommender

Parameters and typical values

Speakers / sources referenced

Category

Share this summary

Is the summary off?

Video

Summary of "[ИАД, весна 2025] Рекомендательные системы, 2"

Summary — main ideas and lessons

1) Topic and context

2) Historical motivation and intuition

3) Types of collaborative filtering

4) User-based vs. item-based — pros and cons

5) Formal setup and goals

Memory-based (neighborhood) methods

Methodology / steps

Implementation tips

Matrix-based item-item learning (SLIM-like)

Idea and objective

Solution approaches and practicalities

Similarity computation practicalities

Data, splits and evaluation (seminar practicals)

Data and preprocessing

Train/test split approaches

Candidates and cold items/users

Metrics covered

Baselines

Practical coding notes from the seminar

Final course plan pointers

Detailed recipes — building and evaluating a memory-based item-item recommender

Parameters and typical values

Speakers / sources referenced

Category ?

Share this summary

Is the summary off?

Video

Category