Summary of "Жив ли ML в 2026? Статистическое исследование"

What this video is about

A single presenter (author of the free learning platform nareka.ru) performed a large statistical study of the Russian machine‑learning job market (snapshot: March 2026). Goal: measure who the typical ML candidate is, how employers behave, how visible candidates are on the main job platform, domain dynamics (who moves where), and what predicts vacancy response volumes.

Data, scale and core numbers

Methodology

  1. Data collection

    • Queried multiple endpoints (couldn’t rely on a single source).
    • Used domain keywords (ML Research, Data Scientist, ML Engineer, etc.), then cleaned results.
  2. Cleaning / filtering pipeline

    • Five manual/automated passes removed irrelevant roles (backend, full‑stack, pure analysts, unrelated engineers), spam accounts and “zombie” vacancies.
    • Platform frontend scoring used as a first filter; a threshold of 0.25 was adopted to drop ~90% of garbage while keeping plausible candidates.
  3. Multi‑scoring capture

    • Collected three platform scoring outputs per candidate‑vacancy:
      • recommendation neural network
      • textual relevance / search logit
      • front‑end / “suable” / packaging score
    • Evaluated ~27,000 candidate–vacancy pairs.
    • Intersection of top‑100 by all three systems = 168 candidates (the “core elite”).
  4. Text extraction & structuring

    • Parsed job titles, free‑text descriptions and resume duty descriptions into structured JSON.
    • Used an LLM (Gemini 3 Fast mentioned) with prompt engineering and structured‑output validation; automatic retry when schema mismatches occurred.
  5. Statistical validation

    • Applied sampling formulas (Cochran formula referenced) to decide how many LLM outputs to manually check.
    • Manual checks on random samples (sample sizes ~200 at 95% confidence) produced acceptable extraction accuracy.
    • Bootstrap used to build confidence intervals for medians and other statistics.
  6. Data engineering / reproducibility

    • Strict contracts and field validation at each pipeline stage.
    • Produced a single master candidate file and a master vacancy file.
    • Analysis automated in Python (large codebase, ~13k lines); used statsmodels, scikit‑learn, XGBoost, etc.
  7. Analytical models

    • Regression models used to explain “responses per vacancy” (dependent variable). Key predictors identified in results.

Main findings — demographics and careers

Job market structure and domains

Skills, resume writing and scoring

Dynamics, mobility and career lessons

Technical / tool notes

Practical takeaways / actionable lessons

Limitations and cautions

Extras and follow‑ups

Speakers / sources mentioned

Category ?

Educational


Share this summary


Is the summary off?

If you think the summary is inaccurate, you can reprocess it with the latest model.

Video