Summary of "РЕАЛЬНОЕ СОБЕСЕДОВАНИЕ / Middle ML-разработчик "Точка Банк" - от 350 000"

Context

Product / project work — medical predictive analytics

Problem

Extract structured features from unstructured medical texts and support clinicians with a decision‑support dashboard showing current medications, risky diseases, and recommendations.

Initial approach

Embedding‑based improvements

Domain adaptation & deployment

Team / role

NLP model & engineering notes

Tokenizers / tokenization

Pretraining choices

Positional encodings

Attention internals

Transformer blocks

Practical model limitations

Common production approaches / pipelines (example: classification / sentiment)

  1. Data collection and labeling.
  2. Preprocessing: normalization, lowercasing, lemmatization (important for rich‑morphology languages — tools: Natasha, spaCy).
  3. Vectorization / baseline modeling:
    • Fast baselines: TF–IDF + Naive Bayes / Logistic Regression.
    • CatBoost (with built‑in text handling) as a strong baseline.
    • Transformer fine‑tuning (BERT) when context and semantics matter.
  4. Evaluation and business validation:
    • Classification metrics: F1 (precision/recall trade‑off), ROC AUC (ranking).
    • Regression metrics: MAE, MSE, MAPE/SMAPE (note sensitivity to outliers).
    • Clinical/product validation with domain experts for medical cases.

ML fundamentals covered

Python

Concurrency

SQL

Metrics

Model optimization & inference speedups

Classic ML model trade‑offs

Production & team / task landscape at the bank

Team focus: “Data in Communication with Client”

Key product capabilities and tasks:

Infrastructure / organization

Practical recommendations / takeaways

References / typical guides and tutorials mentioned or implied

Main speakers / sources

Category ?

Technology


Share this summary


Is the summary off?

If you think the summary is inaccurate, you can reprocess it with the latest model.

Video