Summary of "РЕАЛЬНОЕ СОБЕСЕДОВАНИЕ / Middle ML-разработчик "Точка Банк" - от 350 000"

Context

Real interview for a Middle ML/NLP developer role at Tochka Bank, run by the Criminal IT community.
Candidate described past projects (medical predictive analytics, NLP), technical choices, end-to-end responsibilities, and discussed ML/NLP fundamentals with interviewers (Arthur, Andrey).

Product / project work — medical predictive analytics

Problem

Extract structured features from unstructured medical texts and support clinicians with a decision‑support dashboard showing current medications, risky diseases, and recommendations.

Initial approach

Template-based extraction combined with spaCy.
Many extraction tasks converted to binary classification (feature present / absent).

Embedding‑based improvements

FastText sliding-window matching:
- Increased F1 from ~0.85 to ~0.95.
- Inference became ~10× faster.
Transformer‑based models (BERT family) used to extract more complex clinical features and disease risks.

Domain adaptation & deployment

Collected ~1 GB of medical text (EHRs) and trained an in‑domain BERT.
Deployed as part of a medical decision‑support system used by public and private clinics across ~46 regions (~200 clinics).
Business KPI tracked: number of connected organizations.
Clinical validation performed with a team of doctors.

Team / role

Candidate acted as acting team lead.
Responsibilities covered: data collection, preprocessing, model training, wrapping the model into a service, and handing off to backend/integration teams.

NLP model & engineering notes

Tokenizers / tokenization

Train your own tokenizer when domain-specific tokenization is required (e.g., Chinese, specialized medical notation).

Pretraining choices

BERT‑style (encoder‑only) models typically use MLM (masked language modeling) and sometimes NSP (next sentence prediction).
RoBERTa‑style variants train on MLM alone (no NSP).

Positional encodings

Options include sinusoidal, absolute indices, relative positions (T5), and rotary positional encodings (RoPE).
Trade-offs: extrapolation to longer sequences, determinism, and implementation complexity.

Attention internals

Core components: Q, K, V matrices.
Scaled dot‑product attention divides Q·K^T by sqrt(d) to stabilize variance (temperature effect).
Softmax sharpness determines how many tokens receive significant attention mass.

Transformer blocks

Common elements: multi‑head attention, residual connections, layer normalization (pre‑norm vs post‑norm), and feed‑forward MLPs.

Practical model limitations

Context window limits (e.g., 512 tokens for many BERTs); long contexts require chunking/combining strategies.
For highly repetitive/template text, simpler approaches (TF–IDF + classical models) can outperform large contextual models on cost/accuracy.

Common production approaches / pipelines (example: classification / sentiment)

Data collection and labeling.
Preprocessing: normalization, lowercasing, lemmatization (important for rich‑morphology languages — tools: Natasha, spaCy).
Vectorization / baseline modeling:
- Fast baselines: TF–IDF + Naive Bayes / Logistic Regression.
- CatBoost (with built‑in text handling) as a strong baseline.
- Transformer fine‑tuning (BERT) when context and semantics matter.
Evaluation and business validation:
- Classification metrics: F1 (precision/recall trade‑off), ROC AUC (ranking).
- Regression metrics: MAE, MSE, MAPE/SMAPE (note sensitivity to outliers).
- Clinical/product validation with domain experts for medical cases.

ML fundamentals covered

Python

Mutable vs immutable types, hashtable behavior for dict keys, hash collisions and bucket handling.

Concurrency

Multithreading vs multiprocessing in Python:
- GIL limits CPU‑bound parallelism; threads help IO‑bound tasks.

SQL

Joins: inner, left, right, outer, cross (Cartesian).
Join conditions and row‑count implications.
Window functions: OVER with PARTITION BY, ORDER BY, and frame clauses like ROWS BETWEEN.

Metrics

ROC AUC: ranking metric invariant to monotonic transforms.
F1: harmonic mean of precision and recall (penalizes imbalance).
MSE vs MAE trade‑offs; MAPE vs SMAPE distinctions.

Model optimization & inference speedups

Quantization (FP32 → FP16 or INT8) for reduced memory and hardware acceleration.
Distillation: train a smaller model to mimic a larger one.
Caching mechanisms (QCache / speculative decoding) and efficient decoding strategies.
Efficient attention variants and architecture adjustments.
Note: hardware‑aware validation is required to check accuracy loss from optimizations.

Classic ML model trade‑offs

Tree ensembles (GBM, CatBoost, XGBoost, RandomForest) excel on tabular data; boosting often wins predictive quality but has interpretability and extrapolation caveats.
Linear models are preferred for interpretability and when true relationships are linear or require monotonic extrapolation.
For strong monotonic trends (time series, inflation‑like behavior), trees/boosting may underperform at extrapolation.

Production & team / task landscape at the bank

Team focus: “Data in Communication with Client”

Key product capabilities and tasks:

Auto‑replies: LM‑driven generation and fast hint suggestions (short canned replies).
Validator models: check reply quality, toxicity, and question‑answer match.
Session management: detect session end, maintain conversation continuity and operator assignment.
Routing: classify first message to route to the correct operator skill group.
Summarization: chat and call summarization (drafts for operator editing or direct sending).
Analytics tools for product managers and analysts.

Infrastructure / organization

Separate team maintains core LLM deployment and retrieval (tracks and adapts models like LLaMA 13B, Mistral).
Data scientists build models and small services; product team developers usually handle integration.
For high‑load needs, industrial engineering support is involved.
Typical team size described: ~10 analysts and ~2 data scientists (hiring more mid/senior ML engineers).

Practical recommendations / takeaways

Start with strong baselines: TF–IDF + linear model or CatBoost text features before transformers.
Train domain‑specific tokenizers and/or pretrain/fine‑tune language models for specialized text.
Validate outputs with domain experts (essential in healthcare).
Measure business KPIs (e.g., number of integrated clinics, traffic handled) alongside ML metrics.
For deployment: consider quantization, distillation, caching, and choose architectures suited to context length and production constraints.

References / typical guides and tutorials mentioned or implied

Hugging Face community models and pre‑trained checkpoints.
Standard sentiment‑analysis pattern: label → preprocess → vectorize (TF–IDF) → baseline classifier → evaluate → fine‑tune transformer if needed.
CatBoost docs for built‑in text features and target encoding.
Transformer literature: scaled dot‑product attention, positional encodings (sinusoidal, relative, RoPE), BERT vs RoBERTa pretraining differences.

Main speakers / sources

Candidate: Middle ML / NLP developer (unnamed) — described medical NLP project and technical experience.
Interviewers / participants: Arthur (technical questions), Andrey (follow‑ups/product questions), with brief mentions of Fedor.
Organizational references: Criminal IT community (organizer), Tochka Bank (employer/team), recruiter Yana.

Share this summary

Is the summary off?

If you think the summary is inaccurate, you can reprocess it with the latest model.

Summarize another video

Summary of "РЕАЛЬНОЕ СОБЕСЕДОВАНИЕ / Middle ML-разработчик "Точка Банк" - от 350 000"

Context

Product / project work — medical predictive analytics

Problem

Initial approach

Embedding‑based improvements

Domain adaptation & deployment

Team / role

NLP model & engineering notes

Tokenizers / tokenization

Pretraining choices

Positional encodings

Attention internals

Transformer blocks

Practical model limitations

Common production approaches / pipelines (example: classification / sentiment)

ML fundamentals covered

Python

Concurrency

SQL

Metrics

Model optimization & inference speedups

Classic ML model trade‑offs

Production & team / task landscape at the bank

Team focus: “Data in Communication with Client”

Infrastructure / organization

Practical recommendations / takeaways

References / typical guides and tutorials mentioned or implied

Main speakers / sources

Category

Share this summary

Is the summary off?

Video

Summary of "РЕАЛЬНОЕ СОБЕСЕДОВАНИЕ / Middle ML-разработчик "Точка Банк" - от 350 000"

Context

Product / project work — medical predictive analytics

Problem

Initial approach

Embedding‑based improvements

Domain adaptation & deployment

Team / role

NLP model & engineering notes

Tokenizers / tokenization

Pretraining choices

Positional encodings

Attention internals

Transformer blocks

Practical model limitations

Common production approaches / pipelines (example: classification / sentiment)

ML fundamentals covered

Python

Concurrency

SQL

Metrics

Model optimization & inference speedups

Classic ML model trade‑offs

Production & team / task landscape at the bank

Team focus: “Data in Communication with Client”

Infrastructure / organization

Practical recommendations / takeaways

References / typical guides and tutorials mentioned or implied

Main speakers / sources

Category ?

Share this summary

Is the summary off?

Video

Category