Summary of "РЕАЛЬНОЕ СОБЕСЕДОВАНИЕ / Middle ML-разработчик "Точка Банк" - от 350 000"
Context
- Real interview for a Middle ML/NLP developer role at Tochka Bank, run by the Criminal IT community.
- Candidate described past projects (medical predictive analytics, NLP), technical choices, end-to-end responsibilities, and discussed ML/NLP fundamentals with interviewers (Arthur, Andrey).
Product / project work — medical predictive analytics
Problem
Extract structured features from unstructured medical texts and support clinicians with a decision‑support dashboard showing current medications, risky diseases, and recommendations.
Initial approach
- Template-based extraction combined with spaCy.
- Many extraction tasks converted to binary classification (feature present / absent).
Embedding‑based improvements
- FastText sliding-window matching:
- Increased F1 from ~0.85 to ~0.95.
- Inference became ~10× faster.
- Transformer‑based models (BERT family) used to extract more complex clinical features and disease risks.
Domain adaptation & deployment
- Collected ~1 GB of medical text (EHRs) and trained an in‑domain BERT.
- Deployed as part of a medical decision‑support system used by public and private clinics across ~46 regions (~200 clinics).
- Business KPI tracked: number of connected organizations.
- Clinical validation performed with a team of doctors.
Team / role
- Candidate acted as acting team lead.
- Responsibilities covered: data collection, preprocessing, model training, wrapping the model into a service, and handing off to backend/integration teams.
NLP model & engineering notes
Tokenizers / tokenization
- Train your own tokenizer when domain-specific tokenization is required (e.g., Chinese, specialized medical notation).
Pretraining choices
- BERT‑style (encoder‑only) models typically use MLM (masked language modeling) and sometimes NSP (next sentence prediction).
- RoBERTa‑style variants train on MLM alone (no NSP).
Positional encodings
- Options include sinusoidal, absolute indices, relative positions (T5), and rotary positional encodings (RoPE).
- Trade-offs: extrapolation to longer sequences, determinism, and implementation complexity.
Attention internals
- Core components: Q, K, V matrices.
- Scaled dot‑product attention divides Q·K^T by sqrt(d) to stabilize variance (temperature effect).
- Softmax sharpness determines how many tokens receive significant attention mass.
Transformer blocks
- Common elements: multi‑head attention, residual connections, layer normalization (pre‑norm vs post‑norm), and feed‑forward MLPs.
Practical model limitations
- Context window limits (e.g., 512 tokens for many BERTs); long contexts require chunking/combining strategies.
- For highly repetitive/template text, simpler approaches (TF–IDF + classical models) can outperform large contextual models on cost/accuracy.
Common production approaches / pipelines (example: classification / sentiment)
- Data collection and labeling.
- Preprocessing: normalization, lowercasing, lemmatization (important for rich‑morphology languages — tools: Natasha, spaCy).
- Vectorization / baseline modeling:
- Fast baselines: TF–IDF + Naive Bayes / Logistic Regression.
- CatBoost (with built‑in text handling) as a strong baseline.
- Transformer fine‑tuning (BERT) when context and semantics matter.
- Evaluation and business validation:
- Classification metrics: F1 (precision/recall trade‑off), ROC AUC (ranking).
- Regression metrics: MAE, MSE, MAPE/SMAPE (note sensitivity to outliers).
- Clinical/product validation with domain experts for medical cases.
ML fundamentals covered
Python
- Mutable vs immutable types, hashtable behavior for dict keys, hash collisions and bucket handling.
Concurrency
- Multithreading vs multiprocessing in Python:
- GIL limits CPU‑bound parallelism; threads help IO‑bound tasks.
SQL
- Joins: inner, left, right, outer, cross (Cartesian).
- Join conditions and row‑count implications.
- Window functions: OVER with PARTITION BY, ORDER BY, and frame clauses like ROWS BETWEEN.
Metrics
- ROC AUC: ranking metric invariant to monotonic transforms.
- F1: harmonic mean of precision and recall (penalizes imbalance).
- MSE vs MAE trade‑offs; MAPE vs SMAPE distinctions.
Model optimization & inference speedups
- Quantization (FP32 → FP16 or INT8) for reduced memory and hardware acceleration.
- Distillation: train a smaller model to mimic a larger one.
- Caching mechanisms (QCache / speculative decoding) and efficient decoding strategies.
- Efficient attention variants and architecture adjustments.
- Note: hardware‑aware validation is required to check accuracy loss from optimizations.
Classic ML model trade‑offs
- Tree ensembles (GBM, CatBoost, XGBoost, RandomForest) excel on tabular data; boosting often wins predictive quality but has interpretability and extrapolation caveats.
- Linear models are preferred for interpretability and when true relationships are linear or require monotonic extrapolation.
- For strong monotonic trends (time series, inflation‑like behavior), trees/boosting may underperform at extrapolation.
Production & team / task landscape at the bank
Team focus: “Data in Communication with Client”
Key product capabilities and tasks:
- Auto‑replies: LM‑driven generation and fast hint suggestions (short canned replies).
- Validator models: check reply quality, toxicity, and question‑answer match.
- Session management: detect session end, maintain conversation continuity and operator assignment.
- Routing: classify first message to route to the correct operator skill group.
- Summarization: chat and call summarization (drafts for operator editing or direct sending).
- Analytics tools for product managers and analysts.
Infrastructure / organization
- Separate team maintains core LLM deployment and retrieval (tracks and adapts models like LLaMA 13B, Mistral).
- Data scientists build models and small services; product team developers usually handle integration.
- For high‑load needs, industrial engineering support is involved.
- Typical team size described: ~10 analysts and ~2 data scientists (hiring more mid/senior ML engineers).
Practical recommendations / takeaways
- Start with strong baselines: TF–IDF + linear model or CatBoost text features before transformers.
- Train domain‑specific tokenizers and/or pretrain/fine‑tune language models for specialized text.
- Validate outputs with domain experts (essential in healthcare).
- Measure business KPIs (e.g., number of integrated clinics, traffic handled) alongside ML metrics.
- For deployment: consider quantization, distillation, caching, and choose architectures suited to context length and production constraints.
References / typical guides and tutorials mentioned or implied
- Hugging Face community models and pre‑trained checkpoints.
- Standard sentiment‑analysis pattern: label → preprocess → vectorize (TF–IDF) → baseline classifier → evaluate → fine‑tune transformer if needed.
- CatBoost docs for built‑in text features and target encoding.
- Transformer literature: scaled dot‑product attention, positional encodings (sinusoidal, relative, RoPE), BERT vs RoBERTa pretraining differences.
Main speakers / sources
- Candidate: Middle ML / NLP developer (unnamed) — described medical NLP project and technical experience.
- Interviewers / participants: Arthur (technical questions), Andrey (follow‑ups/product questions), with brief mentions of Fedor.
- Organizational references: Criminal IT community (organizer), Tochka Bank (employer/team), recruiter Yana.
Category
Technology
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.
Preparing reprocess...