Summary of "What is Feature Engineering | Day 23 | 100 Days of Machine Learning"

What is feature engineering (Day 23) — Overview

Feature engineering is the process of using domain knowledge and data transformations to create, clean, or select input features so that machine learning algorithms perform better. It is part science and part art: there is no single universal approach, and techniques vary by problem and data.

Good features can make a weak model perform well; poor features can make a strong model perform poorly.

Feature engineering typically sits in the ML pipeline after initial preprocessing and before modeling. Key sub-tasks include transformation, construction, selection, and algorithmic extraction/dimensionality reduction. The presenter plans a series of follow-up videos (10–15) covering these techniques in depth.


Main ideas and takeaways


Detailed breakdown — workflow and techniques

  1. Preprocessing / initial step

    • Identify and handle missing values (imputation or deletion). Common strategies:
      • Drop rows or columns with too many missing values (when acceptable)
      • Mean imputation (numeric)
      • Median imputation (numeric, robust to outliers)
      • Mode imputation (categorical)
      • Advanced imputation (KNN, model-based, iterative) — covered in later videos
  2. Feature transformation

    • Categorical handling (convert strings to numeric):
      • One-hot / dummy encoding (binary column per category)
      • Label encoding or ordinal encoding (when categories have an order)
      • Frequency or target encoding (advanced)
    • Binning / discretization:
      • Convert continuous variables to categorical buckets (e.g., age → child/teen/adult)
    • Outlier detection and handling:
      • Detect outliers before training (they can distort models such as linear regression)
      • Remove or transform outliers depending on cause and algorithm sensitivity
    • Scaling and normalization:
      • Standardization (z-score) or Min–Max normalization to bring features to comparable scales
      • Important for distance-based algorithms (k-NN, some clustering) and models relying on Euclidean distance
    • Power transforms and other transforms:
      • Log transform, Box–Cox, Yeo–Johnson, etc., to reduce skewness or stabilize variance
  3. Feature construction (manual / domain-driven)

    • Create new features by combining or aggregating existing ones using domain knowledge or intuition.
    • Common techniques:
      • Arithmetic combinations, ratios, and interactions
      • Aggregations and group-based features
      • Date-time feature extraction
      • Boolean flags and derived categories
    • Example: Titanic — combine SibSp (siblings/spouses) and Parch (parents/children) into FamilySize; optionally bin FamilySize into Alone / Small / Large.
  4. Feature selection

    • Remove irrelevant, redundant, or noisy features to improve performance and speed.
    • Example: flattened pixel features in MNIST — many pixels are irrelevant; selecting informative pixels reduces dimensionality and speeds learning.
    • Approaches (covered later in depth):
      • Filter methods (correlation, statistical tests)
      • Wrapper methods (recursive feature elimination)
      • Embedded methods (regularization, tree-based feature importance)
  5. Feature extraction (algorithmic dimensionality reduction)

    • Create new features (components) from originals using algorithms, then select a subset of those.
    • Example: PCA — rotate the feature space to orthogonal axes (principal components) and keep top components explaining most variance.
    • Other techniques: LDA, t-SNE (the presenter indicates PCA and similar methods will be covered).

Practical points emphasized


Examples used in the video


Speakers and sources

Note: subtitles were auto-generated and contained errors; the summary interprets and corrects obvious transcription mistakes.

Category ?

Educational


Share this summary


Is the summary off?

If you think the summary is inaccurate, you can reprocess it with the latest model.

Video