Summary of "What is Feature Engineering | Day 23 | 100 Days of Machine Learning"

What is feature engineering (Day 23) — Overview

Feature engineering is the process of using domain knowledge and data transformations to create, clean, or select input features so that machine learning algorithms perform better. It is part science and part art: there is no single universal approach, and techniques vary by problem and data.

Good features can make a weak model perform well; poor features can make a strong model perform poorly.

Feature engineering typically sits in the ML pipeline after initial preprocessing and before modeling. Key sub-tasks include transformation, construction, selection, and algorithmic extraction/dimensionality reduction. The presenter plans a series of follow-up videos (10–15) covering these techniques in depth.

Main ideas and takeaways

Feature engineering improves model performance by creating, cleaning, or selecting better inputs.
There is no one-size-fits-all recipe — experimentation and domain knowledge matter.
Core sub-tasks:
- Feature transformation
- Feature construction
- Feature selection
- Feature extraction / dimensionality reduction
Many ML libraries cannot accept missing values directly — handle missingness first.
The process is iterative: try multiple transformations and selections to see what helps a specific model and dataset.

Detailed breakdown — workflow and techniques

Preprocessing / initial step
- Identify and handle missing values (imputation or deletion). Common strategies:
  - Drop rows or columns with too many missing values (when acceptable)
  - Mean imputation (numeric)
  - Median imputation (numeric, robust to outliers)
  - Mode imputation (categorical)
  - Advanced imputation (KNN, model-based, iterative) — covered in later videos
Feature transformation
- Categorical handling (convert strings to numeric):
  - One-hot / dummy encoding (binary column per category)
  - Label encoding or ordinal encoding (when categories have an order)
  - Frequency or target encoding (advanced)
- Binning / discretization:
  - Convert continuous variables to categorical buckets (e.g., age → child/teen/adult)
- Outlier detection and handling:
  - Detect outliers before training (they can distort models such as linear regression)
  - Remove or transform outliers depending on cause and algorithm sensitivity
- Scaling and normalization:
  - Standardization (z-score) or Min–Max normalization to bring features to comparable scales
  - Important for distance-based algorithms (k-NN, some clustering) and models relying on Euclidean distance
- Power transforms and other transforms:
  - Log transform, Box–Cox, Yeo–Johnson, etc., to reduce skewness or stabilize variance
Feature construction (manual / domain-driven)
- Create new features by combining or aggregating existing ones using domain knowledge or intuition.
- Common techniques:
  - Arithmetic combinations, ratios, and interactions
  - Aggregations and group-based features
  - Date-time feature extraction
  - Boolean flags and derived categories
- Example: Titanic — combine SibSp (siblings/spouses) and Parch (parents/children) into FamilySize; optionally bin FamilySize into Alone / Small / Large.
Feature selection
- Remove irrelevant, redundant, or noisy features to improve performance and speed.
- Example: flattened pixel features in MNIST — many pixels are irrelevant; selecting informative pixels reduces dimensionality and speeds learning.
- Approaches (covered later in depth):
  - Filter methods (correlation, statistical tests)
  - Wrapper methods (recursive feature elimination)
  - Embedded methods (regularization, tree-based feature importance)
Feature extraction (algorithmic dimensionality reduction)
- Create new features (components) from originals using algorithms, then select a subset of those.
- Example: PCA — rotate the feature space to orthogonal axes (principal components) and keep top components explaining most variance.
- Other techniques: LDA, t-SNE (the presenter indicates PCA and similar methods will be covered).

Practical points emphasized

Handle missing values before feeding data into ML libraries (many do not accept missingness).
Detect and treat outliers prior to training; outliers can substantially change fitted models.
Scale features before applying models sensitive to feature scales.
Feature engineering is iterative and experimental — test different transformations, constructions, and selections.
The presenter will dedicate multiple upcoming videos to topics such as imputation, encoding, outlier detection, scaling, and PCA.

Examples used in the video

Titanic dataset: constructing FamilySize from SibSp and Parch.
MNIST (handwritten digits): illustrates the need for feature selection or dimensionality reduction with many pixel features.
Real-estate example: rooms vs. square footage — motivation for combined or extracted features (e.g., area per room, or PCA).