Summary of "Feature Scaling - Standardization | Day 24 | 100 Days of Machine Learning"

Overview

Tutorial on feature scaling with a focus on standardization (z-score scaling).
Part of Day 24 of the “100 Days of Machine Learning” series.
Covers: what standardization is, why it’s needed, geometric intuition, a hands-on Python/sklearn demo, and guidance on when to apply it.
A follow-up video will cover normalization (min–max and other techniques).

Standardization (z-score scaling) transforms each feature x_i to:
- z = (x_i - mean) / std
After transformation each feature has mean ≈ 0 and standard deviation ≈ 1.
Conceptually two steps: mean-centering, then scaling by the standard deviation.

Many ML algorithms depend on distances or gradients; features with different scales can bias results or slow/stop convergence.
Examples where scaling matters:
- Distance-based methods: KNN, K-means, clustering, similarity measures (Euclidean distance).
- Gradient-based optimization: logistic regression (with gradient descent), neural networks — scaling helps convergence and stabilizes learning.
- PCA: relies on variance, so standardization is important before applying PCA.
Examples where scaling is usually not necessary:
- Tree-based models and many ensemble tree methods (Decision Trees, Random Forest, Gradient Boosting, XGBoost, LightGBM) — scaling typically has little or no effect.

In feature space, standardization shifts points to center around zero (mean-centering) and rescales axes so each has unit variance.
The transformation preserves distribution shape but rescales the spread along each axis.

Always split data into train/test before scaling.
Fit the scaler on the training set only:
- scaler.fit(X_train) — learn mean/std from training data.
Apply the same transform to both train and test:
- scaler.transform(X_train)
- scaler.transform(X_test)
If using pandas, convert scaled NumPy arrays back to a DataFrame for easier inspection.
Visualize distributions before and after (PDF plots, describe()) to verify mean ≈ 0 and std ≈ 1.
Handle outliers carefully — extreme values affect mean/std and therefore the scaling result.
Use sklearn.preprocessing.StandardScaler as the standard tool; you can implement a custom scaler class if needed.

Example experiment (logistic regression):
- Unscaled features: ~65% accuracy.
- Standardized features: improved accuracy (~81% reported in the demo).
The demo also shows injecting extreme values/outliers to illustrate how scaling behaves and to emphasize treating outliers explicitly when necessary.

Standardization changes only center and scale, not the shape of the distribution.
Always fit the scaler on training data and apply the same transform to validation/test/new data.
Standardization generally does not harm models and is a safe default when unsure, but it is unnecessary for many tree-based models.
Normalization (min–max scaling and other techniques) will be covered in the next video; different methods are suited to different situations.

Python, pandas, scikit-learn (StandardScaler), train_test_split.
Plotting PDFs and using describe() to inspect distributions.
Converting scaler output (NumPy arrays) back to pandas DataFrame for inspection.

Next video: normalization and a comparison of min–max scaling vs standardization, with guidance on when to use each.
Related playlist topics on the channel: logistic regression, gradient descent, PCA, etc., for deeper study.