Summary of "Power Transformer | Box - Cox Transform | Yeo - Johnson Transform"
Overview — main ideas and lessons
The video introduces the “power transformer” family of data transformations (Box–Cox and Yeo–Johnson) used to make feature distributions more Gaussian-like and to improve performance of algorithms that assume or benefit from near-normal inputs (for example, linear and logistic regression).
Key points:
- Box–Cox and Yeo–Johnson are parametric, monotonic power transformations that include common transforms (log, square root, etc.) as special cases by adjusting a parameter λ (lambda). Lambda is estimated per feature to best approximate normality.
- Box–Cox requires strictly positive data; Yeo–Johnson supports zero and negative values. If you need to use Box–Cox on data with zeros or negatives, shift the data by adding a small positive constant first.
- Lambda values are typically chosen by maximum likelihood (or related estimation techniques).
- scikit-learn’s
PowerTransformer(withmethod='box-cox'ormethod='yeo-johnson') can be used to apply these transforms and (by default) standardize the transformed features to zero mean and unit variance. - Practical recommendation: check feature distributions; if skewed, try power transforms and compare downstream model performance. Pick the transform that yields the best validation metric.
If features are skewed and your model benefits from normal-ish inputs, try Box–Cox or Yeo–Johnson and validate which gives the best downstream results.
Key concepts / definitions
- Power transformer: a family of parametric, monotonic transformations parameterized by λ that make data more Gaussian-like.
- Box–Cox transform: a power transform requiring strictly positive inputs. Special cases include log, square root, etc. λ is estimated per feature.
- Yeo–Johnson transform: variant that supports zero and negative values; otherwise serves the same purpose as Box–Cox.
- Lambda (λ): the exponent/power parameter governing the transformation for each feature; estimated to optimize normality (often via maximum likelihood).
- Standardization:
PowerTransformerby default standardizes (zero mean, unit variance) after transforming (controlled bystandardize=True).
Step-by-step methodology
-
Inspect your dataset
- Plot histograms or density plots for each feature to identify skewness/non-normality.
- Check for zeros or negative values (important for Box–Cox).
-
Choose transformation strategy
- If all values in a feature are strictly positive → Box–Cox is possible.
- If features contain zeros or negatives → use Yeo–Johnson, or shift the data by a small positive constant before Box–Cox.
-
Prepare scikit-learn transformer
-
Import and create transformer objects:
python from sklearn.preprocessing import PowerTransformer pt_box = PowerTransformer(method='box-cox', standardize=True) pt_yj = PowerTransformer(method='yeo-johnson', standardize=True) -
If using Box–Cox and a feature has zeros/min=0, add a small epsilon (e.g.,
1e-6or a domain-appropriate constant) to that feature before fitting.
-
-
Fit and transform
-
Fit transformer on training features:
python pt.fit(X_train) -
Transform training and test/validation sets:
python X_train_t = pt.transform(X_train) X_test_t = pt.transform(X_test) -
PowerTransformerestimates a separate λ for each feature internally (accessible after fitting).
-
-
Train and evaluate downstream model
- Fit a model (e.g.,
LinearRegression) on transformed training data. - Evaluate via cross-validation or a hold-out test set (apply the same transform pipeline before scoring).
- Compare metrics (R2, RMSE, etc.) against a model trained on untransformed features.
- Fit a model (e.g.,
-
Compare transformations
- Try both Box–Cox and Yeo–Johnson (and possibly other transforms like log, sqrt).
- Compare model metrics and inspect post-transform feature distributions.
- Choose the transform that yields the best validation performance.
-
Additional practical tips
- Use Pipelines to ensure the same transform is applied to train and test splits.
- When using Box–Cox, store/record any small shift applied so you can invert or apply the same preprocessing to new data.
- If
PowerTransformer.standardize=True, an additionalStandardScalerafter transform is usually unnecessary.
Implementation notes from the video
- Dataset: concrete strength dataset with features such as cement, blast furnace slag, fly ash, water, superplasticizer, coarse aggregate, fine aggregate, and age; target is concrete strength.
- Tools used:
sklearn.preprocessing.PowerTransformer,sklearn.linear_model.LinearRegression, andcross_val_scorefor evaluation. - The presenter printed per-feature λ values after fitting (e.g., cement ≈ 0.170; each feature has its own λ).
- Visualizations: histograms before and after transformation to show improved normality for some features (not all features change equally).
- Results: baseline regression score improved after applying Box–Cox and, in many cases, improved further with Yeo–Johnson. Exact numbers varied, but transforms often improved R²/performance.
- Handling zeros: when a feature had min = 0, the presenter added a very small constant to allow Box–Cox.
Observations and recommendations
- Power transforms are especially helpful in real-world tabular data where many features are skewed.
- If your algorithm benefits from normality (linear or logistic regression, some feature-sensitive models), include power transformations in preprocessing experiments.
- Use Yeo–Johnson when you need to handle zeros/negatives without shifting; use Box–Cox when all values are positive (it may perform slightly differently).
- Always validate with cross-validation and compare transforms — choose the transform that yields the best downstream validation metric.
PowerTransformeroften outperforms simple hand-picked transforms (log, sqrt), but try alternatives as well.
Speakers / sources featured
- Presenter / YouTube channel host (unnamed) — instructor who explains concepts and runs the notebook demo.
- scikit-learn (
sklearn.preprocessing.PowerTransformer) — implementation used for Box–Cox and Yeo–Johnson transforms.
Category
Educational
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.