Summary of "Factor Analysis | What is Factor Analysis? | Factor Analysis Explained | Machine Learning | Edureka"
Factor Analysis — Summary (Edureka video, presenter: Kavya)
Purpose
Factor analysis (FA) is a statistical dimension-reduction technique that groups many observed variables into a smaller number of underlying factors (latent variables) to simplify analysis, discover latent constructs, and assess dimensionality and homogeneity of data.
Latent variables
- Unobserved constructs inferred from observed variables.
- Examples: job satisfaction factors; “service” and “food quality” in a restaurant-review example; other examples include quality of life, business confidence, morale, happiness, conservatism.
Two broad uses / types
- Exploratory Factor Analysis (EFA): Discover underlying structure from the data (useful for insight).
- Confirmatory Factor Analysis (CFA): Test a hypothesized factor structure (model-based testing, often with equations).
Common extraction methods (especially in EFA)
- Principal Component Analysis (PCA)
- Common factor analysis
- Image factoring
- Maximum likelihood
- Alpha factoring
- Weighted least squares
Note: PCA and common factor analysis are the most commonly used.
Key outputs and interpretation
- Factor loadings: Correlation coefficients between observed variables and factors (range ~0–1); indicate how much of a variable’s variance is explained by a factor.
- Communality: For each observed variable, the sum of squared loadings across retained factors (proportion of that variable’s variance explained by the factors).
- Eigenvalue: For each factor, the sum of squared loadings across variables — a measure of variance explained by that factor.
- Cross-loading: When a variable loads substantially on more than one factor; often handled by rotation.
- Rotation: A post-extraction step (orthogonal or oblique) to make factor structure easier to interpret.
Difference between PCA and FA
- PCA: Finds linear combinations (components) that capture total variance (common + unique + error); useful to reduce variables by variance.
- FA: Models only the common/shared variance to recover latent constructs; use FA when the goal is to discover latent variables.
- Note: When the variable count is large (e.g., >30), PCA and FA results may converge.
Assumptions and data requirements
- Clean data: no or handled missing values and outliers.
- Adequate sample size: general rule of thumb ≥ 5 observations per variable (e.g., 10 variables → at least 50 observations). Also noted “sample size > number of factors.”
- Variables should be interrelated (sufficient correlations among variables); testable via the correlation matrix and tests such as Bartlett’s test of sphericity.
- Variables should be numeric, ideally measured on an interval scale.
- Data normalization/scaling is preferred (though multivariate normalization is not strictly necessary).
How to choose the number of factors
- Sample guidance: follow observations-per-variable rules of thumb.
- Scree plot: look for the “elbow” to pick factors (visual but sometimes ambiguous).
- Kaiser criterion: retain factors whose eigenvalue > 1.
Basic logic / algorithmic intuition (step-by-step)
- Start with the observed variables you want to reduce.
- Extract the first component/factor: a linear combination of variables that explains the maximum possible variance shared among variables (or total variance in PCA).
- Extract subsequent components/factors that explain the most remaining variance, subject to being orthogonal (or appropriately constrained) to earlier components.
- Continue extracting until all variance is accounted for, then select a smaller number of factors that explain most of the meaningful variance.
- Apply rotation to the retained factor solution to clarify interpretation, then use factor loadings and communalities to label and interpret factors (i.e., name latent constructs).
Practical issues to address (and how to handle them)
- Decide between PCA and FA based on goal (latent construct discovery → FA; variable reduction by variance → PCA).
- Interpret results using loadings, communalities, and eigenvalues.
- Determine factor count using scree plot, Kaiser criterion (eigenvalue > 1), and sample-size considerations.
- Handle cross-loadings by applying rotation and, if necessary, re-specifying variables or factor structure.
Examples used in the video
- Job satisfaction questionnaire: multiple observed items (role fit, supervisor, pay, co-workers, etc.) may reflect a smaller set of latent factors.
- Restaurant choice example: six observed review variables (waiting time, cleanliness, staff behavior, taste, freshness, temperature) can be reduced to two latent factors: service and food quality.
Speaker / Source
- Kavya (presenter), Edureka (YouTube channel / source)
Category
Educational
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.
Preparing reprocess...