Summary of "Correlation | Karl Pearson’s coefficient of correlation | Multiple correlation | Biostatistics"
Topic
Correlation — definition and types — and methods to calculate Karl Pearson’s coefficient of correlation; brief note on multiple correlation.
Key lessons
- Correlation measures the degree (strength and direction) of association between two (or more) variables.
- Types of correlation: positive, negative, linear, curvilinear, and multiple correlation (among three or more variables).
- Two practical methods to compute Pearson’s r: the Actual‑Mean Method and the Assumed‑Mean Method.
- Use the Assumed‑Mean Method when the sample means are inconvenient decimals.
- Example worked in the lecture yields r ≈ 0.961–0.962 (very strong positive linear correlation).
Definitions and types
- Correlation (co‑relation): a measure of how closely two variables are related; it quantifies the degree of relationship/association.
- Positive correlation: both variables move in the same direction (one increases → the other tends to increase).
- Negative correlation: variables move in opposite directions (one increases → the other tends to decrease).
- Linear correlation: the ratio of change between two variables remains approximately constant (points lie roughly on a straight line).
- Curvilinear correlation: the ratio of change does not remain constant (relationship is non‑linear).
- Multiple correlation: association among three or more variables; typically one variable is treated as dependent and the others as independent.
Methods to compute Karl Pearson’s coefficient
1) Actual‑Mean Method
When to use:
- Use when sample means x̄ and ȳ are convenient (integers or simple fractions).
Formula:
r = Σ[(xi − x̄)(yi − ȳ)] / sqrt[ Σ(xi − x̄)² × Σ(yi − ȳ)² ]
Steps:
- List paired data (xi, yi).
- Compute x̄ = (Σxi)/n and ȳ = (Σyi)/n.
- For each pair compute deviations: (xi − x̄) and (yi − ȳ).
- Compute the product for each pair: (xi − x̄)(yi − ȳ).
- Sum those products → numerator = Σ[(xi − x̄)(yi − ȳ)].
- Compute squared deviations for x and y separately: Σ(xi − x̄)² and Σ(yi − ȳ)².
- Denominator = sqrt[Σ(xi − x̄)² × Σ(yi − ȳ)²].
- Divide numerator by denominator to get r.
Worked example (from lecture):
- Data: x = [2, 3, 4, 5, 6, 7, 8]; y = [4, 7, 8, 9, 10, 14, 18]
- x̄ = 35/7 = 5; ȳ = 70/7 = 10
- Σ(xi − x̄)(yi − ȳ) = 58
- Σ(xi − x̄)² = 28 ; Σ(yi − ȳ)² = 130
- Denominator = sqrt(28 × 130) ≈ 60.331
- r ≈ 58 / 60.331 ≈ 0.962 → very strong positive linear correlation
2) Assumed‑Mean Method
When to use:
- Use when x̄ or ȳ would be awkward decimals or to simplify arithmetic by centering on convenient values.
Basic idea:
- Choose convenient constants a (for x) and b (for y) near central values; compute coded deviations dx = xi − a and dy = yi − b and work with their sums.
Formula (using coded deviations):
r = [ n·Σ(dx·dy) − (Σdx)(Σdy) ] / sqrt{ [ n·Σ(dx²) − (Σdx)² ] × [ n·Σ(dy²) − (Σdy)² ] }
Steps:
- Choose assumed means a and b (central values) so dx and dy are small integers.
- For each observation compute dx = xi − a and dy = yi − b.
- Compute Σdx, Σdy, Σ(dx·dy), Σ(dx²), Σ(dy²), and note n.
- Substitute into the coded‑deviation form of Pearson’s formula.
- Compute numerator and denominator, then r.
Tips:
- Selecting a and b as middle values keeps arithmetic simple.
- Carefully track signs; squared terms remove negative signs.
- This method avoids fractional arithmetic when actual means are inconvenient.
3) Multiple correlation (brief)
- Multiple correlation refers to the association among three or more variables.
- In a three‑variable case, typically one variable (e.g., z) is treated as dependent and the others (x, y) as independent predictors.
- There are standard formulas (and matrix/regression approaches) for computing the multiple correlation coefficient depending on which variable is dependent.
- Practical approach: treat one variable as dependent and compute its relationship with the set of independents using regression or matrix methods.
Additional practical tips and pedagogical points
- Always transcribe given data carefully; transcription errors cause wrong results.
- If one method is inconvenient, switch to the other.
- Memorize the formula and practice example problems for fluency.
- This topic commonly appears in courses such as BBA, BCA, B.Tech, M.B.A., and B.Pharmacy biostatistics (unit 1).
Resources mentioned
- “Depth of Biology” application (Play Store) and an associated website (as recommended by the lecturer).
- Lecturer encouraged watching unit‑wise videos, downloading notes from the app, and asking questions in comments.
Speakers / sources
- Unnamed lecturer (video presenter)
- Karl Pearson (originator of the Pearson coefficient)
- Depth of Biology (app/channel/website referenced)
Category
Educational
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.
Preparing reprocess...