Summary of "Regression | least squares Method | Multiple regression | standard error of regression #biostatistic"
Overview and main concepts
- Regression is a statistical method to predict the value of one variable from the value(s) of another variable(s). It identifies relationships between dependent and independent variable(s).
- Two basic types covered:
- Simple (linear) regression of one variable on another:
- “x on y”: x is dependent, y is independent.
- “y on x”: y is dependent, x is independent.
- Multiple regression: estimating an unknown variable from two or more predictor variables (e.g.,
y = a + b1*x1 + b2*x2 + ...).
- Simple (linear) regression of one variable on another:
- The method emphasized for estimating regression parameters (intercept
aand slopeb) is least squares (normal equations). - Standard error of regression (standard error of estimate) measures how much the regression line would vary if data were recollected many times; it quantifies how much predictions deviate from actual observations.
Simple linear regression — least squares (normal equations)
Step-by-step procedure for simple linear regression (regressing one variable on another):
-
Prepare data
- List observed paired values
(xi, yi)and determine sample sizen. - Compute required sums:
Σx,ΣyΣ(xy)= sum of productsxi*yiΣ(x^2)and/orΣ(y^2)as needed
- Compute sample means
x̄ = Σx / nandȳ = Σy / n(useful for the intercept form).
- List observed paired values
-
Form the normal equations
-
For regression of x on y (x dependent):
Σx = n*a + b*ΣyΣ(xy) = a*Σy + b*Σ(y^2)(Herea= intercept,b= slope when regressing x on y.)
-
For regression of y on x (y dependent), swap x and y:
Σy = n*a + b*ΣxΣ(xy) = a*Σx + b*Σ(x^2)
-
-
Solve the two linear normal equations simultaneously for
aandb- Use algebraic elimination or
- Use means/covariance formulas (below) to get
bdirectly and thena.
-
Alternative direct formula for the slope (covariance / sums form)
- For x on y (x dependent):
b = [n*Σ(xy) − Σx*Σy] / [n*Σ(y^2) − (Σy)^2]a = x̄ − b*ȳ
- For y on x (y dependent):
b = [n*Σ(xy) − Σx*Σy] / [n*Σ(x^2) − (Σx)^2]a = ȳ − b*x̄
- For x on y (x dependent):
-
Interpret the fitted regression equation (e.g.,
x = a + b*yory = a + b*x).
Worked numerical example
Data:
x = {1, 2, 3, 4, 5}y = {2, 5, 3, 8, 7}
Computed sums:
n = 5Σx = 1 + 2 + 3 + 4 + 5 = 15→x̄ = 15 / 5 = 3Σy = 2 + 5 + 3 + 8 + 7 = 25→ȳ = 25 / 5 = 5Σ(xy) = 2 + 10 + 9 + 32 + 35 = 88Σ(y^2) = 4 + 25 + 9 + 64 + 49 = 151
Normal equations for x on y:
15 = 5*a + 25*b88 = 25*a + 151*b
Solve for a and b:
- Compute
b:b = (5*88 − 15*25) / (5*151 − 25^2) = 65 / 130 = 0.5
- Compute
a:a = x̄ − b*ȳ = 3 − 0.5*5 = 0.5
Final regression (x on y):
x = 0.5 + 0.5*y
Note: the original transcript contained arithmetic/transcription mistakes (it at one point claimed a and b were 5). The correct solution is a = 0.5, b = 0.5 as shown above.
Regression coefficient / emphasized formulas
- General slope formula (covariance / sums form):
- For x on y:
b = [n*Σ(xy) − Σx*Σy] / [n*Σ(y^2) − (Σy)^2] - For y on x: replace x ↔ y in the formula.
- For x on y:
- Regression line (slope-intercept form): dependent =
a + b *independent.
Multiple regression (concept)
- Definition: estimating one dependent variable from two or more independent variables.
- General form:
y = a + b1*x1 + b2*x2 + b3*x3 + ... ais the intercept;b1, b2, ...are partial regression coefficients (slopes).- Widely used in biological, social, and pharmaceutical sciences to study relationships involving multiple predictors.
- The transcript did not derive parameter-estimation formulas for multiple regression; typically these are solved via matrix methods or extended normal equations.
Standard error of regression (standard error of estimate)
- Definition: a measure of how much the regression line would vary (or how much predicted values deviate) if data were recollected many times; it expresses predictive uncertainty/error around the fitted line.
- Common formula for simple regression:
s_{y|x} = s_y * sqrt(1 − r^2)whenydepends onx.- More generally for a dependent variable
Zon predictors:s_{Z|predictors} = s_Z * sqrt(1 − r^2)wheres_Zis the standard deviation of the dependent variable andris the correlation coefficient. For multiple regression, replacer^2withR^2.
- Use:
- Insert the known standard deviation and correlation (
r) into the formula and compute. - For
xony, uses_x * sqrt(1 − r^2); foryonx, uses_y * sqrt(1 − r^2).
- Insert the known standard deviation and correlation (
Applications and resources
- Regression and multiple regression are common topics in syllabi for BBA, BCA, B.Pharmacy, and other biology/biostatistics courses.
- The lecturer mentioned supporting notes and lectures available via the “Depth of Biology” app (Play Store) and a website
egramswaraj.gov.inthat posts MCQs and other material.
Caution: The auto-generated transcript contains arithmetic and transcription errors (notably misreporting final parameter values in the worked example). The numerical results above use corrected arithmetic where necessary. The transcript also repeats and sometimes jumbles sentences; formulas and procedural steps have been extracted and clarified.
Speakers / sources (as identified in the subtitles)
- Lecturer / instructor (unnamed; primary speaker walking through definitions, formulas, and examples)
- Depth of Biology application (resource mentioned)
- Website: egramswaraj.gov.in (resource mentioned)
Category
Educational
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.