Summary of "【科学実験リテラシー】Day4 相関係数"
Scientific concepts / phenomena covered
Correlation coefficients (quantitative relationship)
- Correlation: a quantitative description of how two variables relate (e.g., (x) and (y)).
- Scatter plot: used to visualize the relationship between two variables ((x), (y)).
Pearson’s product-moment correlation coefficient ((r))
- The main focus is Pearson’s correlation coefficient (r).
- The value (r) is interpreted using variance and covariance.
- Key interpretation range:
- (r \in [-1, 1])
- (r = 1): strong positive linear relationship (points lie on a line with positive slope)
- (r = -1): strong negative linear relationship
- (r \approx 0): little/no linear correlation
- Absolute value reflects strength (sign gives direction):
- roughly (0)–(0.2): almost no correlation
- (0.2)–(0.4): weak
- (0.4)–(0.7): strong
- Methodological note: interpret (r) together with the scatter plot, not alone.
Variance and covariance (mathematical basis)
- Variance: used to standardize/normalize the spread of each variable.
- Covariance:
- computed from deviations from the mean of (x) and deviations from the mean of (y)
- used in the numerator of correlation
- The lesson emphasizes that correlation uses normalized covariance, producing a value bounded between -1 and 1, which makes comparisons easier (and also relates to convenience vs. sample-size effects).
Regression vs correlation
- Correlation analysis: treats (x) and (y) symmetrically; asks whether two variables vary together.
- Regression analysis (contrasted for next week): emphasizes a directional / one-way influence, such as predicting/estimating (y) from (x).
Causation vs correlation
- A correlation coefficient indicates association, not cause-and-effect.
- Examples:
- Height vs weight: taller people tend to weigh more, but correlation alone cannot determine which causes which.
- Math vs science grades: correlated, but causal direction is not implied.
- Spurious correlation (apparent correlation):
- restaurants vs financial institution branches in Tokyo appear strongly correlated (reported (r \approx 0.9)),
- but the lesson attributes both to a third variable: daytime population (more people flowing into business districts increases both restaurants and bank branches).
- Partial correlation:
- introduced as a correction approach:
- after “removing” the third variable’s effect, the correlation decreases (example: reduced from ~0.9 to 0.66).
- introduced as a correction approach:
Simpson’s paradox / grouping trap (hidden patterns)
- Aggregated data can hide or understate relationships.
- Example described: unemployment rate vs election vote share:
- overall correlation appears weak,
- but within subgroups/regions (e.g., England vs Scotland, Wales in the example), correlations can be strong or differ.
- Takeaway: how you group data matters, and it requires domain knowledge.
Other correlation coefficients (beyond Pearson)
- Rank correlation:
- Spearman’s rank correlation coefficient
- Kendall’s rank correlation coefficient
- Use case: when data are ordinal/rank-based (e.g., students ranked by math and English).
- They assess monotonic/rank-based association rather than raw numeric spacing.
- Spearman/Kendall are presented as alternatives derived from ranking relationships.
Autocorrelation for time series (correlation with time lag)
- Autocorrelation: correlation of a variable with itself at different time shifts (lag (h)).
- Time-series phenomena discussed:
- trends (persistence or reversal depending on the sign),
- seasonality / periodicity (repeating patterns),
- used for predicting upward/downward trends and identifying regular variations.
- Lesson explanation:
- compute correlation between (x_i) and (x_{i+h}),
- as lag (h) increases, correlation typically approaches 0 if no periodicity exists.
- Example: department store sales (monthly) over multiple years:
- periodicity detected by peaks in autocorrelation at specific lag values,
- periodicity appears at a lag like 6, suggesting a repeating cycle roughly every 6 months.
Scientific literacy / galaxy-phenomenon correlation mention
- Mentions an example dataset from scientific literacy tests:
- countries with higher scientific literacy scores tend to have higher reading comprehension (positive correlation).
- Mentions a galaxy/black-hole dataset:
- axes related to galaxy/black hole masses and a “loss” measure,
- the concept is correlation between astrophysical quantities (subtitle details are garbled, but correlation between properties is implied).
Methodology / workflow shown (Excel-based calculation)
How to compute correlation in Excel (as taught)
1) Create a scatter plot
- Select two columns of data ((x) and (y)).
- Insert → scatter plot with points.
- Add/format:
- chart title
- axis labels
- adjust axis limits and font sizes
2) Compute the correlation coefficient
- Option 1: Excel “Data Analysis” tool
- Data → Data Analysis
- choose Correlation
- set input range (including first row labels)
- set output destination
- read the resulting correlation matrix (diagonal should be 1)
- Option 2: Excel function for Pearson correlation
- use a built-in correlation function (Pearson product-moment correlation)
- calculate correlation directly from two ranges
- Option 3 (conceptual): compute via variance/covariance
- compute standard deviations and covariance using Excel/statistical functions
- plug into the Pearson (r) definition
- the lesson highlights careful handling of population vs sample versions (noted as potential pitfalls with
.pvs.svariants).
Researchers / sources featured (names at end)
- Karl Pearson (Pearson’s product-moment correlation coefficient)
- Spearman (Spearman’s rank correlation coefficient)
- Kendall (Kendall’s rank correlation coefficient)
- Murakami (teacher referenced as “Mr. Murakami” introducing parts of the lesson)
Category
Science and Nature
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.
Preparing reprocess...