Summary of "Statistics 101: Descriptive Statistics, Histograms"

Statistics 101: Descriptive Statistics — Histograms

Main ideas and lessons

Quantitative vs categorical data

Categorical data: labels or categories (e.g., phone brand, region). Summarized by counts/frequencies.
Quantitative data: numeric measurements (e.g., age, sales). Summarized differently — histograms are the primary preliminary visualization.

What a histogram is and why it’s useful

A histogram groups (bins) quantitative data into contiguous, non-overlapping intervals (bins/buckets) and displays the count (frequency) or relative frequency for each bin as a vertical bar.

Shows the shape of the distribution: where values cluster, presence of tails and peaks.
Foundational for later statistical topics and exploratory data analysis.

Bins / buckets — core concepts

Bins are contiguous and exclusive (they butt up against each other; no gaps).
Bin width (size) and number of bins strongly affect usefulness:
- Too few bins → a “histo-blob”: loss of resolution, shape is hidden.
- Too many bins → many bins with very few observations; histogram becomes noisy.
There is no single correct number of bins — aim for a practical “sweet spot” using common sense, domain knowledge, or software defaults/formulas as guides.

Axes and frequencies

Horizontal (x) axis: the quantitative variable (e.g., age).
Vertical (y) axis: frequency, relative frequency, or percent frequency.
Relative frequency = bin count / total observations. Using relative frequency vs. frequency does not change the histogram’s shape.

Common histogram shapes (how to interpret)

Left-skewed (negative skew): tail on the left; bulk of data on the right.
Right-skewed (positive skew): tail on the right; bulk of data on the left.
Symmetric: roughly mirror image about the center (e.g., bell-shaped/normal).
Bimodal or multimodal: two (or more) distinct peaks — suggests subgroups or mixed processes.
Uniform: nearly equal counts across bins.
Random/no clear pattern: no obvious shape (may be due to poor binning).

Worked example: smartphone users’ ages

Example dataset: 100 smartphone users in the U.S., with brand (categorical) and age (quantitative).

Binning choices demonstrated:

1 bin (everyone 18+): a “histo-blob” — no information.
3 bins (~20-year width: 18–39, 40–59, 60+): coarse resolution (example counts 39, 45, 16) — shows skew toward younger ages.
6 bins (~10-year width): better resolution and preferred by the presenter (example counts 24, 15, 23, 22, 15, 1).
12 bins (~5-year width): possibly too many — many bins with very small counts (1–7 observations), making interpretation harder.

Frequency vs relative frequency:

With 100 observations, convert counts to relative frequency by dividing by 100 (e.g., 24 → 0.24). The histogram’s shape is unchanged by this conversion.

Practical step-by-step: create and interpret a histogram

Choose the quantitative variable of interest (e.g., age).
Decide a binning strategy:
- Choose number of bins or bin width (consider domain conventions, convenience, or software defaults).
- Make bins contiguous and non-overlapping so every observation falls in exactly one bin.
Tally observations per bin (frequency).
Optionally compute relative frequency: frequency / total sample size.
Draw a bar for each bin:
- Horizontal span = bin interval.
- Height = frequency or relative frequency.
- Bars touch (no gaps) in a histogram.
Check that the sum of frequencies equals the total sample size.
Inspect the histogram’s shape and consider:
- Skewness, symmetry, modes (peaks), uniformity, or multimodality.
- Whether bin width needs adjustment (too coarse or too fine).
Use histogram insights for further analysis (e.g., identifying subgroups, transformation needs, outliers).

Warnings and tips

Choosing bins involves both art and rule-of-thumb formulas — experiment to find a clear, useful depiction.
Software can pick bins automatically; always verify that automatic choices make sense for your data and question.
Histograms are widely used in statistics, analytics, and data science — understanding them is essential.

Speakers / sources (as identified)

Video presenter / narrator: likely Brandon Foltz (presenter indicated in sponsorship reference).
Sponsor: Great Courses Plus.
Referenced professor/source: Professor Michael Starbird (mentioned in relation to a Great Courses lecture).