Summary of "Lab 2: Interval estimation for population CDF"

Concise summary — main ideas, methods, and lab plan

Lecture focus

Topic: Interval estimation for a population CDF (cumulative distribution function).
Goal: Use sample data to estimate the population CDF and construct interval (confidence) bands around that estimate. Study how sample size and confidence level affect those bands. Also introduce and test plug‑in estimators for population parameters.

Key concepts and definitions

Parameter vs full distribution
- “Parameters” (mean μ, variance σ², median, etc.) give a partial description of a distribution; the PDF/CDF give the full description.
- PDF and CDF are linked: the PDF integrates to the CDF; the CDF differentiates to the PDF (for continuous cases).
Point estimation vs interval estimation
- Point estimate: a single value (e.g., μ̂ = x̄).
- Interval estimate: a range [lower, upper] intended to contain the true value with confidence level 1 − α.
- Trade-off: point estimates are more decisive (single value) but have lower probability of containing the true value; intervals are more likely to contain the true value but are less decisive.
Empirical CDF (ECDF) as the point estimator for F(x)
- Conceptual formula: F̂(x) = (1/n) Σ I{Xk ≤ x}, k = 1..n.
- Construction: sort sample values, assign equal probability 1/n to each sample, cumulative sum yields a step ECDF.
- ECDF is a step function that increases from 0 to 1.

Constructing interval (confidence) bands for a CDF

General idea
- Form upper and lower bands by adding/subtracting a margin of error (radius) to the ECDF:
  - Upper band: min(1, F̂(x) + margin)
  - Lower band: max(0, F̂(x) − margin)
- Always clip to [0,1] to enforce a valid CDF range.
Margin of error depends on
- Sample size n: larger n → narrower margin.
- Confidence level 1 − α: higher confidence (larger 1 − α) → wider bands.
- The margin formula can come from uniform bounds (e.g., the Dvoretzky–Kiefer–Wolfowitz (DKW) inequality) or other approximations; conceptually it shrinks with n and grows with the confidence level.
Practical plotting notes
- Plot the ECDF as a step function. To show asymptotes to 0 and 1 you often add artificial points before the smallest sample and after the largest sample; for clarity you may trim those added endpoints.
- When computing bands, enforce clipping to keep values within [0,1].

Plug‑in estimators (second lab topic)

Principle
- If a parameter is a functional of the PDF or CDF (e.g., μ = ∫ x f(x) dx), replace the unknown f or F with an estimate (typically the empirical distribution or a density estimate) to obtain an estimator.
- For expectations, replace the integral by the empirical average (sum over observations) — this yields the sample mean for μ.
Discrete vs continuous
- Integrals become sums when continuous functions are replaced by discrete, sample-based estimates.
Relation to other methods
- Different estimation methods (method of moments, maximum likelihood, plug‑in, etc.) reflect different philosophies and generally do not yield identical estimators, but they can be compared.
How to judge/compare estimators
- Use bias, consistency, mean squared error (MSE), variance, and related criteria to select the best estimator for a task.

Lab structure — procedural steps

Set up the environment (use provided R packages/functions; code is encapsulated so you need not reimplement everything).
Choose a known population distribution (simulate from a known CDF/PDF so you can compare estimates to truth).
For a chosen sample size n:
- Draw a random sample from the population.
- Sort the sample values (ascending).
- Construct the empirical CDF: assign probability 1/n to each sample value and compute cumulative sums to form the step function.
Compute the point estimate for F(x) (the ECDF).
Compute the margin of error for the CDF bands using the chosen method (e.g., DKW-based uniform bound); margin depends on n and α.
Form interval bands: F̂(x) ± margin, then clip values to [0,1].
Plot results:
- Plot the true CDF (if known) and the ECDF.
- Plot upper and lower bands around the ECDF to visualize the confidence region.
- Be careful with artificial first/last points used to illustrate 0 and 1; remove or trim them when needed for clarity.
Repeat experiments to study effects:
- Vary n (e.g., n = 100, 1,000, 10,000) and observe narrowing of bands and smoothing of the ECDF.
- Vary confidence level (1 − α, e.g., 70%, 95%, 99%) and observe widening of bands as confidence increases.
Second lab part — plug‑in estimator experiments:
- Simulate many samples, compute plug‑in estimates for parameters of interest (mean, variance, or distribution parameters like exponential rate λ).
- Compare estimated values to true parameter values; evaluate convergence as n grows.
- Pay attention to biased vs unbiased forms (e.g., variance with divisor n vs n − 1).
Diagnostic/comparison measures: - Compute bias, variance, MSE, or other criteria to compare alternative estimators (method of moments, MLE, plug‑in, etc.).

Practical R implementation notes

Use existing R functions/packages to compute the ECDF and confidence bands to avoid reimplementation.
Input format: use data frames; ensure the sample column is ordered if plotting by order.
Many helper functions will automatically compute the ECDF, upper/lower bands, and apply clipping to [0,1].
Small implementation details: adding endpoints to reach 0/1 can cause plots to show a count slightly > n (due to added points); handle this by trimming added endpoints if needed.

Important conceptual takeaways

The ECDF is a natural nonparametric point estimator for F(x); increasing sample size improves both smoothness and accuracy.
Interval estimation for the CDF is done by adding/subtracting a margin that depends on n and α, then clipping to [0,1]; larger confidence levels produce wider bands.
Plug‑in estimators are straightforward and intuitive: substitute empirical estimates for unknown distribution functions inside population functionals; integrals reduce to sums.
Always check estimator properties (bias, consistency, MSE) and be aware of discrete vs continuous approximations and small‑sample vs large‑sample behavior.

Speakers and sources referenced

Lecturer / Instructor (primary speaker) — unnamed in subtitles (sometimes referred to as “engineer”).
Methods and references mentioned:
- Plug‑in estimators
- Method of moments
- Maximum likelihood
- Dvoretzky–Kiefer–Wolfowitz (DKW) inequality for uniform CDF confidence bands
Tools referenced:
- R (packages and functions for ECDF, plotting, and simulation) — used in lab demonstrations.