Summary of "Временные ряды | Модели ARIMA и SARIMAX (01.12.2025)"

Summary of the Video: “Временные ряды | Модели ARIMA и SARIMAX (01.12.2025)”

Main Ideas and Concepts

1. Course and Homework Structure

Homework consists of two parts: a main part and a bonus part.
Bonus points can allow skipping some future homework.
Homework involves working with generated time series with known components (trend, seasonality, autocorrelation).
Students should apply discussed methods (ETS, ARIMA, SARIMAX) on both generated and real series.
Deadlines and testing schedules were briefly mentioned.

2. Introduction to ARIMA and SARIMAX Models

ARIMA models extend ARMA by including differencing (integration) to handle non-stationarity.
SARIMA adds seasonal components.
SARIMAX includes exogenous variables (external regressors).
ETS models decompose series explicitly into error, trend, and seasonality, while ARIMA models focus on autocorrelation structures.
ARIMA models require stationarity; differencing is used to achieve this.
ARIMA models can approximate any stationary time series given sufficient order (Volterra’s theorem).

3. Stationarity

Stationarity means statistical properties (mean, variance, autocovariance) do not change over time.
Non-stationary series have trends, seasonality, or changing variance.
Types of stationarity:
- Strong stationarity: all statistical properties (full joint distribution) are time-invariant.
- Weak stationarity: only first two moments (mean and variance) and autocovariance depend only on lag, not on time.
Stationarity is crucial for valid model estimation, hypothesis testing, and forecasting.
Non-stationary data can cause unstable parameter estimates and unrealistic forecasts.

4. Testing for Stationarity

Common tests: Augmented Dickey-Fuller (ADF) and KPSS.
ADF null hypothesis: series has a unit root (non-stationary).
KPSS null hypothesis: series is stationary.
Both tests have limitations and are often used together with visual inspection.
Stationarity testing is essential in econometrics but less critical in some machine learning contexts where normalization techniques are used.

5. Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF)

ACF measures correlation between points separated by lag k.
PACF measures correlation between points at lag k after removing effects of intermediate lags.
ACF and PACF plots help identify appropriate AR and MA orders (p and q) in ARIMA models.
Examples:
- White noise: no significant autocorrelation beyond lag 0.
- AR(1): ACF decays exponentially, PACF cuts off after lag 1.
- MA(1): ACF cuts off after lag 1, PACF decays exponentially.
- ARMA(1,1): more complex patterns in ACF and PACF.

6. Model Selection and Parameter Estimation

ARIMA model parameters: (p, d, q) where d is the order of differencing.
Seasonal components add parameters: seasonal orders (P, D, Q) and seasonal period s.
Exogenous variables (X) can be included in SARIMAX models.
Model selection involves:
- Differencing to achieve stationarity.
- Using ACF and PACF to choose p and q.
- Using information criteria (AIC, BIC) or cross-validation for model comparison.
Automatic parameter selection methods exist but expert judgment is important.
Constraints on parameters ensure stationarity and invertibility (roots outside unit circle).

7. Forecasting with ARIMA

Forecasts are generated stepwise using past observed and forecasted values.
Residuals from fitted models are used to estimate noise terms.
Forecast intervals depend on stationarity and model assumptions.

8. Exogenous Variables in SARIMAX

Exogenous variables should be deterministic and not influenced by the series itself.
Examples: calendar effects, holidays, promotions, weather variables like wind speed.
Too many exogenous variables can cause overfitting or computational issues (curse of dimensionality).
Usually 1–2 exogenous variables are manageable.

9. Practical Advice and Additional Notes

Seasonality should be removed via seasonal differencing before modeling.
Limit total differencing order (d + D) to 2 or less for stability.
Residual diagnostics (normality, no autocorrelation) are important to validate models.
In large datasets with many time series, automatic parameter selection and machine learning approaches may be preferable.
ETS and ARIMA models are not directly comparable by AIC/BIC due to differing assumptions.

Methodology / Instructions

Homework Assignment

Work on generated time series with known parameters (trend, seasonality, autocorrelation).
Apply ETS, naive, ARMA, ARIMA, SARIMA, and SARIMAX models.
Use graphical tools (ACF, PACF) for model identification.
Test models on real series to observe practical challenges.
Submit homework by the deadline; bonus points available.

Stationarity Testing

Use ADF and KPSS tests together.
Visualize series and their statistical properties (mean, variance, autocovariance).
Differentiate series if non-stationary.
Understand weak vs. strong stationarity concepts.

Model Identification

Plot ACF and PACF of original and differenced series.
Identify p (AR order) from PACF cutoffs.
Identify q (MA order) from ACF cutoffs.
Determine d (differencing order) by stationarity tests and visual inspection.
For seasonal data, identify seasonal orders (P, D, Q) and season length s.

Parameter Selection

Use grid search or heuristic rules to limit parameter search space.
Avoid over-differencing (d + D ≤ 2).
Use AIC, BIC, or cross-validation to compare models.
Check residuals for autocorrelation and normality.

Forecasting

Generate forecasts stepwise, using previous forecasts as inputs for future steps.
Set unknown future noise terms to zero.
Construct forecast intervals based on model residuals.

Incorporating Exogenous Variables

Choose deterministic, external variables unrelated to series values.
Normalize or transform exogenous variables if necessary.
Limit the number of exogenous variables to avoid overfitting.

Practical Tips

Use automatic parameter selection for many series but verify results.
Expert judgment is essential for final model choice.
Machine learning models may bypass stationarity assumptions via normalization.
Residual diagnostics are crucial for model validation.

Speakers / Sources Featured

Primary Speaker: The lecturer/instructor conducting the session (name not provided).
References Mentioned:
- Shand’s textbook on ARIMA models (not favored by speaker but recommended).
- Online resources about ARIMA parameter selection (translated into Russian by the speaker).
- A research article by the speaker’s team on forecasting and machine learning integration (offered upon request).

This summary captures the key concepts, methodologies, and practical advice presented in the video on time series analysis, focusing on ARIMA and SARIMAX models, stationarity, and model diagnostics.