SDSC6012 Course 5-Autoregressive models
#sdsc6012
English / 中文
Autoregressive Model (AR(p))
Definition and Form
A p-th order autoregressive model, denoted as AR(p), has the form:
where:
-
is white noise with mean 0 and variance .
-
(with ) are the autoregressive coefficients.
Intuitive understanding: The current value is a linear combination of its own past p historical values, plus a random disturbance. It captures the “inertia” or “memory” of the series, meaning the influence of the series’ own history on its current state.
Stationarity and Causality
-
Stationarity condition: The necessary and sufficient condition for an AR(p) process to be stationary is that all roots of its characteristic equation lie outside the unit circle in the complex plane.
- The characteristic equation is defined as:
- If the modulus of the roots is , the process variance may grow infinitely (explode) or exhibit trends (like a random walk), violating stationarity.
-
Causality: A causal process means that the current value depends only on the current and past white noise , not on future white noise. For AR models, stationarity usually implies causality.
Example: AR(1) Model
-
Stationarity/causality condition:
-
Characteristic equation: , root is , requiring , i.e., .
Moving Average Model (MA(q))
Definition and Form
A q-th order moving average model, denoted as MA(q), has the form:
where:
-
are independent and identically distributed white noise sequences ().
-
(with ) are the moving average coefficients.
Intuitive understanding: The current value is a linear combination of the current and past q random “shocks” or “innovations”. It captures the short-term impact of external transient events on the series. For example, measures the influence of the previous period’s shock on the current value .
Key Characteristics: Short Memory and ACF Truncation
-
Short memory: The MA(q) model has only q periods of memory. For , and are composed of entirely independent shocks, and its autocorrelation function (ACF) truncates strictly after lag q (becomes 0).
-
Flexibility: Within q periods, the ACF can exhibit any pattern, but its memory cannot describe long-term dependencies.
Backshift Operator (B) Representation
The backshift operator is defined as: , , and .
Using , the MA(q) model can be concisely expressed as:
where is the moving average polynomial:
Proof:
Step 1: Core Tool – Backshift Operator (B)
The backshift operator, denoted as B, is a key operator in time series analysis.
-
B acts on a time series variable.
-
B acting on the value at time t yields the value at the previous time t-1.
Mathematical definition:
By extension, higher powers of B represent multiple backshifts:
-
means applying the backshift operation twice:
-
means applying the backshift operation q times:
The essence of the backshift operator is a transformer of time indices, which systematically shifts the entire series backward on the time axis. It is the foundation for building time series models like ARIMA.
Step 2: Rewriting the MA(q) Model Using the Backshift Operator B
The original equation of the MA(q) model is:
Rewrite each term using the backshift operator B:
-
remains unchanged, can be regarded as (where , the identity operator)
-
becomes
-
becomes
-
becomes
Substituting these into the original model gives:
This step completes the transition from an intuitive time-lag representation to a compact operator representation, preparing for subsequent factorization and polynomial definition.
Step 3: Extracting the Common Factor
Observe that every term on the right-hand side contains the common factor , extract it:
Extracting the common factor transforms the additive structure of the model into a multiplicative structure, revealing that the core of the model is a linear filtering process applied to the white noise sequence .
Step 4: Defining the Moving Average Polynomial
Based on the form after extracting the common factor, we define a polynomial in the backshift operator B, called the moving average polynomial, denoted :
Using this polynomial, the MA(q) model can be expressed very concisely as:
The moving average polynomial completely characterizes the structure and properties of the MA model. The model’s order q, the coefficients of each moving average term, and the model’s stationarity and invertibility conditions are all encapsulated in this polynomial. It is a key bridge connecting time series model theory with operator theory.
Invertibility
-
Definition: An MA process is invertible if its white noise sequence can be expressed as a linear combination of current and past observed values (i.e., ), and the coefficients are absolutely summable.
-
Condition (using MA(1) as an example): The MA(1) process is invertible if and only if .
- This is equivalent to the root of its moving average polynomial , which is , having a modulus greater than 1 ().
-
Importance: Invertibility ensures the uniqueness and identifiability of the model parameters, and facilitates transforming the model into an AR() form for forecasting and understanding.
Autoregressive Moving Average Model (ARMA(p, q))
Definition and Form
The ARMA(p, q) model combines the AR and MA parts, and has the form:
Or, using the backshift operator , it can be expressed as:
where:
-
is the autoregressive polynomial
-
is the moving average polynomial
Intuitive understanding: The current value is influenced by two aspects simultaneously:
- AR part (inertia): A linear combination of the series’ own past states (), capturing long-term trends and periodic patterns.
- MA part (shocks): A linear combination of current and past random shocks (), capturing the short-term impact of external events.
It is like a sound in a room with reverberation (echo), which includes both the original sound (MA) and the echoes (AR).
Advantages:
The ARMA model, with relatively few parameters (), combines the flexibility of AR models in capturing long-term dependencies and MA models in capturing short-term patterns, allowing for a more concise (parameter-efficient) description of complex stationary time series. According to the Wold decomposition theorem, ARMA models can approximate any stationary process with arbitrary accuracy.
Stationarity, Causality, and Invertibility
-
Stationarity depends on the AR part: requires that the roots of lie outside the unit circle.
-
Causality also depends on the AR part: usually satisfied under stationarity conditions.
-
Invertibility depends on the MA part: requires that the roots of lie outside the unit circle.
Parameter Redundancy
-
If the AR polynomial and the MA polynomial have a common factor, the model parameters cannot be uniquely identified, and the model can be simplified to a lower-order ARMA model.
-
Therefore, a “good” ARMA model requires that and have no common factors, to ensure the model structure is unique and of the lowest order.
Detailed Explanation
-
Core Properties of the Model
-
Universal approximation capability: For any stationary process with autocovariance function , and for any , there exists an ARMA process that can approximate it infinitely closely. This means that ARMA models can, in theory, approximate any other stationary process with arbitrary precision.
-
Intuitive Understanding of Model Components
- Characteristics of the AR(p) Process
- Allows many non-zero coefficients: The current value can depend on values from the distant past (, , …, ).
- Coefficients have decay constraints: Although there can be many non-zero coefficients , the values of these coefficients cannot be arbitrary. To ensure stationarity, the sequence formed by these coefficients must satisfy specific mathematical constraints (usually achieved by having characteristic roots inside the unit circle).
- Decay pattern: This constraint causes the influence of distant past values to exhibit a specific decay pattern as the lag order increases, typically exponential decay or sinusoidal decay.
For example, an AR(2) process: , the values of the coefficients 0.7 and 0.2 and their combination determine the fluctuation and memory pattern of the series.
- Characteristics of the MA(q) Process
- Allows a limited number of non-zero coefficients: Only a finite number (q) of coefficients are allowed to be non-zero.
- Flexible coefficient values: The values of these non-zero coefficients can be arbitrary (no decay constraints like in AR), making it very flexible in the short term.
- Short memory: The autocorrelation function (ACF) of an MA process truncates abruptly (immediately becomes 0) after lag q. This is because when the lag , and are composed of entirely independent shocks ().
For example, an MA(1) process: , its ACF truncates after lag 1. The advantage is its ability to flexibly capture any autocorrelation pattern in the short term; the disadvantage is its inability to describe long-term dependency relationships.
- Characteristics of the AR(p) Process
3. Parameter Redundancy Problem
-
Common factor problem: When constructing an ARMA model, its core polynomials are defined as:
- Autoregressive polynomial:
- Moving average polynomial:
Where is the lag operator ().
-
Ensuring model uniqueness: The polynomials and must have no common factors. For example, if and , then they have a common factor , and the model can be simplified to a white noise process, leading to non-unique parameter identification. Eliminating common factors ensures the resulting model is structurally unique, parameters are identifiable, and it is of the lowest possible order.
Model Selection and Python Implementation
How to Distinguish AR and MA?
Ask a question: What is driving this system?
-
If the answer is “some recent external events” (e.g., news from yesterday and today), then it is closer to an MA model.
-
If the answer is “the system’s own previous state” (e.g., the market continues to fall because it is already in a downward trend), then it is closer to an AR model.
-
If both are present, use an ARMA model.
Python Implementation Example (using statsmodels)
1 | import numpy as np |
Model Evaluation
-
Use AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion) to compare the goodness-of-fit of different models (lower values are better).
-
Test the residuals of the model; they should behave like white noise (no autocorrelation), otherwise it indicates that the model has not fully captured the information in the data.
