#sdsc6012

English / 中文

Autoregressive Model (AR(p))

Definition and Form

A p-th order autoregressive model, denoted as AR(p), has the form:

$x_t = \phi_1 x_{t-1} + \phi_2 x_{t-2} + \cdots + \phi_p x_{t-p} + w_t$

where:

$w_t \sim \text{wn}(0, \sigma_w^2)$ is white noise with mean 0 and variance $\sigma_w^2$ .
$\phi_1, \phi_2, \ldots, \phi_p$ (with $\phi_p \neq 0$ ) are the autoregressive coefficients.

Intuitive understanding: The current value $x_t$ is a linear combination of its own past p historical values, plus a random disturbance. It captures the “inertia” or “memory” of the series, meaning the influence of the series’ own history on its current state.

Stationarity and Causality

Stationarity condition: The necessary and sufficient condition for an AR(p) process to be stationary is that all roots of its characteristic equation lie outside the unit circle in the complex plane.
- The characteristic equation is defined as: $\phi(z) = 1 - \phi_1 z - \phi_2 z^2 - \cdots - \phi_p z^p = 0$
- If the modulus of the roots is $\leq 1$ , the process variance may grow infinitely (explode) or exhibit trends (like a random walk), violating stationarity.
Causality: A causal process means that the current value $x_t$ depends only on the current and past white noise $w_t, w_{t-1}, ...$ , not on future white noise. For AR models, stationarity usually implies causality.

Example: AR(1) Model

$x_t = \phi x_{t-1} + w_t$

Stationarity/causality condition: $|\phi| < 1$
Characteristic equation: $1 - \phi z = 0$ , root is $z = 1/\phi$ , requiring $|1/\phi| > 1$ , i.e., $|\phi| < 1$ .

Moving Average Model (MA(q))

Definition and Form

A q-th order moving average model, denoted as MA(q), has the form:

$x_t = w_t + \theta_1 w_{t-1} + \theta_2 w_{t-2} + \cdots + \theta_q w_{t-q}$

where:

$w_t, w_{t-1}, ..., w_{t-q}$ are independent and identically distributed white noise sequences ( $w_t \sim \text{wn}(0, \sigma_w^2)$ ).
$\theta_1, \theta_2, \ldots, \theta_q$ (with $\theta_q \neq 0$ ) are the moving average coefficients.

Intuitive understanding: The current value $x_t$ is a linear combination of the current and past q random “shocks” or “innovations”. It captures the short-term impact of external transient events on the series. For example, $\theta_1$ measures the influence of the previous period’s shock $w_{t-1}$ on the current value $x_t$ .

Key Characteristics: Short Memory and ACF Truncation

Short memory: The MA(q) model has only q periods of memory. For $k > q$ , $x_t$ and $x_{t-k}$ are composed of entirely independent shocks, and its autocorrelation function (ACF) truncates strictly after lag q (becomes 0).
Flexibility: Within q periods, the ACF can exhibit any pattern, but its memory cannot describe long-term dependencies.

Backshift Operator (B) Representation

The backshift operator $B$ is defined as: $B \cdot x_t = x_{t-1}$ , $B \cdot w_t = w_{t-1}$ , and $B^k \cdot w_t = w_{t-k}$ .
Using $B$ , the MA(q) model can be concisely expressed as:

$x_t = \theta(B) w_t$

where $\theta(B)$ is the moving average polynomial:

$\theta(B) = 1 + \theta_1 B + \theta_2 B^2 + \cdots + \theta_q B^q$

Proof:

Step 1: Core Tool – Backshift Operator (B)

The backshift operator, denoted as B, is a key operator in time series analysis.

B acts on a time series variable.
B acting on the value at time t yields the value at the previous time t-1.

Mathematical definition:

$B \cdot x_t = x_{t-1}$

$B \cdot w_t = w_{t-1}$

By extension, higher powers of B represent multiple backshifts:

$B^2$ means applying the backshift operation twice:

$B^2 \cdot w_t = B \cdot (B \cdot w_t) = B \cdot w_{t-1} = w_{t-2}$

$B^q$ means applying the backshift operation q times:

$B^q \cdot w_t = w_{t-q}$

The essence of the backshift operator is a transformer of time indices, which systematically shifts the entire series backward on the time axis. It is the foundation for building time series models like ARIMA.

Step 2: Rewriting the MA(q) Model Using the Backshift Operator B

The original equation of the MA(q) model is:

$x_t = w_t + \theta_1 w_{t-1} + \theta_2 w_{t-2} + \ldots + \theta_q w_{t-q}$

Rewrite each term using the backshift operator B:

$w_t$ remains unchanged, can be regarded as $B^0 \cdot w_t$ (where $B^0 = 1$ , the identity operator)
$\theta_1 w_{t-1}$ becomes $\theta_1 \cdot (B \cdot w_t)$
$\theta_2 w_{t-2}$ becomes $\theta_2 \cdot (B^2 \cdot w_t)$
$\theta_q w_{t-q}$ becomes $\theta_q \cdot (B^q \cdot w_t)$

Substituting these into the original model gives:

$x_t = w_t + \theta_1 B w_t + \theta_2 B^2 w_t + \ldots + \theta_q B^q w_t$

This step completes the transition from an intuitive time-lag representation to a compact operator representation, preparing for subsequent factorization and polynomial definition.

Step 3: Extracting the Common Factor $w_t$

Observe that every term on the right-hand side contains the common factor $w_t$ , extract it:

$x_t = \left(1 + \theta_1 B + \theta_2 B^2 + \ldots + \theta_q B^q \right) w_t$

Extracting the common factor transforms the additive structure of the model into a multiplicative structure, revealing that the core of the model is a linear filtering process applied to the white noise sequence $\{w_t\}$ .

Step 4: Defining the Moving Average Polynomial $\theta(B)$

Based on the form after extracting the common factor, we define a polynomial in the backshift operator B, called the moving average polynomial, denoted $\theta(B)$ :

$\theta(B) = 1 + \theta_1 B + \theta_2 B^2 + \cdots + \theta_q B^q$

Using this polynomial, the MA(q) model can be expressed very concisely as:

$x_t = \theta(B) w_t$

The moving average polynomial $\theta(B)$ completely characterizes the structure and properties of the MA model. The model’s order q, the coefficients $\theta_i$ of each moving average term, and the model’s stationarity and invertibility conditions are all encapsulated in this polynomial. It is a key bridge connecting time series model theory with operator theory.

Invertibility

Definition: An MA process is invertible if its white noise sequence $w_t$ can be expressed as a linear combination of current and past observed values $x_t, x_{t-1}, ...$ (i.e., $w_t = \sum_{j=0}^{\infty} \pi_j x_{t-j}$ ), and the coefficients are absolutely summable.
Condition (using MA(1) as an example): The MA(1) process $x_t = w_t + \theta w_{t-1}$ is invertible if and only if $|\theta| < 1$ .
- This is equivalent to the root of its moving average polynomial $\theta(z) = 1 + \theta z$ , which is $z = -1/\theta$ , having a modulus greater than 1 ( $|z| > 1$ ).
Importance: Invertibility ensures the uniqueness and identifiability of the model parameters, and facilitates transforming the model into an AR( $\infty$ ) form for forecasting and understanding.

Autoregressive Moving Average Model (ARMA(p, q))

Definition and Form

The ARMA(p, q) model combines the AR and MA parts, and has the form:

$x_t = \phi_1 x_{t-1} + ... + \phi_p x_{t-p} + w_t + \theta_1 w_{t-1} + ... + \theta_q w_{t-q}$

Or, using the backshift operator $B$ , it can be expressed as:

$\phi(B) x_t = \theta(B) w_t$

where:

$\phi(B) = 1 - \phi_1 B - \phi_2 B^2 - ... - \phi_p B^p$ is the autoregressive polynomial
$\theta(B) = 1 + \theta_1 B + \theta_2 B^2 + ... + \theta_q B^q$ is the moving average polynomial

Intuitive understanding: The current value $x_t$ is influenced by two aspects simultaneously:

AR part (inertia): A linear combination of the series’ own past states ( $x_{t-1}, ..., x_{t-p}$ ), capturing long-term trends and periodic patterns.

MA part (shocks): A linear combination of current and past random shocks ( $w_t, w_{t-1}, ..., w_{t-q}$ ), capturing the short-term impact of external events.
It is like a sound in a room with reverberation (echo), which includes both the original sound (MA) and the echoes (AR).

Advantages:

The ARMA model, with relatively few parameters ( $p + q$ ), combines the flexibility of AR models in capturing long-term dependencies and MA models in capturing short-term patterns, allowing for a more concise (parameter-efficient) description of complex stationary time series. According to the Wold decomposition theorem, ARMA models can approximate any stationary process with arbitrary accuracy.

Stationarity, Causality, and Invertibility

Stationarity depends on the AR part: requires that the roots of $\phi(B) = 0$ lie outside the unit circle.
Causality also depends on the AR part: usually satisfied under stationarity conditions.
Invertibility depends on the MA part: requires that the roots of $\theta(B) = 0$ lie outside the unit circle.

Parameter Redundancy

If the AR polynomial $\phi(B)$ and the MA polynomial $\theta(B)$ have a common factor, the model parameters cannot be uniquely identified, and the model can be simplified to a lower-order ARMA model.
Therefore, a “good” ARMA model requires that $\phi(B)$ and $\theta(B)$ have no common factors, to ensure the model structure is unique and of the lowest order.

Detailed Explanation

Core Properties of the Model

Universal approximation capability: For any stationary process with autocovariance function $\gamma$ , and for any $k>0$ , there exists an ARMA process $\{x_t\}$ that can approximate it infinitely closely. This means that ARMA models can, in theory, approximate any other stationary process with arbitrary precision.

Intuitive Understanding of Model Components
1. Characteristics of the AR(p) Process
  - Allows many non-zero coefficients: The current value $x_t$ can depend on values from the distant past ( $x_{t-1}$ , $x_{t-2}$ , …, $x_{t-p}$ ).
  - Coefficients have decay constraints: Although there can be many non-zero coefficients $\phi_1, \phi_2, ..., \phi_p$ , the values of these coefficients cannot be arbitrary. To ensure stationarity, the sequence formed by these coefficients must satisfy specific mathematical constraints (usually achieved by having characteristic roots inside the unit circle).
  - Decay pattern: This constraint causes the influence of distant past values to exhibit a specific decay pattern as the lag order increases, typically exponential decay or sinusoidal decay.
For example, an AR(2) process: $X_t = 0.7 X_{t-1} + 0.2 X_{t-2} + \varepsilon_t$ , the values of the coefficients 0.7 and 0.2 and their combination determine the fluctuation and memory pattern of the series.
1. Characteristics of the MA(q) Process
  - Allows a limited number of non-zero coefficients: Only a finite number (q) of coefficients $\theta_1, \theta_2, ..., \theta_q$ are allowed to be non-zero.
  - Flexible coefficient values: The values of these non-zero coefficients can be arbitrary (no decay constraints like in AR), making it very flexible in the short term.
  - Short memory: The autocorrelation function (ACF) of an MA process truncates abruptly (immediately becomes 0) after lag q. This is because when the lag $k > q$ , $x_t$ and $x_{t-k}$ are composed of entirely independent shocks ( $\varepsilon$ ).
For example, an MA(1) process: $X_t = \varepsilon_t + 0.8\varepsilon_{t-1}$ , its ACF truncates after lag 1. The advantage is its ability to flexibly capture any autocorrelation pattern in the short term; the disadvantage is its inability to describe long-term dependency relationships.

3. Parameter Redundancy Problem

Common factor problem: When constructing an ARMA model, its core polynomials are defined as:
- Autoregressive polynomial: $\phi(B) = 1 - \phi_1 B - \phi_2 B^2 - ... - \phi_p B^p$
- Moving average polynomial: $\theta(B) = 1 + \theta_1 B + \theta_2 B^2 + ... + \theta_q B^q$
Where $B$ is the lag operator ( $B X_t = X_{t-1}$ ).
Ensuring model uniqueness: The polynomials $\phi(z)$ and $\theta(z)$ must have no common factors. For example, if $\phi(B) = (1-0.5B)$ and $\theta(B) = (1-0.5B)$ , then they have a common factor $(1-0.5B)$ , and the model can be simplified to a white noise process, leading to non-unique parameter identification. Eliminating common factors ensures the resulting model is structurally unique, parameters are identifiable, and it is of the lowest possible order.

Model Selection and Python Implementation

How to Distinguish AR and MA?

Ask a question: What is driving this system?

If the answer is “some recent external events” (e.g., news from yesterday and today), then it is closer to an MA model.
If the answer is “the system’s own previous state” (e.g., the market continues to fall because it is already in a downward trend), then it is closer to an AR model.
If both are present, use an ARMA model.

Python Implementation Example (using statsmodels)

import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt

# 1. Generate simulated ARMA(1,1) data
np.random.seed(42)
n = 1000
phi = 0.5  # AR coefficient
theta = 0.5 # MA coefficient
w = np.random.normal(size=n)  # White noise

x = np.zeros(n)
for t in range(1, n):
    x[t] = phi * x[t-1] + w[t] + theta * w[t-1]

# Plot
plt.figure(figsize=(10, 6))
plt.plot(x)
plt.title('Simulated ARMA(1,1) Process')
plt.show()

# 2. Fit an ARMA(1,1) model
model = sm.tsa.ARMA(x, order=(1, 1))
fitted_model = model.fit()

# 3. Output model fitting summary
print(fitted_model.summary())
# Output example: AR(1) coefficient estimate approx. 0.47, MA(1) coefficient estimate approx. -0.28

Model Evaluation

Use AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion) to compare the goodness-of-fit of different models (lower values are better).
Test the residuals of the model; they should behave like white noise (no autocorrelation), otherwise it indicates that the model has not fully captured the information in the data.

Autoregressive Model (AR(p))

Definition and Form

Stationarity and Causality

Example: AR(1) Model

Moving Average Model (MA(q))

Definition and Form

Key Characteristics: Short Memory and ACF Truncation

Backshift Operator (B) Representation

Step 1: Core Tool – Backshift Operator (B)

Step 2: Rewriting the MA(q) Model Using the Backshift Operator B

Step 3: Extracting the Common Factor wtw_twt​

Step 4: Defining the Moving Average Polynomial θ(B)\theta(B)θ(B)

Invertibility

Autoregressive Moving Average Model (ARMA(p, q))

Definition and Form

Advantages:

Stationarity, Causality, and Invertibility

Parameter Redundancy

3. Parameter Redundancy Problem

Model Selection and Python Implementation

How to Distinguish AR and MA?

Python Implementation Example (using statsmodels)

Model Evaluation

Step 3: Extracting the Common Factor $w_t$

Step 4: Defining the Moving Average Polynomial $\theta(B)$