#sdsc6012

English / 中文

Autoregressive Model (AR(p))

Definition and Form

A p-th order autoregressive model, denoted as AR(p), has the form:

xt=ϕ1xt1+ϕ2xt2++ϕpxtp+wtx_t = \phi_1 x_{t-1} + \phi_2 x_{t-2} + \cdots + \phi_p x_{t-p} + w_t

where:

  • wtwn(0,σw2)w_t \sim \text{wn}(0, \sigma_w^2) is white noise with mean 0 and variance σw2\sigma_w^2.

  • ϕ1,ϕ2,,ϕp\phi_1, \phi_2, \ldots, \phi_p (with ϕp0\phi_p \neq 0) are the autoregressive coefficients.

Intuitive understanding: The current value xtx_t is a linear combination of its own past p historical values, plus a random disturbance. It captures the “inertia” or “memory” of the series, meaning the influence of the series’ own history on its current state.

Stationarity and Causality

  • Stationarity condition: The necessary and sufficient condition for an AR(p) process to be stationary is that all roots of its characteristic equation lie outside the unit circle in the complex plane.

    • The characteristic equation is defined as: ϕ(z)=1ϕ1zϕ2z2ϕpzp=0\phi(z) = 1 - \phi_1 z - \phi_2 z^2 - \cdots - \phi_p z^p = 0
    • If the modulus of the roots is 1\leq 1, the process variance may grow infinitely (explode) or exhibit trends (like a random walk), violating stationarity.
  • Causality: A causal process means that the current value xtx_t depends only on the current and past white noise wt,wt1,...w_t, w_{t-1}, ..., not on future white noise. For AR models, stationarity usually implies causality.

Example: AR(1) Model

xt=ϕxt1+wtx_t = \phi x_{t-1} + w_t

  • Stationarity/causality condition: ϕ<1|\phi| < 1

  • Characteristic equation: 1ϕz=01 - \phi z = 0, root is z=1/ϕz = 1/\phi, requiring 1/ϕ>1|1/\phi| > 1, i.e., ϕ<1|\phi| < 1.

Moving Average Model (MA(q))

Definition and Form

A q-th order moving average model, denoted as MA(q), has the form:

xt=wt+θ1wt1+θ2wt2++θqwtqx_t = w_t + \theta_1 w_{t-1} + \theta_2 w_{t-2} + \cdots + \theta_q w_{t-q}

where:

  • wt,wt1,...,wtqw_t, w_{t-1}, ..., w_{t-q} are independent and identically distributed white noise sequences (wtwn(0,σw2)w_t \sim \text{wn}(0, \sigma_w^2)).

  • θ1,θ2,,θq\theta_1, \theta_2, \ldots, \theta_q (with θq0\theta_q \neq 0) are the moving average coefficients.

Intuitive understanding: The current value xtx_t is a linear combination of the current and past q random “shocks” or “innovations”. It captures the short-term impact of external transient events on the series. For example, θ1\theta_1 measures the influence of the previous period’s shock wt1w_{t-1} on the current value xtx_t.

Key Characteristics: Short Memory and ACF Truncation

  • Short memory: The MA(q) model has only q periods of memory. For k>qk > q, xtx_t and xtkx_{t-k} are composed of entirely independent shocks, and its autocorrelation function (ACF) truncates strictly after lag q (becomes 0).

  • Flexibility: Within q periods, the ACF can exhibit any pattern, but its memory cannot describe long-term dependencies.

Backshift Operator (B) Representation

The backshift operator BB is defined as: Bxt=xt1B \cdot x_t = x_{t-1}, Bwt=wt1B \cdot w_t = w_{t-1}, and Bkwt=wtkB^k \cdot w_t = w_{t-k}.
Using BB, the MA(q) model can be concisely expressed as:

xt=θ(B)wtx_t = \theta(B) w_t

where θ(B)\theta(B) is the moving average polynomial:

θ(B)=1+θ1B+θ2B2++θqBq\theta(B) = 1 + \theta_1 B + \theta_2 B^2 + \cdots + \theta_q B^q

Proof:

Step 1: Core Tool – Backshift Operator (B)

The backshift operator, denoted as B, is a key operator in time series analysis.

  • B acts on a time series variable.

  • B acting on the value at time t yields the value at the previous time t-1.

Mathematical definition:

Bxt=xt1B \cdot x_t = x_{t-1}

Bwt=wt1B \cdot w_t = w_{t-1}

By extension, higher powers of B represent multiple backshifts:

  • B2B^2 means applying the backshift operation twice:

B2wt=B(Bwt)=Bwt1=wt2B^2 \cdot w_t = B \cdot (B \cdot w_t) = B \cdot w_{t-1} = w_{t-2}

  • BqB^q means applying the backshift operation q times:

Bqwt=wtqB^q \cdot w_t = w_{t-q}

The essence of the backshift operator is a transformer of time indices, which systematically shifts the entire series backward on the time axis. It is the foundation for building time series models like ARIMA.

Step 2: Rewriting the MA(q) Model Using the Backshift Operator B

The original equation of the MA(q) model is:

xt=wt+θ1wt1+θ2wt2++θqwtqx_t = w_t + \theta_1 w_{t-1} + \theta_2 w_{t-2} + \ldots + \theta_q w_{t-q}

Rewrite each term using the backshift operator B:

  • wtw_t remains unchanged, can be regarded as B0wtB^0 \cdot w_t (where B0=1B^0 = 1, the identity operator)

  • θ1wt1\theta_1 w_{t-1} becomes θ1(Bwt)\theta_1 \cdot (B \cdot w_t)

  • θ2wt2\theta_2 w_{t-2} becomes θ2(B2wt)\theta_2 \cdot (B^2 \cdot w_t)

  • θqwtq\theta_q w_{t-q} becomes θq(Bqwt)\theta_q \cdot (B^q \cdot w_t)

Substituting these into the original model gives:

xt=wt+θ1Bwt+θ2B2wt++θqBqwtx_t = w_t + \theta_1 B w_t + \theta_2 B^2 w_t + \ldots + \theta_q B^q w_t

This step completes the transition from an intuitive time-lag representation to a compact operator representation, preparing for subsequent factorization and polynomial definition.

Step 3: Extracting the Common Factor wtw_t

Observe that every term on the right-hand side contains the common factor wtw_t, extract it:

xt=(1+θ1B+θ2B2++θqBq)wtx_t = \left(1 + \theta_1 B + \theta_2 B^2 + \ldots + \theta_q B^q \right) w_t

Extracting the common factor transforms the additive structure of the model into a multiplicative structure, revealing that the core of the model is a linear filtering process applied to the white noise sequence {wt}\{w_t\}.

Step 4: Defining the Moving Average Polynomial θ(B)\theta(B)

Based on the form after extracting the common factor, we define a polynomial in the backshift operator B, called the moving average polynomial, denoted θ(B)\theta(B):

θ(B)=1+θ1B+θ2B2++θqBq\theta(B) = 1 + \theta_1 B + \theta_2 B^2 + \cdots + \theta_q B^q

Using this polynomial, the MA(q) model can be expressed very concisely as:

xt=θ(B)wtx_t = \theta(B) w_t

The moving average polynomial θ(B)\theta(B) completely characterizes the structure and properties of the MA model. The model’s order q, the coefficients θi\theta_i of each moving average term, and the model’s stationarity and invertibility conditions are all encapsulated in this polynomial. It is a key bridge connecting time series model theory with operator theory.

Invertibility

  • Definition: An MA process is invertible if its white noise sequence wtw_t can be expressed as a linear combination of current and past observed values xt,xt1,...x_t, x_{t-1}, ... (i.e., wt=j=0πjxtjw_t = \sum_{j=0}^{\infty} \pi_j x_{t-j}), and the coefficients are absolutely summable.

  • Condition (using MA(1) as an example): The MA(1) process xt=wt+θwt1x_t = w_t + \theta w_{t-1} is invertible if and only if θ<1|\theta| < 1.

    • This is equivalent to the root of its moving average polynomial θ(z)=1+θz\theta(z) = 1 + \theta z, which is z=1/θz = -1/\theta, having a modulus greater than 1 (z>1|z| > 1).
  • Importance: Invertibility ensures the uniqueness and identifiability of the model parameters, and facilitates transforming the model into an AR(\infty) form for forecasting and understanding.

Autoregressive Moving Average Model (ARMA(p, q))

Definition and Form

The ARMA(p, q) model combines the AR and MA parts, and has the form:

xt=ϕ1xt1+...+ϕpxtp+wt+θ1wt1+...+θqwtqx_t = \phi_1 x_{t-1} + ... + \phi_p x_{t-p} + w_t + \theta_1 w_{t-1} + ... + \theta_q w_{t-q}

Or, using the backshift operator BB, it can be expressed as:

ϕ(B)xt=θ(B)wt\phi(B) x_t = \theta(B) w_t

where:

  • ϕ(B)=1ϕ1Bϕ2B2...ϕpBp\phi(B) = 1 - \phi_1 B - \phi_2 B^2 - ... - \phi_p B^p is the autoregressive polynomial

  • θ(B)=1+θ1B+θ2B2+...+θqBq\theta(B) = 1 + \theta_1 B + \theta_2 B^2 + ... + \theta_q B^q is the moving average polynomial

Intuitive understanding: The current value xtx_t is influenced by two aspects simultaneously:

  1. AR part (inertia): A linear combination of the series’ own past states (xt1,...,xtpx_{t-1}, ..., x_{t-p}), capturing long-term trends and periodic patterns.
  2. MA part (shocks): A linear combination of current and past random shocks (wt,wt1,...,wtqw_t, w_{t-1}, ..., w_{t-q}), capturing the short-term impact of external events.
    It is like a sound in a room with reverberation (echo), which includes both the original sound (MA) and the echoes (AR).

Advantages:

The ARMA model, with relatively few parameters (p+qp + q), combines the flexibility of AR models in capturing long-term dependencies and MA models in capturing short-term patterns, allowing for a more concise (parameter-efficient) description of complex stationary time series. According to the Wold decomposition theorem, ARMA models can approximate any stationary process with arbitrary accuracy.

Stationarity, Causality, and Invertibility

  • Stationarity depends on the AR part: requires that the roots of ϕ(B)=0\phi(B) = 0 lie outside the unit circle.

  • Causality also depends on the AR part: usually satisfied under stationarity conditions.

  • Invertibility depends on the MA part: requires that the roots of θ(B)=0\theta(B) = 0 lie outside the unit circle.

Parameter Redundancy

  • If the AR polynomial ϕ(B)\phi(B) and the MA polynomial θ(B)\theta(B) have a common factor, the model parameters cannot be uniquely identified, and the model can be simplified to a lower-order ARMA model.

  • Therefore, a “good” ARMA model requires that ϕ(B)\phi(B) and θ(B)\theta(B) have no common factors, to ensure the model structure is unique and of the lowest order.

Detailed Explanation
  1. Core Properties of the Model

  • Universal approximation capability: For any stationary process with autocovariance function γ\gamma, and for any k>0k>0, there exists an ARMA process {xt}\{x_t\} that can approximate it infinitely closely. This means that ARMA models can, in theory, approximate any other stationary process with arbitrary precision.

  1. Intuitive Understanding of Model Components

    1. Characteristics of the AR(p) Process
      • Allows many non-zero coefficients: The current value xtx_t can depend on values from the distant past (xt1x_{t-1}, xt2x_{t-2}, …, xtpx_{t-p}).
      • Coefficients have decay constraints: Although there can be many non-zero coefficients ϕ1,ϕ2,...,ϕp\phi_1, \phi_2, ..., \phi_p, the values of these coefficients cannot be arbitrary. To ensure stationarity, the sequence formed by these coefficients must satisfy specific mathematical constraints (usually achieved by having characteristic roots inside the unit circle).
      • Decay pattern: This constraint causes the influence of distant past values to exhibit a specific decay pattern as the lag order increases, typically exponential decay or sinusoidal decay.

    For example, an AR(2) process: Xt=0.7Xt1+0.2Xt2+εtX_t = 0.7 X_{t-1} + 0.2 X_{t-2} + \varepsilon_t, the values of the coefficients 0.7 and 0.2 and their combination determine the fluctuation and memory pattern of the series.

    1. Characteristics of the MA(q) Process
      • Allows a limited number of non-zero coefficients: Only a finite number (q) of coefficients θ1,θ2,...,θq\theta_1, \theta_2, ..., \theta_q are allowed to be non-zero.
      • Flexible coefficient values: The values of these non-zero coefficients can be arbitrary (no decay constraints like in AR), making it very flexible in the short term.
      • Short memory: The autocorrelation function (ACF) of an MA process truncates abruptly (immediately becomes 0) after lag q. This is because when the lag k>qk > q, xtx_t and xtkx_{t-k} are composed of entirely independent shocks (ε\varepsilon).

    For example, an MA(1) process: Xt=εt+0.8εt1X_t = \varepsilon_t + 0.8\varepsilon_{t-1}, its ACF truncates after lag 1. The advantage is its ability to flexibly capture any autocorrelation pattern in the short term; the disadvantage is its inability to describe long-term dependency relationships.

3. Parameter Redundancy Problem

  • Common factor problem: When constructing an ARMA model, its core polynomials are defined as:

    • Autoregressive polynomial: ϕ(B)=1ϕ1Bϕ2B2...ϕpBp\phi(B) = 1 - \phi_1 B - \phi_2 B^2 - ... - \phi_p B^p
    • Moving average polynomial: θ(B)=1+θ1B+θ2B2+...+θqBq\theta(B) = 1 + \theta_1 B + \theta_2 B^2 + ... + \theta_q B^q

    Where BB is the lag operator (BXt=Xt1B X_t = X_{t-1}).

  • Ensuring model uniqueness: The polynomials ϕ(z)\phi(z) and θ(z)\theta(z) must have no common factors. For example, if ϕ(B)=(10.5B)\phi(B) = (1-0.5B) and θ(B)=(10.5B)\theta(B) = (1-0.5B), then they have a common factor (10.5B)(1-0.5B), and the model can be simplified to a white noise process, leading to non-unique parameter identification. Eliminating common factors ensures the resulting model is structurally unique, parameters are identifiable, and it is of the lowest possible order.

Model Selection and Python Implementation

How to Distinguish AR and MA?

Ask a question: What is driving this system?

  • If the answer is “some recent external events” (e.g., news from yesterday and today), then it is closer to an MA model.

  • If the answer is “the system’s own previous state” (e.g., the market continues to fall because it is already in a downward trend), then it is closer to an AR model.

  • If both are present, use an ARMA model.

Python Implementation Example (using statsmodels)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt

# 1. Generate simulated ARMA(1,1) data
np.random.seed(42)
n = 1000
phi = 0.5 # AR coefficient
theta = 0.5 # MA coefficient
w = np.random.normal(size=n) # White noise

x = np.zeros(n)
for t in range(1, n):
x[t] = phi * x[t-1] + w[t] + theta * w[t-1]

# Plot
plt.figure(figsize=(10, 6))
plt.plot(x)
plt.title('Simulated ARMA(1,1) Process')
plt.show()

# 2. Fit an ARMA(1,1) model
model = sm.tsa.ARMA(x, order=(1, 1))
fitted_model = model.fit()

# 3. Output model fitting summary
print(fitted_model.summary())
# Output example: AR(1) coefficient estimate approx. 0.47, MA(1) coefficient estimate approx. -0.28

Model Evaluation

  • Use AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion) to compare the goodness-of-fit of different models (lower values are better).

  • Test the residuals of the model; they should behave like white noise (no autocorrelation), otherwise it indicates that the model has not fully captured the information in the data.