#sdsc6012

English / 中文

Fundamentals of Time Series Theory

Definition and Properties of Time Series

Time series is a sequence of random variables arranged in chronological order, denoted as $\{X_t: t \in T\}$ , where $T$ is the time index set. In practical applications, $T$ is typically a discrete set (e.g., $T = \{0, 1, 2, \ldots\}$ ).

Core Concept: Time series analysis aims to reveal internal dynamic dependencies within the sequence and build predictive models based on historical data.

Example Data Table

import pandas as pd

data = {
    'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', 
             '2023-01-06', '2023-01-07', '2023-01-08', '2023-01-09', '2023-01-10'],
    'Temperature': [22.5, 24.1, 23.8, 21.2, 20.5, 19.8, 22.3, 23.7, 24.5, 25.2],
    'Humidity': [65, 62, 68, 72, 75, 78, 70, 66, 63, 60],
    'Sales': [150, 168, 142, 135, 158, 172, 165, 148, 156, 162],
    'Stock_Price': [105.2, 106.8, 104.5, 103.1, 107.3, 109.6, 108.2, 106.7, 107.9, 110.4]
}

df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

Description: Time series data includes timestamps and multiple observed variables, suitable for multivariate time series analysis.

Stationarity: Strict Definition and Classification

Strictly Stationary Process

A time series $\{X_t\}$ is strictly stationary if for any finite-dimensional distribution function and any time shift $k$ , it satisfies:

$F_{X_{t_1}, X_{t_2}, \ldots, X_{t_n}}(x_1, x_2, \ldots, x_n) = F_{X_{t_1+k}, X_{t_2+k}, \ldots, X_{t_n+k}}(x_1, x_2, \ldots, x_n)$

where $F$ is the joint distribution function and $n$ is any positive integer.

Weakly Stationary Process

In practical applications, weak stationarity is more commonly used, requiring three conditions:

Constant mean function:

$\mathbb{E}[X_t] = \mu \quad \text{for all } t$
Constant variance function:

$\text{Var}(X_t) = \sigma^2 \quad \text{for all } t$
Autocovariance function depends only on time lag:

$\text{Cov}(X_t, X_s) = \gamma(|t-s|) \quad \text{for all } t, s$

Important Note: Strict stationarity implies weak stationarity, but the converse is not true unless the process follows a multivariate normal distribution.

Stationarity Testing Methodology

Graphical Testing Methods

Time Series Plot Analysis

Plot the time series with a mean line to visually identify:

Trend components
Seasonality
Heteroscedasticity

import matplotlib.pyplot as plt

fig, axes = plt.subplots(2, 2, figsize=(12, 8))
variables = ['Temperature', 'Humidity', 'Sales', 'Stock_Price']

for i, var in enumerate(variables):
    row, col = i // 2, i % 2
    axes[row, col].plot(df.index, df[var], marker='o', linewidth=2)
    axes[row, col].axhline(y=df[var].mean(), color='r', linestyle='--', 
                          label=f'Mean ({df[var].mean():.1f})')
    axes[row, col].set_title(f'{var} - Stationarity Analysis')
    axes[row, col].legend()
    axes[row, col].grid(alpha=0.3)

Autocorrelation Function Plot

Plot the sample autocorrelation function (SACF). For stationary series, SACF decays rapidly to near zero.

Statistical Test: Augmented Dickey-Fuller Test

Test Principle

The ADF test estimates the following regression model:

$\Delta X_t = \alpha + \beta t + \gamma X_{t-1} + \sum_{i=1}^{p} \phi_i \Delta X_{t-i} + \varepsilon_t$

where $\Delta X_t = X_t - X_{t-1}$ is the first-difference operator.

Hypothesis Setting

Null hypothesis $H_0: \gamma = 0$ (series is non-stationary)
Alternative hypothesis $H_1: \gamma < 0$ (series is stationary)

Decision Criterion

If the test statistic is less than the critical value (or p-value < significance level, e.g., 0.05), reject the null hypothesis, indicating stationarity.

from statsmodels.tsa.stattools import adfuller

print("Augmented Dickey-Fuller Test Results:")
print("=" * 50)

for var in variables:
    result = adfuller(df[var])
    print(f"{var}:")
    print(f"  ADF Statistic: {result[0]:.4f}")
    print(f"  p-value: {result[1]:.4f}")
    
    if result[1] < 0.05:
        print("  -> Series is likely STATIONARY (reject null hypothesis)")
    else:
        print("  -> Series is likely NON-STATIONARY (cannot reject null hypothesis)")
    print("-" * 30)

Note: The ADF test is sensitive to the choice of lag order $p$ , typically determined using AIC or BIC criteria.

Gaussian White Noise Process: Ideal Stationary Series

Gaussian white noise $\{\varepsilon_t\}$ is a fundamental process in time series analysis, defined by:

$\mathbb{E}[\varepsilon_t] = 0$ (zero mean)
$\text{Var}(\varepsilon_t) = \sigma^2$ (constant variance)
$\text{Cov}(\varepsilon_t, \varepsilon_s) = 0$ for $t \neq s$ (no autocorrelation)

Mathematical Expression: $\varepsilon_t \sim \text{IID } \mathcal{N}(0, \sigma^2)$ , where IID denotes independent and identically distributed.

Mathematical Definition

$X_t \sim \mathcal{N}(0, 1) \quad \text{for all } t$

Properties

Mean: $\mathbb{E}[X_t] = 0$
Variance: $\text{Var}(X_t) = 1$
Autocovariance function:

$\gamma(k) = \begin{cases} 1 & \text{if } k = 0 \\ 0 & \text{if } k \neq 0 \end{cases}$

Generation and Verification

import numpy as np

# Generate Gaussian white noise
np.random.seed(42)
n_points = 500
white_noise = np.random.normal(0, 1, n_points)

# Verify statistical properties
print(f"Overall Mean: {white_noise.mean():.4f}")
print(f"Overall Standard Deviation: {white_noise.std():.4f}")
print(f"Variance: {white_noise.var():.4f}")

# Autocorrelation test
from statsmodels.tsa.stattools import acf
autocorr = acf(white_noise, nlags=10)
print("\nAutocorrelation (lags 1-5):")
for i in range(1, 6):
    print(f"  Lag {i}: {autocorr[i]:.4f}")

Autocovariance and Autocorrelation Functions

Autocovariance Function

For a weakly stationary process, the autocovariance function is defined as:

$\gamma(k) = \text{Cov}(X_t, X_{t+k}) = \mathbb{E}[(X_t - \mu)(X_{t+k} - \mu)]$

where $k$ is the lag order.

Autocorrelation Function

The standardized autocovariance function gives the autocorrelation function:

$\rho(k) = \frac{\gamma(k)}{\gamma(0)} = \frac{\gamma(k)}{\sigma^2}$

The autocorrelation function satisfies: $\rho(0) = 1$ , $\rho(k) = \rho(-k)$ , and $|\rho(k)| \leq 1$ .

Autocovariance of White Noise

For a white noise process:

$\gamma(k) = \begin{cases} \sigma^2 = 1 & \text{if } k = 0 \\ 0 & \text{if } k \neq 0 \end{cases}$

Explanation: White noise has no autocorrelation at any non-zero lag, making it a typical stationary process.

Stationarization Methods for Non-Stationary Series

Differencing

First-Order Differencing

$\nabla X_t = X_t - X_{t-1}$

Second-Order Differencing

$\nabla^2 X_t = \nabla(\nabla X_t) = (X_t - X_{t-1}) - (X_{t-1} - X_{t-2}) = X_t - 2X_{t-1} + X_{t-2}$

Seasonal Differencing

For seasonal series with period $s$ :

$\nabla_s X_t = X_t - X_{t-s}$

Application Principle: Differencing order should generally not exceed 2, as over-differencing increases variance and reduces interpretability.

Differencing Implementation

# First-order differencing
df['Temperature_Diff'] = df['Temperature'].diff()

# Visualize differencing results
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))

# Original series
ax1.plot(df.index, df['Temperature'], marker='o', color='red', linewidth=2)
ax1.set_title('Original Temperature Series')
ax1.set_ylabel('Temperature (°C)')
ax1.grid(alpha=0.3)

# Differenced series
ax2.plot(df.index[1:], df['Temperature_Diff'][1:], marker='s', color='blue', linewidth=2)
ax2.axhline(y=0, color='black', linestyle='-', alpha=0.3)
ax2.set_title('First Difference Series (ΔTemperature = Temperature_t - Temperature_{t-1})')
ax2.set_ylabel('Temperature Difference (°C)')
ax2.grid(alpha=0.3)

plt.tight_layout()
plt.show()

# Differencing statistics
diff_stats = f"Mean: {df['Temperature_Diff'].mean():.2f}°C\n"
diff_stats += f"Std Dev: {df['Temperature_Diff'].std():.2f}°C\n"
diff_stats += f"Min: {df['Temperature_Diff'].min():.2f}°C\n"
diff_stats += f"Max: {df['Temperature_Diff'].max():.2f}°C"

Differencing Calculation Example

print("Temperature Difference Calculation:")
print("=" * 40)
for i in range(1, len(df)):
    date_str = df.index[i].strftime('%Y-%m-%d')
    temp_diff = df['Temperature'].iloc[i] - df['Temperature'].iloc[i-1]
    print(f"Δ({date_str}) = {df['Temperature'].iloc[i]:.1f} - {df['Temperature'].iloc[i-1]:.1f} = {temp_diff:.1f}°C")

Transformation Methods

Logarithmic Transformation

$Y_t = \log(X_t)$

Applicable for exponential trends and increasing variance over time.

Box-Cox Transformation

$Y_t = \begin{cases} \frac{X_t^\lambda - 1}{\lambda} & \text{if } \lambda \neq 0 \\ \log(X_t) & \text{if } \lambda = 0 \end{cases}$

Optimizes parameter $\lambda$ to handle non-stationarity and heteroscedasticity.

Theoretical Framework for Data Preprocessing

Data Quality Dimensions

According to data quality management theory, data quality is evaluated through multidimensional metrics:

Accuracy: Degree of consistency between data and the real entities it describes.
Completeness: Extent to which required data is fully recorded, measured by:

$\text{Missing Rate} = \frac{\text{Number of Missing Values}}{\text{Total Data Points}} \times 100\%$
Consistency: Uniform representation of data across different sources.
Timeliness: Proximity of data updates to the current time.
Believability: Trustworthiness of data sources and values.
Interpretability: Ease of understanding and using the data.

Classification of Missing Data Mechanisms

Types of Missing Mechanisms

Missing Completely at Random (MCAR): Missingness is independent of both observed and unobserved values.
Missing at Random (MAR): Missingness depends only on observed values, not unobserved ones.
Missing Not at Random (MNAR): Missingness depends on unobserved values.

Handling Method Selection Criteria

Missing Mechanism	Recommended Handling Methods
MCAR	Direct deletion, mean imputation
MAR	Regression imputation, multiple imputation
MNAR	Model-based methods, selection models

Missing Value Handling Implementation

# Detect missing values
print("Missing Values Analysis:")
print("=" * 30)
print(df.isnull().sum())

# Handling methods
# 1. Delete missing values
df_drop = df.dropna()

# 2. Impute missing values
df_fill_mean = df.fillna(df.mean())   # Mean imputation
df_fill_forward = df.fillna(method='ffill')   # Forward filling

# 3. Interpolation
df_interpolate = df.interpolate()

Theoretical Basis for Noise Data Handling

Noise Statistical Model

Assume observed data $Y_t$ consists of true signal $f(t)$ and noise $\varepsilon_t$ :

$Y_t = f(t) + \varepsilon_t$

where $\varepsilon_t \sim \mathcal{N}(0, \sigma^2)$ .

Smoothing Techniques Mathematical Principles

Moving Average Method:

$\hat{f}(t) = \frac{1}{2k+1} \sum_{i=-k}^{k} Y_{t+i}$

Exponential Smoothing Method:

$\hat{f}(t) = \alpha Y_t + (1-\alpha)\hat{f}(t-1)$

where $\alpha \in (0,1)$ is the smoothing parameter.

Noise Data Handling Implementation

# Binning smoothing
def binning_smooth(data, bin_size=3, method='mean'):
    smoothed = []
    for i in range(0, len(data), bin_size):
        bin_data = data[i:i+bin_size]
        if method == 'mean':
            smoothed.extend([bin_data.mean()] * len(bin_data))
        elif method == 'median':
            smoothed.extend([bin_data.median()] * len(bin_data))
    return smoothed

# Apply binning smoothing
df['Temperature_Smooth'] = binning_smooth(df['Temperature'].values)

Data Integration and Correlation Analysis

Statistical Correlation Theory

Pearson Correlation Coefficient

Population correlation coefficient:

$\rho_{X,Y} = \frac{\text{Cov}(X,Y)}{\sigma_X \sigma_Y} = \frac{\mathbb{E}[(X-\mu_X)(Y-\mu_Y)]}{\sigma_X \sigma_Y}$

Sample correlation coefficient:

$r_{xy} = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^n (x_i - \bar{x})^2} \sqrt{\sum_{i=1}^n (y_i - \bar{y})^2}}$

Correlation Test

Test statistic:

$t = \frac{r\sqrt{n-2}}{\sqrt{1-r^2}} \sim t(n-2)$

where $H_0: \rho = 0$ , $H_1: \rho \neq 0$ .

Correlation Coefficient Calculation Implementation

# Pearson correlation coefficient
corr_matrix = df.corr()
print("Correlation Matrix:")
print("=" * 30)
print(corr_matrix)

# Visualize correlation matrix
import seaborn as sns
plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', center=0)
plt.title('Feature Correlation Matrix')
plt.show()

Chi-Square Test and Contingency Table Analysis

For independence testing of two categorical variables:

Expected Frequency Calculation

$E_{ij} = \frac{(\text{row}_i \text{ total}) \times (\text{column}_j \text{ total})}{\text{grand total}}$

Chi-Square Statistic

$\chi^2 = \sum_{i=1}^r \sum_{j=1}^c \frac{(O_{ij} - E_{ij})^2}{E_{ij}} \sim \chi^2((r-1)(c-1))$

Application Note: Use Fisher’s exact test when >20% of cells have expected frequencies <5.

Feature Engineering and Dimensionality Reduction

Mathematical Basis of Principal Component Analysis (PCA)

Problem Formulation

Given centered data matrix $X$ ( $n \times p$ ), find projection direction $w$ to maximize variance:

$\max_{w} w^T \Sigma w \quad \text{s.t.} \quad w^T w = 1$

where $\Sigma = \frac{1}{n} X^T X$ is the sample covariance matrix.

Eigenvalue Decomposition Solution

$\Sigma v_i = \lambda_i v_i, \quad i=1,2,\ldots,p$

where $\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_p \geq 0$ are eigenvalues, $v_i$ are corresponding eigenvectors.

Variance Explained Ratio

The variance explained by the $k$ -th principal component is:

$\frac{\lambda_k}{\sum_{i=1}^p \lambda_i}$

Feature Selection Theory

Filter Methods

Evaluate feature importance using statistics (e.g., correlation coefficient, chi-square statistic, mutual information).

Wrapper Methods

Select optimal feature subsets via subset search and cross-validation. Common algorithms:

Forward Selection
Backward Elimination
Recursive Feature Elimination (RFE)

Embedded Methods

Automatically perform feature selection during model training, e.g.:

Lasso Regression: $L_1$ regularization induces sparsity
Decision Trees: Feature importance scoring

Feature Engineering Implementation

Polynomial Features

from sklearn.preprocessing import PolynomialFeatures

# Create polynomial features
poly = PolynomialFeatures(degree=2, include_bias=False)
data_poly = poly.fit_transform(df[['Temperature', 'Humidity']])
feature_names = poly.get_feature_names_out(['Temperature', 'Humidity'])

# Combine features
df_poly = pd.DataFrame(data_poly, columns=feature_names)
df_extended = pd.concat([df, df_poly], axis=1)

Statistical Feature Creation

# Create new statistical features
df['RM_LSTAT'] = df['RM'] * df['LSTAT']   # Number of rooms × Low-income population ratio
df['RM_PTRATIO'] = df['RM'] / df['PTRATIO']   # Number of rooms ÷ Pupil-teacher ratio
df['RM_TAX'] = df['RM'] / df['TAX']   # Number of rooms ÷ Property tax

# Calculate correlation with target variable
target_corrs = df.corr()['target'].abs().sort_values(ascending=False)
selected_features = target_corrs[target_corrs >= 0.5].index.tolist()

Data Transformation and Normalization

Normalization Methods

Min-Max Normalization

$v' = \frac{v - \min_A}{\max_A - \min_A} \times (\text{new\_max}_A - \text{new\_min}_A) + \text{new\_min}_A$

Z-Score Normalization

$v' = \frac{v - \mu_A}{\sigma_A}$

Decimal Scaling Normalization

$v' = \frac{v}{10^j}$

where $j$ is the smallest integer such that $\max(|v'|) < 1$ .

Normalization Implementation

from sklearn.preprocessing import MinMaxScaler, StandardScaler

# Min-Max normalization
minmax_scaler = MinMaxScaler()
df_minmax = minmax_scaler.fit_transform(df[['Temperature', 'Humidity']])

# Z-Score normalization
std_scaler = StandardScaler()
df_std = std_scaler.fit_transform(df[['Temperature', 'Humidity']])

Data Discretization and Concept Hierarchy

Discretization Algorithm Classification

Unsupervised Discretization

Equal-Width Binning: Fixed interval width

$\text{bin width} = \frac{\max - \min}{N}$
Equal-Frequency Binning: Each bin contains approximately equal samples

Supervised Discretization

Entropy-Based Discretization (e.g., ID3 algorithm)
ChiMerge Algorithm: Bottom-up merging based on chi-square statistic

Concept Hierarchy Generation Methods

Statistical-Based Methods

Automatically generate hierarchies based on distinct attribute values, with attributes having more distinct values at lower levels.

Domain Knowledge-Based Methods

Define hierarchies explicitly using domain expertise, e.g.:

Geographic hierarchy: Street < City < State < Country
Temporal hierarchy: Day < Month < Quarter < Year

Discretization Implementation

# Equal-width discretization
df['Temp_Binned'] = pd.cut(df['Temperature'], bins=5, labels=['Low', 'Medium-Low', 
                                                             'Medium', 'Medium-High', 'High'])

# Equal-depth discretization
df['Humidity_Binned'] = pd.qcut(df['Humidity'], q=4, labels=['Q1', 'Q2', 'Q3', 'Q4'])

# Cluster-based discretization
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3)
df['Sales_Cluster'] = kmeans.fit_predict(df[['Sales']])

Summary

Key Points in Time Series Analysis

Stationarity testing is a prerequisite for time series analysis.
Non-stationary series can be transformed to stationary via methods like differencing.
Autocorrelation analysis helps understand internal series structure.

Critical Steps in Data Preprocessing

Data Cleaning: Handle missing values, noise, and outliers.
Data Integration: Resolve consistency issues in multi-source data.
Data Reduction: Improve efficiency through feature selection and data compression.
Data Transformation: Enhance model performance via normalization and discretization.

Best Practice Recommendations

Always start analysis with data exploration and visualization.
Choose preprocessing methods based on data characteristics.
Iteratively optimize feature engineering strategies.
Validate preprocessing impact on final models.