#sdsc6012
English / 中文
Fundamentals of Time Series Theory
Definition and Properties of Time Series
Time series is a sequence of random variables arranged in chronological order, denoted as { X t : t ∈ T } \{X_t: t \in T\} { X t : t ∈ T } , where T T T is the time index set. In practical applications, T T T is typically a discrete set (e.g., T = { 0 , 1 , 2 , … } T = \{0, 1, 2, \ldots\} T = { 0 , 1 , 2 , … } ).
Core Concept : Time series analysis aims to reveal internal dynamic dependencies within the sequence and build predictive models based on historical data.
Example Data Table
1 2 3 4 5 6 7 8 9 10 11 12 13 14 import pandas as pddata = { 'Date' : ['2023-01-01' , '2023-01-02' , '2023-01-03' , '2023-01-04' , '2023-01-05' , '2023-01-06' , '2023-01-07' , '2023-01-08' , '2023-01-09' , '2023-01-10' ], 'Temperature' : [22.5 , 24.1 , 23.8 , 21.2 , 20.5 , 19.8 , 22.3 , 23.7 , 24.5 , 25.2 ], 'Humidity' : [65 , 62 , 68 , 72 , 75 , 78 , 70 , 66 , 63 , 60 ], 'Sales' : [150 , 168 , 142 , 135 , 158 , 172 , 165 , 148 , 156 , 162 ], 'Stock_Price' : [105.2 , 106.8 , 104.5 , 103.1 , 107.3 , 109.6 , 108.2 , 106.7 , 107.9 , 110.4 ] } df = pd.DataFrame(data) df['Date' ] = pd.to_datetime(df['Date' ]) df.set_index('Date' , inplace=True )
Description : Time series data includes timestamps and multiple observed variables, suitable for multivariate time series analysis.
Stationarity: Strict Definition and Classification
Strictly Stationary Process
A time series { X t } \{X_t\} { X t } is strictly stationary if for any finite-dimensional distribution function and any time shift k k k , it satisfies:
F X t 1 , X t 2 , … , X t n ( x 1 , x 2 , … , x n ) = F X t 1 + k , X t 2 + k , … , X t n + k ( x 1 , x 2 , … , x n ) F_{X_{t_1}, X_{t_2}, \ldots, X_{t_n}}(x_1, x_2, \ldots, x_n) = F_{X_{t_1+k}, X_{t_2+k}, \ldots, X_{t_n+k}}(x_1, x_2, \ldots, x_n)
F X t 1 , X t 2 , … , X t n ( x 1 , x 2 , … , x n ) = F X t 1 + k , X t 2 + k , … , X t n + k ( x 1 , x 2 , … , x n )
where F F F is the joint distribution function and n n n is any positive integer.
Weakly Stationary Process
In practical applications, weak stationarity is more commonly used, requiring three conditions:
Constant mean function :
E [ X t ] = μ for all t \mathbb{E}[X_t] = \mu \quad \text{for all } t
E [ X t ] = μ for all t
Constant variance function :
Var ( X t ) = σ 2 for all t \text{Var}(X_t) = \sigma^2 \quad \text{for all } t
Var ( X t ) = σ 2 for all t
Autocovariance function depends only on time lag :
Cov ( X t , X s ) = γ ( ∣ t − s ∣ ) for all t , s \text{Cov}(X_t, X_s) = \gamma(|t-s|) \quad \text{for all } t, s
Cov ( X t , X s ) = γ ( ∣ t − s ∣ ) for all t , s
Important Note : Strict stationarity implies weak stationarity, but the converse is not true unless the process follows a multivariate normal distribution.
Stationarity Testing Methodology
Graphical Testing Methods
Time Series Plot Analysis
Plot the time series with a mean line to visually identify:
Trend components
Seasonality
Heteroscedasticity
1 2 3 4 5 6 7 8 9 10 11 12 13 import matplotlib.pyplot as pltfig, axes = plt.subplots(2 , 2 , figsize=(12 , 8 )) variables = ['Temperature' , 'Humidity' , 'Sales' , 'Stock_Price' ] for i, var in enumerate (variables): row, col = i // 2 , i % 2 axes[row, col].plot(df.index, df[var], marker='o' , linewidth=2 ) axes[row, col].axhline(y=df[var].mean(), color='r' , linestyle='--' , label=f'Mean ({df[var].mean():.1 f} )' ) axes[row, col].set_title(f'{var} - Stationarity Analysis' ) axes[row, col].legend() axes[row, col].grid(alpha=0.3 )
Autocorrelation Function Plot
Plot the sample autocorrelation function (SACF). For stationary series, SACF decays rapidly to near zero.
Statistical Test: Augmented Dickey-Fuller Test
Test Principle
The ADF test estimates the following regression model:
Δ X t = α + β t + γ X t − 1 + ∑ i = 1 p ϕ i Δ X t − i + ε t \Delta X_t = \alpha + \beta t + \gamma X_{t-1} + \sum_{i=1}^{p} \phi_i \Delta X_{t-i} + \varepsilon_t
Δ X t = α + βt + γ X t − 1 + i = 1 ∑ p ϕ i Δ X t − i + ε t
where Δ X t = X t − X t − 1 \Delta X_t = X_t - X_{t-1} Δ X t = X t − X t − 1 is the first-difference operator.
Hypothesis Setting
Decision Criterion
If the test statistic is less than the critical value (or p-value < significance level, e.g., 0.05), reject the null hypothesis, indicating stationarity.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 from statsmodels.tsa.stattools import adfullerprint ("Augmented Dickey-Fuller Test Results:" )print ("=" * 50 )for var in variables: result = adfuller(df[var]) print (f"{var} :" ) print (f" ADF Statistic: {result[0 ]:.4 f} " ) print (f" p-value: {result[1 ]:.4 f} " ) if result[1 ] < 0.05 : print (" -> Series is likely STATIONARY (reject null hypothesis)" ) else : print (" -> Series is likely NON-STATIONARY (cannot reject null hypothesis)" ) print ("-" * 30 )
Note : The ADF test is sensitive to the choice of lag order p p p , typically determined using AIC or BIC criteria.
Gaussian White Noise Process: Ideal Stationary Series
Gaussian white noise { ε t } \{\varepsilon_t\} { ε t } is a fundamental process in time series analysis, defined by:
E [ ε t ] = 0 \mathbb{E}[\varepsilon_t] = 0 E [ ε t ] = 0 (zero mean)
Var ( ε t ) = σ 2 \text{Var}(\varepsilon_t) = \sigma^2 Var ( ε t ) = σ 2 (constant variance)
Cov ( ε t , ε s ) = 0 \text{Cov}(\varepsilon_t, \varepsilon_s) = 0 Cov ( ε t , ε s ) = 0 for t ≠ s t \neq s t = s (no autocorrelation)
Mathematical Expression : ε t ∼ IID N ( 0 , σ 2 ) \varepsilon_t \sim \text{IID } \mathcal{N}(0, \sigma^2) ε t ∼ IID N ( 0 , σ 2 ) , where IID denotes independent and identically distributed.
Mathematical Definition
X t ∼ N ( 0 , 1 ) for all t X_t \sim \mathcal{N}(0, 1) \quad \text{for all } t
X t ∼ N ( 0 , 1 ) for all t
Properties
Mean : E [ X t ] = 0 \mathbb{E}[X_t] = 0 E [ X t ] = 0
Variance : Var ( X t ) = 1 \text{Var}(X_t) = 1 Var ( X t ) = 1
Autocovariance function :
γ ( k ) = { 1 if k = 0 0 if k ≠ 0 \gamma(k) = \begin{cases}
1 & \text{if } k = 0 \\
0 & \text{if } k \neq 0
\end{cases} γ ( k ) = { 1 0 if k = 0 if k = 0
Generation and Verification
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 import numpy as npnp.random.seed(42 ) n_points = 500 white_noise = np.random.normal(0 , 1 , n_points) print (f"Overall Mean: {white_noise.mean():.4 f} " )print (f"Overall Standard Deviation: {white_noise.std():.4 f} " )print (f"Variance: {white_noise.var():.4 f} " )from statsmodels.tsa.stattools import acfautocorr = acf(white_noise, nlags=10 ) print ("\nAutocorrelation (lags 1-5):" )for i in range (1 , 6 ): print (f" Lag {i} : {autocorr[i]:.4 f} " )
Autocovariance and Autocorrelation Functions
Autocovariance Function
For a weakly stationary process, the autocovariance function is defined as:
γ ( k ) = Cov ( X t , X t + k ) = E [ ( X t − μ ) ( X t + k − μ ) ] \gamma(k) = \text{Cov}(X_t, X_{t+k}) = \mathbb{E}[(X_t - \mu)(X_{t+k} - \mu)]
γ ( k ) = Cov ( X t , X t + k ) = E [( X t − μ ) ( X t + k − μ )]
where k k k is the lag order.
Autocorrelation Function
The standardized autocovariance function gives the autocorrelation function:
ρ ( k ) = γ ( k ) γ ( 0 ) = γ ( k ) σ 2 \rho(k) = \frac{\gamma(k)}{\gamma(0)} = \frac{\gamma(k)}{\sigma^2}
ρ ( k ) = γ ( 0 ) γ ( k ) = σ 2 γ ( k )
The autocorrelation function satisfies: ρ ( 0 ) = 1 \rho(0) = 1 ρ ( 0 ) = 1 , ρ ( k ) = ρ ( − k ) \rho(k) = \rho(-k) ρ ( k ) = ρ ( − k ) , and ∣ ρ ( k ) ∣ ≤ 1 |\rho(k)| \leq 1 ∣ ρ ( k ) ∣ ≤ 1 .
Autocovariance of White Noise
For a white noise process:
γ ( k ) = { σ 2 = 1 if k = 0 0 if k ≠ 0 \gamma(k) = \begin{cases}
\sigma^2 = 1 & \text{if } k = 0 \\
0 & \text{if } k \neq 0
\end{cases} γ ( k ) = { σ 2 = 1 0 if k = 0 if k = 0
Explanation : White noise has no autocorrelation at any non-zero lag, making it a typical stationary process.
Stationarization Methods for Non-Stationary Series
Differencing
First-Order Differencing
∇ X t = X t − X t − 1 \nabla X_t = X_t - X_{t-1}
∇ X t = X t − X t − 1
Second-Order Differencing
∇ 2 X t = ∇ ( ∇ X t ) = ( X t − X t − 1 ) − ( X t − 1 − X t − 2 ) = X t − 2 X t − 1 + X t − 2 \nabla^2 X_t = \nabla(\nabla X_t) = (X_t - X_{t-1}) - (X_{t-1} - X_{t-2}) = X_t - 2X_{t-1} + X_{t-2}
∇ 2 X t = ∇ ( ∇ X t ) = ( X t − X t − 1 ) − ( X t − 1 − X t − 2 ) = X t − 2 X t − 1 + X t − 2
Seasonal Differencing
For seasonal series with period s s s :
∇ s X t = X t − X t − s \nabla_s X_t = X_t - X_{t-s}
∇ s X t = X t − X t − s
Application Principle : Differencing order should generally not exceed 2, as over-differencing increases variance and reduces interpretability.
Differencing Implementation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 df['Temperature_Diff' ] = df['Temperature' ].diff() fig, (ax1, ax2) = plt.subplots(2 , 1 , figsize=(12 , 8 )) ax1.plot(df.index, df['Temperature' ], marker='o' , color='red' , linewidth=2 ) ax1.set_title('Original Temperature Series' ) ax1.set_ylabel('Temperature (°C)' ) ax1.grid(alpha=0.3 ) ax2.plot(df.index[1 :], df['Temperature_Diff' ][1 :], marker='s' , color='blue' , linewidth=2 ) ax2.axhline(y=0 , color='black' , linestyle='-' , alpha=0.3 ) ax2.set_title('First Difference Series (ΔTemperature = Temperature_t - Temperature_{t-1})' ) ax2.set_ylabel('Temperature Difference (°C)' ) ax2.grid(alpha=0.3 ) plt.tight_layout() plt.show() diff_stats = f"Mean: {df['Temperature_Diff' ].mean():.2 f} °C\n" diff_stats += f"Std Dev: {df['Temperature_Diff' ].std():.2 f} °C\n" diff_stats += f"Min: {df['Temperature_Diff' ].min ():.2 f} °C\n" diff_stats += f"Max: {df['Temperature_Diff' ].max ():.2 f} °C"
Differencing Calculation Example
1 2 3 4 5 6 print ("Temperature Difference Calculation:" )print ("=" * 40 )for i in range (1 , len (df)): date_str = df.index[i].strftime('%Y-%m-%d' ) temp_diff = df['Temperature' ].iloc[i] - df['Temperature' ].iloc[i-1 ] print (f"Δ({date_str} ) = {df['Temperature' ].iloc[i]:.1 f} - {df['Temperature' ].iloc[i-1 ]:.1 f} = {temp_diff:.1 f} °C" )
Y t = log ( X t ) Y_t = \log(X_t)
Y t = log ( X t )
Applicable for exponential trends and increasing variance over time.
Y t = { X t λ − 1 λ if λ ≠ 0 log ( X t ) if λ = 0 Y_t = \begin{cases}
\frac{X_t^\lambda - 1}{\lambda} & \text{if } \lambda \neq 0 \\
\log(X_t) & \text{if } \lambda = 0
\end{cases} Y t = { λ X t λ − 1 log ( X t ) if λ = 0 if λ = 0
Optimizes parameter λ \lambda λ to handle non-stationarity and heteroscedasticity.
Theoretical Framework for Data Preprocessing
Data Quality Dimensions
According to data quality management theory, data quality is evaluated through multidimensional metrics:
Accuracy : Degree of consistency between data and the real entities it describes.
Completeness : Extent to which required data is fully recorded, measured by:
Missing Rate = Number of Missing Values Total Data Points × 100 % \text{Missing Rate} = \frac{\text{Number of Missing Values}}{\text{Total Data Points}} \times 100\%
Missing Rate = Total Data Points Number of Missing Values × 100%
Consistency : Uniform representation of data across different sources.
Timeliness : Proximity of data updates to the current time.
Believability : Trustworthiness of data sources and values.
Interpretability : Ease of understanding and using the data.
Classification of Missing Data Mechanisms
Types of Missing Mechanisms
Missing Completely at Random (MCAR) : Missingness is independent of both observed and unobserved values.
Missing at Random (MAR) : Missingness depends only on observed values, not unobserved ones.
Missing Not at Random (MNAR) : Missingness depends on unobserved values.
Handling Method Selection Criteria
Missing Mechanism
Recommended Handling Methods
MCAR
Direct deletion, mean imputation
MAR
Regression imputation, multiple imputation
MNAR
Model-based methods, selection models
Missing Value Handling Implementation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 print ("Missing Values Analysis:" )print ("=" * 30 )print (df.isnull().sum ())df_drop = df.dropna() df_fill_mean = df.fillna(df.mean()) df_fill_forward = df.fillna(method='ffill' ) df_interpolate = df.interpolate()
Theoretical Basis for Noise Data Handling
Noise Statistical Model
Assume observed data Y t Y_t Y t consists of true signal f ( t ) f(t) f ( t ) and noise ε t \varepsilon_t ε t :
Y t = f ( t ) + ε t Y_t = f(t) + \varepsilon_t
Y t = f ( t ) + ε t
where ε t ∼ N ( 0 , σ 2 ) \varepsilon_t \sim \mathcal{N}(0, \sigma^2) ε t ∼ N ( 0 , σ 2 ) .
Smoothing Techniques Mathematical Principles
Moving Average Method :
f ^ ( t ) = 1 2 k + 1 ∑ i = − k k Y t + i \hat{f}(t) = \frac{1}{2k+1} \sum_{i=-k}^{k} Y_{t+i}
f ^ ( t ) = 2 k + 1 1 i = − k ∑ k Y t + i
Exponential Smoothing Method :
f ^ ( t ) = α Y t + ( 1 − α ) f ^ ( t − 1 ) \hat{f}(t) = \alpha Y_t + (1-\alpha)\hat{f}(t-1)
f ^ ( t ) = α Y t + ( 1 − α ) f ^ ( t − 1 )
where α ∈ ( 0 , 1 ) \alpha \in (0,1) α ∈ ( 0 , 1 ) is the smoothing parameter.
Noise Data Handling Implementation
1 2 3 4 5 6 7 8 9 10 11 12 13 def binning_smooth (data, bin_size=3 , method='mean' ): smoothed = [] for i in range (0 , len (data), bin_size): bin_data = data[i:i+bin_size] if method == 'mean' : smoothed.extend([bin_data.mean()] * len (bin_data)) elif method == 'median' : smoothed.extend([bin_data.median()] * len (bin_data)) return smoothed df['Temperature_Smooth' ] = binning_smooth(df['Temperature' ].values)
Data Integration and Correlation Analysis
Statistical Correlation Theory
Pearson Correlation Coefficient
Population correlation coefficient:
ρ X , Y = Cov ( X , Y ) σ X σ Y = E [ ( X − μ X ) ( Y − μ Y ) ] σ X σ Y \rho_{X,Y} = \frac{\text{Cov}(X,Y)}{\sigma_X \sigma_Y} = \frac{\mathbb{E}[(X-\mu_X)(Y-\mu_Y)]}{\sigma_X \sigma_Y}
ρ X , Y = σ X σ Y Cov ( X , Y ) = σ X σ Y E [( X − μ X ) ( Y − μ Y )]
Sample correlation coefficient:
r x y = ∑ i = 1 n ( x i − x ˉ ) ( y i − y ˉ ) ∑ i = 1 n ( x i − x ˉ ) 2 ∑ i = 1 n ( y i − y ˉ ) 2 r_{xy} = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^n (x_i - \bar{x})^2} \sqrt{\sum_{i=1}^n (y_i - \bar{y})^2}}
r x y = ∑ i = 1 n ( x i − x ˉ ) 2 ∑ i = 1 n ( y i − y ˉ ) 2 ∑ i = 1 n ( x i − x ˉ ) ( y i − y ˉ )
Correlation Test
Test statistic:
t = r n − 2 1 − r 2 ∼ t ( n − 2 ) t = \frac{r\sqrt{n-2}}{\sqrt{1-r^2}} \sim t(n-2)
t = 1 − r 2 r n − 2 ∼ t ( n − 2 )
where H 0 : ρ = 0 H_0: \rho = 0 H 0 : ρ = 0 , H 1 : ρ ≠ 0 H_1: \rho \neq 0 H 1 : ρ = 0 .
Correlation Coefficient Calculation Implementation
1 2 3 4 5 6 7 8 9 10 11 12 corr_matrix = df.corr() print ("Correlation Matrix:" )print ("=" * 30 )print (corr_matrix)import seaborn as snsplt.figure(figsize=(10 , 8 )) sns.heatmap(corr_matrix, annot=True , cmap='coolwarm' , center=0 ) plt.title('Feature Correlation Matrix' ) plt.show()
Chi-Square Test and Contingency Table Analysis
For independence testing of two categorical variables:
Expected Frequency Calculation
E i j = ( row i total ) × ( column j total ) grand total E_{ij} = \frac{(\text{row}_i \text{ total}) \times (\text{column}_j \text{ total})}{\text{grand total}}
E ij = grand total ( row i total ) × ( column j total )
Chi-Square Statistic
χ 2 = ∑ i = 1 r ∑ j = 1 c ( O i j − E i j ) 2 E i j ∼ χ 2 ( ( r − 1 ) ( c − 1 ) ) \chi^2 = \sum_{i=1}^r \sum_{j=1}^c \frac{(O_{ij} - E_{ij})^2}{E_{ij}} \sim \chi^2((r-1)(c-1))
χ 2 = i = 1 ∑ r j = 1 ∑ c E ij ( O ij − E ij ) 2 ∼ χ 2 (( r − 1 ) ( c − 1 ))
Application Note : Use Fisher’s exact test when >20% of cells have expected frequencies <5.
Feature Engineering and Dimensionality Reduction
Mathematical Basis of Principal Component Analysis (PCA)
Given centered data matrix X X X (n × p n \times p n × p ), find projection direction w w w to maximize variance:
max w w T Σ w s.t. w T w = 1 \max_{w} w^T \Sigma w \quad \text{s.t.} \quad w^T w = 1
w max w T Σ w s.t. w T w = 1
where Σ = 1 n X T X \Sigma = \frac{1}{n} X^T X Σ = n 1 X T X is the sample covariance matrix.
Eigenvalue Decomposition Solution
Σ v i = λ i v i , i = 1 , 2 , … , p \Sigma v_i = \lambda_i v_i, \quad i=1,2,\ldots,p
Σ v i = λ i v i , i = 1 , 2 , … , p
where λ 1 ≥ λ 2 ≥ ⋯ ≥ λ p ≥ 0 \lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_p \geq 0 λ 1 ≥ λ 2 ≥ ⋯ ≥ λ p ≥ 0 are eigenvalues, v i v_i v i are corresponding eigenvectors.
Variance Explained Ratio
The variance explained by the k k k -th principal component is:
λ k ∑ i = 1 p λ i \frac{\lambda_k}{\sum_{i=1}^p \lambda_i}
∑ i = 1 p λ i λ k
Feature Selection Theory
Filter Methods
Evaluate feature importance using statistics (e.g., correlation coefficient, chi-square statistic, mutual information).
Wrapper Methods
Select optimal feature subsets via subset search and cross-validation. Common algorithms:
Embedded Methods
Automatically perform feature selection during model training, e.g.:
Feature Engineering Implementation
Polynomial Features
1 2 3 4 5 6 7 8 9 10 from sklearn.preprocessing import PolynomialFeaturespoly = PolynomialFeatures(degree=2 , include_bias=False ) data_poly = poly.fit_transform(df[['Temperature' , 'Humidity' ]]) feature_names = poly.get_feature_names_out(['Temperature' , 'Humidity' ]) df_poly = pd.DataFrame(data_poly, columns=feature_names) df_extended = pd.concat([df, df_poly], axis=1 )
Statistical Feature Creation
1 2 3 4 5 6 7 8 df['RM_LSTAT' ] = df['RM' ] * df['LSTAT' ] df['RM_PTRATIO' ] = df['RM' ] / df['PTRATIO' ] df['RM_TAX' ] = df['RM' ] / df['TAX' ] target_corrs = df.corr()['target' ].abs ().sort_values(ascending=False ) selected_features = target_corrs[target_corrs >= 0.5 ].index.tolist()
Data Transformation and Normalization
Normalization Methods
Min-Max Normalization
v ′ = v − min A max A − min A × ( new_max A − new_min A ) + new_min A v' = \frac{v - \min_A}{\max_A - \min_A} \times (\text{new\_max}_A - \text{new\_min}_A) + \text{new\_min}_A
v ′ = max A − min A v − min A × ( new_max A − new_min A ) + new_min A
Z-Score Normalization
v ′ = v − μ A σ A v' = \frac{v - \mu_A}{\sigma_A}
v ′ = σ A v − μ A
Decimal Scaling Normalization
v ′ = v 10 j v' = \frac{v}{10^j}
v ′ = 1 0 j v
where j j j is the smallest integer such that max ( ∣ v ′ ∣ ) < 1 \max(|v'|) < 1 max ( ∣ v ′ ∣ ) < 1 .
Normalization Implementation
1 2 3 4 5 6 7 8 9 from sklearn.preprocessing import MinMaxScaler, StandardScalerminmax_scaler = MinMaxScaler() df_minmax = minmax_scaler.fit_transform(df[['Temperature' , 'Humidity' ]]) std_scaler = StandardScaler() df_std = std_scaler.fit_transform(df[['Temperature' , 'Humidity' ]])
Data Discretization and Concept Hierarchy
Discretization Algorithm Classification
Unsupervised Discretization
Equal-Width Binning : Fixed interval width
bin width = max − min N \text{bin width} = \frac{\max - \min}{N}
bin width = N max − min
Equal-Frequency Binning : Each bin contains approximately equal samples
Supervised Discretization
Entropy-Based Discretization (e.g., ID3 algorithm)
ChiMerge Algorithm : Bottom-up merging based on chi-square statistic
Concept Hierarchy Generation Methods
Statistical-Based Methods
Automatically generate hierarchies based on distinct attribute values, with attributes having more distinct values at lower levels.
Domain Knowledge-Based Methods
Define hierarchies explicitly using domain expertise, e.g.:
Discretization Implementation
1 2 3 4 5 6 7 8 9 10 11 df['Temp_Binned' ] = pd.cut(df['Temperature' ], bins=5 , labels=['Low' , 'Medium-Low' , 'Medium' , 'Medium-High' , 'High' ]) df['Humidity_Binned' ] = pd.qcut(df['Humidity' ], q=4 , labels=['Q1' , 'Q2' , 'Q3' , 'Q4' ]) from sklearn.cluster import KMeanskmeans = KMeans(n_clusters=3 ) df['Sales_Cluster' ] = kmeans.fit_predict(df[['Sales' ]])
Summary
Key Points in Time Series Analysis
Stationarity testing is a prerequisite for time series analysis.
Non-stationary series can be transformed to stationary via methods like differencing.
Autocorrelation analysis helps understand internal series structure.
Critical Steps in Data Preprocessing
Data Cleaning : Handle missing values, noise, and outliers.
Data Integration : Resolve consistency issues in multi-source data.
Data Reduction : Improve efficiency through feature selection and data compression.
Data Transformation : Enhance model performance via normalization and discretization.
Best Practice Recommendations
Always start analysis with data exploration and visualization.
Choose preprocessing methods based on data characteristics.
Iteratively optimize feature engineering strategies.
Validate preprocessing impact on final models.