Advanced Time Series Analysis with SARIMAX: A Case Study on Avocado Prices
In this post, we delve into the intricacies of time series analysis, particularly focusing on the Seasonal AutoRegressive Integrated Moving Average with eXogenous regressors (SARIMAX) model. Our case study involves analyzing avocado prices to make accurate predictions.
Introduction to Time Series Analysis
Time series analysis is a crucial statistical technique for analyzing data points collected or recorded at specific time intervals. It's widely used in various domains, including finance, economics, environmental science, and more. The primary goal is to develop models that capture the underlying patterns in the data to make future predictions.
Why SARIMAX?
SARIMAX is an extension of the ARIMA model that supports seasonal effects and exogenous variables. It can model complex time series data that exhibit seasonal patterns, which is essential for many real-world applications. SARIMAX stands for:
- Seasonal: Captures seasonality in the data.
- AutoRegressive: Involves regressing the variable on its own lagged values.
- Integrated: Represents differencing of raw observations to make the time series stationary.
- Moving Average: Involves modeling the error term as a linear combination of error terms occurring contemporaneously and at various times in the past.
- Exogenous: Allows incorporating external factors that might influence the target variable.
Data Preparation and Exploratory Data Analysis (EDA)
For our analysis, we used avocado price data. The initial step involved data cleaning and preparation, ensuring we had a complete dataset without missing values. EDA helped us understand the data better, revealing trends, seasonality, and outliers.
import pandas as pd
import matplotlib.pyplot as plt
# Load the dataset
data = pd.read_csv('avocado.csv')
# Data cleaning and preparation
data['Date'] = pd.to_datetime(data['Date'])
data = data.set_index('Date')
data = data.sort_index()
# EDA
data['AveragePrice'].plot(figsize=(14, 7))
plt.title('Avocado Prices Over Time')
plt.ylabel('Average Price ($)')
plt.xlabel('Date')
plt.show()
Model Selection: SARIMAX
Choosing the right model parameters is critical for accurate forecasting. We tested several SARIMAX models with different combinations of parameters and evaluated them using the Akaike Information Criterion (AIC). A lower AIC indicates a better-fitting model.
Here are some models we tested:
- ARIMA(0, 0, 0)x(0, 0, 0, 52)52: AIC = 1026.82
- ARIMA(0, 1, 1)x(0, 1, 1, 52)52: AIC = -381.87
- ARIMA(1, 0, 0)x(1, 0, 1, 52)52: AIC = -607.52
From our evaluations, ARIMA(0, 1, 0)x(0, 0, 1, 52)52 with an AIC of -614.61 emerged as the best model.
import statsmodels.api as sm
# Fit the SARIMAX model
model = sm.tsa.statespace.SARIMAX(data['AveragePrice'],
order=(0, 1, 0),
seasonal_order=(0, 0, 1, 52),
enforce_stationarity=False,
enforce_invertibility=False)
results = model.fit()
print(results.summary())
Forecasting and Visualization
Once the model is fitted, we can use it to make future predictions. We forecasted the next 100 days of avocado prices and visualized the results along with the confidence intervals.
# Forecasting
pred_uc = results.get_forecast(steps=100)
pred_ci = pred_uc.conf_int()
# Plotting the results
ax = data['AveragePrice'].plot(label='Observed', figsize=(14, 7))
pred_uc.predicted_mean.plot(ax=ax, label='Forecast')
ax.fill_between(pred_ci.index,
pred_ci.iloc[:, 0],
pred_ci.iloc[:, 1], color='k', alpha=.25)
ax.set_xlabel('Date')
ax.set_ylabel('Avocado Prices in $')
plt.legend()
plt.show()
Conclusion
In this case study, we demonstrated the application of the SARIMAX model in forecasting avocado prices. By carefully selecting model parameters and evaluating them using AIC, we achieved a robust model for accurate predictions. Time series analysis, particularly using SARIMAX, proves to be a powerful tool in understanding and forecasting trends in time-dependent data.
Stay tuned for more in-depth analyses and tutorials on advanced data science techniques! Feel free to reach out with any questions or comments!