For TAs:
1. The differencing and ADF test have already been done in the EDA section. Here, I directly check the ACF and PACF plots of the stationary series.
2. I combined the ARIMA and SARIMA sections. By checking the ACF and PACF plots, I can determine whether a seasonal component exists, select appropriate orders of terms, and choose the optimal model through model diagnostics.
Here, we will apply the ARIMA or SARIMA model to our 10 time series datasets in order to make forecasts. Below is a brief introduction to both models, including the parameters p, d, q, P, D, and Q.
ARIMA Model:
ARIMA (AutoRegressive Integrated Moving Average) is used for non-seasonal time series data that exhibit patterns over time. The parameters are:
- p: The order of the autoregressive (AR) term.
- d: The degree of differencing to make the series stationary.
- q: The order of the moving average (MA) term.
SARIMA Model:
SARIMA (Seasonal ARIMA) is an extension of the ARIMA model that deals with seasonality in time series data. The additional parameters for seasonal components are:
- P: The order of the seasonal autoregressive (AR) term.
- D: The degree of seasonal differencing.
- Q: The order of the seasonal moving average (MA) term.
- s: The number of periods in each season.
Data and Model Selection:
In the EDA phase, we performed differencing and seasonal differencing to transform the original series into stationary series. The stationarity of the differenced series was validated using the Augmented Dickey-Fuller (ADF) test. Therefore, we can now decide whether to apply the ARIMA or SARIMA model and select the appropriate values for d or D.
U.S. Dollar Index: First-order differencing, ARIMA model, d=1.
Trade Balance: Both seasonal and ordinary differencing, SARIMA model, d=1, D=1, s=4.
Global Commodity Price: First-order differencing, ARIMA model, d=1.
House Price: First-order differencing, ARIMA model, d=1.
International Visitors: Both seasonal and ordinary differencing, SARIMA model, d=1, D=1, s=12.
After performing the necessary differencing or seasonal differencing, all series have become stationary. We will now present the ACF and PACF plots of these differenced series to determine the optimal values for p (AR model), q (MA model), P (seasonal AR), and Q (seasonal MA).
ACF and PACF Plots
The following displays the ACF and PACF plots for the stationary series after differencing.
library(gridExtra)acf <-ggAcf(diff(dxy_ts), 50)+ggtitle("ACF Plot for USD Index") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") pacf <-ggPacf(diff(dxy_ts), 50)+ggtitle("PACF Plot for USD Index") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") grid.arrange(acf, pacf, nrow=2)
Code
acf <-ggAcf(diff(diff(balance_ts, lag=4)))+ggtitle("ACF Plot for Trade Balance") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") pacf <-ggPacf(diff(diff(balance_ts, lag=4)))+ggtitle("PACF Plot for Trade Balance") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") grid.arrange(acf, pacf, nrow=2)
Code
acf <-ggAcf(diff(gdp_ts))+ggtitle("ACF Plot for GDP") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") pacf <-ggPacf(diff(gdp_ts))+ggtitle("PACF Plot for GDP") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") grid.arrange(acf, pacf, nrow=2)
Code
acf <-ggAcf(diff(unem_ts))+ggtitle("ACF Plot for Unemployment Rate") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") pacf <-ggPacf(diff(unem_ts))+ggtitle("PACF Plot for Unemployment Rate") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") grid.arrange(acf, pacf, nrow=2)
Code
acf <-ggAcf(diff(diff(cpi_ts, lag=12)))+ggtitle("ACF Plot for CPI") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") pacf <-ggPacf(diff(diff(cpi_ts, lag=12)))+ggtitle("PACF Plot for CPI") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") grid.arrange(acf, pacf, nrow=2)
Code
acf <-ggAcf(diff(sp5_ts), 50)+ggtitle("ACF Plot for S&P 500 Index") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") pacf <-ggPacf(diff(sp5_ts), 50)+ggtitle("PACF Plot for S&P 500 Index") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") grid.arrange(acf, pacf, nrow=2)
Code
acf <-ggAcf(diff(xau_ts), 50)+ggtitle("ACF Plot for Gold Price") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") pacf <-ggPacf(diff(xau_ts), 50)+ggtitle("PACF Plot for Gold Price") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") grid.arrange(acf, pacf, nrow=2)
Code
acf <-ggAcf(diff(gsci_ts), 50)+ggtitle("ACF Plot for S&P GSCI Index") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") pacf <-ggPacf(diff(gsci_ts), 50)+ggtitle("PACF Plot for S&P GSCI Index") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") grid.arrange(acf, pacf, nrow=2)
Code
acf <-ggAcf(diff(house_ts))+ggtitle("ACF Plot for House Price Index") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") pacf <-ggPacf(diff(house_ts))+ggtitle("PACF Plot for House Price Index") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") grid.arrange(acf, pacf, nrow=2)
Code
acf <-ggAcf(diff(diff(visitors_ts, lag=12)))+ggtitle("ACF Plot for Number of International Visitors") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") pacf <-ggPacf(diff(diff(visitors_ts, lag=12)))+ggtitle("PACF Plot for Number of International Visitors") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") grid.arrange(acf, pacf, nrow=2)
The ACF plot helps determine the q parameter by identifying the number of significant lags in the moving average (MA) component. The PACF plot helps determine the p parameter by identifying the number of significant lags in the autoregressive (AR) component. For seasonal models, ACF and PACF can also be used to determine the seasonal parameters Q and P.
start_line <-grep("Coefficients", model_output) # Locate where coefficient details startend_line <-length(model_output) # Last line of outputcat(model_output[start_line:end_line], sep ="\n")
Coefficients:
Estimate SE t.value p.value
constant 1e-04 1e-04 0.8257 0.409
sigma^2 estimated as 2.26922e-05 on 5034 degrees of freedom
AIC = -7.854818 AICc = -7.854818 BIC = -7.852226
start_line <-grep("Coefficients", model_output) # Locate where coefficient details startend_line <-length(model_output) # Last line of outputcat(model_output[start_line:end_line], sep ="\n")
Coefficients:
Estimate SE t.value p.value
ar1 0.0145 0.0141 1.0260 0.3049
constant 0.0001 0.0001 0.8118 0.4169
sigma^2 estimated as 2.268745e-05 on 5033 degrees of freedom
AIC = -7.85463 AICc = -7.854629 BIC = -7.850743
For both ARIMA(0,1,0) and ARIMA(1,1,0), the model diagnostics results are very similar. The results are as follows:
The Residual Plot shows nearly consistent fluctuation around zero, suggesting that the residuals are nearly stationary with a constant mean and finite variance over time.
The Autocorrelation Function (ACF) of the residuals shows mostly independence.
The Q-Q Plot indicates that the residuals follow a near-normal distribution, with minor deviations at the tails, which is typical in time series data.
The Ljung-Box Test p-values are below the 0.05 significance level for lags greater than 10, implying that autocorrelations remain in the residuals.
Since the AR(1) term in the ARIMA(1,1,0) model is not significant at the 10% level, and the ARIMA(0,1,0) model has lower AIC, AICc, and BIC values, we decide to use ARIMA(0,1,0) as the optimal model. Moreover, the intercept term of the ARIMA(0,1,0) is insignificant.
start_line <-grep("Coefficients", model_output) # Locate where coefficient details startend_line <-length(model_output) # Last line of outputcat(model_output[start_line:end_line], sep ="\n")
Coefficients:
Estimate SE t.value p.value
ma1 0.3844 0.1179 3.2618 0.0017
sma1 -1.0000 0.1739 -5.7491 0.0000
sigma^2 estimated as 186230777 on 72 degrees of freedom
AIC = 22.12471 AICc = 22.127 BIC = 22.21812
start_line <-grep("Coefficients", model_output) # Locate where coefficient details startend_line <-length(model_output) # Last line of outputcat(model_output[start_line:end_line], sep ="\n")
For both SARIMA(0,1,1)x(0,1,1)[4] and SARIMA(1,0,1)(2,1,0)[4], the model diagnostics results are very similar. The results are as follows:
The Residual Plot shows nearly consistent fluctuation around zero, suggesting that the residuals are nearly stationary with a constant mean and finite variance over time.
The Autocorrelation Function (ACF) of the residuals shows perfect independence.
The Q-Q Plot indicates that the residuals follow a near-normal distribution, with minor deviations at the tails, which is typical in time series data.
The Ljung-Box Test p-values are all above the 0.05 significance level, implying no autocorrelations are left in the residuals and concluding that the model is well-fitted.
Coefficient Significance: All model coefficients are significant.
Therefore, we decide to use SARIMA(0,1,1)x(0,1,1)[4] as the optimal model, since it has lower AIC, AICc, and BIC.
start_line <-grep("Coefficients", model_output) # Locate where coefficient details startend_line <-length(model_output) # Last line of outputcat(model_output[start_line:end_line], sep ="\n")
Coefficients:
Estimate SE t.value p.value
constant 212.9192 38.3573 5.5509 0
sigma^2 estimated as 114760 on 77 degrees of freedom
AIC = 14.53976 AICc = 14.54043 BIC = 14.60019
start_line <-grep("Coefficients", model_output) # Locate where coefficient details startend_line <-length(model_output) # Last line of outputcat(model_output[start_line:end_line], sep ="\n")
Coefficients:
Estimate SE t.value p.value
ma1 -0.921 0.0442 -20.8594 0
sigma^2 estimated as 113730.5 on 76 degrees of freedom
AIC = 14.5559 AICc = 14.55659 BIC = 14.61677
For both ARIMA(0,1,0) and ARIMA(0,2,1), the model diagnostics results are very similar. The results are as follows:
The Residual Plot shows nearly consistent fluctuation around zero, but there is a significant oscillation in 2020, indicating the need for a more advanced model to account for special events.
The Autocorrelation Function (ACF) of the residuals shows perfect independence.
The Q-Q Plot indicates that the residuals follow a near-normal distribution, with minor deviations at the tails, which is typical in time series data.
The Ljung-Box Test p-values are all above the 0.05 significance level, implying no autocorrelations are left in the residuals and concluding that the model is well-fitted.
Since the MA(1) term in the ARIMA(0,2,1) model is significant at the 5% level, we decide to use ARIMA(0,2,1) as the optimal model.
start_line <-grep("Coefficients", model_output) # Locate where coefficient details startend_line <-length(model_output) # Last line of outputcat(model_output[start_line:end_line], sep ="\n")
Coefficients:
Estimate SE t.value p.value
constant -0.0015 0.006 -0.246 0.8059
sigma^2 estimated as 0.008057314 on 226 degrees of freedom
AIC = -1.965677 AICc = -1.965598 BIC = -1.935501
start_line <-grep("Coefficients", model_output) # Locate where coefficient details startend_line <-length(model_output) # Last line of outputcat(model_output[start_line:end_line], sep ="\n")
Coefficients:
Estimate SE t.value p.value
ar1 -0.5485 0.2146 -2.5556 0.0113
ma1 0.6962 0.1832 3.7993 0.0002
constant -0.0014 0.0064 -0.2196 0.8264
sigma^2 estimated as 0.007811542 on 224 degrees of freedom
AIC = -1.978774 AICc = -1.9783 BIC = -1.918423
start_line <-grep("Coefficients", model_output) # Locate where coefficient details startend_line <-length(model_output) # Last line of outputcat(model_output[start_line:end_line], sep ="\n")
Coefficients:
Estimate SE t.value p.value
ma1 0.1550 0.0722 2.1459 0.0329
constant -0.0014 0.0068 -0.2091 0.8345
sigma^2 estimated as 0.00789841 on 225 degrees of freedom
AIC = -1.976678 AICc = -1.976442 BIC = -1.931414
The model diagnostics results for all three models are similar. The results are as follows:
The Residual Plot shows nearly consistent fluctuation around zero, but there is a significant oscillation in 2020, indicating the need for a more advanced model to account for special events.
The Autocorrelation Function (ACF) of the residuals shows perfect independence.
The Q-Q Plot indicates that the residuals follow a near-normal distribution, with minor deviations at the tails, which is typical in time series data.
However, the Ljung-Box test results are different. For the ARIMA(0,1,0) model, the Ljung-Box Test p-values after lag 2 are all above the 0.05 significance level. For the other two models, all p-values are above the threshold. This implies that, for the other two models, no autocorrelations remain in the residuals, concluding that the models are well-fitted.
Coefficient Significance: All model coefficients are significant.
Since both the AR(1) and MA(1) terms in the ARIMA(1,1,1) model are significant, we choose this model as the optimal one.
start_line <-grep("Coefficients", model_output) # Locate where coefficient details startend_line <-length(model_output) # Last line of outputcat(model_output[start_line:end_line], sep ="\n")
start_line <-grep("Coefficients", model_output) # Locate where coefficient details startend_line <-length(model_output) # Last line of outputcat(model_output[start_line:end_line], sep ="\n")
The model diagnostics results for both models are similar. The results are as follows:
The Residual Plot shows nearly consistent fluctuation around zero, suggesting that the residuals are nearly stationary with a constant mean and finite variance over time.
The Autocorrelation Function (ACF) of the residuals shows mostly independence.
The Q-Q Plot indicates that the residuals follow a near-normal distribution, with minor deviations at the tails, which is typical in time series data.
However, the Ljung-Box test results and coefficient significance are different. For the SARIMA(1,1,2)x(0,1,1)[12] model, the Ljung-Box Test p-values are all above the 0.05 significance level, and all coefficients are significant. For the SARIMA(2,2,3)(1,0,0)[12] model, the Ljung-Box Test p-values are above the 0.05 significance level only after lag 20, and the majority of the coefficients are not significant.
Therefore, we choose the SARIMA(1,1,2)x(0,1,1)[12] model as the optimal one.
start_line <-grep("Coefficients", model_output) # Locate where coefficient details startend_line <-length(model_output) # Last line of outputcat(model_output[start_line:end_line], sep ="\n")
start_line <-grep("Coefficients", model_output) # Locate where coefficient details startend_line <-length(model_output) # Last line of outputcat(model_output[start_line:end_line], sep ="\n")
The model diagnostics results for both models are very similar. The results are as follows:
The Residual Plot shows nearly consistent fluctuation around zero, But the magnitude of the residuals after 2020 is noticeably larger, indicating the need for a more advanced model to account for special events.
The Autocorrelation Function (ACF) of the residuals shows mostly independence.
The Q-Q Plot indicates that the residuals follow a near-normal distribution, with minor deviations at the tails, which is typical in time series data.
The Ljung-Box test results for both models are not satisfactory, as they are below the 0.05 significance level. This indicates that autocorrelations remain in the residuals, concluding that the models need improvement.
Coefficient Significance: All model coefficients are significant.
Since the ARIMA(4,1,4) model has lower AIC, AICc and BIC, we choose it as the optimal model.
start_line <-grep("Coefficients", model_output) # Locate where coefficient details startend_line <-length(model_output) # Last line of outputcat(model_output[start_line:end_line], sep ="\n")
Coefficients:
Estimate SE t.value p.value
constant 0.0018 7e-04 2.4406 0.0148
sigma^2 estimated as 0.0005449249 on 1038 degrees of freedom
AIC = -4.673136 AICc = -4.673132 BIC = -4.663615
This model is chosen by both manual search and auto.arima(). The diagnostic results are as follows:
The Residual Plot shows nearly consistent fluctuation around zero, suggesting that the residuals are nearly stationary with a constant mean and finite variance over time.
The Autocorrelation Function (ACF) of the residuals shows mostly independence.
The Q-Q Plot indicates that the residuals follow a near-normal distribution, with minor deviations at the tails, which is typical in time series data.
The Ljung-Box Test p-values are all above the 0.05 significance level, implying no autocorrelations are left in the residuals and concluding that the model is well-fitted.
Therefore, the ARIMA(0,1,0) model is the optimal model.
start_line <-grep("Coefficients", model_output) # Locate where coefficient details startend_line <-length(model_output) # Last line of outputcat(model_output[start_line:end_line], sep ="\n")
Coefficients:
Estimate SE t.value p.value
constant 1e-04 3e-04 0.4152 0.678
sigma^2 estimated as 0.0002048122 on 2534 degrees of freedom
AIC = -5.653962 AICc = -5.653961 BIC = -5.649356
start_line <-grep("Coefficients", model_output) # Locate where coefficient details startend_line <-length(model_output) # Last line of outputcat(model_output[start_line:end_line], sep ="\n")
Coefficients:
Estimate SE t.value p.value
sar1 0.0073 0.0205 0.3552 0.7225
constant 0.0001 0.0003 0.4089 0.6826
sigma^2 estimated as 0.000204801 on 2533 degrees of freedom
AIC = -5.653223 AICc = -5.653221 BIC = -5.646314
The model diagnostics results for both models are similar. The results are as follows:
The Residual Plot shows nearly consistent fluctuation around zero, but the magnitude of the residuals around 2020 is noticeably larger, indicating the need for a more advanced model to account for special events.
The Autocorrelation Function (ACF) of the residuals shows mostly independence.
The Q-Q Plot indicates that the residuals follow a near-normal distribution, with minor deviations at the tails, which is typical in time series data.
However, the Ljung-Box test results are different. For the ARIMA(0,1,0) model, most of the Ljung-Box Test p-values are all above the 0.05 significance level. For the SARIMA(0,1,0)(1,0,0)[252] model, only p-values after lag 100 are above the threshold. This implies that autocorrelations remain in the residuals for the latter model. Moreover, the seasonal AR(1) term of the model is not significant.
Therefore, we choose the ARIMA(0,1,0) model as the optimal one.
start_line <-grep("Coefficients", model_output) # Locate where coefficient details startend_line <-length(model_output) # Last line of outputcat(model_output[start_line:end_line], sep ="\n")
start_line <-grep("Coefficients", model_output) # Locate where coefficient details startend_line <-length(model_output) # Last line of outputcat(model_output[start_line:end_line], sep ="\n")
start_line <-grep("Coefficients", model_output) # Locate where coefficient details startend_line <-length(model_output) # Last line of outputcat(model_output[start_line:end_line], sep ="\n")
The model diagnostics results for all three models are similar. The results are as follows:
The Residual Plot shows nearly consistent fluctuation around zero, but there is a significant oscillation in 2020, indicating the need for a more advanced model to account for special events.
The Autocorrelation Function (ACF) of the residuals shows perfect independence.
The Q-Q Plot indicates that the residuals follow a near-normal distribution, with minor deviations at the tails, which is typical in time series data.
However, the Ljung-Box test results are different. For the ARIMA(3,1,0) model, the Ljung-Box Test p-values after lag 6 are all above the 0.05 significance level. For the other two models, all p-values are above the threshold. This implies that, for the other two models, no autocorrelations remain in the residuals, concluding that the models are well-fitted.
Coefficient significance: For the ARIMA(3,1,0) model, all three coefficients are significant. For the ARIMA(3,1,2) model, only the coefficient of the MA1 term is insignificant, while all other four are significant. For the ARIMA(2,2,2) model, only the coefficient of the AR2 term is significant, with the other three coefficients insignificant.
Therefore, we choose the ARIMA(3,1,2) model as the optimal one.
start_line <-grep("Coefficients", model_output) # Locate where coefficient details startend_line <-length(model_output) # Last line of outputcat(model_output[start_line:end_line], sep ="\n")
Coefficients:
Estimate SE t.value p.value
ma1 0.2515 0.0667 3.7693 2e-04
sma1 -1.0000 0.0919 -10.8779 0e+00
sigma^2 estimated as 0.03367981 on 225 degrees of freedom
AIC = -0.3681259 AICc = -0.3678899 BIC = -0.3228623
start_line <-grep("Coefficients", model_output) # Locate where coefficient details startend_line <-length(model_output) # Last line of outputcat(model_output[start_line:end_line], sep ="\n")
The model diagnostics results for both models are similar. The results are as follows:
The Residual Plot shows nearly consistent fluctuation around zero, but there is a significant oscillation in 2020, indicating the need for a more advanced model to account for special events.
The Autocorrelation Function (ACF) of the residuals shows mostly independence.
The Q-Q Plot indicates that the residuals follow a near-normal distribution, with minor deviations at the tails, which is typical in time series data.
However, the Ljung-Box test results are different. For the SARIMA(0,1,1)x(0,1,1)[12] model, the Ljung-Box Test p-values are all above the 0.05 significance level. For the SARIMA(2,0,2)(0,0,2)[12] model, all p-values are below the threshold.
Therefore, we choose the SARIMA(0,1,1)x(0,1,1)[12] model as the optimal model, where all coefficients are significant.
The equation of the best model for each time series:
U.S. Dollar Index:
ARIMA(0,1,0) \[
(1-B) x_t= w_t
\]
where \(x_t\) is the original time series and \(w_t\) is the Gaussian white noise process.
fit <-Arima(dxy_ts, order =c(0,1,0), include.drift =TRUE)forecast_result <-forecast(fit, h =252)autoplot(forecast_result) +labs(title ="ARIMA(0,1,0) Forecast",x ="Time",y ="Predicted Values") +theme_minimal()
Code
fit <-Arima(balance_ts, order =c(0,1,1), seasonal =list(order =c(0,1,1), period =4))forecast_result <-forecast(fit, h =8)autoplot(forecast_result) +labs(title ="SARIMA(0,1,1)x(0,1,1)[4] Forecast",x ="Time",y ="Predicted Values") +theme_minimal()
Code
fit <-Arima(gdp_ts, order =c(0,2,1))forecast_result <-forecast(fit, h =12)autoplot(forecast_result) +labs(title ="ARIMA(0,2,1) Forecast",x ="Time",y ="Predicted Values") +theme_minimal()
Code
fit <-Arima(unem_ts, order =c(1,1,1))forecast_result <-forecast(fit, h =12)autoplot(forecast_result) +labs(title ="ARIMA(1,1,1) Forecast",x ="Time",y ="Predicted Values") +theme_minimal()
Code
fit <-Arima(cpi_ts, order =c(1,1,2), seasonal =list(order =c(0,1,1), period =12))forecast_result <-forecast(fit, h =36)autoplot(forecast_result) +labs(title ="SARIMA(1,1,2)X(0,1,1)[12] Forecast",x ="Time",y ="Predicted Values") +theme_minimal()
Code
fit <-Arima(sp5_ts, order =c(4,1,4), include.drift=TRUE)forecast_result <-forecast(fit, h =252)autoplot(forecast_result) +labs(title ="ARIMA(4,1,4) Forecast",x ="Time",y ="Predicted Values") +theme_minimal()
Code
fit <-Arima(xau_ts, order =c(0,1,0), include.drift=TRUE)forecast_result <-forecast(fit, h =52)autoplot(forecast_result) +labs(title ="ARIMA(0,1,0) Forecast",x ="Time",y ="Predicted Values") +theme_minimal()
Code
fit <-Arima(gsci_ts, order =c(0,1,0), include.drift=TRUE)forecast_result <-forecast(fit, h =252)autoplot(forecast_result) +labs(title ="ARIMA(0,1,0) Forecast",x ="Time",y ="Predicted Values") +theme_minimal()
Code
fit <-Arima(house_ts, order =c(3,1,2))forecast_result <-forecast(fit, h =12)autoplot(forecast_result) +labs(title ="ARIMA(3,1,2) Forecast",x ="Time",y ="Predicted Values") +theme_minimal()
Code
fit <-Arima(visitors_ts, order =c(0,1,1), seasonal =list(order =c(0,1,1), period =12))forecast_result <-forecast(fit, h =36)autoplot(forecast_result) +labs(title ="SARIMA(0,1,1)X(0,1,1)[12] Forecast",x ="Time",y ="Predicted Values") +theme_minimal()
The forecast plots for all series show that our optimal models perform well. The confidence intervals, shown in the shaded blue area, represent the range within which the time series are expected to move. The widening of these intervals as the forecast period extends reflects increasing uncertainty.
Comparison with Benchmark Methods
Write a function
Code
plot_forecasts <-function(forecast_result, ts, h, fit) {print(accuracy(forecast_result)) # Plot the forecasts using Mean, Naïve, Drift Methods, and ARIMA Fitautoplot(ts) +autolayer(meanf(ts, h = h), series ="Mean", PI =FALSE) +autolayer(naive(ts, h = h), series ="Naïve", PI =FALSE) +autolayer(snaive(ts, h = h), series ="SNaïve", PI =FALSE) +autolayer(rwf(ts, drift =TRUE, h = h), series ="Drift", PI =FALSE) +autolayer(forecast(fit, h = h), series ="Fit", PI =FALSE) +xlab("Date") +ylab("Predicted Values") +guides(colour =guide_legend(title ="Forecast Methods")) +theme_minimal()}
fit <-Arima(dxy_ts, order =c(0,1,0), include.drift =TRUE)forecast_result <-forecast(fit, h =252)plot_forecasts(forecast_result, dxy_ts, 252, fit)
ME RMSE MAE MPE MAPE
Training set 8.733295e-07 0.004763563 0.003531088 -5.079e-05 0.07875213
MASE ACF1
Training set 0.05939571 0.01503592
Code
fit <-Arima(balance_ts, order =c(0,1,1), seasonal =list(order =c(0,1,1), period =4))forecast_result <-forecast(fit, h =8)plot_forecasts(forecast_result, balance_ts, 8, fit)
ME RMSE MAE MPE MAPE MASE
Training set -759.8004 13207.78 9733.733 0.4284101 6.512908 0.4244237
ACF1
Training set -0.02815605
Code
fit <-Arima(gdp_ts, order =c(0,2,1))forecast_result <-forecast(fit, h =12)plot_forecasts(forecast_result, gdp_ts, 12, fit)
ME RMSE MAE MPE MAPE MASE ACF1
Training set 32.67852 332.9493 147.9156 0.1147268 0.756791 0.1612916 -0.1421698
Code
fit <-Arima(unem_ts, order =c(1,1,1))forecast_result <-forecast(fit, h =12)plot_forecasts(forecast_result, unem_ts, 12, fit)
ME RMSE MAE MPE MAPE MASE
Training set -0.001293724 0.08819836 0.03253087 -0.1672727 1.841263 0.1628627
ACF1
Training set 0.0003963382
Code
fit <-Arima(cpi_ts, order =c(1,1,2), seasonal =list(order =c(0,1,1), period =12))forecast_result <-forecast(fit, h =36)plot_forecasts(forecast_result, cpi_ts, 36, fit)
ME RMSE MAE MPE MAPE MASE
Training set 0.01658077 0.6646226 0.4823797 0.00423208 0.2025314 0.07696065
ACF1
Training set 0.02124335
Code
fit <-Arima(sp5_ts, order =c(4,1,4), include.drift=TRUE)forecast_result <-forecast(fit, h =252)plot_forecasts(forecast_result, sp5_ts, 252, fit)
ME RMSE MAE MPE MAPE MASE
Training set -0.02174175 27.98292 17.39177 -0.02943003 0.7786464 0.05055928
ACF1
Training set -0.01040333
Code
fit <-Arima(xau_ts, order =c(0,1,0), include.drift=TRUE)forecast_result <-forecast(fit, h =52)plot_forecasts(forecast_result, xau_ts, 52, fit)
ME RMSE MAE MPE MAPE MASE
Training set 5.803826e-06 0.02333315 0.01745149 0.0001973932 0.2473152 0.122605
ACF1
Training set 0.003597321
Code
fit <-Arima(gsci_ts, order =c(0,1,0), include.drift=TRUE)forecast_result <-forecast(fit, h =252)plot_forecasts(forecast_result, gsci_ts, 252, fit)
ME RMSE MAE MPE MAPE
Training set 2.379987e-06 0.01430894 0.01014429 -0.000263987 0.1665067
MASE ACF1
Training set 0.05461529 -0.001120247
Code
fit <-Arima(house_ts, order =c(3,1,2))forecast_result <-forecast(fit, h =12)plot_forecasts(forecast_result, house_ts, 12, fit)
ME RMSE MAE MPE MAPE MASE
Training set 0.520081 4.429123 2.728648 0.1197514 0.6335488 0.1106083
ACF1
Training set -0.01834026
Code
fit <-Arima(visitors_ts, order =c(0,1,1), seasonal =list(order =c(0,1,1), period =12))forecast_result <-forecast(fit, h =36)plot_forecasts(forecast_result, visitors_ts, 36, fit)
ME RMSE MAE MPE MAPE MASE
Training set -0.0005064716 0.178516 0.06267839 -0.01008101 0.4338407 0.2188027
ACF1
Training set -0.00867727
Our models generally outperform the benchmark methods, although sometimes the Naïve method better predicts seasonality. By comparing the forecast results of our model with those benchmark methods, it is clear that our model offers more precise and reliable predictions. The accuracy metrics (ME, RMSE, MAE, MPE, MAPE, MASE, ACF1) further confirm the superiority of our model, making it the optimal choice for forecasting in this case.