This section develops modern deep learning methods and compares them with traditional time-series models. The goal is to develop forecasting models, critically evaluate their performance, and reflect on how model choice influences forecasting accuracy and usefulness.
Relative Performance Among the three deep learning models, LSTM achieved the lowest MSE on the validation set, indicating the best forecasting accuracy. GRU and RNN models performed similarly, though GRU showed slightly better results in both MSE and visual trend alignment. This is consistent with the known limitations of RNNs in capturing long-term dependencies, which LSTM is explicitly designed to handle through gating mechanisms. Visual inspection of the prediction plots confirms that LSTM is better at tracking trend changes and smooth fluctuations in the USD index.
Effectiveness of Regularization The use of dropout and L1L2 regularization significantly improved training stability and generalization. Without these regularization techniques, the models tended to overfit the training data, with training loss continuing to decrease while validation performance worsened. With a dropout rate of 0.2, the models converged more reliably and avoided unnecessary overtraining.
Comparison with Traditional Models Compared to the ARIMA(0,1,0) model used in the Univariate TS Models section, the LSTM model produced lower MSE. While ARIMA performed reasonably well for short-term predictions and was easy to interpret, it struggled to adapt to nonlinear patterns and structural shifts in the time series. Deep learning models captured more complex dynamics but required greater computational effort and tuning. In summary, LSTM demonstrated superior forecasting potential for this dataset, especially when flexibility and robustness are prioritized over interpretability.
Forecasting Performance Reflection
Through this modeling process, I have gained a deeper understanding of how different forecasting approaches perform on financial time series data like the USD index. Quantitatively, LSTM and GRU models outperformed both RNN and the ARIMA model from the Univariate TS Models section in terms of MSE, demonstrating better ability to capture temporal dependencies and subtle fluctuations. RNN, despite being simpler, was less effective, especially for longer-term dynamics.
Qualitatively, deep learning models provided more adaptive and flexible forecasts. They were better at handling changing trends and nonlinearity, which are common in real-world financial data. However, this came at the cost of interpretability and required more computation and careful tuning. In contrast, ARIMA was much easier to implement and explain, but its performance was limited to short-term, linear relationships.
Overall, this comparison taught me that no single model is universally best. Each comes with trade-offs. For simple, interpretable forecasts, ARIMA remains a strong baseline. But when modeling complex, evolving patterns, deep learning models like LSTM or GRU offer clear advantages, making them more appropriate for high-variance time series like the USD index.
Among the three deep learning models, the LSTM achieved the lowest MSE on the validation set, followed closely by GRU. RNN had the highest error and showed more fluctuation in its forecasts. This suggests that LSTM and GRU are better at capturing long-term dependencies in the USD time series, especially when incorporating external variables such as the S&P 500 index and Bitcoin price. Visual inspection of the prediction plots confirms that LSTM and GRU more accurately track turning points and slow trends.
In this project, I compared a range of traditional and deep learning forecasting models using both univariate and multivariate inputs to predict the USD index. The results highlight several important lessons about model complexity, input structure, and practical forecasting implications.
First, increasing model complexity generally led to improved performance, but only when well-matched with appropriate input data. For example, GRU (Univariate) achieved the lowest MSE (0.0266), outperforming even multivariate models such as LSTM (Multivariate, MSE = 0.4693) and RNN (Multivariate, MSE = 1.0046). This suggests that a more complex model does not always guarantee better performance, especially if the data or structure is not aligned with the model’s strengths. GRU’s efficiency in capturing medium- to long-term dependencies made it especially effective for univariate USD forecasting, where temporal structure was prominent and external variables added limited marginal value.
Second, multivariate modeling did not consistently improve performance. While SARIMAX and GRU (Multivariate) outperformed their univariate counterparts, LSTM and RNN actually performed worse with additional input variables. This inconsistency may be due to noise or weak correlations between the USD index and the added variables (S&P 500 and Bitcoin), which might have introduced complexity without improving signal quality. It also highlights that multivariate models require more careful tuning and feature selection to avoid performance degradation.
From a practical standpoint, if I were to select a model for real-world forecasting, I would prioritize GRU with univariate input. It is stable, relatively simple to implement, and consistently accurate. If more exogenous variables were available—and clearly relevant—I might revisit multivariate GRU or SARIMAX with more feature engineering. However, based on current results, a strong univariate model can be highly effective.
If I had only used univariate models, I might have overlooked potential risks such as model overfitting to internal patterns or missing external shocks. However, in this case, multivariate inputs did not significantly improve accuracy, which teaches me that variable quality matters more than variable quantity.
Overall, this comparison taught me to focus on aligning model choice with data characteristics. I would explain to stakeholders that while complex, multivariate models are appealing, simpler models like GRU can often provide more reliable forecasts, especially when the target variable exhibits strong autocorrelation and limited external dependencies. Choosing the right model is not just about complexity, but about matching the model’s strengths to the nature of the forecasting task.