Based on the data and previous univariate analysis, we will begin conducting multivariate analysis in this section. Our data science questions include:
What factors interact with the US Dollar Index?
What exogenous variables affect the US Dollar Index?
What factors are influenced by the US Dollar Index?
Before constructing Multivariate Time Series Models, we will first conduct a literature review (see the Introduction part) and analyze the mechanisms among variables from an economic perspective.
Interacting Variables
Trade Balance
Trade Deficit → Dollar Depreciation: Higher imports than exports lead to a trade deficit, requiring capital inflows but increasing dollar supply, causing depreciation.
Dollar Depreciation → Trade Deficit Reduction: A weaker dollar boosts US exports and reduces imports, helping balance the deficit.
The expansion of the trade deficit leads to the depreciation of the US dollar.
A trade deficit means that the US imports more than it exports. According to the Balance of Payments Identity (Current Account + Financial Account = 0), a trade deficit (i.e., a negative current account) means that the financial account must be positive. This is reflected in capital inflows. Typically, the US issues government bonds and attracts foreign investment to balance the trade deficit. However, the increased supply of dollars in the foreign exchange market will cause the dollar to depreciate, or foreign investors may become concerned about the US fiscal situation, leading to reduced demand for the dollar and further depreciation.
The depreciation of the dollar leads to a reduction in the trade deficit.
When the dollar depreciates, US goods and services become relatively cheaper in international markets, which may increase US exports. At the same time, imported goods become relatively expensive, leading to a reduction in US imports. Therefore, the depreciation of the dollar generally reduces the trade deficit.
Global Commodity Market
A stronger dollar leads to a decrease in global commodity prices.
Since the dollar is the global reserve and trading currency, commodities (such as oil, gold, copper, aluminum, etc.) are typically priced in dollars. When the dollar appreciates, the cost for holders of other currencies to buy dollar-priced commodities increases, which leads to a reduction in demand and lowers commodity prices. Additionally, investors may shift funds from physical assets like commodities into dollar-denominated assets, further driving down commodity prices.
Changes in commodity prices also influence the US Dollar Index.
A rise in commodity prices, especially oil and metal prices, often puts pressure on global inflation. This, in turn, affects the Dollar Index. Moreover, commodity price fluctuations often reflect changes in the global economy. For example, an increase in oil prices may indicate a slowdown in global economic growth or supply-demand imbalances. These factors could lead to higher demand for the dollar as a safe-haven currency, thus pushing up the Dollar Index.
Stock Market
The risk and return associated with stocks are relatively high, and the dollar can serve as a safe-haven asset. Therefore, there is a competitive relationship between the two. Generally speaking, the stock market and the Dollar Index have an inverse relationship. When global investors have a higher risk appetite or when the stock market is strong, capital tends to flow into the stock market. This reduces demand for the dollar and leads to a depreciation of the dollar. On the other hand, when the dollar appreciates, investors may prefer to move funds into low-risk assets like bonds. This leads to capital outflows from the stock market and puts downward pressure on it, especially on overvalued stocks. However, a stronger dollar may also reflect a robust economy, which could push stock prices higher.
Gold Market
The dollar and gold also tend to show inverse fluctuations due to their competitive relationship.
A stronger dollar leads to a decrease in gold prices.
First, gold prices are usually denominated in dollars. When the dollar appreciates, the cost of gold in other currencies increases, reducing demand and lowering gold prices. Secondly, a stronger dollar may be due to the Federal Reserve raising interest rates. In this case, investors are more likely to invest in higher-yielding assets (such as dollar-denominated bonds) and reduce holdings in non-interest-bearing assets like gold, leading to a decline in gold prices.
An increase in gold prices leads to the depreciation of the dollar.
When the dollar weakens or global economic uncertainty rises, the dollar may no longer be considered a safe-haven asset, and gold becomes the preferred choice. Investors may tend to buy gold as a hedge against inflation or currency devaluation, leading to decreased demand for the dollar and causing it to depreciate.
GDP
Economic growth drives dollar appreciation.
When a country’s GDP grows strongly, it typically reflects good economic performance and attracts foreign investors’ attention. This capital inflow (e.g., foreign direct investment, securities investment) increases demand for the country’s currency, which in our case is the US dollar, leading to dollar appreciation.
Unemployment Rate
A higher unemployment rate leads to dollar depreciation.
Mechanism 1
The unemployment rate mainly affects the economy, which in turn influences the Dollar Index. There is an inverse relationship between the unemployment rate and GDP growth. A higher unemployment rate typically indicates an economic recession or depression, where businesses reduce hiring, which is often accompanied by a decrease in consumption and investment. This reduces the ability to attract foreign investment into the US, leading to reduced demand for dollar-denominated assets and a depreciation of the dollar.
Mechanism 2
When the unemployment rate is high, the Federal Reserve may adopt loose monetary policies (e.g., lowering interest rates or quantitative easing) to stimulate the economy and reduce unemployment. Such loose monetary policies typically lead to dollar depreciation because lower interest rates reduce the relative attractiveness of the dollar, making it less likely to attract foreign capital inflows, thereby decreasing demand for the dollar.
CPI
An increase in the CPI drives dollar appreciation.
When the CPI rises, it indicates inflation. In this case, the Federal Reserve may adopt a tightening monetary policy, such as raising interest rates, to control inflation. Higher interest rates make the dollar more attractive because they can attract foreign capital inflows, increasing demand for the dollar and thus driving up its value.
Expectations are also a factor. If inflation expectations rise, markets may anticipate that the Federal Reserve will take more tightening measures, which could lead to dollar appreciation.
Exogenous Variables
Real Estate Market
A stronger dollar typically indicates a relatively strong US economy, and interest rates may rise. Higher mortgage rates increase borrowing costs, which can suppress demand for housing and lead to a decline in housing prices.
However, a stronger dollar can also boost confidence among homebuyers and global investors. This leads to increased capital inflows into US real estate, pushing housing prices up.
Therefore, the impact may be a combined effect.
Tourism
When the dollar appreciates, foreign tourists need to exchange more of their local currency for dollars, which increases their travel costs. Some tourists may opt for other travel destinations, leading to a decrease in the number of tourists visiting the US. Therefore, US tourism may decline with a stronger dollar.
Code
library(tidyverse)library(ggplot2)library(forecast)library(astsa) library(xts)library(zoo)library(tseries)library(fpp2)library(fma)library(lubridate)library(tidyverse)library(TSstudio)library(quantmod)library(tidyquant)library(plotly)library(readr)library(dplyr)library(kableExtra)library(knitr)library(patchwork)library(vars)# Load datainvisible(getSymbols("DX-Y.NYB", src ="yahoo", from ="2005-01-01", to ="2024-12-31"))dxy <-data.frame(Date =index(`DX-Y.NYB`), Open =`DX-Y.NYB`[, "DX-Y.NYB.Open"], High =`DX-Y.NYB`[, "DX-Y.NYB.High"], Low =`DX-Y.NYB`[, "DX-Y.NYB.Low"], Close =`DX-Y.NYB`[, "DX-Y.NYB.Close"])colnames(dxy) <-c("Date", "Open", "High", "Low", "Close")dxy <-na.omit(dxy)bea <-read.csv("./data/bea.csv")bea$time <-as.Date(bea$time)gdp <-read.csv("./data/gdp.csv")gdp$time <-as.Date(gdp$time)gdp$total <- gdp$consumption + gdp$investment + gdp$net_export + gdp$governmentdata_unem <-read.csv("./data/unem.csv", header=TRUE)data_unem$time <-as.Date(data_unem$time)data_cpi <-read.csv("./data/cpi.csv", header=TRUE)data_cpi$time <-as.Date(data_cpi$time)invisible(getSymbols("^GSPC", src ="yahoo", from ="2005-01-01", to ="2024-12-31"))invisible(getSymbols("BTC-USD", src ="yahoo", from ="2015-01-01", to ="2024-12-31"))xau <-read.csv("./data/xau.csv")xau$Date <-as.Date(xau$Date)gsci <-read.csv("./data/gsci.csv")[2:2518,]gsci$Date <-as.Date(gsci$Date)house <-read.csv("./data/house.csv", header=TRUE)house$time <-as.Date(house$time)visitors <-read.csv("./data/visitors.csv", header=TRUE)visitors$time <-as.Date(visitors$time)egg <-read.csv("./data/eggs_price.csv")egg$Date <-as.Date(egg$Date)ir <-read.csv("./data/interest_rate.csv")ir$Date <-as.Date(ir$Date)oil <-read.csv("./data/oil_price.csv")oil$Date <-as.Date(oil$Date)set.seed(123)library(kableExtra)# ARIMA funtionARIMA.c =function(p1, p2, q1, q2, data) { d =1 i =1 temp =data.frame() ls =matrix(rep(NA, 6*100), nrow =100)for (p in p1:p2) {for (q in q1:q2) {if (p + d + q <=9) { model <-tryCatch({Arima(data, order =c(p, d, q), include.drift =TRUE) }, error =function(e) {return(NULL) })if (!is.null(model)) { ls[i, ] =c(p, d, q, model$aic, model$bic, model$aicc) i = i +1 } } } } temp =as.data.frame(ls)names(temp) =c("p", "d", "q", "AIC", "BIC", "AICc") temp =na.omit(temp)return(temp)}# SARIMA functionSARIMA.c =function(p1, p2, q1, q2, P1, P2, Q1, Q2, s, data) { d =1 D =1 i =1 temp =data.frame() ls =matrix(rep(NA, 9*100), nrow =100)for (p in p1:p2) {for (q in q1:q2) {for (P in P1:P2) {for (Q in Q1:Q2) {if (p + d + q + P + D + Q <=9) { model <-tryCatch({Arima(data, order =c(p, d, q), seasonal =list(order =c(P,D,Q), period = s)) }, error =function(e) {return(NULL) })if (!is.null(model)) { ls[i, ] =c(p, d, q, P, D, Q, model$aic, model$bic, model$aicc) i = i +1 } } } } } } temp =as.data.frame(ls)names(temp) =c("p", "d", "q", "P", "D", "Q", "AIC", "BIC", "AICc") temp =na.omit(temp)return(temp)}highlight_output =function(output, type="ARIMA") { highlight_row <-c(which.min(output$AIC), which.min(output$BIC), which.min(output$AICc)) knitr::kable(output, align ='c', caption =paste("Comparison of", type, "Models")) %>%kable_styling(full_width =FALSE, position ="center") %>%row_spec(highlight_row, bold =TRUE, background ="#FFFF99") # Highlight row in yellow}# Define a function to fit SARIMA and handle errorsfit_sarima <-function(xtrain, p, d, q, P, D, Q, s) { fit <-tryCatch({Arima(xtrain, order =c(p, d, q), seasonal =list(order =c(P, D, Q), period = s),include.drift =FALSE, lambda =0, method ="ML") }, error =function(e) {return(NULL) # Return NULL if an error occurs })return(fit) # Return the fitted model (or NULL)}
The model with p=1 or p=2 looks good. However, only a few variables are significant in the model with p=6, suggesting that VAR(6) may not be the best choice.
Code
fun.var <-function(ts, year, p, s){ fit <-VAR(ts, p=p, type='both') fcast <-predict(fit, n.ahead = s) f1<-fcast$fcst$USD f2<-fcast$fcst$Deficit f3<-fcast$fcst$CommodityPrice f4<-fcast$fcst$CPI ff<-data.frame(f1[,1],f2[,1],f3[,1],f4[,1]) ff<-ts(ff,start=c(year,1),frequency = s)return(ff)}data <- ts_df1n=nrow(data)n_var =ncol(data)h <-4# h: Forecast horizon# k: Initial training set# Calculate k as 1/3rd of the data, rounded down to the nearest multiple of 12k <-floor(n /3/ h) * hnum_iter <- (n - k) / h # Number of rolling iterations# Initialize matrices for RMSErmse1 <-matrix(NA, nrow = n-k, ncol = n_var) # RMSE for Model 1rmse2 <-matrix(NA, nrow = n-k, ncol = n_var) # RMSE for Model 2# Define rolling start timest <-tsp(data)[1] + (k -1) / h # Walk-Forward Validation Loopfor (i in1:num_iter) { xtrain <-window(data, end = st + i -1) xtest <-window(data, start = st + (i -1) +1/h, end = st + i) # Test set for the next 12 months test_start_year <- st + (i-1) +1/h #starting year for predication, i.e. xtest######## VAR(1) ############ ff1 <-fun.var(xtrain, test_start_year, p=1, s=h)######## VAR(2) ############ ff2 <-fun.var(xtrain, test_start_year, p=2, s=h)##### collecting errors ###### a = h*i-h+1 b= h*i rmse1[c(a:b),] <-abs(ff1-xtest) rmse2[c(a:b),] <-abs(ff2-xtest)}rmse_combined <-as.data.frame(rbind(rmse1, rmse2))colnames(rmse_combined) =c("USD","Deficit","CommodityPrice","CPI")rmse_combined$Model <-c(rep("VAR(1)", n-k),rep("VAR(2)", n-k))rmse_combined$Date <-time(data)[(k+1):n]# Create the USD RMSE plot with a legendggplot(data = rmse_combined, aes(x = Date, y = USD, color = Model)) +geom_line() +labs(title ="CV Error for USD",x ="Date",y ="Error",color ="Model" ) +theme_minimal()
# Fit a VAR(1) model including both a constant and trendfit <-VAR(data, p =1, type ="both")forecast(fit, h=h) %>%autoplot() +scale_x_continuous(breaks =2015:2027) +xlab("Year") +theme_minimal()
I believe the forecast is very good. The VAR(1) model has effectively captured the pattern of four varibales.
ir_q <- ir %>%mutate(quarter =floor_date(Date, "quarter")) %>%group_by(quarter) %>%summarise(IR =mean(IR, na.rm =TRUE))unem_q <- data_unem %>%mutate(quarter =floor_date(time, "quarter")) %>%group_by(quarter) %>%summarise(unem =mean(unem, na.rm =TRUE))df2 <-data.frame(Date = dxy_q$quarter, USD =log(dxy_q$Close), InterestRate =log(ir_q$IR),CPI =log(cpi_q$cpi),UnemploymentRate =log(unem_q$unem),GDP =log(gdp$total) )plot_usd <-plot_ly(df2, x =~Date, y =~USD, type ='scatter', mode ='lines', name ='U.S. Dollar')plot_ir <-plot_ly(df2, x =~Date, y =~InterestRate, type ='scatter', mode ='lines', name ='Interest Rate') plot_cpi <-plot_ly(df2, x =~Date, y =~CPI, type ='scatter', mode ='lines', name ='CPI') plot_unem <-plot_ly(df2, x =~Date, y =~UnemploymentRate, type ='scatter', mode ='lines', name ='unemployment Rate') plot_gdp <-plot_ly(df2, x =~Date, y =~GDP, type ='scatter', mode ='lines', name ='GDP') subplot(plot_usd, plot_ir, plot_cpi, plot_unem, plot_gdp, nrows =5, shareX =TRUE) %>%layout(title ="Trend of U.S. Dollar and Related Variables", showlegend =FALSE,xaxis =list(title ='Date'),yaxis =list(title ='U.S. Dollar'),yaxis2 =list(title ='Interest Rate'),yaxis3 =list(title ='CPI'),yaxis4 =list(title ='Unemployment Rate'),yaxis5 =list(title ='GDP'))
The trend of the USD and interest rate is very similar.
Code
ts_df2 <-ts(df2[,-1], start =c(2005,1), frequency =4)VARselect(ts_df2, lag.max=6, type="both")
The model with p=1 or p=2 looks good. However, only a few variables are significant in the model with p=3. Next, we will perform cross-validation to determine the best model.
Code
fun.var.2<-function(ts, year, p, s){ fit <-VAR(ts, p=p, type='both') fcast <-predict(fit, n.ahead = s) f1<-fcast$fcst$USD f2<-fcast$fcst$InterestRate f3<-fcast$fcst$CPI f4<-fcast$fcst$UnemploymentRate f5<-fcast$fcst$GDP ff<-data.frame(f1[,1],f2[,1],f3[,1],f4[,1],f5[,1]) ff<-ts(ff,start=c(year,1),frequency = s)return(ff)}data <- ts_df2n=nrow(data)n_var =ncol(data)h <-4# h: Forecast horizon# k: Initial training set# Calculate k as 1/3rd of the data, rounded down to the nearest multiple of 12k <-floor(n /3/ h) * hnum_iter <- (n - k) / h # Number of rolling iterations# Initialize matrices for RMSErmse1 <-matrix(NA, nrow = n-k, ncol = n_var) # RMSE for Model 1rmse2 <-matrix(NA, nrow = n-k, ncol = n_var) # RMSE for Model 2rmse3 <-matrix(NA, nrow = n-k, ncol = n_var) # RMSE for Model 3# Define rolling start timest <-tsp(data)[1] + (k -1) / h # Walk-Forward Validation Loopfor (i in1:num_iter) { xtrain <-window(data, end = st + i -1) xtest <-window(data, start = st + (i -1) +1/h, end = st + i) # Test set for the next 12 months test_start_year <- st + (i-1) +1/h #starting year for predication, i.e. xtest######## VAR(1) ############ ff1 <-fun.var.2(xtrain, test_start_year, p=1, s=h)######## VAR(2) ############ ff2 <-fun.var.2(xtrain, test_start_year, p=2, s=h)######## VAR(3) ############ ff3 <-fun.var.2(xtrain, test_start_year, p=3, s=h)##### collecting errors ###### a = h*i-h+1 b= h*i rmse1[c(a:b),] <-abs(ff1-xtest) rmse2[c(a:b),] <-abs(ff2-xtest) rmse3[c(a:b),] <-abs(ff3-xtest)}rmse_combined <-as.data.frame(rbind(rmse1, rmse2, rmse3))colnames(rmse_combined) =c("USD","InterestRate","CPI","UnemploymentRate","GDP")rmse_combined$Model <-c(rep("VAR(1)", n-k),rep("VAR(2)", n-k),rep("VAR(3)", n-k))rmse_combined$Date <-time(data)[(k+1):n]# Create the USD RMSE plot with a legendggplot(data = rmse_combined, aes(x = Date, y = USD, color = Model)) +geom_line() +labs(title ="CV Error for USD",x ="Date",y ="Error",color ="Model" ) +theme_minimal()
# Fit a VAR(1) model including both a constant and trendfit <-VAR(data, p =1, type ="both")forecast(fit, h=h) %>%autoplot() +xlab("Year") +theme_minimal()
I believe the forecast is very good. The VAR(1) model has effectively captured the pattern of five varibales.
VAR: USD ~ Stock Price + Bitcoin Price
Now, let’s analyze how the US Dollar Index interacts with other financial assets, such as the S&P 500 Index and Bitcoin.
# USD and S&P500 available on trading days# Bitcoin prices (2015-2024) available daily# Extract trading daysdxy_15 <- dxy[2523:dim(dxy)[1],]sp5_15 <- GSPC[2518:dim(GSPC)[1],]trading_days <-as.Date(intersect(as.Date(rownames(dxy_15)),as.Date(index(sp5_15))))dxy_15 <-filter(dxy_15, Date %in% trading_days)sp5_15 <- sp5_15[trading_days]btc_15 <-`BTC-USD`[trading_days]df3 <-data.frame(Date = dxy_15$Date, USD =log(dxy_15$Close), SP500 =log(sp5_15$GSPC.Close),Bitcoin =log(btc_15[,"BTC-USD.Close"]) )colnames(df3) <-c("Date", "USD", "SP500", "Bitcoin")plot_usd <-plot_ly(df3, x =~Date, y =~USD, type ='scatter', mode ='lines', name ='U.S. Dollar')plot_sp5 <-plot_ly(df3, x =~Date, y =~SP500, type ='scatter', mode ='lines', name ='S&P500 Index') plot_btc <-plot_ly(df3, x =~Date, y =~Bitcoin, type ='scatter', mode ='lines', name ='Bitcoin') subplot(plot_usd, plot_sp5, plot_btc, nrows =3, shareX =TRUE) %>%layout(title ="Trend of U.S. Dollar and Related Variables", showlegend =FALSE,xaxis =list(title ='Date'),yaxis =list(title ='U.S. Dollar'),yaxis2 =list(title ='S&P500 Index'),yaxis3 =list(title ='Bitcoin'))
Code
ts_df3 <-ts(df3[,-1], start =c(2015,1), frequency =252)VARselect(ts_df3, lag.max=6, type="both")
The results looks similar. Next, we will perform cross-validation to determine the best model.
Code
fun.var.3<-function(ts, year, p, s){ fit <-VAR(ts, p=p, type='both') fcast <-predict(fit, n.ahead = s) f1<-fcast$fcst$USD f2<-fcast$fcst$SP500 f3<-fcast$fcst$Bitcoin ff<-data.frame(f1[,1],f2[,1],f3[,1]) ff<-ts(ff,start=c(year,1),frequency = s)return(ff)}data <- ts_df3n=nrow(data)n_var =ncol(data)h <-252# h: Forecast horizon# k: Initial training set# Calculate k as 1/3rd of the data, rounded down to the nearest multiple of 12k <-floor(n /3/ h) * hnum_iter <- (n - k) / h # Number of rolling iterations# Initialize matrices for RMSErmse1 <-matrix(NA, nrow = n-k, ncol = n_var) # RMSE for Model 1rmse2 <-matrix(NA, nrow = n-k, ncol = n_var) # RMSE for Model 2# Define rolling start timest <-tsp(data)[1] + (k -1) / h # Walk-Forward Validation Loopfor (i in1:num_iter) { xtrain <-window(data, end = st + i -1) xtest <-window(data, start = st + (i -1) +1/h, end = st + i) # Test set for the next 12 months test_start_year <- st + (i-1) +1/h #starting year for predication, i.e. xtest######## VAR(2) ############ ff1 <-fun.var.3(xtrain, test_start_year, p=2, s=h)######## VAR(5) ############ ff2 <-fun.var.3(xtrain, test_start_year, p=5, s=h)##### collecting errors ###### a = h*i-h+1 b= h*i rmse1[c(a:b),] <-abs(ff1-xtest) rmse2[c(a:b),] <-abs(ff2-xtest)}rmse_combined <-as.data.frame(rbind(rmse1, rmse2))colnames(rmse_combined) =c("USD","S&P500","Bitcoin")rmse_combined$Model <-c(rep("VAR(2)", n-k),rep("VAR(5)", n-k))rmse_combined$Date <-time(data)[(k+1):n]# Create the USD RMSE plot with a legendggplot(data = rmse_combined, aes(x = Date, y = USD, color = Model)) +geom_line() +labs(title ="CV Error for USD",x ="Date",y ="Error",color ="Model" ) +theme_minimal()
The cross-validation results are very similar. Considering the principle of parsimony, we select VAR(2) as the best model.
# Fit a VAR(2) model including both a constant and trendfit <-VAR(data, p =2, type ="both")forecast(fit, h=h) %>%autoplot() +scale_x_continuous(breaks =2015:2026) +xlab("Year") +theme_minimal()
fun.var.4<-function(ts, year, p, s){ fit <-VAR(ts, p=p, type='both') fcast <-predict(fit, n.ahead = s) f1<-fcast$fcst$USD f2<-fcast$fcst$OilPrice f3<-fcast$fcst$GoldPrice ff<-data.frame(f1[,1],f2[,1],f3[,1]) ff<-ts(ff,start=c(year,1),frequency = s)return(ff)}data <- ts_df4n=nrow(data)n_var =ncol(data)h <-12# h: Forecast horizon# k: Initial training set# Calculate k as 1/3rd of the data, rounded down to the nearest multiple of 12k <-floor(n /3/ h) * hnum_iter <- (n - k) / h # Number of rolling iterations# Initialize matrices for RMSErmse1 <-matrix(NA, nrow = n-k, ncol = n_var) # RMSE for Model 1rmse2 <-matrix(NA, nrow = n-k, ncol = n_var) # RMSE for Model 2# Define rolling start timest <-tsp(data)[1] + (k -1) / h # Walk-Forward Validation Loopfor (i in1:num_iter) { xtrain <-window(data, end = st + i -1) xtest <-window(data, start = st + (i -1) +1/h, end = st + i) # Test set for the next 12 months test_start_year <- st + (i-1) +1/h #starting year for predication, i.e. xtest######## VAR(1) ############ ff1 <-fun.var.4(xtrain, test_start_year, p=1, s=h)######## VAR(2) ############ ff2 <-fun.var.4(xtrain, test_start_year, p=2, s=h)##### collecting errors ###### a = h*i-h+1 b= h*i rmse1[c(a:b),] <-abs(ff1-xtest) rmse2[c(a:b),] <-abs(ff2-xtest)}rmse_combined <-as.data.frame(rbind(rmse1, rmse2))colnames(rmse_combined) =c("USD","Oil Price","Gold Price")rmse_combined$Model <-c(rep("VAR(1)", n-k),rep("VAR(2)", n-k))rmse_combined$Date <-time(data)[(k+1):n]# Create the USD RMSE plot with a legendggplot(data = rmse_combined, aes(x = Date, y = USD, color = Model)) +geom_line() +labs(title ="CV Error for USD",x ="Date",y ="Error",color ="Model" ) +theme_minimal()
The cross-validation result of VAR(1) is not significantly better than that of VAR(2). Therefore, we proceed with VAR(2) as suggested by VARselect(), since it has a lower AIC.
# Fit a VAR(2) model including both a constant and trendfit <-VAR(data, p =2, type ="both")forecast(fit, h=h) %>%autoplot() +xlab("Year") +theme_minimal()
SARIMAX: USD ~ Egg Price
In this analysis, we will explore how the domestic market, specifically egg prices, affects the USD.
df5 <-data.frame(Date = dxy_m$month, dxy =log(dxy_m$Close), egg =log(egg$Price))plot_usd <-plot_ly(df5, x =~Date, y =~dxy, type ='scatter', mode ='lines', name ='U.S. Dollar')plot_egg <-plot_ly(df5, x =~Date, y =~egg, type ='scatter', mode ='lines', name ='Egg Price') subplot(plot_usd, plot_egg, nrows =2, shareX =TRUE) %>%layout(title ="Trend of U.S. Dollar and Egg Price", showlegend =FALSE,xaxis =list(title ='Date'),yaxis =list(title ='U.S. Dollar'),yaxis2 =list(title ='Egg Price'))
The trend of the USD and egg prices is very similar. Thus, we can fit an ARIMAX model using the USD as the dependent variable and egg prices as the exogenous variable.
Code
ts_df5 <-ts(df5, start =c(2005, 1), frequency =12)auto.arima(ts_df5[,"dxy"], xreg = ts_df5[,"egg"])
# Fit a linear modellm_fit.5<-lm(dxy ~ egg, data = df5)res.5<-ts(residuals(lm_fit.5), start =c(2005, 1), frequency =12)# ACF and PACF plots of the residualsggtsdisplay(diff(res.5), main ="Differenced residuals") # p=0:1, d=1, q=0:1, P=0:3, D=1, Q=0:1
Coefficients:
Estimate SE t.value p.value
ar1 0.2471 0.0641 3.8530 2e-04
sma1 -1.0000 0.0612 -16.3362 0e+00
sigma^2 estimated as 0.0003659517 on 225 degrees of freedom
AIC = -4.890285 AICc = -4.890049 BIC = -4.845022
The best model for the residuals of the linear model is ARIMA(1,1,0)(0,1,1)[12].
The Residual Plot of the ARIMA model shows nearly consistent fluctuation around zero, suggesting that the residuals are nearly stationary with a constant mean and finite variance over time.
The Autocorrelation Function (ACF) of the residuals shows mostly independence.
The Q-Q Plot indicates that the residuals follow a near-normal distribution, with minor deviations at the tails, which is typical in time series data.
The Ljung-Box Test p-values are mostly above the 0.05 significance level, implying that a few autocorrelations are left in the residuals.
All coefficients are significant at the 5% significant level.
Code
# Define parametersdata <- ts_df5n <-nrow(data) # Total observationsh <-12# h: Forecast horizon# k: Initial training set# Calculate k as 1/3rd of the data, rounded down to the nearest multiple of 12k <-floor(n /3/ h) * hnum_iter <- (n - k) / h # Number of rolling iterations# Initialize matrices for RMSErmse1 <-matrix(NA, nrow = num_iter, ncol = h) # RMSE for Model 1rmse2 <-matrix(NA, nrow = num_iter, ncol = h) # RMSE for Model 2# Define rolling start timest <-tsp(data)[1] + (k -1) / h # Walk-Forward Validation Loopfor (i in1:num_iter) { xtrain <-window(data, end = st + i -1) xtest <-window(data, start = st + (i -1) +1/h, end = st + i) # Test set for the next 12 months# Fit auto.arima() model model.1<-Arima(xtrain[,"dxy"], order =c(1,1,2), seasonal =list(order =c(2,0,1), period =12),xreg = xtrain[, "egg"],optim.control =list(maxit =10000)) f.1<-forecast(model.1, xreg = xtest[, "egg"], h = h) rmse1[i,] <- (f.1$mean - xtest[,"dxy"])^2###### Fit manual model model.2<-Arima(xtrain[, "dxy"], order =c(1,1,0),seasonal =list(order =c(0,1,1), period =12),xreg = xtrain[, "egg"],optim.control =list(maxit =10000)) f.2<-forecast(model.2, xreg = xtest[, "egg"], h = h) rmse2[i,] <- (f.2$mean - xtest[,"dxy"])^2} # Compute RMSE across all iterationsrmse1_avg <-sqrt(colMeans(rmse1, na.rm =TRUE))rmse2_avg <-sqrt(colMeans(rmse2, na.rm =TRUE))# Create a DataFrame for better visualizationerror_table <-data.frame(Horizon =1:h,RMSE_Model1 = rmse1_avg,RMSE_Model2 = rmse2_avg)# **Improved RMSE Plot using ggplot2**ggplot(error_table, aes(x = Horizon)) +geom_line(aes(y = RMSE_Model1, color ="Regression with ARIMA(1,1,2)(2,0,1)[12] errors"), size =1) +geom_line(aes(y = RMSE_Model2, color ="Regression with ARIMA(1,1,0)(0,1,1)[12] errors"), size =1) +labs(title ="RMSE Comparison for 12-Step Forecasts",x ="Forecast Horizon (Months Ahead)",y ="Root Mean Squared Error (RMSE)") +scale_color_manual(name ="Models", values =c("red", "blue")) +theme_minimal()
ARIMA(1,1,2)(2,0,1)[12] looks better.
Fit and summarize the best model:
Code
best_model <-Arima(data[,"dxy"], order =c(1,1,2), seasonal =list(order =c(2,0,1), period =12),xreg = data[, "egg"],optim.control =list(maxit =10000)) summary(best_model)
Series: data[, "dxy"]
Regression with ARIMA(1,1,2)(2,0,1)[12] errors
Coefficients:
ar1 ma1 ma2 sar1 sar2 sma1 xreg
-0.9684 1.2983 0.3595 -0.704 -0.1496 0.5654 0.0019
s.e. 0.0264 0.0678 0.0635 0.449 0.0765 0.4505 0.0125
sigma^2 = 0.0002711: log likelihood = 645.57
AIC=-1275.13 AICc=-1274.5 BIC=-1247.32
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set 0.0009160148 0.01618699 0.01272872 0.01965128 0.28394 0.2172217
ACF1
Training set -0.01524275
arima_egg <-auto.arima(ts_df5[,"egg"])f_egg <-forecast(arima_egg, h = h)f.5<-forecast(best_model, xreg = f_egg$mean, h = h)autoplot(f.5) +labs(title ="12-Month Forecast of Log-Transformed U.S. Dollar",x ="Time", y ="Log-transformed USD") +theme_minimal()
I believe the forecast is very good. By adding the exogenous variable (egg price) to the SARIMA model, the SARIMAX model has effectively captured the pattern of USD.
ARIMAX: USD ~ House Price + International Visitors
Here, we will examine how the real estate and tourism market affect the USD.
visitors_q <- visitors %>%mutate(quarter =floor_date(time, "quarter")) %>%group_by(quarter) %>%summarise(count =mean(count, na.rm =TRUE))df6 <-data.frame(Date = dxy_q$quarter, dxy =log(dxy_q$Close), house =log(house$index),visitors =log(visitors_q$count))plot_usd <-plot_ly(df6, x =~Date, y =~dxy, type ='scatter', mode ='lines', name ='U.S. Dollar')plot_house <-plot_ly(df6, x =~Date, y =~house, type ='scatter', mode ='lines', name ='House Price') plot_visitors <-plot_ly(df6, x =~Date, y =~visitors, type ='scatter', mode ='lines', name ='Visitors') subplot(plot_usd, plot_house, plot_visitors, nrows =3, shareX =TRUE) %>%layout(title ="Trend of U.S. Dollar and Related Variables ", showlegend =FALSE,xaxis =list(title ='Date'),yaxis =list(title ='U.S. Dollar'),yaxis2 =list(title ='House Price'),yaxis3 =list(title ="Num of Intl Visitors"))
Both the USD and house prices show an upward trend. Thus, we can fit an ARIMAX model using the USD as the dependent variable, house prices and international visitors as the exogenous variables.
# Fit a linear modellm_fit.6<-lm(dxy ~ house + visitors, data = df6)res.6<-ts(residuals(lm_fit.6), start =c(2005, 1), frequency =4)# ACF and PACF plots of the residualsggtsdisplay(diff(res.6), main ="Differenced Residuals") # p=0:2, d=1, q=0:3
The best model for the residuals of the linear model is ARIMA(0,1,1), since it has lower AIC and AICc values.
For both models:
The Residual Plot of the ARIMA model shows nearly consistent fluctuation around zero, suggesting that the residuals are nearly stationary with a constant mean and finite variance over time.
The Autocorrelation Function (ACF) of the residuals shows perfectly independence.
The Q-Q Plot indicates that the residuals follow a near-normal distribution, with minor deviations at the tails, which is typical in time series data.
The Ljung-Box Test p-values are all above the 0.05 significance level, implying that no autocorrelations are left in the residuals.
All coefficients are significant at the 5% significant level.
Code
# Define parametersdata <- ts_df6n <-nrow(data) # Total observationsh <-4# h: Forecast horizon# k: Initial training set# Calculate k as 1/3rd of the data, rounded down to the nearest multiple of 4k <-floor(n /3/ h) * hnum_iter <- (n - k) / h # Number of rolling iterations# Initialize matrices for RMSErmse1 <-matrix(NA, nrow = num_iter, ncol = h) # RMSE for Model 1rmse2 <-matrix(NA, nrow = num_iter, ncol = h) # RMSE for Model 2# Define rolling start timest <-tsp(data)[1] + (k -1) / h # Walk-Forward Validation Loopfor (i in1:num_iter) { xtrain <-window(data, end = st + i -1) xtest <-window(data, start = st + (i -1) +1/h, end = st + i) # Test set for the next 4 months# Fit auto.arima() model model.1<-Arima(xtrain[,"dxy"], order =c(0,1,1), xreg = xtrain[, c("house", "visitors")],optim.control =list(maxit =10000)) f.1<-forecast(model.1, xreg = xtest[, c("house", "visitors")], h = h) rmse1[i,] <- (f.1$mean - xtest[,"dxy"])^2###### Fit manual model model.2<-Arima(xtrain[, "dxy"], order =c(1,1,2),xreg = xtrain[, c("house", "visitors")],optim.control =list(maxit =10000)) f.2<-forecast(model.2, xreg = xtest[, c("house", "visitors")], h = h) rmse2[i,] <- (f.2$mean - xtest[,"dxy"])^2} # Compute RMSE across all iterationsrmse1_avg <-sqrt(colMeans(rmse1, na.rm =TRUE))rmse2_avg <-sqrt(colMeans(rmse2, na.rm =TRUE))# Create a DataFrame for better visualizationerror_table <-data.frame(Horizon =1:h,RMSE_Model1 = rmse1_avg,RMSE_Model2 = rmse2_avg)# **Improved RMSE Plot using ggplot2**ggplot(error_table, aes(x = Horizon)) +geom_line(aes(y = RMSE_Model1, color ="Regression with ARIMA(0,1,1) errors"), size =1) +geom_line(aes(y = RMSE_Model2, color ="Regression with ARIMA(1,1,2) errors"), size =1) +labs(title ="RMSE Comparison for 4-Step Forecasts",x ="Forecast Horizon (Months Ahead)",y ="Root Mean Squared Error (RMSE)") +scale_color_manual(name ="Models", values =c("red", "blue")) +theme_minimal()
Series: data[, "dxy"]
Regression with ARIMA(0,1,1) errors
Coefficients:
ma1 house visitors
0.4995 0.3585 -0.0118
s.e. 0.0878 0.2114 0.0071
sigma^2 = 0.0007382: log likelihood = 174.14
AIC=-340.28 AICc=-339.74 BIC=-330.8
Training set error measures:
ME RMSE MAE MPE MAPE
Training set 2.440354e-05 0.02648176 0.02082905 0.0001847146 0.4645378
MASE ACF1
Training set 0.3713074 0.03777751
With the parameters from the model: \[
\begin{gathered}
y_t=\beta x_t+n_t \\
y_t=0.3585 \text { House Price } -0.0118 \text{International Visitors} +n_t
\end{gathered}
\]
arima_house <-auto.arima(ts_df6[,"house"])f_house <-forecast(arima_house, h = h)arima_visitors <-auto.arima(ts_df6[,"visitors"])f_visitors <-forecast(arima_visitors, h = h)xreg <-cbind(f_house$mean, f_visitors$mean)colnames(xreg) <-c("house", "visitors")f.6<-forecast(best_model, xreg = xreg, h = h)autoplot(f.6) +labs(title ="4-Quarter Forecast of Log-Transformed U.S. Dollar",x ="Time", y ="Log-transformed USD") +theme_minimal()
I believe the forecast is very good. By adding the exogenous variables (house price and the number of international visitors) to the ARIMA model, the ARIMAX model has effectively captured the pattern of USD.
Conclusion
🔹 First, to represent international trade, I built a VAR(1) model including the trade deficit, commodity prices, and the CPI.
The parameter matrix shows that a widening trade deficit leads to dollar depreciation, which aligns with economic intuition — a negative trade balance puts downward pressure on the currency. Commodity prices and the USD are also negatively correlated, likely reflecting that rising global commodity prices increase import costs. CPI, however, is positively related to the dollar, possibly due to expectations of tighter monetary policy in response to inflation.
🔹 Second, for macroeconomic fundamentals, I modeled the USD alongside interest rates, CPI, unemployment rate, and GDP.
The model suggests the dollar strengthens when interest rates, inflation, and GDP increase — which is consistent with capital inflows attracted by higher returns and a stronger economy. Unemployment, by contrast, is negatively related to the USD, reflecting that economic slack weakens confidence in the currency.
🔹 Third, I explored financial market dynamics using a VAR(2) model with S&P 500 and Bitcoin prices.
At lag one, the dollar and Bitcoin appear positively correlated — this may indicate that both are being used as alternative safe-haven assets. The S&P 500, however, is negatively related to the dollar, supporting the idea that when investors are optimistic about risky assets, the dollar tends to weaken.
🔹 Fourth, I examined the global commodity market with oil and gold prices in a VAR(2) framework.
As expected, gold is negatively related to the USD, consistent with its role as a traditional hedge when the dollar weakens. Oil also shows a negative relationship — rising oil prices may signal global inflation or geopolitical risks that weigh on the dollar. This finding also aligns with the earlier observation that the USD is negatively related to the global commodity price index.
🔹 Fifth, I introduced a SARIMAX model using egg prices as a proxy for domestic food inflation and supply-side shocks. This model predicted a slight upward trend in the USD.
🔹 Finally, to capture the real economy, I applied an ARIMAX(0,1,1) model using U.S. house prices and the number of international visitors.
House prices were positively related to the dollar, reflecting stronger domestic demand and investment sentiment. In contrast, international tourism had a negative effect — perhaps because a stronger dollar reduces the attractiveness of the U.S. as a travel destination. This model also produced an upward trend in the USD forecast.
Overall, the models suggest that the USD is influenced by a complex interplay of trade balances, macroeconomic fundamentals, financial market dynamics, commodity prices, and domestic economic indicators. The forecasts indicate a generally upward trend in the USD, driven by strong fundamentals and positive sentiment in the U.S. economy.