ARMA/ARIMA/SARIMA Models

Dataset 1 - Electricity Production

1) Data Review

From EDA, we looked at ACF graphs and checked ADF to see if the data was stationary. The data was not stationary so we’ll take the log to try to make it at least weakly stationary.

Log Transform energy production outputs - and plot

Check if data is now stationary using ACF plot

It’s now weakly stationary! No further differencing necessary.

2) ACF/PACF Plots

ACF/PACF plots are used to select potential p and q values for an ARMA(p,q) model. ARMA is used because no differencing of the data occurred. From these plots, some potential p and q values are selected to fit an ARMA model with. Separate models have to be fit for each energy source, thus separate ACF/PACF plots are shown below.

Coal

Moving average order (found from ACF): q - 1,2,3,4
Autoregressive term (found from pACF): p - 1

Nuclear

Moving average order (found from ACF): q - 1,2,3,4,5
Autoregressive term (found from pACF): p - 1

Wind

Moving average order (found from ACF): q - 1,2,3,4,5
Autoregressive term (found from pACF): p - 1

Solar

Moving average order (found from ACF): q - 1,2,3
Autoregressive term (found from pACF): p - 1

3) Fit ARIMA(p,d,q) models

Coal

p	q	AIC
1	1	1319.547
1	2	1325.199
1	3	1312.109
1	4	1313.924

Nuclear

p	q	AIC
1	1	1213.296
1	2	1212.388
1	3	1214.309
1	4	1225.694
1	5	1229.118

Wind

p	q	AIC
1	1	1144.565
1	2	1147.529
1	3	1148.876
1	4	1147.240
1	5	1150.403

Solar

p	q	AIC
1	1	1069.721
1	2	1063.944
1	3	1073.950

4) Model Diagnostics

With the above exploration, a best fit model is chosen for each energy source by minimizing AIC.

Coal

The ARMA(p,q) model that minimizes AIC is ARMA(1,3).


Call:
arma(x = coal_ts, order = c(1, 3))

Model:
ARMA(1,3)

Residuals:
       Min         1Q     Median         3Q        Max 
-467564239 -111368527   52069246  112738236  326924565 

Coefficient(s):
            Estimate  Std. Error    t value Pr(>|t|)    
ar1        1.074e+00   9.793e-03    109.664   <2e-16 ***
ma1       -3.294e-01   2.462e-01     -1.338   0.1810    
ma2       -3.552e-02   6.203e-01     -0.057   0.9543    
ma3        5.782e-01   2.625e-01      2.203   0.0276 *  
intercept -2.978e+08   1.120e+04 -26602.741   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Fit:
sigma^2 estimated as 2.75e+16,  Conditional Sum-of-Squares = 7.701038e+17,  AIC = 1312.11

ARMA(1,3) for coal produces an AIC value of 1312.11 and a CSS value of 7.7x10^17. Using the model summary and plot of the model against the data above, it seems that the model fits the data pretty well but not perfectly. The randomness and sensitivity of the original data makes it hard for the model to truly fit well without overfitting. To achieve a better fit would require overfitting the training data and wouldn’t generalize well to new data.

\[ \phi(B) = 1 - 1.07(B) \] \[ \theta(B) = 1 + 3.29*10^-1 *(B) + 3.552*10^-2*(B^2) - 5.782*10^-1*(B^3) \]

Nuclear

The ARMA(p,q) model that minimizes AIC is ARMA(1,2).


Call:
arma(x = nuclear_ts, order = c(1, 2))

Model:
ARMA(1,2)

Residuals:
       Min         1Q     Median         3Q        Max 
-114143177  -26807890    5091034   22515762   52763712 

Coefficient(s):
            Estimate  Std. Error   t value Pr(>|t|)    
ar1        8.976e-01   3.176e-03   282.600   <2e-16 ***
ma1        5.650e-02   1.914e-01     0.295   0.7678    
ma2       -3.256e-01   1.744e-01    -1.868   0.0618 .  
intercept  1.657e+08   7.670e+03 21598.292   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Fit:
sigma^2 estimated as 1.298e+15,  Conditional Sum-of-Squares = 3.764894e+16,  AIC = 1212.39

ARMA(1,2) for nuclear produces an AIC value of 1212.39 and a CSS value of 3.76x10^16. It seems that the model fits the data pretty well but not perfectly. To achieve a better fit would require overfitting the training data and wouldn’t generalize well to new data.

\[ \phi(B) = 1 - 8.976*10^-1(B) \] \[ \theta(B) = 1 - 5.65*10^-2 *(B) + 3.256*10^-1*(B^2) \]

Wind

The ARMA(p,q) model that minimizes AIC is ARMA(1,1).


Call:
arma(x = wind_ts, order = c(1, 1))

Model:
ARMA(1,1)

Residuals:
      Min        1Q    Median        3Q       Max 
-22040549  -6078188  -4049714   3768737  33558013 

Coefficient(s):
           Estimate  Std. Error  t value Pr(>|t|)    
ar1       1.105e+00   1.202e-02   91.912   <2e-16 ***
ma1       4.621e-01   1.890e-01    2.444   0.0145 *  
intercept 6.648e+06   3.527e+04  188.479   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Fit:
sigma^2 estimated as 1.659e+14,  Conditional Sum-of-Squares = 4.976633e+15,  AIC = 1144.57

ARMA(1,1) for wind produces an AIC value of 1144.57 and a CSS value of 4.97x10^15. The model fits this data much better than the previous data as it follows a steadier upward trend over time as wind production has been more widely adopted by US states. Additionally, wind power is less prone to political shocks so is easier to predict and fit.

\[ \phi(B) = 1 - 1.05(B) \] \[ \theta(B) = 1 + 4.62*10^-1 *(B) \]

Solar

The ARMA(p,q) model that minimizes AIC is ARMA(1,2).


Call:
arma(x = solar_ts, order = c(1, 2))

Model:
ARMA(1,2)

Residuals:
     Min       1Q   Median       3Q      Max 
-8923843   -53422   331446  1505395 10832340 

Coefficient(s):
            Estimate  Std. Error  t value Pr(>|t|)    
ar1        1.268e+00   1.864e-02   68.028  < 2e-16 ***
ma1        1.018e+00   1.741e-01    5.849 4.96e-09 ***
ma2       -2.671e-01   2.142e-01   -1.247    0.212    
intercept -5.627e+05         NaN      NaN      NaN    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Fit:
sigma^2 estimated as 1.255e+13,  Conditional Sum-of-Squares = 3.82509e+14,  AIC = 1063.94

Simiarly, ARMA(1,2) for solar produces an AIC value of 1,063.94 and a CSS value of 3.82x10^9. The model fits this data much better than the previous data as it follows a steadier upward trend over time as solar production has been more widely adopted by US states. Additionally, solar power is less prone to political shocks so is easier to predict and fit.

\[ \phi(B) = 1 - 1.268(B) \] \[ \theta(B) = 1 - 1.018(B) + 2.671*10^-1*(B^2) \]

5) Fit an ARMA(p,q) model using auto.arima()

Coal

Series: coal_ts 
ARIMA(2,2,2) 

Coefficients:
          ar1      ar2      ma1     ma2
      -0.2910  -0.8112  -0.7750  0.8498
s.e.   0.1896   0.1227   0.2165  0.3065

sigma^2 = 2.109e+16:  log likelihood = -606.73
AIC=1223.45   AICc=1225.95   BIC=1230.46

Auto.arima() chooses AR(2,2) as the best model, very different from my chosen model of AR(1,3). This is likely because auto.arima() considers a more comprehensive set of parameters (i.e. I didn’t check p=2 as a possible value) and I chose my model based on AIC alone whereas auto.arima() chooses models based on AIC, BIC, and AICc.

Nuclear

Series: nuclear_ts 
ARIMA(0,1,0) with drift 

Coefficients:
         drift
      12988794
s.e.   7504730

sigma^2 = 1.79e+15:  log likelihood = -587.85
AIC=1179.71   AICc=1180.13   BIC=1182.57

The model chosen here is (0,1,0) which is bizarre. This only applies differencing, but doesn’t use AutoRegressive modeling or Moving Average modeling. This is likely because the data is hard to fit.

Wind

Series: wind_ts 
ARIMA(0,2,1) 

Coefficients:
          ma1
      -0.4339
s.e.   0.1699

sigma^2 = 2.172e+14:  log likelihood = -537.34
AIC=1078.68   AICc=1079.13   BIC=1081.49

The model chosen is ARIMA(0,2,1), which is essentially an MA() model with differencing. The search algorithm decided to difference twice, where I didn’t because I already saw weak stationarity in initial exploration.

Solar

Series: solar_ts 
ARIMA(0,2,1) 

Coefficients:
         ma1
      0.8289
s.e.  0.1120

sigma^2 = 2.766e+13:  log likelihood = -506.9
AIC=1017.81   AICc=1018.25   BIC=1020.61

6) Forecast

Forecast each energy source for the next three years - using manually selected models, but including 2 orders of differencing for each.

Coal

Forecasting coal production three years into the future, it’s expected to continue declining along the lines of its current downward trend. There’s quite a large confidence band around this estimate, as many other factors will affect coal production - such as the geopolitical climate surrounding coal.

Nuclear

Forecasting nuclear production three years into the future, it’s expected to stay fairly steady along the 1.5x10^9 tons line. There’s quite a large confidence band around this estimate, as many other factors will affect wind production - especially political sentiment around its usage.

Wind

Forecasting wind production three years into the future, it’s expected to continually exponentially growing as per its existing trend. There’s a much smaller confidence band around this esimate, as its seen a steady increase over this timeframe. Infrastructure improvements in the space will affect the estimate greatly.

Solar

Forecasting solar production three years into the future, it’s expected to continually exponentially growing as per its existing trend. There’s a much smaller confidence band around this esimate, as its seen a steady increase over this timeframe. Infrastructure improvements in the space will affect the estimate greatly.

7) Benchmark Methods

Compare the models to baseline benchmark methods to ensure they’re performing above.

Coal

Nuclear

Wind

Solar

Note: Across all four energy sources, the fitted model does perform better than benchmark methods.

Dataset 2 - Petroleum Exports vs. Imports

1) Data Review

From EDA, we looked at ACF graphs and checked ADF to see if the data was stationary. The data was not stationary so we take the log and difference to make it stationary.

Log Transform petroleum imports, exports, and production - and plot

Since the data is still not stationary, exports, imports, and production will be differenced.

Imports


    Augmented Dickey-Fuller Test

data:  import_ts %>% diff()
Dickey-Fuller = -3.7192, Lag order = 2, p-value = 0.04149
alternative hypothesis: stationary

Exports

Must be differenced twice. A first difference results in a p-value of 0.32 by the ADF test: indicating that we can’t reject the null hypothesis of no stationarity. Stationarity is only achieved with a second difference.


    Augmented Dickey-Fuller Test

data:  export_ts %>% diff() %>% diff()
Dickey-Fuller = -4.5144, Lag order = 2, p-value = 0.01
alternative hypothesis: stationary

Production Must be differenced twice. A first difference results in a p-value of 0.99 by the ADF test: indicating that we can’t reject the null hypothesis of no stationarity. Stationarity is only achieved with a second difference.


    Augmented Dickey-Fuller Test

data:  production_ts %>% diff() %>% diff()
Dickey-Fuller = -4.0549, Lag order = 2, p-value = 0.02134
alternative hypothesis: stationary

All three variables were differenced once or twice in order to achieve stationarity. The stationary property of the data is confirmed by the ACF plots of each, as well as the Augmented-Dickey-Fuller Test of the differenced series.

2) ACF/PACF Plots

ACF/PACF plots are used to select potential p and q values for an ARIMA(p,d,q) model. From these plots, some potential p and q values are selected to fit an ARIMA model with. Separate models have to be fit for each variable thus separate ACF/PACF plots are shown below.

Import

Moving average order (found from ACF): q - 1,2,3
Autoregressive term (found from pACF): p - 1

Exports

Moving average order (found from ACF): q - 1,2,3
Autoregressive term (found from pACF): p - 1

Production

Moving average order (found from ACF): q - 1
Autoregressive term (found from pACF): p - 1

3) Fit ARIMA(p,d,q) models

Imports

p	d	q	AIC	BIC	AICc
1	1	1	1.764891	5.171374	3.028049
1	2	1	1.782784	5.055911	3.116117
1	3	1	10.774584	13.908151	12.186349
1	1	2	3.519834	8.061811	5.742056
1	2	2	3.708435	8.072605	6.061376
1	3	2	7.919693	12.097783	10.419693
1	1	3	3.394707	9.072178	6.924119
1	2	3	5.588815	11.044027	9.338815
1	3	3	8.751611	13.974223	12.751611

Exports

p	d	q	AIC	BIC	AICc
1	1	1	10.408080	13.81456	11.67124
1	2	1	11.644777	14.91790	12.97811
1	3	1	19.463857	22.59742	20.87562
1	1	2	9.090324	13.63230	11.31255
1	2	2	12.512548	16.87672	14.86549
1	3	2	21.466731	25.64482	23.96673
1	1	3	11.089958	16.76743	14.61937
1	2	3	11.108577	16.56379	14.85858
1	3	3	18.917698	24.14031	22.91770

Production

p	d	q	AIC	BIC	AICc
1	1	0	-42.47036	-40.19937	-41.87036
1	2	0	-40.83047	-38.64838	-40.19889
1	3	0	-29.22643	-27.13739	-28.55977
1	1	1	-40.55816	-37.15168	-39.29500
1	2	1	-38.89113	-35.61800	-37.55780
1	3	1	-33.56462	-30.43105	-32.15285
2	1	0	-40.56932	-37.16284	-39.30617
2	2	0	-38.93721	-35.66408	-37.60387
2	3	0	-28.70668	-25.57311	-27.29491
2	1	1	-38.76965	-34.22767	-36.54743
2	2	1	-37.03220	-32.66803	-34.67926
2	3	1	-32.16038	-27.98229	-29.66038

4) Model Diagnostics

With the above exploration, a best fit model for each variable is chosen by minimizing AIC, AICc, and BIC

Imports

The ARIMA model that minimizes AIC and AICc is ARIMA(1,1,1)


Call:
arima(x = import_ts, order = c(1, 1, 1))

Coefficients:
         ar1      ma1
      0.5586  -0.1272
s.e.  0.4354   0.5220

sigma^2 estimated as 0.04819:  log likelihood = 2.12,  aic = 1.76

Training set error measures:
                     ME      RMSE       MAE       MPE     MAPE      MASE
Training set 0.05626467 0.2149074 0.1541133 0.7196743 1.833747 0.7871504
                   ACF1
Training set -0.0981568

With this model, we get a training RMSE of 0.215, so the model fits the training data fairly well without overfitting due to its bizarre behavior.

\[ \phi(B) = 1 - 0.5586(B) \] \[ \theta(B) = 1 - 0.4354(B) \]

Exports

The consensus across AIC, BIC, and AICc is that the best fit model is ARIMA(1,1,2).


Call:
arima(x = export_ts, order = c(1, 1, 2))

Coefficients:
          ar1     ma1     ma2
      -0.8486  1.7381  0.9117
s.e.   0.1701  0.3518  0.3438

sigma^2 estimated as 0.05395:  log likelihood = -0.55,  aic = 9.09

Training set error measures:
                     ME      RMSE       MAE       MPE     MAPE      MASE
Training set 0.07329069 0.2273772 0.1747664 0.9874671 2.719147 0.7576492
                   ACF1
Training set -0.0979957

\[ \phi(B) = 1 + 0.8486(B) \] \[ \theta(B) = 1 - 1.7101(B) - 0.9117(B^2) \] With this model, we get a training RMSE of 0.2273. As evidenced by the plot, the model does a pretty terrible job with this model fit - likely due to the complex adn random nature of the data. Let’s see what auto.arima() would have chosen for this model.

Production

The consensus across AIC, BIC, and AICc is that the best fit model is ARIMA(1,1,0).


Call:
arima(x = production_ts, order = c(1, 1, 0))

Coefficients:
         ar1
      0.8427
s.e.  0.1421

sigma^2 estimated as 0.007357:  log likelihood = 23.24,  aic = -42.47

Training set error measures:
                     ME       RMSE        MAE       MPE     MAPE      MASE
Training set 0.01526092 0.08398331 0.06473024 0.1645849 0.705565 0.7023005
                    ACF1
Training set 0.008829977

\[ \phi(B) = 1 - 0.8427(B) \]

With this model, we get a training RMSE of 0.08398331, so the model fits the training data fairly well without overfitting due to its bizarre behavior.

5) Fit an ARIMA(p,d,q) model using auto.arima()

Imports

Series: import_ts 
ARIMA(0,1,0) with drift 

Coefficients:
       drift
      0.1189
s.e.  0.0450

sigma^2 = 0.04869:  log likelihood = 2.63
AIC=-1.26   AICc=-0.66   BIC=1.01

The model chosen is ARIMA(0,1,0) with drift. While auto.arima() agrees with my choice of differencing, it chose different parameters for p and q, and experimented with drift whereas I did not.

Exports

Series: export_ts 
ARIMA(0,1,1) with drift 

Coefficients:
         ma1   drift
      0.4822  0.1498
s.e.  0.1514  0.0766

sigma^2 = 0.06913:  log likelihood = -1
AIC=7.99   AICc=9.26   BIC=11.4

The model chosen is ARIMA(0,1,1) with drift. This is slightly different than my choice, likely because auto.arima() experimented with a wider range of parameters and experimented with drift.

Production

Series: production_ts 
ARIMA(2,0,0) with non-zero mean 

Coefficients:
         ar1      ar2    mean
      1.7223  -0.8478  9.1494
s.e.  0.1344   0.1229  0.1388

sigma^2 = 0.007907:  log likelihood = 23.34
AIC=-38.69   AICc=-36.58   BIC=-33.98

The model chosen is ARIMA(2,0,0) with non-zero mean - or an AR(2) model. AR is more suitable when the data has a clear pattern of autoregression and less randomness. This potentially makes sense given the lack of seasonality and large overall upward trend in production since 1970.

6) Forecast

Forecasting petroleum imports, exports, and production for the next three years.

All three forecasts have a wide confidence band - indicating that based on past data alone we cannot generate a good estimate with high confidence. Imports are expected to increase slightly in the next 3 years, but have mostly stabilized. Exports are expected to increase for a year and then plateau/decrease for the next two years. Production is expected to continue exponentially increasing during the next three years.

7) Benchmark

The models perform better than most of the models, expect the drift model. It’s clear that the models would benefit from incorporating a drift term to account for the intense drift especially production and exports suffer from.

Dataset 3 - Cost of Energy

1) Data Review

From EDA, we looked at ACF graphs and checked ADF to see if the data was stationary. The data was not stationary so we take the log and difference to make it stationary.

Log Transform - and plot

Taking the log of the data barely changed it! We’ll have to difference it. According to the ADF test, the data requires 2 differences. We can experiment with this during model building.


    Augmented Dickey-Fuller Test

data:  cost_ts %>% diff()
Dickey-Fuller = -2.6121, Lag order = 3, p-value = 0.3332
alternative hypothesis: stationary

2) ACF/PACF Plots

Import

Moving average order (found from ACF): q - 1,2,3,4,5
Autoregressive term (found from pACF): p - 0,1

3) Fit ARIMA(p,d,q) models

p	d	q	AIC	BIC	AICc
0	2	1	-39.93109	-36.60397	-39.59776
0	3	1	-17.18553	-13.91036	-16.84267
0	2	2	-38.52660	-33.53591	-37.84088
0	3	2	-29.18752	-24.27477	-28.48164
0	2	3	-36.60530	-29.95105	-35.42883
0	3	3	-27.35465	-20.80430	-26.14253
0	2	4	-37.89350	-29.57569	-36.07532
0	3	4	-26.43893	-18.25100	-24.56393
0	2	5	-36.23746	-26.25609	-33.61246
0	3	5	-26.28811	-16.46259	-23.57843
1	2	1	-38.52145	-33.53077	-37.83574
1	3	1	-22.09809	-17.18533	-21.39221
1	2	2	-39.77465	-33.12040	-38.59818
1	3	2	-27.31762	-20.76727	-26.10550
1	2	3	-37.90967	-29.59186	-36.09149
1	3	3	-28.53833	-20.35040	-26.66333
1	2	4	-36.48416	-26.50279	-33.85916
1	3	4	-27.12777	-17.30225	-24.41809
1	2	5	-34.48427	-22.83934	-30.87137
1	3	5	-25.13525	-13.67215	-21.40192

4) Model Diagnostics

With the above exploration, a best fit model for each variable is chosen by minimizing AIC, AICc, and BIC

The ARIMA model that minimizes AIC, BIC, and AICc is ARIMA(0,2,1)


Call:
arima(x = cost_ts, order = c(0, 2, 1))

Coefficients:
          ma1
      -0.8515
s.e.   0.1333

sigma^2 estimated as 0.01836:  log likelihood = 21.97,  aic = -39.93

Training set error measures:
                       ME      RMSE        MAE        MPE     MAPE      MASE
Training set -0.007565158 0.1321637 0.09492191 -0.1831388 7.305725 0.9102723
                   ACF1
Training set 0.06219478

With this model, we get a training RMSE of 0.243, so the model fits the training data fairly well without overfitting. The random bumps and dips in the data have been largely smoothed, while the trend line is maintained.

\[ \theta(B) = 1 + 0.8616(B) \]

5) Fit an ARIMA(p,d,q) model using auto.arima()

Series: cost_ts 
ARIMA(0,1,0) with drift 

Coefficients:
       drift
      0.0572
s.e.  0.0207

sigma^2 = 0.01756:  log likelihood = 24.6
AIC=-45.19   AICc=-44.87   BIC=-41.81

The model chosen by auto.arima() is ARIMA(0,1,0) with drift. This is similar to the model parameters that I manually chose - except that auto.arima() chose to perform a second difference of the data and didn’t see the need for a moving average term.

6) Forecast

Forecasting electricity costs for the next three years.

According to our model, electricity costs are going to continue to rise over the next three years. Though there is ia decently large confidence band around the estimate, it is predicted that the slight dip in electricity costs during the Great Recession in 2008 were short lived and the market is going to recover. Since this data is old, I can confirm that electricity costs have continued to rise since the dataset stopped collecting data and the model forecast was correct.

7) Benchmark

The model performs better than all the benchmark methods - except drift which performs about the same as the model. Again, this model could be improved by including a drift term and then should outperform even the drift benchmark.

Dataset 4 - CO2 Emissions

1) Data Review

From EDA, we looked at ACF graphs and checked ADF to see if the data was stationary. The data was not stationary so we take the log and difference to make it stationary.

Log Transform coal, natural gas, petro, and hydrocarbon gas - and plot

Taking the log of the data changed it significantly! An ADF test reveals the data is almost stationary. According to the ADF test, coal and petroleum require another differencing, while hydro and natural gas are already stationary. We can experiment with this during model building.


    Augmented Dickey-Fuller Test

data:  coal_ts %>% diff()
Dickey-Fuller = -17.759, Lag order = 8, p-value = 0.01
alternative hypothesis: stationary


    Augmented Dickey-Fuller Test

data:  hydro_ts
Dickey-Fuller = -4.6126, Lag order = 8, p-value = 0.01
alternative hypothesis: stationary


    Augmented Dickey-Fuller Test

data:  natural_ts
Dickey-Fuller = -5.1735, Lag order = 8, p-value = 0.01
alternative hypothesis: stationary


    Augmented Dickey-Fuller Test

data:  petro_ts %>% diff()
Dickey-Fuller = -14.139, Lag order = 8, p-value = 0.01
alternative hypothesis: stationary

2) ACF/PACF Plots

ACF/PACF plots are used to select potential p and q values for a SARIMA(p,d,q)(P,D,Q) model. SARIMA is used because the data contains seasonal components and had differencing. From these plots, some potential p,q,P,Q values are selected to fit a SARIMA model with. Separate models have to be fit for each energy source, thus separate ACF/PACF plots are shown below.

Coal

Differencing: d = 0,1; D = 0,1
Moving average order: q = 0,1; Q = 0,1,2,3
Autoregressive terms: p = 0,1, P = 0,1

Hydrocarbon

Differencing: d = 0,1; D = 0,1
Moving average order: q = 0,1,2,3,4; Q = 0,1,2,3
Autoregressive terms: p = 0,1, P = 0,1,2

Natural Gas

Differencing: d = 0,1; D = 0,1
Moving average order (ACF): q = 0,1,2; Q = 0,1,2
Autoregressive terms (PACF): p = 0,1,2,3; P = 0,1,2,3

Petroleum

Differencing: d = 0,1; D = 0,1
Moving average order: q = 0,1; Q = 0,1
Autoregressive terms: p = 0,1,2,3,4; P = 0,1,2

3) Fit SARIMA models and Model Diagnostics

Coal

p	d	q	P	D	Q	AIC	BIC	AICc
0	1	0	0	1	0	-1713.934	-1709.559	-1713.928
0	1	0	1	1	0	-1808.766	-1800.016	-1808.746
0	1	0	0	1	1	-1933.114	-1924.364	-1933.094
0	1	0	1	1	1	-1934.556	-1921.431	-1934.515
0	1	0	0	1	2	-1935.279	-1922.154	-1935.238
0	1	0	1	1	2	-1933.530	-1916.030	-1933.461
0	1	1	0	1	0	-1756.164	-1747.414	-1756.144
0	1	1	1	1	0	-1840.869	-1827.744	-1840.828
0	1	1	0	1	1	-1954.295	-1941.170	-1954.254
0	1	1	1	1	1	-1957.260	-1939.760	-1957.191
0	1	1	0	1	2	-1958.498	-1940.998	-1958.429
0	1	1	1	1	2	-1956.844	-1934.969	-1956.741
1	1	0	0	1	0	-1749.433	-1740.683	-1749.412
1	1	0	1	1	0	-1832.690	-1819.565	-1832.649
1	1	0	0	1	1	-1946.043	-1932.918	-1946.002
1	1	0	1	1	1	-1949.070	-1931.570	-1949.001
1	1	0	0	1	2	-1950.206	-1932.706	-1950.138
1	1	0	1	1	2	-1948.466	-1926.591	-1948.363
1	1	1	0	1	0	-1755.139	-1742.014	-1755.097
1	1	1	1	1	0	-1862.508	-1845.008	-1862.440
1	1	1	0	1	1	-1983.036	-1965.536	-1982.967
1	1	1	1	1	1	-1983.376	-1961.500	-1983.272
1	1	1	0	1	2	-1984.070	-1962.195	-1983.967

Hydrocarbon

p	d	q	P	D	Q	AIC	BIC	AICc
0	1	0	0	1	0	-873.1263	-868.7513	-873.1194
0	1	0	1	1	0	-1002.8832	-994.1332	-1002.8627
0	1	0	2	1	0	-1111.3376	-1098.2125	-1111.2964
0	1	0	0	1	1	-1203.8541	-1195.1040	-1203.8335
0	1	0	1	1	1	-1201.9027	-1188.7776	-1201.8616
0	1	0	2	1	1	-1204.4410	-1186.9409	-1204.3722
0	1	0	0	1	2	-1201.9142	-1188.7891	-1201.8730
0	1	0	1	1	2	-1201.4663	-1183.9662	-1201.3976
0	1	0	2	1	2	-1203.6509	-1181.7757	-1203.5476
0	1	0	0	1	3	-1203.5805	-1186.0804	-1203.5117
0	1	0	1	1	3	-1202.5083	-1180.6331	-1202.4050
0	1	1	0	1	0	-1060.3161	-1051.5660	-1060.2955
0	1	1	1	1	0	-1186.1759	-1173.0508	-1186.1347
0	1	1	2	1	0	-1287.0822	-1269.5821	-1287.0135
0	1	1	0	1	1	-1347.1337	-1334.0086	-1347.0925
0	1	1	1	1	1	-1345.9415	-1328.4414	-1345.8728
0	1	1	2	1	1	-1346.4694	-1324.5943	-1346.3661
0	1	1	0	1	2	-1346.0930	-1328.5929	-1346.0243
0	1	1	1	1	2	-1346.8532	-1324.9781	-1346.7500
0	1	1	0	1	3	-1346.6384	-1324.7633	-1346.5351
0	1	2	0	1	0	-1061.4756	-1048.3506	-1061.4345
0	1	2	1	1	0	-1189.6328	-1172.1327	-1189.5641
0	1	2	2	1	0	-1295.2742	-1273.3991	-1295.1710
0	1	2	0	1	1	-1358.8380	-1341.3379	-1358.7693
0	1	2	1	1	1	-1357.1601	-1335.2849	-1357.0568
0	1	2	0	1	2	-1357.2290	-1335.3539	-1357.1258
0	1	3	0	1	0	-1061.0342	-1043.5341	-1060.9655
0	1	3	1	1	0	-1188.4295	-1166.5543	-1188.3262
0	1	3	0	1	1	-1357.7752	-1335.9001	-1357.6719
0	1	4	0	1	0	-1080.6914	-1058.8163	-1080.5881
1	1	0	0	1	0	-975.5913	-966.8413	-975.5708
1	1	0	1	1	0	-1099.9589	-1086.8339	-1099.9178
1	1	0	2	1	0	-1202.5387	-1185.0386	-1202.4699
1	1	0	0	1	1	-1285.5188	-1272.3937	-1285.4776
1	1	0	1	1	1	-1283.9443	-1266.4442	-1283.8755
1	1	0	2	1	1	-1285.8820	-1264.0069	-1285.7787
1	1	0	0	1	2	-1284.0408	-1266.5407	-1283.9721
1	1	0	1	1	2	-1284.2978	-1262.4227	-1284.1946
1	1	0	0	1	3	-1285.6719	-1263.7968	-1285.5687
1	1	1	0	1	0	-1061.1029	-1047.9778	-1061.0618
1	1	1	1	1	0	-1189.1817	-1171.6816	-1189.1130
1	1	1	2	1	0	-1295.2879	-1273.4128	-1295.1846
1	1	1	0	1	1	-1362.6442	-1345.1441	-1362.5755
1	1	1	1	1	1	-1360.9720	-1339.0969	-1360.8687
1	1	1	0	1	2	-1361.0473	-1339.1722	-1360.9440
1	1	2	0	1	0	-1085.5112	-1068.0111	-1085.4425
1	1	2	1	1	0	-1199.6741	-1177.7990	-1199.5708
1	1	2	0	1	1	-1366.4262	-1344.5511	-1366.3230
1	1	3	0	1	0	-1083.6212	-1061.7461	-1083.5179

Natural Gas

p	d	q	P	D	Q	AIC	BIC	AICc
0	1	0	0	1	0	-1542.895	-1538.520	-1542.889
0	1	0	1	1	0	-1658.087	-1649.337	-1658.066
0	1	0	2	1	0	-1725.448	-1712.323	-1725.407
0	1	0	3	1	0	-1753.627	-1736.127	-1753.558
0	1	0	0	1	1	-1792.413	-1783.663	-1792.392
0	1	0	1	1	1	-1792.921	-1779.796	-1792.880
0	1	0	2	1	1	-1792.694	-1775.194	-1792.625
0	1	0	3	1	1	-1790.751	-1768.876	-1790.648
0	1	0	0	1	2	-1793.287	-1780.162	-1793.246
0	1	0	1	1	2	-1792.405	-1774.904	-1792.336
0	1	0	2	1	2	-1790.737	-1768.861	-1790.633
0	1	1	0	1	0	-1630.824	-1622.074	-1630.803
0	1	1	1	1	0	-1734.020	-1720.895	-1733.979
0	1	1	2	1	0	-1785.860	-1768.359	-1785.791
0	1	1	3	1	0	-1810.176	-1788.301	-1810.073
0	1	1	0	1	1	-1831.400	-1818.275	-1831.358
0	1	1	1	1	1	-1832.111	-1814.611	-1832.043
0	1	1	2	1	1	-1831.441	-1809.566	-1831.337
0	1	1	0	1	2	-1832.460	-1814.960	-1832.391
0	1	1	1	1	2	-1831.015	-1809.140	-1830.912
0	1	2	0	1	0	-1669.981	-1656.855	-1669.939
0	1	2	1	1	0	-1772.096	-1754.595	-1772.027
0	1	2	2	1	0	-1830.306	-1808.431	-1830.203
0	1	2	0	1	1	-1870.805	-1853.305	-1870.736
0	1	2	1	1	1	-1872.027	-1850.152	-1871.924
0	1	2	0	1	2	-1872.589	-1850.714	-1872.486
1	1	0	0	1	0	-1582.644	-1573.894	-1582.623
1	1	0	1	1	0	-1697.097	-1683.972	-1697.056
1	1	0	2	1	0	-1756.762	-1739.262	-1756.693
1	1	0	3	1	0	-1783.229	-1761.354	-1783.126
1	1	0	0	1	1	-1814.581	-1801.456	-1814.540
1	1	0	1	1	1	-1814.929	-1797.428	-1814.860
1	1	0	2	1	1	-1814.363	-1792.487	-1814.259
1	1	0	0	1	2	-1815.239	-1797.739	-1815.170
1	1	0	1	1	2	-1814.007	-1792.132	-1813.904
1	1	1	0	1	0	-1691.456	-1678.331	-1691.415
1	1	1	1	1	0	-1787.334	-1769.834	-1787.265
1	1	1	2	1	0	-1844.790	-1822.915	-1844.687
1	1	1	0	1	1	-1892.790	-1875.290	-1892.721
1	1	1	1	1	1	-1894.348	-1872.473	-1894.244
1	1	1	0	1	2	-1894.850	-1872.975	-1894.747
1	1	2	0	1	0	-1690.210	-1672.710	-1690.141
1	1	2	1	1	0	-1786.554	-1764.679	-1786.451
1	1	2	0	1	1	-1890.932	-1869.057	-1890.829
2	1	0	0	1	0	-1632.125	-1618.999	-1632.083
2	1	0	1	1	0	-1736.742	-1719.242	-1736.673
2	1	0	2	1	0	-1790.699	-1768.824	-1790.595
2	1	0	0	1	1	-1836.138	-1818.638	-1836.069
2	1	0	1	1	1	-1837.311	-1815.436	-1837.208
2	1	0	0	1	2	-1837.705	-1815.830	-1837.602
2	1	1	0	1	0	-1689.826	-1672.326	-1689.758
2	1	1	1	1	0	-1786.016	-1764.141	-1785.913
2	1	1	0	1	1	-1890.921	-1869.045	-1890.817
2	1	2	0	1	0	-1704.664	-1682.789	-1704.561
3	1	0	0	1	0	-1640.016	-1622.516	-1639.948
3	1	0	1	1	0	-1740.415	-1718.540	-1740.312
3	1	0	0	1	1	-1839.331	-1817.456	-1839.228
3	1	1	0	1	0	-1697.614	-1675.739	-1697.511

Petroleum

p	d	q	P	D	Q	AIC	BIC	AICc
0	1	0	0	1	0	107.05453	111.42955	107.06137
0	1	0	1	1	0	-84.86274	-76.11269	-84.84219
0	1	0	2	1	0	-201.31645	-188.19138	-201.27529
0	1	0	0	1	1	-266.32053	-257.57048	-266.29998
0	1	0	1	1	1	-270.27867	-257.15359	-270.23750
0	1	0	2	1	1	-269.47817	-251.97808	-269.40945
0	1	1	0	1	0	-199.14932	-190.39927	-199.12878
0	1	1	1	1	0	-403.85305	-390.72798	-403.81188
0	1	1	2	1	0	-496.00647	-478.50637	-495.93775
0	1	1	0	1	1	-572.47243	-559.34735	-572.43126
0	1	1	1	1	1	-582.56629	-565.06619	-582.49756
0	1	1	2	1	1	-581.20532	-559.33020	-581.10205
1	1	0	0	1	0	-48.31077	-39.56072	-48.29022
1	1	0	1	1	0	-251.91514	-238.79007	-251.87397
1	1	0	2	1	0	-348.57302	-331.07293	-348.50430
1	1	0	0	1	1	-424.90521	-411.78013	-424.86404
1	1	0	1	1	1	-430.57355	-413.07345	-430.50482
1	1	0	2	1	1	-429.03517	-407.16005	-428.93190
1	1	1	0	1	0	-208.42018	-195.29510	-208.37901
1	1	1	1	1	0	-403.12235	-385.62225	-403.05362
1	1	1	2	1	0	-496.40992	-474.53479	-496.30665
1	1	1	0	1	1	-571.40102	-553.90092	-571.33229
1	1	1	1	1	1	-580.83795	-558.96283	-580.73468
2	1	0	0	1	0	-95.61907	-82.49400	-95.57791
2	1	0	1	1	0	-316.82671	-299.32661	-316.75798
2	1	0	2	1	0	-412.05095	-390.17583	-411.94768
2	1	0	0	1	1	-486.30003	-468.79993	-486.23130
2	1	0	1	1	1	-499.14862	-477.27350	-499.04535
2	1	1	0	1	0	-212.31668	-194.81658	-212.24795
2	1	1	1	1	0	-406.03542	-384.16029	-405.93215
2	1	1	0	1	1	-570.32544	-548.45032	-570.22217
3	1	0	0	1	0	-114.39490	-96.89480	-114.32617
3	1	0	1	1	0	-338.03855	-316.16343	-337.93528
3	1	0	0	1	1	-507.26295	-485.38782	-507.15968
3	1	1	0	1	0	-211.01216	-189.13704	-210.90889
4	1	0	0	1	0	-127.23810	-105.36298	-127.13483

4) Model Diagnostics

With the above exploration, a best fit model is chosen for each energy source by minimizing AIC.

Coal

The SARIMA model that minimizes AIC and AICc is SARIMA(1,1,1)(0,1,2)

Series: coal_ts 
ARIMA(1,1,1)(0,1,2)[12] 

Coefficients:
         ar1      ma1     sma1     sma2
      0.7153  -0.9330  -0.7101  -0.0771
s.e.  0.0468   0.0262   0.0463   0.0433

sigma^2 = 0.001933:  log likelihood = 997.03
AIC=-1984.07   AICc=-1983.97   BIC=-1962.19

Training set error measures:
                      ME       RMSE        MAE         MPE      MAPE      MASE
Training set -0.00220553 0.04333907 0.02966952 -0.05188677 0.6181671 0.5017093
                    ACF1
Training set -0.02887885

Hydrocarbon

The SARIMA model that minimizes AIC and AICc is SARIMA(1,1,2)(0,1,1)

Series: hydro_ts 
ARIMA(1,1,2)(0,1,1)[12] 

Coefficients:
         ar1      ma1     ma2     sma1
      0.6838  -1.2657  0.3178  -0.8718
s.e.  0.1003   0.1177  0.0981   0.0302

sigma^2 = 0.005473:  log likelihood = 688.21
AIC=-1366.43   AICc=-1366.32   BIC=-1344.55

Training set error measures:
                      ME       RMSE        MAE        MPE     MAPE      MASE
Training set 0.003801311 0.07292737 0.05689759 0.09963854 2.964171 0.6705054
                    ACF1
Training set 0.002831291

Natural Gas

The SARIMA model that minimizes AIC and AICc is SARIMA(1,1,1)(0,1,2)

Series: natural_ts 
ARIMA(1,1,1)(0,1,2)[12] 

Coefficients:
         ar1      ma1     sma1     sma2
      0.5435  -0.9253  -0.6784  -0.0909
s.e.  0.0438   0.0183   0.0439   0.0449

sigma^2 = 0.002253:  log likelihood = 952.42
AIC=-1894.85   AICc=-1894.75   BIC=-1872.97

Training set error measures:
                      ME       RMSE        MAE        MPE      MAPE     MASE
Training set 0.002460763 0.04678813 0.03593403 0.04625378 0.7828864 0.655979
                    ACF1
Training set 0.004392229

Petroleum

The SARIMA model that minimizes AIC, BIC, and AICc is SARIMA(0,1,1)(1,1,1)

Series: petro_ts 
ARIMA(0,1,1)(1,1,1)[12] 

Coefficients:
          ma1     sar1     sma1
      -0.8639  -0.1659  -0.8360
s.e.   0.0226   0.0469   0.0289

sigma^2 = 0.02081:  log likelihood = 295.28
AIC=-582.57   AICc=-582.5   BIC=-565.07

Training set error measures:
                       ME     RMSE       MAE        MPE     MAPE      MASE
Training set -0.001821531 0.142328 0.1073413 -0.7744743 6.447119 0.6979237
                   ACF1
Training set 0.01685106

5) Fit a SARIMA(p,d,q)(P,D,Q) model using auto.arima()

Coal

Series: coal_ts 
ARIMA(1,1,1)(0,1,2)[12] 

Coefficients:
         ar1      ma1     sma1     sma2
      0.7153  -0.9330  -0.7101  -0.0771
s.e.  0.0468   0.0262   0.0463   0.0433

sigma^2 = 0.001933:  log likelihood = 997.03
AIC=-1984.07   AICc=-1983.97   BIC=-1962.19

The model chosen was ARIMA(1,1,1)(0,1,2) which is the same as I chose via manual fitting!

Hydrocarbon

Series: hydro_ts 
ARIMA(2,0,1)(2,1,2)[12] with drift 

Coefficients:
         ar1      ar2      ma1     sar1     sar2     sma1     sma2  drift
      1.2627  -0.2778  -0.8241  -0.4107  -0.1018  -0.4327  -0.3460  6e-04
s.e.  0.0810   0.0733   0.0587   0.2109   0.0598   0.2094   0.1941  5e-04

sigma^2 = 0.005468:  log likelihood = 693.26
AIC=-1368.51   AICc=-1368.2   BIC=-1329.12

The model chosen was ARIMA(2,0,1)(0,1,2) which is different from my chosen model of ARIMA(1,1,2)(0,1,1).

Natural Gas

Series: natural_ts 
ARIMA(2,1,1)(1,1,1)[12] 

Coefficients:
         ar1      ar2      ma1    sar1     sma1
      0.5500  -0.0177  -0.9225  0.1084  -0.7961
s.e.  0.0468   0.0442   0.0201  0.0573   0.0396

sigma^2 = 0.002258:  log likelihood = 952.25
AIC=-1892.51   AICc=-1892.36   BIC=-1866.26

The model chosen was ARIMA(2,1,1)(1,1,1) which was different from my chosen model of ARIMA(1,1,1)(0,1,2).

Petroleum

Series: petro_ts 
ARIMA(0,1,1)(0,0,2)[12] 

Coefficients:
          ma1    sma1    sma2
      -0.8842  0.0798  0.1886
s.e.   0.0214  0.0443  0.0378

sigma^2 = 0.0244:  log likelihood = 262.41
AIC=-516.83   AICc=-516.76   BIC=-499.24

The model chosen was ARIMA(0,1,1)(0,0,2) which was slightly different from my chosen model of ARIMA(0,1,1)(1,1,1). The seasonal effects were handled differently by auto.arima()

6) Forecast

Forecasting CO2 Emissions by source for the next 5 years

Coal

Hydrocarbon

Natural Gas

Petroleum

According to our models, coal emissions are expected to continue their steep downwards trend over the next 5 years, while CO2 Emissions from Petroleum are also expected to decrease over the next 5 years - albeit slightly less dramatically. CO2 emissions from both Hydrocarbon Gas and Natural Gas are forecasted to increase over the next 5 years.

7) Benchmark

Compare the models to baseline benchmark methods to ensure they’re performing above.

Coal

All benchmark methods predict CO2 emissions to stay steady or increase over the next 5 years. It looks like our model outperforms the benchmark methods.

Hydrocarbon

Both fit and seasonal naive methods predict an upward trend in CO2 emissions, with seasonal variation preserved. Drift, mean, and naive methods predict constant emissions and our model is far better.

Natural Gas

Drift, Mean and Naive methods predict constant emissions over the next 5 years. Seasonal naive predicts increasing CO2 emissions with seasonal variation, but the fit outperforms seasonal naive and it follows the trend slightly better.

Petroleum

The fit and seasonal naive preserve the seasonal variation but the model fit forecasts decreasing CO2 emissions (following the trend) and seasonal naive predicts increasing seasonal variation. Drift, mean, and naive benchmark methods again severely underperform.

8) Seasonal Cross Validation

1 Step Ahead

Coal ARIMA(1,1,1)(0,1,2) for both auto.arima() and manual fitting.

 [1] 0.04619373 0.04570981 0.04725047 0.04745566 0.04854170 0.04904679
 [7] 0.04993501 0.05057058 0.05137274 0.05206507 0.05282980 0.05354680
[13] 0.05429527 0.05502300 0.05576439 0.05649678 0.05723510 0.05796952
[19] 0.05870651 0.05944180 0.06017821 0.06091388 0.06165004 0.06238588
[25] 0.06312193 0.06430790 0.06632118 0.06833454 0.07034785 0.07236119
[31] 0.07437451 0.07638784 0.07840117 0.08041450 0.08242783 0.08444115
[37] 0.08645448 0.08846781 0.09048114 0.09249447 0.09450779 0.09652112
[43] 0.09853445 0.10054778 0.10256111 0.10457443 0.10658776 0.10860109
[49] 0.11061442 0.11262775 0.11464107 0.11665440 0.11866773 0.12068106
[55] 0.12269439 0.12470771 0.12672104 0.12873437 0.13074770 0.13276103

Hydrocarbon ARIMA(2,0,1)(0,1,2) for auto.arima() and ARIMA(1,1,2)(0,1,1) from manual fitting.

 [1] 0.07974090 0.21231107 0.34844198 0.43904916 0.46729056 0.43355642
 [7] 0.35332484 0.25158786 0.15561699 0.08803235 0.06183177 0.07836086
[13] 0.12837006 0.19553350 0.26127758 0.30958432 0.33059942 0.32230591
[19] 0.29008470 0.24452063 0.19819837 0.16239099 0.14446263 0.14653472
[25] 0.16559137 0.19482700 0.22575927 0.25050036 0.26361288 0.26314655
[31] 0.25069987 0.23060751 0.20855614 0.19003452 0.17901631 0.17716960
[37] 0.18372372 0.19594748 0.21004964 0.22223107 0.22961376 0.23083438
[43] 0.22619708 0.21740091 0.20695879 0.19748680 0.19105279 0.18873644
[49] 0.19048299 0.19525160 0.20138694 0.20709667 0.21090569 0.21197878
[55] 0.21024853 0.20634046 0.20133780 0.19646233 0.19275939 0.19086346

 [1] 0.08971719 0.09637873 0.09879890 0.09967010 0.09997514 0.10007317
 [7] 0.10009549 0.10009014 0.10007468 0.10005553 0.10003503 0.10001403
[13] 0.09999286 0.09997162 0.09995036 0.09992909 0.09990782 0.09988655
[19] 0.09986527 0.09984400 0.09982272 0.09980145 0.09978017 0.09975890
[25] 0.09973762 0.09971635 0.09969508 0.09967380 0.09965253 0.09963125
[31] 0.09960998 0.09958870 0.09956743 0.09954615 0.09952488 0.09950361
[37] 0.09948233 0.09946106 0.09943978 0.09941851 0.09939723 0.09937596
[43] 0.09935468 0.09933341 0.09931213 0.09929086 0.09926959 0.09924831
[49] 0.09922704 0.09920576 0.09918449 0.09916321 0.09914194 0.09912066
[55] 0.09909939 0.09907812 0.09905684 0.09903557 0.09901429 0.09899302

From this, it appears that ARIMA(2,0,1)(0,1,2) has much more variable RMSE, but overall is a better fit model

Natural gas ARIMA(2,1,1)(1,1,1) for auto.arima() and ARIMA(1,1,1)(0,1,2) from manual fitting.

 [1] 0.05279172 0.17922373 0.31912281 0.42672446 0.48067602 0.48246156
 [7] 0.44856163 0.40070412 0.35771745 0.33088375 0.32303682 0.33045894
[13] 0.34610679 0.36277882 0.37530316 0.38141445 0.38148409 0.37755569
[19] 0.37219683 0.36756755 0.36491146 0.36448378 0.36579586 0.36800119
[25] 0.37026257 0.37199741 0.37296817 0.37324094 0.37306752 0.37275048
[31] 0.37253697 0.37256333 0.37285056 0.37333544 0.37391662 0.37449699
[37] 0.37501116 0.37543457 0.37577755 0.37607094 0.37635031 0.37664385
[43] 0.37696642 0.37731945 0.37769487 0.37808058 0.37846534 0.37884186
[49] 0.37920762 0.37956414 0.37991518 0.38026490 0.38061650 0.38097161
[55] 0.38133027 0.38169145 0.38205372 0.38241579 0.38277683 0.38313660

 [1] 0.05169864 0.08091653 0.09517965 0.10221910 0.10576896 0.10763298
 [7] 0.10868248 0.10933841 0.10980416 0.11017801 0.11050744 0.11081540
[13] 0.11111299 0.11140556 0.11169571 0.11198468 0.11227308 0.11256121
[19] 0.11284921 0.11313715 0.11342505 0.11371294 0.11400082 0.11428870
[25] 0.11457658 0.11486445 0.11515233 0.11544020 0.11572808 0.11601595
[31] 0.11630383 0.11659170 0.11687958 0.11716745 0.11745533 0.11774320
[37] 0.11803108 0.11831895 0.11860683 0.11889470 0.11918258 0.11947045
[43] 0.11975833 0.12004620 0.12033408 0.12062195 0.12090983 0.12119770
[49] 0.12148558 0.12177345 0.12206133 0.12234920 0.12263708 0.12292495
[55] 0.12321283 0.12350070 0.12378858 0.12407645 0.12436433 0.12465220

From this analysis, it looks like ARIMA(2,1,1)(1,1,1) is a better fit model.

Petroleum ARIMA(0,1,1)(0,0,2) for auto.arima() and ARIMA(0,1,1)(1,1,1) from manual fitting.

 [1] 0.08598773 0.08658462 0.08718152 0.08777841 0.08837531 0.08897220
 [7] 0.08956909 0.09016599 0.09076288 0.09135978 0.09195667 0.09255356
[13] 0.09315046 0.09374735 0.09434425 0.09494114 0.09553803 0.09613493
[19] 0.09673182 0.09732872 0.09792561 0.09852250 0.09911940 0.09971629
[25] 0.10031319 0.10091008 0.10150697 0.10210387 0.10270076 0.10329766
[31] 0.10389455 0.10449144 0.10508834 0.10568523 0.10628213 0.10687902
[37] 0.10747591 0.10807281 0.10866970 0.10926660 0.10986349 0.11046038
[43] 0.11105728 0.11165417 0.11225107 0.11284796 0.11344485 0.11404175
[49] 0.11463864 0.11523554 0.11583243 0.11642932 0.11702622 0.11762311
[55] 0.11822001 0.11881690 0.12041907 0.12202312 0.12362717 0.12523122

 [1] 0.08598773 0.08658462 0.08718152 0.08777841 0.08837531 0.08897220
 [7] 0.08956909 0.09016599 0.09076288 0.09135978 0.09195667 0.09255356
[13] 0.09315046 0.09374735 0.09434425 0.09494114 0.09553803 0.09613493
[19] 0.09673182 0.09732872 0.09792561 0.09852250 0.09911940 0.09971629
[25] 0.10031319 0.10091008 0.10150697 0.10210387 0.10270076 0.10329766
[31] 0.10389455 0.10449144 0.10508834 0.10568523 0.10628213 0.10687902
[37] 0.10747591 0.10807281 0.10866970 0.10926660 0.10986349 0.11046038
[43] 0.11105728 0.11165417 0.11225107 0.11284796 0.11344485 0.11404175
[49] 0.11463864 0.11523554 0.11583243 0.11642932 0.11702622 0.11762311
[55] 0.11822001 0.11881690 0.12041907 0.12202312 0.12362717 0.12523122

From this analysis, it looks like both models perform similarly and have the same RMSE.

12 steps ahead forecast

Coal ARIMA(1,1,1)(0,1,2) for both auto.arima() and manual fitting.

[1] 525

[1] 0.3327111

Hydrocarbon ARIMA(2,0,1)(0,1,2) for auto.arima() and ARIMA(1,1,2)(0,1,1) from manual fitting.

[1] 525

[1] 0.4455012

[1] 0.4738333

From this analysis, ARIMA(2,0,1)(0,1,2) from auto.arima() has a much lower RMSE.

Natural gas ARIMA(2,1,1)(1,1,1) for auto.arima() and ARIMA(1,1,1)(0,1,2) from manual fitting.

[1] 525

[1] 0.4328701

[1] 0.4238351

From this analysis, ARIMA(1,1,1)(0,1,2) from manual fitting has a much lower RMSE and is a better fit model for this data.

Petroleum ARIMA(0,1,1)(0,0,2) for auto.arima() and ARIMA(0,1,1)(1,1,1) from manual fitting.

[1] 525

[1] 0.367206

[1] 0.367206

From this analysis, it appears that both models perform the same and have the same RMSE. Either is a valid model to work with.

Source code for the above analysis: Github