Using the ARIMA Model for Forecasting
An important part of applying machine learning models to time series data is having a good baseline model for comparison. In this post, I want to learn about the AutoRegressive Integrated Moving Average (ARIMA) modeling approach for time series data. This is not considered a machine learning model, but was developed separately in the fields of statistics and econometrics and can serve as a good, competitive baseline for comparison with ML tools [1, 2].
The ARIMA model is comprised of several components, which refer to different letters in the name, that is [3]:
 Autoregression (AR): An autoregressive model will identify and use relationships between the current entry in the time series and lagged (i.e. previous) entries. This basically means that the model knows that past entries in the time series will affect the current value [4]. Typically an autoregressive model is a linear regression model that tries to fit coefficients to \(p\) previous observations [4]:

Integrated (I): Using differences between entries in the time series to make it stationary (i.e. the mean of the distribution is fixed). We can compute one difference as \(y_t  y_{t1}\), or do secondorder differencing, i.e. \((y_t  y_{t1})  (y_{t1}  y_{t2})\), or some higher order difference, \(d\) [4].

Moving average (MA): Using the relationship between the current observation and the residual error in a moving average computed over \(q\) lagged entries in the series. Mathematically, this can be expressed as [4]:
The full ARIMA model combines all of these terms together linearly. Generally, when calling an implementation of ARIMA in Python or some other coding language, we will need to specify the parameters \(p\), \(d\), \(q\), as discussed above[3,4]. Specifically, each of these parameters will define [3,4]:
 \(p\): the number of autoregressive terms (the AR order)
 \(d\): the differencing order (for the I component)
 \(q\): the number of moving average terms (the MA order)
There are several ways to choose the values of \(p\), \(d\) and \(q\). For example, to choose \(p\) and \(q\), we could use different autocorrelation functions to determine across how many previous time steps there is a significant correlation with the current time step [2, 3]. For the MA order (\(q\)), we can use the autocorrelation function (ACF) to determine the optimal number of terms to include in a moving average [4]. Specifically, we would look for the number of previous entries in the time series that had a significant autocorrelation value [3]. For the AR order (\(p\)), we could use the partial autocorrelation function (PACF), which computes the correlation between two points in the time series while excluding the influence of other points in the series [4]. Choosing the differencing order is a little less straightforward  [2] recommends testing different values of \(d\) and comparing the performance of the model each time using RMSE as the metric [2].
References
[1] Joseph, M. Modern Time Series Forecasting with Python : Explore IndustryReady Time Series Forecasting Using Modern Machine Learning and Deep Learning. 1st ed. Birmingham: Packt Publishing, Limited, 2022.
[2] “Autoregressive integrated moving average.” Wikipedia. https://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average Visited 18 Feb 2023.
[3] Brownlee, J. “How to Create an ARIMA Model for Time Series Forecasting in Python.” Machine Learning Mastery, 10 Dec 2020. https://machinelearningmastery.com/arimafortimeseriesforecastingwithpython/ Visited 18 Feb 2023.
[4] Maklin, C. “ARIMA Model Python Example  Time Series Forecasting.” Towards Data Science, 25 May 2019. https://towardsdatascience.com/machinelearningpart19timeseriesandautoregressiveintegratedmovingaveragemodelarimac1005347b0d7 Visited 18 Feb 2023.