One critical aspect of time series analysis is to determine if we are dealing with a unit root or not. If we fail to consider this, we may end up with all of our tests and conclusions invalidated!1
To begin understanding how to test our data for a unit root, let us consider the following process as our base scenario:
\[
y_t = \alpha + \rho y_{t-1} + \epsilon_t
\]
In this model, checking for a unit root is translated into testing . We can define our hypothesis as:
\[
H_0: \rho = 1\\
H_1: |\rho| < 1
\]
Seeing this simple problem, we can be tempted to run OLS on the base model and see if using a t-test. The problem with this approach is that under the null hypothesis, and are non-stationary, so we cannot simply apply the central limit theorem, and thus our t-test is invalid.
We can modify our base model to our advantage by subtracting on both sides of the equation:
Notice that in the last equation, under the null hypothesis (i.e. ) the variable disappears and makes stationary. Keep in mind that and are still non-stationary, it is just that the difference between and is stationary. To make this more clear, observe that under the null hypothesis . The right-hand side of the equation is stationary!
Additionally, under the alternative hypothesis (i.e. ) the variable does not disappear, but this is no longer a problem because is stationary (by definition of the alternative hypothesis).
We still cannot use a t-test on our last equation to verify if , but we are getting closer. For convenience, we define :
The problem with this is that under the null hypothesis is still non-stationary, and thus does not have a t-distribution. Thanks to David Dickey and Wayne Fuller, we now know that it has a defined distribution (i.e. the Dickey-Fuller distribution). This means that we can simply calculate a t-statistic and compare it to the critical values of a Dickey-Fuller table.
The Dickey-Fuller test is one of the simplest tests to determine if a time series has a unit root or not. The only difference with the reasoning presented above is that the test involves 3 alternative hypotheses2:
An autoregressive process with parameter , mean equal to zero and no trend:
\[
\Delta y_t = (\rho - 1) y_{t-1} + \epsilon_t
\]
An autoregressive process with parameter , non-zero mean and no trend:
\[
\Delta y_t = \alpha + (\rho - 1) y_{t-1} + \epsilon_t
\]
An autoregressive process with parameter , non-zero mean and trend:
\[
\Delta y_t = \alpha + \beta t + (\rho - 1) y_{t-1} + \epsilon_t
\]
Monte Carlo method to get a Dickey-Fuller table
Description of the method
In simple words, to get our Dickey-Fuller distributions and our critical values, we need to simulate many processes with a unit root (i.e. with ) and then we need to try to assess how likely it is to observe due to the randomness of the simulation.