# Dickey-fuller table using Monte Carlo

Statistics

One critical aspect of time series analysis is to determine if we are dealing with a unit root or not. If we fail to consider this, we may end up with all of our tests and conclusions invalidated!1
To begin understanding how to test our data for a unit root, let us consider the following $AR(1)$ process as our base scenario: $y_t = \alpha + \rho y_{t-1} + \epsilon_t$

In this model, checking for a unit root is translated into testing $\rho = 1$. We can define our hypothesis as:

$H_0: \rho = 1\\ H_1: |\rho| < 1$

Seeing this simple problem, we can be tempted to run OLS on the base model and see if $\rho = 1$ using a t-test. The problem with this approach is that under the null hypothesis, $y_t$ and $y_{t-1}$ are non-stationary, so we cannot simply apply the central limit theorem, and thus our t-test is invalid.
We can modify our base model to our advantage by subtracting $y_{t-1}$ on both sides of the equation:

\begin{align} y_t - y_{t-1} &= \alpha + \rho y_{t-1} - y_{t-1} + \epsilon_t\\ \Delta y_t &= \alpha + (\rho - 1) y_{t-1} + \epsilon_t \end{align}

Notice that in the last equation, under the null hypothesis (i.e. $\rho = 1$) the variable $y_{t-1}$ disappears and makes $\Delta y_t$ stationary. Keep in mind that $y_t$ and $y_{t-1}$ are still non-stationary, it is just that the difference between $y_t$ and $y_{t-1}$ is stationary. To make this more clear, observe that under the null hypothesis $\Delta y_t = \alpha + \epsilon_t$. The right-hand side of the equation is stationary!
Additionally, under the alternative hypothesis (i.e. $% $) the variable $y_{t-1}$ does not disappear, but this is no longer a problem because $y_{t-1}$ is stationary (by definition of the alternative hypothesis).

We still cannot use a t-test on our last equation to verify if $(\rho - 1) = 0$, but we are getting closer. For convenience, we define $\delta = \rho - 1$:

$\Delta y_t = \alpha + \delta y_{t-1} + \epsilon_t$

The problem with this is that under the null hypothesis $y_{t-1}$ is still non-stationary, and thus $\hat{\delta}$ does not have a t-distribution. Thanks to David Dickey and Wayne Fuller, we now know that it has a defined distribution (i.e. the Dickey-Fuller distribution). This means that we can simply calculate a t-statistic and compare it to the critical values of a Dickey-Fuller table.

The Dickey-Fuller test is one of the simplest tests to determine if a time series has a unit root or not. The only difference with the reasoning presented above is that the test involves 3 alternative hypotheses2:

• An autoregressive process with parameter $% $, mean equal to zero and no trend: $\Delta y_t = (\rho - 1) y_{t-1} + \epsilon_t$
• An autoregressive process with parameter $% $, non-zero mean and no trend: $\Delta y_t = \alpha + (\rho - 1) y_{t-1} + \epsilon_t$
• An autoregressive process with parameter $% $, non-zero mean and trend: $\Delta y_t = \alpha + \beta t + (\rho - 1) y_{t-1} + \epsilon_t$

# Monte Carlo method to get a Dickey-Fuller table

## Description of the method

In simple words, to get our Dickey-Fuller distributions and our critical values, we need to simulate many $AR(1)$ processes with a unit root (i.e. with $\rho = 1$) and then we need to try to assess how likely it is to observe $\rho = 0$ due to the randomness of the simulation.

Here is a step-by-step outline of the process3:

1. Simulate an $AR(1)$ process of length $N$ with $\rho = 1$.
2. Calculate $\Delta y_t = y_t - y_{t-1}$.
3. Run one regression for each model:
• $\Delta y_t = \delta y_{t-1} + \epsilon$ → save the t-statistic of $\hat{\delta}$.
• $\Delta y_t = \alpha + \delta y_{t-1} + \epsilon$ → save the t-statistic of $\hat{\delta}$.
• $\Delta y_t = \alpha + \beta t + \delta y_{t-1} + \epsilon$ → save the t-statistic of $\hat{\delta}$.
4. Repeat steps 1-3 as many times as necessary (probably at least 10,000 or 20,000 times).
5. To calculate the critical value of a model (for a defined $N$) at $\alpha\%$, calculate the $\alpha$ percentile of each group of t-values.

## Implementing the code

Here is a small Python program that accomplishes the steps listed above4.

To get our table, we can simply do:

Model N 10% 5% 2.5% 1%
Model a 25 -1.61-1.96-2.28-2.66
50 -1.61-1.95-2.24-2.61
100 -1.62-1.95-2.24-2.59
250 -1.62-1.95-2.24-2.58
500 -1.62-1.94-2.23-2.57
Model b 25 -2.64-2.99-3.32-3.73
50 -2.60-2.92-3.22-3.57
100 -2.58-2.89-3.16-3.49
250 -2.57-2.87-3.13-3.45
500 -2.57-2.87-3.13-3.44
Model c 25 -3.24-3.61-3.95-4.39
50 -3.18-3.50-3.80-4.15
100 -3.15-3.46-3.73-4.06
250 -3.14-3.43-3.69-4.00
500 -3.13-3.42-3.67-3.98

To plot each distribution, we can use matplotlib:

You can find all the code here and play with an online Jupyter Notebook here .
You can also find a Matlab version of the code here .

1. The presence of unit roots makes our time series non-stationary, which in turn makes inference much harder and less straightforward.

2. The reason behind this is that the Dickey-Fuller distribution depends on the type of model (a, b or c) and the length of the series.

3. I got the idea of the description from here (the description is for a similar test).

4. On my laptop, 100,000 repetitions took about 15-20 minutes.