Maximum Likelihood¶
It is perfectly valid to calculate the fit of the Phat distribution to a univariate dataset using Maximum Likelihood Estimation (MLE) via negative log-likelihood. This process is available via the fit
method (which inherits from statsmodels
GenericLikelihoodModel
.
BUT there is one major issue as it pertains to the tails that must be considered.
First, let’s attempt to fit the Phat distribution to our familiar distribution of S&P 500 index level returns.
[2]:
import yfinance as yf
import phat as ph
sp = yf.download('^GSPC')
sp_ret = sp.Close.pct_change()[1:]
res = ph.Phat.fit(sp_ret)
[*********************100%***********************] 1 of 1 completed
Optimization terminated successfully.
Current function value: -3.184565
Iterations: 160
Function evaluations: 275
[3]:
res.params
[3]:
array([0.0005961 , 0.00354794, 0.07451353, 0.06369988])
[4]:
res.summary()
[4]:
Dep. Variable: | Close | Log-Likelihood: | 62558. |
---|---|---|---|
Model: | PhatFit | AIC: | -1.251e+05 |
Method: | Maximum Likelihood | BIC: | -1.251e+05 |
Date: | Fri, 23 Jul 2021 | ||
Time: | 08:17:39 | ||
No. Observations: | 19644 | ||
Df Residuals: | 19643 | ||
Df Model: | 0 |
coef | std err | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
const | 0.0006 | 5.14e-05 | 11.589 | 0.000 | 0.000 | 0.001 |
x1 | 0.0035 | 3.42e-05 | 103.760 | 0.000 | 0.003 | 0.004 |
xi_l | 0.0745 | 0.009 | 8.373 | 0.000 | 0.057 | 0.092 |
xi_r | 0.0637 | 0.009 | 7.464 | 0.000 | 0.047 | 0.080 |
We can see that both the left and right tail indices are much smaller than we have estimated using the POT and Hill Double Bootstrap techniques. This phenomenon of underfitting in the tails results because the impact of extreme events on the dataset is not large enough to offset the gains from optimization in the body. Hence, we end up with thinner tails masking greater risk.
Instead, we can estimate the tails separately and pass them as fixed values to our fit method. This results in just two free parameters, \(\mu\) and \(\sigma\), in the Gaussian body.
[ ]:
xi_left, xi_right = ph.two_tailed_hill_double_bootstrap(sp_ret)
res = ph.Phat.fit(sp_ret, xi_left, xi_right)
[7]:
res.summary()
[7]:
Dep. Variable: | Close | Log-Likelihood: | 61926. |
---|---|---|---|
Model: | PhatFit | AIC: | -1.238e+05 |
Method: | Maximum Likelihood | BIC: | -1.238e+05 |
Date: | Fri, 23 Jul 2021 | ||
Time: | 08:18:30 | ||
No. Observations: | 19644 | ||
Df Residuals: | 19643 | ||
Df Model: | 0 |
coef | std err | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
const | 0.0006 | 4.8e-05 | 12.858 | 0.000 | 0.001 | 0.001 |
x1 | 0.0032 | 3.2e-05 | 98.791 | 0.000 | 0.003 | 0.003 |
[8]:
res.params
[8]:
array([0.00061775, 0.00316159])
The difference may not appear too meaningful but we do get a greater mean and lesser volatility at the first decimal place of the result.