Maximum Likelihood

It is perfectly valid to calculate the fit of the Phat distribution to a univariate dataset using Maximum Likelihood Estimation (MLE) via negative log-likelihood. This process is available via the fit method (which inherits from statsmodels GenericLikelihoodModel.

BUT there is one major issue as it pertains to the tails that must be considered.

First, let’s attempt to fit the Phat distribution to our familiar distribution of S&P 500 index level returns.

[2]:
import yfinance as yf
import phat as ph

sp = yf.download('^GSPC')
sp_ret = sp.Close.pct_change()[1:]

res = ph.Phat.fit(sp_ret)
[*********************100%***********************]  1 of 1 completed
Optimization terminated successfully.
         Current function value: -3.184565
         Iterations: 160
         Function evaluations: 275
[3]:
res.params
[3]:
array([0.0005961 , 0.00354794, 0.07451353, 0.06369988])
[4]:
res.summary()
[4]:
PhatFit Results
Dep. Variable: Close Log-Likelihood: 62558.
Model: PhatFit AIC: -1.251e+05
Method: Maximum Likelihood BIC: -1.251e+05
Date: Fri, 23 Jul 2021
Time: 08:17:39
No. Observations: 19644
Df Residuals: 19643
Df Model: 0
coef std err z P>|z| [0.025 0.975]
const 0.0006 5.14e-05 11.589 0.000 0.000 0.001
x1 0.0035 3.42e-05 103.760 0.000 0.003 0.004
xi_l 0.0745 0.009 8.373 0.000 0.057 0.092
xi_r 0.0637 0.009 7.464 0.000 0.047 0.080

We can see that both the left and right tail indices are much smaller than we have estimated using the POT and Hill Double Bootstrap techniques. This phenomenon of underfitting in the tails results because the impact of extreme events on the dataset is not large enough to offset the gains from optimization in the body. Hence, we end up with thinner tails masking greater risk.

Instead, we can estimate the tails separately and pass them as fixed values to our fit method. This results in just two free parameters, \(\mu\) and \(\sigma\), in the Gaussian body.

[ ]:
xi_left, xi_right = ph.two_tailed_hill_double_bootstrap(sp_ret)
res = ph.Phat.fit(sp_ret, xi_left, xi_right)
[7]:
res.summary()
[7]:
PhatFit Results
Dep. Variable: Close Log-Likelihood: 61926.
Model: PhatFit AIC: -1.238e+05
Method: Maximum Likelihood BIC: -1.238e+05
Date: Fri, 23 Jul 2021
Time: 08:18:30
No. Observations: 19644
Df Residuals: 19643
Df Model: 0
coef std err z P>|z| [0.025 0.975]
const 0.0006 4.8e-05 12.858 0.000 0.001 0.001
x1 0.0032 3.2e-05 98.791 0.000 0.003 0.003
[8]:
res.params
[8]:
array([0.00061775, 0.00316159])

The difference may not appear too meaningful but we do get a greater mean and lesser volatility at the first decimal place of the result.