{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Maximum Likelihood #\n", "\n", "It is perfectly valid to calculate the fit of the Phat distribution to a univariate dataset using Maximum Likelihood Estimation (MLE) via negative log-likelihood. This process is available via the `fit` method (which inherits from `statsmodels` `GenericLikelihoodModel`.\n", "\n", "BUT there is one major issue as it pertains to the tails that must be considered.\n", "\n", "First, let's attempt to fit the Phat distribution to our familiar distribution of S&P 500 index level returns." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "nbsphinx": "hidden" }, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2\n", "\n", "import seaborn as sns; sns.set(style = 'whitegrid')" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[*********************100%***********************] 1 of 1 completed\n", "Optimization terminated successfully.\n", " Current function value: -3.184565\n", " Iterations: 160\n", " Function evaluations: 275\n" ] } ], "source": [ "import yfinance as yf\n", "import phat as ph\n", "\n", "sp = yf.download('^GSPC')\n", "sp_ret = sp.Close.pct_change()[1:]\n", "\n", "res = ph.Phat.fit(sp_ret)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.0005961 , 0.00354794, 0.07451353, 0.06369988])" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res.params" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
PhatFit Results
Dep. Variable: Close Log-Likelihood: 62558.
Model: PhatFit AIC: -1.251e+05
Method: Maximum Likelihood BIC: -1.251e+05
Date: Fri, 23 Jul 2021
Time: 08:17:39
No. Observations: 19644
Df Residuals: 19643
Df Model: 0
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [0.025 0.975]
const 0.0006 5.14e-05 11.589 0.000 0.000 0.001
x1 0.0035 3.42e-05 103.760 0.000 0.003 0.004
xi_l 0.0745 0.009 8.373 0.000 0.057 0.092
xi_r 0.0637 0.009 7.464 0.000 0.047 0.080
" ], "text/plain": [ "\n", "\"\"\"\n", " PhatFit Results \n", "==============================================================================\n", "Dep. Variable: Close Log-Likelihood: 62558.\n", "Model: PhatFit AIC: -1.251e+05\n", "Method: Maximum Likelihood BIC: -1.251e+05\n", "Date: Fri, 23 Jul 2021 \n", "Time: 08:17:39 \n", "No. Observations: 19644 \n", "Df Residuals: 19643 \n", "Df Model: 0 \n", "==============================================================================\n", " coef std err z P>|z| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "const 0.0006 5.14e-05 11.589 0.000 0.000 0.001\n", "x1 0.0035 3.42e-05 103.760 0.000 0.003 0.004\n", "xi_l 0.0745 0.009 8.373 0.000 0.057 0.092\n", "xi_r 0.0637 0.009 7.464 0.000 0.047 0.080\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that both the left and right tail indices are much smaller than we have estimated using the [POT](pot.ipynb) and [Hill Double Bootstrap techniques](dblbs.ipynb). This phenomenon of underfitting in the tails results because the impact of extreme events on the dataset is not large enough to offset the gains from optimization in the body. Hence, we end up with thinner tails masking greater risk.\n", "\n", "Instead, we can estimate the tails separately and pass them as fixed values to our fit method. This results in just two free parameters, $\\mu$ and $\\sigma$, in the Gaussian body." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "xi_left, xi_right = ph.two_tailed_hill_double_bootstrap(sp_ret)\n", "res = ph.Phat.fit(sp_ret, xi_left, xi_right)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
PhatFit Results
Dep. Variable: Close Log-Likelihood: 61926.
Model: PhatFit AIC: -1.238e+05
Method: Maximum Likelihood BIC: -1.238e+05
Date: Fri, 23 Jul 2021
Time: 08:18:30
No. Observations: 19644
Df Residuals: 19643
Df Model: 0
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [0.025 0.975]
const 0.0006 4.8e-05 12.858 0.000 0.001 0.001
x1 0.0032 3.2e-05 98.791 0.000 0.003 0.003
" ], "text/plain": [ "\n", "\"\"\"\n", " PhatFit Results \n", "==============================================================================\n", "Dep. Variable: Close Log-Likelihood: 61926.\n", "Model: PhatFit AIC: -1.238e+05\n", "Method: Maximum Likelihood BIC: -1.238e+05\n", "Date: Fri, 23 Jul 2021 \n", "Time: 08:18:30 \n", "No. Observations: 19644 \n", "Df Residuals: 19643 \n", "Df Model: 0 \n", "==============================================================================\n", " coef std err z P>|z| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "const 0.0006 4.8e-05 12.858 0.000 0.001 0.001\n", "x1 0.0032 3.2e-05 98.791 0.000 0.003 0.003\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res.summary()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.00061775, 0.00316159])" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res.params" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The difference may not appear too meaningful but we do get a greater mean and lesser volatility at the first decimal place of the result." ] } ], "metadata": { "celltoolbar": "Edit Metadata", "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.0" } }, "nbformat": 4, "nbformat_minor": 4 }