Introduction

P.H.A.T. - Pareto Hybrids with Asymmetric Tails

The Phat distribution is an attempt to address the issues of fat tails in two-tailed data. It is a two-tailed, fully-continuous, well-defined asymmetric power law probability distribution.

It is a mixture model of two Pareto hybrid distributions, as described in 2009 by Julie Carreau and Yoshua Bengio with:

  • Gaussian body

  • distinct Pareto power laws in either tail.

The distribution requires only 4 parameters:

  • \(\mu, \sigma\) in the Gaussian body

  • \(\xi_{\text{left}}, \xi_{\text{right}}\), being the inverse tail index (1/\(\alpha\)) for either Paretian tail.

The phat-tails package makes available several methods to fit a given time-series dataset to the parameters of the Phat distribution and produce a forecast with the results.

Installation

Installation available via pip

$ pip install phat-tails

Quickstart

All pertinent classes and functions are imported via the module phat.

[130]:
import phat as ph

The probability distribution is found in the main class, Phat, which mimics the structure of the continuous distributions found in scipy.stats.

We pass the four parameters to instantiate the distribution. For simplicity, we will show the distribution with equal tail indices initially.

[131]:
mean, sig, shape = 0, 1, 1/5
phat_dist = ph.Phat(mean, sig, shape, shape)

Below is a complete rendering of the distribution, with breakdown among the component Gaussian and Pareto tails.

../_images/notebooks_intro_8_0.png

Below we demonstrate the ability to generate asymmetric tails. We overlay two different Phat distributions, one with symmetric tail indices of \(\alpha=2\) and the other with asymmetric tail indices, \(\alpha_{\text{left}}=2\) and \(\alpha_{\text{right}}=20\).

We can see that the left tails are identical. In the right tails, the distributions appear to differ only modestly, however, this difference leads to dramatically different effects.

[133]:
mean, sig = 0, 1
shape_l1, shape_r = 1/2, 1/2
dist1 = ph.Phat(mean, sig, shape_l1, shape_r)
shape_l2, shape_r = 1/2, 1/20
dist2 = ph.Phat(mean, sig, shape_l2, shape_r,)
../_images/notebooks_intro_11_0.png

The Phat class has common methods such as pdf, cdf, sf, ppf. It can also calculate negative log-likelihood and first and second moments. Derivations are found here.

[135]:
mean, sig, shape_l, shape_r = 0,1, 1/5, 1/4
phat_dist = ph.Phat(mean, sig, shape_l, shape_r)
phat_dist.pdf(10)
[135]:
array([0.00482994])
[136]:
phat_dist.cdf([.05,1,-0.1])
[136]:
array([0.51144212, 0.70624103, 0.47567736])
[137]:
phat_dist.sf([.05])
[137]:
array([0.48855788])
[138]:
assert phat_dist.sf([.05]) == 1 - phat_dist.cdf([.05])
[139]:
import numpy as np
phat_dist.ppf(np.linspace(0,1,5))
[139]:
array([       -inf, -1.63735173,  0.00569209,  1.68013031,         inf])
[140]:
phat_dist.nll(1) # Negative Log-Likelihood
[140]:
array([1.8510368])
[141]:
phat_dist.mean()
[141]:
0.0796142959815449
[142]:
phat_dist.std()
[142]:
3.7926873955033087

It can also generate random variables (and standardized random variables).

[143]:
phat_dist.rvs(20)
[143]:
array([ 4.10522047,  3.55472489, -0.93678543,  0.51636282,  1.34579594,
        1.01540556, -0.80886831, -0.67534635,  0.09117024,  6.42834252,
        0.35292732,  3.1609973 ,  4.03059888,  5.0758071 , -3.4068779 ,
       -0.5631759 ,  0.76542791, -6.26470075, -5.32952901,  0.60021098])
[144]:
phat_dist.std_rvs(20)
[144]:
array([-0.12796173,  0.43906166, -0.737519  , -0.51000483,  0.17890436,
       -0.08411178,  0.28586079, -0.2721797 , -0.22500339,  0.22734977,
       -0.25146567,  0.64438798, -1.50587683,  0.1999281 , -0.29127155,
        0.35417953, -0.01986807,  0.5101493 ,  0.59852587, -0.38408617])

Importantly, Phat captures the undefined moments that result when \(\alpha < 2\).

[145]:
shape_l, shape_r, mean, sig = 1, 1, 0, 1
phat_dist = ph.Phat(mean, sig, shape_l, shape_r)
[146]:
phat_dist.mean()
/Users/spindicate/Documents/programming/investing/analysis/options/phat/src/phat/utils.py:75: RuntimeWarning: invalid value encountered in matmul
  return (self.p @ stack)
[146]:
nan
[147]:
phat_dist.var()
[147]:
nan

Phat has a fit method, which generates a standard Maximum Likelihood Estimate (MLE), although this is not the recommended approach to fitting this distribution.

In addition to the main distribution class, the package also provides:

Dependencies

  • Python versions: 3.9

  • numpy 1.19.5

  • numba 0.53.*

  • scipy 1.7.*

  • scikit-learn 0.24.*

  • statsmodels 0.12.*

  • tensorflow 2.5.0

  • tensorflow-probability 0.12.2

  • matplotlib 3.5.1

  • arch 4.19

  • pmdarima 1.8.2

  • tqdm 4.61.2

Also see requirements and compatibility specifications for Tensorflow and Numba

Suggested

Also Check Out

  • tail-estimation

    • built as part of Ivan Voitalov et al (2019) on tail index estimation techniques for power law phenomenon in scale-free networks

    • code from this package is utilized in the two_tailed_hill_double_bootstrap function

  • thresholdmodeling for a package on manual Peak-over-Threshold (PoT) analysis.

Enhancements

Potential enhancements under consideration:

  • truncated Pareto tails

  • additional tail index estimation techniques

  • integration with Heston or other stochastic volatility models

  • incorporation of Phat innovations into fit of AR-GARCH or ARMA-GARCH via custom model

  • generalization to additional GARCH models

  • better optimization of Garchcaster.forecast method