Neural Networks¶

As outlined in Carreau and Bengio (2009), the parameters of the Phat distribution can also be fit utilizing a simple neural network. For a univariate model, the need for such a structure may not be obvious, but the structure can be built upon to add additional free paramters (such as the mixture weights between the Carbens) and also conditional models with exogeneous variables.

First, we will demonstrate the technique simply on a Gaussian distribution.

Tensorflow is required.

Fitting a Standard Gaussian¶

A conditional density model is estimated by providing one or many independent variables, \(X\), and a dependent variable, \(Y\). In our case, we are looking to fit a univariate independent variable. In Tensorflow, we must provide both \(X\) and \(Y\) input tensors, so to accomplish this we can simply set \(X=0\) for every sample of \(Y\):

\[\begin{split}X_i = 0; i = 1 ... n \\Y_i = \text{independent variable}\end{split}\]

In this example, we generate 100,000 samples from a standard Gaussian and fit the via the negative log-likelihood. phat-tails has a custom DataSplit class we can use to split the data for training purposes.

[2]:

import numpy as np
import scipy.stats as scist
import matplotlib.pyplot as plt

import phat as ph

n = 100000
y_data = scist.norm(0, 1).rvs(size=n)
data = ph.DataSplit(y_data)

2022-01-09 05:21:37.355360: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

Below we can see the kernel density of our samples looking clearly like the PDF of the Gaussian

[3]:

plt.hist(y_data, bins=100)
plt.rcParams['patch.edgecolor'] = 'C0'
plt.show()

We have built a very simple neural network of DN class that takes in both \(X\) and \(Y\) variables, passes \(X\) through 1 hidden layer (utilizing a tanh activation), then to an intermediate layer with two nodes, \(\mu\) and \(\sigma\), the parameters of the Normal distribution. \(\sigma\) is then passed through a customized nnelu activation, which is simply the relu with a restriciton to only positive numbers.

The loss function is the Gaussian negative log-likelihood.

[4]:

import tensorflow as tf
from phat.learn.normnet import DN, gnll_loss

dn = DN(neurons=200)
lr = tf.keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate=1e-3,
    decay_steps=250,
    decay_rate=0.8
)
dn.compile(
    loss=gnll_loss,
    optimizer=tf.keras.optimizers.Adam(learning_rate=lr),
    metrics=['mean', 'std']
)
dn.build_graph().summary()

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
input_1 (InputLayer)            [(None, 1)]          0
__________________________________________________________________________________________________
h1 (Dense)                      (None, 200)          400         input_1[0][0]
__________________________________________________________________________________________________
mu (Dense)                      (None, 1)            201         h1[0][0]
__________________________________________________________________________________________________
sigma (Dense)                   (None, 1)            201         h1[0][0]
__________________________________________________________________________________________________
pvec (Concatenate)              (None, 2)            0           mu[0][0]
                                                                 sigma[0][0]
==================================================================================================
Total params: 802
Trainable params: 802
Non-trainable params: 0
__________________________________________________________________________________________________

We can see the graph visually via plot_model

[5]:

dn.plot_model()

[5]:

[6]:

stop_loss = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=2, verbose=0)
history = dn.fit(
    data.train, epochs=3,
    validation_data=data.test,
    callbacks=[stop_loss], batch_size=32, verbose=0
)

2022-01-09 05:21:38.797366: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)

The model minimized the loss almost immediately, resulting in the parameters below. They are shown next to return values from scipy’s fit function utilizing the Maximum Likelihood Estimate (MLE):

[7]:

import pandas as pd

mean, std = dn.predict(np.zeros(1))[0]
m, s = scist.norm.fit(data.raw.y)

df = pd.DataFrame([[mean, std], [m, s]], index=['ANN', 'MLE'], columns=['mean', 'std'])
df.style.format({'mean': '{:.4f}', 'std': '{:.4f}'})

[7]:

	mean	std
ANN	0.0007	1.0004
MLE	-0.0006	0.9979

[8]:

The fit for both mean and standard deviation is fairly close, though we should be cognizant that, in terms of daily returns, the delta of 0.0013 still translates to a 0.32% CAGR.

Fitting S&P 500 Daily Returns¶

We will repeat the same process now for S&P 500 daily returns.

[9]:

import yfinance as yf

sp = yf.download('^GSPC')
sp_rets = sp.Close.pct_change()[1:]
sp_rets.plot()
plt.show()

[*********************100%***********************]  1 of 1 completed

[10]:

data = ph.DataSplit(sp_rets.values)

dn = DN()
lr = tf.keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate=1e-3,
    decay_steps=100,
    decay_rate=0.9
)
dn.compile(
    loss=gnll_loss,
    optimizer=tf.keras.optimizers.Adam(learning_rate=lr),
    metrics=['mean', 'std']
)
stop_loss = tf.keras.callbacks.EarlyStopping(
    monitor='val_loss', min_delta=0,
    patience=20, verbose=1, mode='auto'
)
history = dn.fit(
    data.train, epochs=100,
    validation_data=data.test,
    callbacks=[stop_loss], batch_size=32, verbose=0
)

Epoch 00064: early stopping

[11]:

mean, std = dn.predict(np.zeros(1))[0]
m, s = scist.norm.fit(sp_rets)

df = pd.DataFrame([[mean, std], [m, s]], index=['ANN', 'MLE'], columns=['mean', 'std'])
df.style.format({'mean': '{:.5f}', 'std': '{:.4f}'})

[11]:

	mean	std
ANN	0.00035	0.0133
MLE	0.00037	0.0131

[12]:

In this instance, the delta between the estimates accounts for just 0.004% CAGR.

A visualation of the gradient descent (towargs the mean) is avaible via loss_progress.

[13]:

import matplotlib
from IPython.core.display import HTML

matplotlib.use("Agg")

Writer = matplotlib.animation.writers['ffmpeg']
writer = Writer(fps=15, metadata=dict(artist='rskene'), bitrate=1800)

anime = dn.loss_progress(history)
anime.save('nnet_norm_fit_sp.mp4', writer=writer)

HTML(anime.to_html5_video())