Skip to content

vivek-v-rao/Return-Mixtures-of-Normals

Repository files navigation

Return-Mixtures-of-Normals

Tools to fit Gaussian mixture models to univariate asset returns and explore model selection with information criteria. Built on scikit-learn's GaussianMixture.

Features

  • Fit 1d Gaussian mixtures per symbol with AIC/BIC selection.
  • Optional EWMA volatility standardization (RiskMetrics-style).
  • Plot fitted densities, best-fit density, and optional KDE overlay.
  • Optional multivariate Gaussian mixture fit on common-date returns.

Requirements

  • Python 3.10+
  • numpy
  • pandas
  • scikit-learn
  • matplotlib (for plotting)
  • scipy (optional, for KDE overlay)

Files

  • xreturns_mix.py: main script to fit mixtures to return series.
  • xfit_mix_1d.py: 1d mixture simulation and fit demo.
  • xfit_mix_mv.py: multivariate mixture simulation and fit demo.
  • xreturn_stats_flat.py: return summary stats and correlations.
  • mixture.py: mixture utilities.
  • stats.py: return utilities and I/O helpers.

Usage

Run the main script:

python xreturns_mix.py

Edit toggles near the top of xreturns_mix.py to control behavior:

  • k_max: max number of 1d mixture components per symbol.
  • k_mv_max: max number of multivariate mixture components.
  • symbols: list of tickers to include, or None for all.
  • ewma_lambda: 0.94 to standardize returns by EWMA volatility, or None.
  • best_plot_criterion: "lik", "AIC", "BIC", or None.
  • show_best_density_plot: show or hide the best-fit plot.
  • show_kde_plot: overlay a KDE if SciPy is available.
  • density_suffix: set to ".png" to save plots per symbol.

Notes

  • Multivariate fits use only dates where all assets have returns.
  • AIC/BIC selection for 1d fits uses the Gaussian mixture log-likelihood.
  • Correlation summaries for multivariate fits report mean and mean absolute off-diagonal correlations per component.

Sample Output

A run of python xreturns_mix.py gave the results below. Looking at the returns of SPY TLT VXX, the Bayesian Information Criterion (BIC) selects a 2-component normal mixture. In the low-volatility state, with SPY daily returns having standard deviation 0.73%, which occurs 83% of the time, SPY returns are positive, VXX returns are negative, and TLT returns are close to zero. In the high-volatility state, which occurs 17% of the time, with daily SPY vol of 2.46%, SPY returns are negative, but VXX and TLT returns are positive.

prices file: etfs_adj_close.csv
#obs, symbols, columns: 6553 3 3
return_type: log
ret_scale: 100.0
ewma_lambda: None
max # of mixture components: 3
max # of mv mixture components: 3
#obs, first, last: 6553 2000-01-03 2026-01-22

multivariate fit: n=2008 d=3
model selection over k=1..3 (aic, bic)
k= 1  aic=   22246.900  bic=   22297.344
k= 2  aic=   20689.604  bic=   20796.097
k= 3  aic=   20651.131  bic=   20813.673
bic selects k=2
aic selects k=3

weights, means, stds per component (sorted by descending weight)
                  weight  mean_SPY  mean_TLT  mean_VXX  sd_SPY  sd_TLT  sd_VXX
label  component                                                              
fit_k2 1          0.8297    0.1451   -0.0295   -0.7859  0.7297  0.7980  2.9582
       2          0.1703   -0.4099    0.1070    2.5955  2.4559  1.6307  8.2447

correlations per component (stacked blocks)
                          SPY    TLT    VXX
label  component var_i                     
fit_k2 1         SPY    1.000 -0.008 -0.750
                 TLT   -0.008  1.000  0.060
                 VXX   -0.750  0.060  1.000
       2         SPY    1.000 -0.244 -0.748
                 TLT   -0.244  1.000  0.179
                 VXX   -0.748  0.179  1.000

off-diagonal correlation summary by component
            mean  mean_abs
component                 
1         -0.233     0.273
2         -0.271     0.390

SPY 6552 returns from 2000-01-04 to 2026-01-22
SPY selected k (aic, bic): 2, 2
SPY moments (mean, sd, skew, excess kurtosis)
             mean      sd    skew  ex_kurt      loglik         aic         bic
label                                                                         
empirical  0.0308  1.2212 -0.2099  11.5358         NaN         NaN         NaN
fit_k1     0.0308  1.2212  0.0000   0.0000 -10605.9472  21215.8944  21229.4695
fit_k2     0.0308  1.2212 -0.4469   4.7684  -9716.3974  19442.7947  19476.7324
fit_k3     0.0308  1.2212 -0.3185   3.5960  -9736.1796  19488.3592  19542.6594

SPY fit_k2 parameters
                      weight    mean      sd
label      component                        
SPY_fit_k2 1          0.7850  0.1071  0.7081
           2          0.2150 -0.2478  2.2375

SPY fit_k3 parameters
                      weight    mean      sd
label      component                        
SPY_fit_k3 1          0.6981  0.1181  0.6282
           2          0.1581  0.5251  1.8868
           3          0.1438 -0.9366  1.8161

TLT 5908 returns from 2002-07-31 to 2026-01-22
TLT selected k (aic, bic): 2, 2
TLT moments (mean, sd, skew, excess kurtosis)
             mean      sd    skew  ex_kurt     loglik         aic         bic
label                                                                        
empirical  0.0148  0.9049 -0.0194   3.4006        NaN         NaN         NaN
fit_k1     0.0148  0.9049  0.0000   0.0000 -7792.8223  15589.6446  15603.0127
fit_k2     0.0148  0.9049 -0.1708   1.3111 -7621.5794  15253.1589  15286.5792
fit_k3     0.0148  0.9049 -0.0390   1.0985 -7644.5557  15305.1113  15358.5838

TLT fit_k2 parameters
                      weight    mean      sd
label      component                        
TLT_fit_k2 1          0.6330  0.0743  0.6349
           2          0.3670 -0.0880  1.2326

TLT fit_k3 parameters
                      weight    mean      sd
label      component                        
TLT_fit_k3 1          0.5529  0.1191  0.5638
           2          0.2525 -0.6767  0.9304
           3          0.1946  0.6158  1.0814

VXX 2008 returns from 2018-01-26 to 2026-01-22
VXX selected k (aic, bic): 2, 2
VXX moments (mean, sd, skew, excess kurtosis)
             mean      sd    skew  ex_kurt     loglik         aic         bic
label                                                                        
empirical -0.2100  4.5225  1.3619   7.2757        NaN         NaN         NaN
fit_k1    -0.2100  4.5225  0.0000   0.0000 -5879.4140  11762.8280  11774.0378
fit_k2    -0.2100  4.5225  1.0868   4.4273 -5610.2475  11230.4951  11258.5195
fit_k3    -0.2100  4.5225  1.2001   4.8141 -5614.3020  11244.6040  11289.4432

VXX fit_k2 parameters
                      weight    mean      sd
label      component                        
VXX_fit_k2 1          0.8003 -0.9408  2.7218
           2          0.1997  2.7192  7.8749

VXX fit_k3 parameters
                      weight    mean      sd
label      component                        
VXX_fit_k3 1          0.4852  0.1485  2.3958
           2          0.3361 -2.5545  2.4717
           3          0.1787  3.2256  8.0550

mixture fit summary:
        n_obs  k_bic  k_aic         bic         aic
symbol                                             
SPY      6552      2      2  19476.7324  19442.7947
TLT      5908      2      2  15286.5792  15253.1589
VXX      2008      2      2  11258.5195  11230.4951

time elapsed: 1.657 seconds

Here is a plot of the probability density of VXX returns showing its positive skew. A 2-component normal mixture fits much better than the normal distribution. VXX density

Releases

No releases published

Packages

 
 
 

Contributors

Languages