Tools to fit Gaussian mixture models to univariate asset returns and explore model selection with information criteria. Built on scikit-learn's GaussianMixture.
- Fit 1d Gaussian mixtures per symbol with AIC/BIC selection.
- Optional EWMA volatility standardization (RiskMetrics-style).
- Plot fitted densities, best-fit density, and optional KDE overlay.
- Optional multivariate Gaussian mixture fit on common-date returns.
- Python 3.10+
- numpy
- pandas
- scikit-learn
- matplotlib (for plotting)
- scipy (optional, for KDE overlay)
xreturns_mix.py: main script to fit mixtures to return series.xfit_mix_1d.py: 1d mixture simulation and fit demo.xfit_mix_mv.py: multivariate mixture simulation and fit demo.xreturn_stats_flat.py: return summary stats and correlations.mixture.py: mixture utilities.stats.py: return utilities and I/O helpers.
Run the main script:
python xreturns_mix.pyEdit toggles near the top of xreturns_mix.py to control behavior:
k_max: max number of 1d mixture components per symbol.k_mv_max: max number of multivariate mixture components.symbols: list of tickers to include, orNonefor all.ewma_lambda:0.94to standardize returns by EWMA volatility, orNone.best_plot_criterion:"lik","AIC","BIC", orNone.show_best_density_plot: show or hide the best-fit plot.show_kde_plot: overlay a KDE if SciPy is available.density_suffix: set to".png"to save plots per symbol.
- Multivariate fits use only dates where all assets have returns.
- AIC/BIC selection for 1d fits uses the Gaussian mixture log-likelihood.
- Correlation summaries for multivariate fits report mean and mean absolute off-diagonal correlations per component.
A run of python xreturns_mix.py gave the results below. Looking at the
returns of SPY TLT VXX, the Bayesian Information Criterion (BIC) selects
a 2-component normal mixture. In the low-volatility state, with SPY daily returns having
standard deviation 0.73%, which occurs 83% of the time, SPY returns are positive, VXX returns
are negative, and TLT returns are close to zero. In the high-volatility state, which occurs 17%
of the time, with daily SPY vol of 2.46%, SPY returns are negative, but VXX and TLT returns are
positive.
prices file: etfs_adj_close.csv
#obs, symbols, columns: 6553 3 3
return_type: log
ret_scale: 100.0
ewma_lambda: None
max # of mixture components: 3
max # of mv mixture components: 3
#obs, first, last: 6553 2000-01-03 2026-01-22
multivariate fit: n=2008 d=3
model selection over k=1..3 (aic, bic)
k= 1 aic= 22246.900 bic= 22297.344
k= 2 aic= 20689.604 bic= 20796.097
k= 3 aic= 20651.131 bic= 20813.673
bic selects k=2
aic selects k=3
weights, means, stds per component (sorted by descending weight)
weight mean_SPY mean_TLT mean_VXX sd_SPY sd_TLT sd_VXX
label component
fit_k2 1 0.8297 0.1451 -0.0295 -0.7859 0.7297 0.7980 2.9582
2 0.1703 -0.4099 0.1070 2.5955 2.4559 1.6307 8.2447
correlations per component (stacked blocks)
SPY TLT VXX
label component var_i
fit_k2 1 SPY 1.000 -0.008 -0.750
TLT -0.008 1.000 0.060
VXX -0.750 0.060 1.000
2 SPY 1.000 -0.244 -0.748
TLT -0.244 1.000 0.179
VXX -0.748 0.179 1.000
off-diagonal correlation summary by component
mean mean_abs
component
1 -0.233 0.273
2 -0.271 0.390
SPY 6552 returns from 2000-01-04 to 2026-01-22
SPY selected k (aic, bic): 2, 2
SPY moments (mean, sd, skew, excess kurtosis)
mean sd skew ex_kurt loglik aic bic
label
empirical 0.0308 1.2212 -0.2099 11.5358 NaN NaN NaN
fit_k1 0.0308 1.2212 0.0000 0.0000 -10605.9472 21215.8944 21229.4695
fit_k2 0.0308 1.2212 -0.4469 4.7684 -9716.3974 19442.7947 19476.7324
fit_k3 0.0308 1.2212 -0.3185 3.5960 -9736.1796 19488.3592 19542.6594
SPY fit_k2 parameters
weight mean sd
label component
SPY_fit_k2 1 0.7850 0.1071 0.7081
2 0.2150 -0.2478 2.2375
SPY fit_k3 parameters
weight mean sd
label component
SPY_fit_k3 1 0.6981 0.1181 0.6282
2 0.1581 0.5251 1.8868
3 0.1438 -0.9366 1.8161
TLT 5908 returns from 2002-07-31 to 2026-01-22
TLT selected k (aic, bic): 2, 2
TLT moments (mean, sd, skew, excess kurtosis)
mean sd skew ex_kurt loglik aic bic
label
empirical 0.0148 0.9049 -0.0194 3.4006 NaN NaN NaN
fit_k1 0.0148 0.9049 0.0000 0.0000 -7792.8223 15589.6446 15603.0127
fit_k2 0.0148 0.9049 -0.1708 1.3111 -7621.5794 15253.1589 15286.5792
fit_k3 0.0148 0.9049 -0.0390 1.0985 -7644.5557 15305.1113 15358.5838
TLT fit_k2 parameters
weight mean sd
label component
TLT_fit_k2 1 0.6330 0.0743 0.6349
2 0.3670 -0.0880 1.2326
TLT fit_k3 parameters
weight mean sd
label component
TLT_fit_k3 1 0.5529 0.1191 0.5638
2 0.2525 -0.6767 0.9304
3 0.1946 0.6158 1.0814
VXX 2008 returns from 2018-01-26 to 2026-01-22
VXX selected k (aic, bic): 2, 2
VXX moments (mean, sd, skew, excess kurtosis)
mean sd skew ex_kurt loglik aic bic
label
empirical -0.2100 4.5225 1.3619 7.2757 NaN NaN NaN
fit_k1 -0.2100 4.5225 0.0000 0.0000 -5879.4140 11762.8280 11774.0378
fit_k2 -0.2100 4.5225 1.0868 4.4273 -5610.2475 11230.4951 11258.5195
fit_k3 -0.2100 4.5225 1.2001 4.8141 -5614.3020 11244.6040 11289.4432
VXX fit_k2 parameters
weight mean sd
label component
VXX_fit_k2 1 0.8003 -0.9408 2.7218
2 0.1997 2.7192 7.8749
VXX fit_k3 parameters
weight mean sd
label component
VXX_fit_k3 1 0.4852 0.1485 2.3958
2 0.3361 -2.5545 2.4717
3 0.1787 3.2256 8.0550
mixture fit summary:
n_obs k_bic k_aic bic aic
symbol
SPY 6552 2 2 19476.7324 19442.7947
TLT 5908 2 2 15286.5792 15253.1589
VXX 2008 2 2 11258.5195 11230.4951
time elapsed: 1.657 seconds
Here is a plot of the probability density of VXX returns showing its positive skew. A 2-component normal mixture fits much better than the normal distribution.
