Skip to content

Commit ac6d720

Browse files
authored
Update README.md
1 parent 3fb6094 commit ac6d720

1 file changed

Lines changed: 23 additions & 14 deletions

File tree

README.md

Lines changed: 23 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -8,26 +8,35 @@ It is particularly useful in high-dimensional settings where the number of featu
88
By constraining the covariance matrices in this manner, we achieve a balance between flexibility and computational efficiency while avoiding overfitting.
99

1010
We consider a dataset represented as $X \in \mathbb{R}^{n \times m}$ with **$n$** data points and **$m$** features, potentially with $m \gg n$. A Gaussian Mixture Model with $K$ components over elements $x^{(i)} \in \mathbb{R}^m$ is defined as:
11-
$$
12-
p(x) = \sum_{k=1}^{K} \pi_k \mathcal{N}(x \mid \mu_k, \Sigma_k)\text{,}
13-
$$
11+
12+
$$p(x) = \sum_{k=1}^{K} \pi_k \mathcal{N}(x \mid \mu_k, \Sigma_k)\text{,}$$
13+
1414
with non-negative mixture weights $\pi_k$ that sum to one. Each Gaussian component has mean $\mu_k \in \mathbb{R}^m$ and covariance matrix of the form:
15+
1516
$$
16-
\Sigma_k = \text{diag}(d_k) + L_k L_k^T
17+
\Sigma_k = \text{diag}(d_k) + L_k L_k^T\text{,}
1718
$$
19+
1820
where:
1921
- $d_k \in \mathbb{R}^m$ is the diagonal vector capturing independent feature noise,
2022
- $L_k \in \mathbb{R}^{m \times k}$ (with rank $k \ll m$) captures the dominant covariance structure via a low-rank factor.
2123

22-
The EM algorithm for fitting GMMs alternates between an **E**xpectation and a **M**aximization:
24+
The EM algorithm for fitting GMMs alternates between **E**xpectation and **M**aximization:
25+
26+
1. **E-step:** Compute responsibilities based on current parameters:
27+
28+
$$\gamma_{ik} = \frac{\pi_k \mathcal{N}(x^{(i)} \mid \mu_k, \Sigma_k)}{\sum_j \pi_j \mathcal{N}(x^{(i)} \mid \mu_j, \Sigma_j)}\text{.}$$
29+
30+
2. **M-step:** Update parameters by maximizing expected complete-data log-likelihood.
31+
32+
$$\pi_k = \frac{n_k}{n}, \quad \text{where } n_k = \sum_{i=1}^{n} \gamma_{ik}\text{,}$$
33+
34+
$$\mu_k = \frac{1}{n_k} \sum_{i=1}^{n} \gamma_{ik} x^{(i)}\text{,}$$
35+
36+
$$\Sigma_k = \frac{1}{n_k} \sum_{i=1}^{n} \gamma_{ik} (x^{(i)} - \mu_k)(x^{(i)} - \mu_k)^\top\text{.}$$
2337

24-
- **E-step:** Compute responsibilities based on current parameters:
25-
$$
26-
\gamma_{nk} = \frac{\pi_k \mathcal{N}(x_n \mid \mu_k, \Sigma_k)}{\sum_j \pi_j \mathcal{N}(x_n \mid \mu_j, \Sigma_j)}
27-
$$
28-
Computing responsibilities requires inverting $\Sigma_k$.
29-
Since this can be prohibitively expensive in large dimensions, we leverage the structure of the covariance matrices to calculate the responsibilities using an $\mathcal{O}(k^3)$ matrix inversion rather than a full $\mathcal{O}(m^3)$ inversion.
30-
- **M-step:** Update parameters by maximizing expected complete-data log-likelihood.
31-
Since updating $\Sigma_k$ directly is intractable in high dimensions, an **inner EM loop** is performed using **Factor Analysis**, where $L_k$ and $D_k$ are iteratively estimated to approximate the full covariance structure.
38+
Computing responsibilities requires inverting $\Sigma_k$.
39+
Since this can be prohibitively expensive in large dimensions, we leverage the structure of the covariance matrices to calculate the responsibilities using an $\mathcal{O}(k^3)$ matrix inversion rather than a full $\mathcal{O}(m^3)$ inversion.
40+
Additionally, since updating $\Sigma_k$ directly is intractable in high dimensions, we perform an **inner EM loop** during maximization is performed using **Factor Analysis**, where $L_k$ and $D_k$ are iteratively estimated to approximate the full covariance structure.
3241

33-
This low-rank plus diagonal structure is particularly advantageous in scenarios such as **text modeling, gene expression analysis, and compressed sensing**, where **\( m \gg n \)** leads to singular or poorly conditioned full covariance estimates. Our package leverages efficient matrix decompositions and batched computations to ensure scalability, making it well-suited for large-scale, high-dimensional datasets.
42+
This low-rank plus diagonal structure is particularly advantageous in settings such as **time-series analysis (e.g. for finance), text modeling, gene expression analysis, and compressed sensing**, where $m \gg n$ leads to singular or poorly conditioned full covariance estimates. Our package leverages efficient matrix decompositions and batched computations to ensure scalability, making it well-suited for large-scale, high-dimensional datasets.

0 commit comments

Comments
 (0)