hmsidR/formulary.qmd at main · fgazzelloni/hmsidR · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
# Formulary {.unnumbered .unlisted}

## Statistical Distributions

### Normal (Gaussian) Distribution {.unnumbered .unlisted}

-   **Formula**: $$
    f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x - \mu)^2}{2\sigma^2}}
    $$
-   **Description**: The normal distribution is a continuous probability
    distribution characterized by a bell-shaped curve. It is defined by
    the mean ($\mu$) and standard deviation ($\sigma$).

### Binomial Distribution {.unnumbered .unlisted}

-   **Formula**: $$
    P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}
    $$
-   **Description**: The binomial distribution represents the number of
    successes in a fixed number of independent Bernoulli trials, with a
    constant probability of success $p$ in each trial. Here, $n$ is the
    number of trials and $k$ is the number of successes.

### Poisson Distribution {.unnumbered .unlisted}

-   **Formula**: $$
    P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}
    $$
-   **Description**: The Poisson distribution represents the probability
    of a given number of events occurring in a fixed interval of time or
    space, given the average number of times the event occurs over that
    interval. Here, $\lambda$ is the average number of events, $k$ is
    the number of occurrences, and $e$ is Euler's number.

### Exponential Distribution {.unnumbered .unlisted}

-   **Formula**: $$
    f(x) = \lambda e^{-\lambda x} \quad \text{for } x \ge 0
    $$
-   **Description**: The exponential distribution represents the time
    between events in a Poisson process. It is defined by the rate
    parameter $\lambda$.

### Uniform Distribution {.unnumbered .unlisted}

-   **Formula**: $$
    f(x) = \begin{cases}
    \frac{1}{b - a} & a \le x \le b \\
    0 & \text{otherwise}
    \end{cases}
    $$
-   **Description**: The uniform distribution describes an equal
    probability for all values in the interval $[a, b]$. It is a
    continuous distribution.

### Bernoulli Distribution {.unnumbered .unlisted}

-   **Formula**: $$
    P(X = x) = p^x (1 - p)^{1-x} \quad \text{for } x \in \{0, 1\}
    $$
-   **Description**: The Bernoulli distribution is a discrete
    distribution representing the outcome of a single binary experiment
    with success probability $p$.

### Beta Distribution {.unnumbered .unlisted}

-   **Formula**: $$
    f(x) = \frac{x^{\alpha-1} (1-x)^{\beta-1}}{B(\alpha, \beta)} \quad \text{for } 0 \le x \le 1
    $$
-   **Description**: The beta distribution is a continuous distribution
    defined on the interval \[0, 1\], parameterized by $\alpha$ and
    $\beta$, and is useful in Bayesian statistics.

### Gamma Distribution {.unnumbered .unlisted}

-   **Formula**: $$
    f(x) = \frac{\beta^\alpha x^{\alpha-1} e^{-\beta x}}{\Gamma(\alpha)} \quad \text{for } x \ge 0
    $$
-   **Description**: The gamma distribution is a continuous distribution
    defined by shape parameter $\alpha$ and rate parameter $\beta$. It
    generalizes the exponential distribution.

### Chi-Squared Distribution {.unnumbered .unlisted}

-   **Formula**: $$
    f(x) = \frac{1}{2^{k/2} \Gamma(k/2)} x^{k/2 - 1} e^{-x/2} \quad \text{for } x \ge 0
    $$
-   **Description**: The chi-squared distribution is a special case of
    the gamma distribution with $\alpha = k/2$ and $\beta = 1/2$, often
    used in hypothesis testing and confidence intervals.

### Student's t-Distribution {.unnumbered .unlisted}

-   **Formula**: $$
    f(t) = \frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{\nu\pi} \Gamma\left(\frac{\nu}{2}\right)} \left(1 + \frac{t^2}{\nu}\right)^{-\frac{\nu+1}{2}}
    $$
-   **Description**: The t-distribution is used to estimate population
    parameters when the sample size is small and the population variance
    is unknown. It is defined by the degrees of freedom $\nu$.

### F-Distribution {.unnumbered .unlisted}

-   **Formula**: $$
    f(x) = \frac{\left(\frac{d_1}{d_2}\right)^{d_1/2} x^{d_1/2 - 1}}{B\left(\frac{d_1}{2}, \frac{d_2}{2}\right) \left(1 + \frac{d_1}{d_2} x\right)^{(d_1 + d_2)/2}}
    $$
-   **Description**: The F-distribution is used to compare two variances
    and is defined by two degrees of freedom, $d_1$ and $d_2$.

### Multinomial Distribution {.unnumbered .unlisted}

-   **Formula**: $$
    P(X_1 = x_1, \ldots, X_k = x_k) = \frac{n!}{x_1! \cdots x_k!} p_1^{x_1} \cdots p_k^{x_k}
    $$
-   **Description**: The multinomial distribution generalizes the
    binomial distribution to more than two outcomes. It describes the
    probabilities of counts among categories.

### Geometric Distribution {.unnumbered .unlisted}

-   **Formula**: $$
    P(X = k) = (1 - p)^{k-1} p \quad \text{for } k \in \{1, 2, 3, \ldots\}
    $$
-   **Description**: The geometric distribution represents the number of
    trials needed to get the first success in a sequence of independent
    Bernoulli trials with success probability $p$.

### Hypergeometric Distribution {.unnumbered .unlisted}

-   **Formula**: $$
    P(X = k) = \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}}
    $$
-   **Description**: The hypergeometric distribution describes the
    probability of $k$ successes in $n$ draws from a finite population
    of size $N$ containing $K$ successes, without replacement.

### Log-Normal Distribution {.unnumbered .unlisted}

-   **Formula**: $$
    f(x) = \frac{1}{x\sigma\sqrt{2\pi}} e^{-\frac{(\ln x - \mu)^2}{2\sigma^2}} \quad \text{for } x > 0
    $$
-   **Description**: The log-normal distribution describes a variable
    whose logarithm is normally distributed. It is useful in modeling
    positively skewed data.

## Machine Learning Models

### Linear Regression {.unnumbered .unlisted}

-   **Formula**: $$
    y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_n x_n + \epsilon
    $$
-   **Description**: Predicts a continuous target variable based on
    linear relationships between the target and one or more predictor
    variables.

### Logistic Regression {.unnumbered .unlisted}

-   **Formula**: $$
    \text{logit}(P(Y=1)) = \ln\left(\frac{P(Y=1)}{1 - P(Y=1)}\right) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_n x_n
    $$
-   **Description**: Predicts a binary outcome based on linear
    relationships between the predictor variables and the log-odds of
    the outcome.

### Generalized Linear Model (GLM) {.unnumbered .unlisted}

-   **Formula**: $$
    g(E(Y)) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_n x_n
    $$
-   **Description**: A generalized linear model is a flexible
    generalization of ordinary linear regression that allows for the
    dependent variable $Y$ to have a distribution other than normal. The
    link function $g$ relates the expected value of the response
    variable $E(Y)$ to the linear predictors. $\beta_0$ is the
    intercept, and $\beta_i$ are the coefficients for the predictor
    variables $x_i$.

### Generalized Additive Model (GAM) {.unnumbered .unlisted}

-   **Formula**: $$
    g(E(Y)) = \beta_0 + f_1(x_1) + f_2(x_2) + \cdots + f_n(x_n)
    $$
-   **Description**: A generalized additive model is an extension of
    generalized linear models where the linear predictor depends
    linearly on unknown smooth functions of some predictor variables,
    and it allows for non-linear relationships between the dependent and
    independent variables. Here, $g$ is the link function, $E(Y)$ is the
    expected value of the response variable $Y$, $\beta_0$ is the
    intercept

### Decision Tree {.unnumbered .unlisted}

-   **Formula**: Recursive binary splitting
-   **Description**: Splits the data into subsets based on the value of
    input features. Each internal node represents a "test" on an
    attribute, each branch represents the outcome of the test, and each
    leaf node represents a class label or continuous value.

### Random Forest {.unnumbered .unlisted}

-   **Formula**: Aggregated decision trees
-   **Description**: Combines the predictions of multiple decision trees
    to improve accuracy and control over-fitting. Each tree is trained
    on a bootstrapped sample of the data and uses a random subset of
    features.

### Support Vector Machine (SVM) {.unnumbered .unlisted}

-   **Formula**: $$
    f(x) = \text{sign}(\mathbf{w} \cdot \mathbf{x} + b)
    $$
-   **Description**: Finds the hyperplane that best separates the
    classes in the feature space. The formula represents the decision
    boundary, where $\mathbf{w}$ is the weight vector and $b$ is the
    bias.

### K-Nearest Neighbors (KNN) {.unnumbered .unlisted}

-   **Formula**: $$
    \hat{y} = \frac{1}{k} \sum_{i=1}^{k} y_i
    $$
-   **Description**: Classifies a data point based on the majority class
    among its $k$ nearest neighbors. For regression, it predicts the
    average of the $k$ nearest neighbors' values.

### Naive Bayes {.unnumbered .unlisted}

-   **Formula**: $$
    P(Y|X) = \frac{P(X|Y)P(Y)}{P(X)}
    $$
-   **Description**: Assumes independence between predictors. It uses
    Bayes' theorem to predict the probability of a class given the
    predictors.

### Principal Component Analysis (PCA) {.unnumbered .unlisted}

-   **Formula**: $$
    Z = XW
    $$
-   **Description**: Reduces the dimensionality of the data by
    transforming the original variables into new uncorrelated variables
    (principal components), ordered by the amount of variance they
    capture.

### K-Means Clustering {.unnumbered .unlisted}

-   **Formula**: $$
    \arg \min_S \sum_{i=1}^{k} \sum_{x \in S_i} \| x - \mu_i \|^2
    $$
-   **Description**: Partitions the data into $k$ clusters by minimizing
    the sum of squared distances between the data points and the cluster
    centroids $\mu_i$.

### Neural Networks {.unnumbered .unlisted}

-   **Formula**: $$
    a^{(l)} = \sigma(z^{(l)})
    $$ $$
    z^{(l)} = W^{(l)}a^{(l-1)} + b^{(l)}
    $$
-   **Description**: Composed of layers of interconnected nodes
    (neurons). Each neuron's output is a weighted sum of its inputs
    passed through an activation function $\sigma$. The parameters
    $W^{(l)}$ and $b^{(l)}$ are the weights and biases of layer $l$.

### Convolutional Neural Networks (CNN) {.unnumbered .unlisted}

-   **Formula**: $$
    (f * g)(t) = \int_{-\infty}^{\infty} f(\tau)g(t - \tau) \, d\tau
    $$
-   **Description**: Uses convolutional layers to apply filters to the
    input, which helps in capturing spatial hierarchies in data,
    particularly useful for image and video processing.

### Recurrent Neural Networks (RNN) {.unnumbered .unlisted}

-   **Formula**: $$
    h_t = \sigma(W_h h_{t-1} + W_x x_t + b)
    $$
-   **Description**: Designed to recognize patterns in sequences of data
    by maintaining a hidden state $h_t$ that captures information from
    previous time steps.

### Gradient Boosting Machines (GBM) {.unnumbered .unlisted}

-   **Formula**: $$
    F_m(x) = F_{m-1}(x) + \eta \cdot h_m(x)
    $$
-   **Description**: Builds an additive model in a forward stage-wise
    manner. Each base learner $h_m$ is trained to reduce the residual
    error of the ensemble's previous predictions.

### Long Short-Term Memory Networks (LSTM) {.unnumbered .unlisted}

-   **Formula**: $$
    \begin{aligned}
    f_t &= \sigma(W_f \cdot [h_{t-1}, x_t] + b_f) \\
    i_t &= \sigma(W_i \cdot [h_{t-1}, x_t] + b_i) \\
    \tilde{C}_t &= \tanh(W_C \cdot [h_{t-1}, x_t] + b_C) \\
    C_t &= f_t * C_{t-1} + i_t * \tilde{C}_t \\
    o_t &= \sigma(W_o \cdot [h_{t-1}, x_t] + b_o) \\
    h_t &= o_t * \tanh(C_t)
    \end{aligned}
    $$
-   **Description**: A type of RNN that can learn long-term dependencies
    by using gates to control the flow of information.