-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathformulary.qmd
More file actions
306 lines (242 loc) · 11.4 KB
/
formulary.qmd
File metadata and controls
306 lines (242 loc) · 11.4 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
# Formulary {.unnumbered .unlisted}
## Statistical Distributions
### Normal (Gaussian) Distribution {.unnumbered .unlisted}
- **Formula**: $$
f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x - \mu)^2}{2\sigma^2}}
$$
- **Description**: The normal distribution is a continuous probability
distribution characterized by a bell-shaped curve. It is defined by
the mean ($\mu$) and standard deviation ($\sigma$).
### Binomial Distribution {.unnumbered .unlisted}
- **Formula**: $$
P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}
$$
- **Description**: The binomial distribution represents the number of
successes in a fixed number of independent Bernoulli trials, with a
constant probability of success $p$ in each trial. Here, $n$ is the
number of trials and $k$ is the number of successes.
### Poisson Distribution {.unnumbered .unlisted}
- **Formula**: $$
P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}
$$
- **Description**: The Poisson distribution represents the probability
of a given number of events occurring in a fixed interval of time or
space, given the average number of times the event occurs over that
interval. Here, $\lambda$ is the average number of events, $k$ is
the number of occurrences, and $e$ is Euler's number.
### Exponential Distribution {.unnumbered .unlisted}
- **Formula**: $$
f(x) = \lambda e^{-\lambda x} \quad \text{for } x \ge 0
$$
- **Description**: The exponential distribution represents the time
between events in a Poisson process. It is defined by the rate
parameter $\lambda$.
### Uniform Distribution {.unnumbered .unlisted}
- **Formula**: $$
f(x) = \begin{cases}
\frac{1}{b - a} & a \le x \le b \\
0 & \text{otherwise}
\end{cases}
$$
- **Description**: The uniform distribution describes an equal
probability for all values in the interval $[a, b]$. It is a
continuous distribution.
### Bernoulli Distribution {.unnumbered .unlisted}
- **Formula**: $$
P(X = x) = p^x (1 - p)^{1-x} \quad \text{for } x \in \{0, 1\}
$$
- **Description**: The Bernoulli distribution is a discrete
distribution representing the outcome of a single binary experiment
with success probability $p$.
### Beta Distribution {.unnumbered .unlisted}
- **Formula**: $$
f(x) = \frac{x^{\alpha-1} (1-x)^{\beta-1}}{B(\alpha, \beta)} \quad \text{for } 0 \le x \le 1
$$
- **Description**: The beta distribution is a continuous distribution
defined on the interval \[0, 1\], parameterized by $\alpha$ and
$\beta$, and is useful in Bayesian statistics.
### Gamma Distribution {.unnumbered .unlisted}
- **Formula**: $$
f(x) = \frac{\beta^\alpha x^{\alpha-1} e^{-\beta x}}{\Gamma(\alpha)} \quad \text{for } x \ge 0
$$
- **Description**: The gamma distribution is a continuous distribution
defined by shape parameter $\alpha$ and rate parameter $\beta$. It
generalizes the exponential distribution.
### Chi-Squared Distribution {.unnumbered .unlisted}
- **Formula**: $$
f(x) = \frac{1}{2^{k/2} \Gamma(k/2)} x^{k/2 - 1} e^{-x/2} \quad \text{for } x \ge 0
$$
- **Description**: The chi-squared distribution is a special case of
the gamma distribution with $\alpha = k/2$ and $\beta = 1/2$, often
used in hypothesis testing and confidence intervals.
### Student's t-Distribution {.unnumbered .unlisted}
- **Formula**: $$
f(t) = \frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{\nu\pi} \Gamma\left(\frac{\nu}{2}\right)} \left(1 + \frac{t^2}{\nu}\right)^{-\frac{\nu+1}{2}}
$$
- **Description**: The t-distribution is used to estimate population
parameters when the sample size is small and the population variance
is unknown. It is defined by the degrees of freedom $\nu$.
### F-Distribution {.unnumbered .unlisted}
- **Formula**: $$
f(x) = \frac{\left(\frac{d_1}{d_2}\right)^{d_1/2} x^{d_1/2 - 1}}{B\left(\frac{d_1}{2}, \frac{d_2}{2}\right) \left(1 + \frac{d_1}{d_2} x\right)^{(d_1 + d_2)/2}}
$$
- **Description**: The F-distribution is used to compare two variances
and is defined by two degrees of freedom, $d_1$ and $d_2$.
### Multinomial Distribution {.unnumbered .unlisted}
- **Formula**: $$
P(X_1 = x_1, \ldots, X_k = x_k) = \frac{n!}{x_1! \cdots x_k!} p_1^{x_1} \cdots p_k^{x_k}
$$
- **Description**: The multinomial distribution generalizes the
binomial distribution to more than two outcomes. It describes the
probabilities of counts among categories.
### Geometric Distribution {.unnumbered .unlisted}
- **Formula**: $$
P(X = k) = (1 - p)^{k-1} p \quad \text{for } k \in \{1, 2, 3, \ldots\}
$$
- **Description**: The geometric distribution represents the number of
trials needed to get the first success in a sequence of independent
Bernoulli trials with success probability $p$.
### Hypergeometric Distribution {.unnumbered .unlisted}
- **Formula**: $$
P(X = k) = \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}}
$$
- **Description**: The hypergeometric distribution describes the
probability of $k$ successes in $n$ draws from a finite population
of size $N$ containing $K$ successes, without replacement.
### Log-Normal Distribution {.unnumbered .unlisted}
- **Formula**: $$
f(x) = \frac{1}{x\sigma\sqrt{2\pi}} e^{-\frac{(\ln x - \mu)^2}{2\sigma^2}} \quad \text{for } x > 0
$$
- **Description**: The log-normal distribution describes a variable
whose logarithm is normally distributed. It is useful in modeling
positively skewed data.
## Machine Learning Models
### Linear Regression {.unnumbered .unlisted}
- **Formula**: $$
y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_n x_n + \epsilon
$$
- **Description**: Predicts a continuous target variable based on
linear relationships between the target and one or more predictor
variables.
### Logistic Regression {.unnumbered .unlisted}
- **Formula**: $$
\text{logit}(P(Y=1)) = \ln\left(\frac{P(Y=1)}{1 - P(Y=1)}\right) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_n x_n
$$
- **Description**: Predicts a binary outcome based on linear
relationships between the predictor variables and the log-odds of
the outcome.
### Generalized Linear Model (GLM) {.unnumbered .unlisted}
- **Formula**: $$
g(E(Y)) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_n x_n
$$
- **Description**: A generalized linear model is a flexible
generalization of ordinary linear regression that allows for the
dependent variable $Y$ to have a distribution other than normal. The
link function $g$ relates the expected value of the response
variable $E(Y)$ to the linear predictors. $\beta_0$ is the
intercept, and $\beta_i$ are the coefficients for the predictor
variables $x_i$.
### Generalized Additive Model (GAM) {.unnumbered .unlisted}
- **Formula**: $$
g(E(Y)) = \beta_0 + f_1(x_1) + f_2(x_2) + \cdots + f_n(x_n)
$$
- **Description**: A generalized additive model is an extension of
generalized linear models where the linear predictor depends
linearly on unknown smooth functions of some predictor variables,
and it allows for non-linear relationships between the dependent and
independent variables. Here, $g$ is the link function, $E(Y)$ is the
expected value of the response variable $Y$, $\beta_0$ is the
intercept
### Decision Tree {.unnumbered .unlisted}
- **Formula**: Recursive binary splitting
- **Description**: Splits the data into subsets based on the value of
input features. Each internal node represents a "test" on an
attribute, each branch represents the outcome of the test, and each
leaf node represents a class label or continuous value.
### Random Forest {.unnumbered .unlisted}
- **Formula**: Aggregated decision trees
- **Description**: Combines the predictions of multiple decision trees
to improve accuracy and control over-fitting. Each tree is trained
on a bootstrapped sample of the data and uses a random subset of
features.
### Support Vector Machine (SVM) {.unnumbered .unlisted}
- **Formula**: $$
f(x) = \text{sign}(\mathbf{w} \cdot \mathbf{x} + b)
$$
- **Description**: Finds the hyperplane that best separates the
classes in the feature space. The formula represents the decision
boundary, where $\mathbf{w}$ is the weight vector and $b$ is the
bias.
### K-Nearest Neighbors (KNN) {.unnumbered .unlisted}
- **Formula**: $$
\hat{y} = \frac{1}{k} \sum_{i=1}^{k} y_i
$$
- **Description**: Classifies a data point based on the majority class
among its $k$ nearest neighbors. For regression, it predicts the
average of the $k$ nearest neighbors' values.
### Naive Bayes {.unnumbered .unlisted}
- **Formula**: $$
P(Y|X) = \frac{P(X|Y)P(Y)}{P(X)}
$$
- **Description**: Assumes independence between predictors. It uses
Bayes' theorem to predict the probability of a class given the
predictors.
### Principal Component Analysis (PCA) {.unnumbered .unlisted}
- **Formula**: $$
Z = XW
$$
- **Description**: Reduces the dimensionality of the data by
transforming the original variables into new uncorrelated variables
(principal components), ordered by the amount of variance they
capture.
### K-Means Clustering {.unnumbered .unlisted}
- **Formula**: $$
\arg \min_S \sum_{i=1}^{k} \sum_{x \in S_i} \| x - \mu_i \|^2
$$
- **Description**: Partitions the data into $k$ clusters by minimizing
the sum of squared distances between the data points and the cluster
centroids $\mu_i$.
### Neural Networks {.unnumbered .unlisted}
- **Formula**: $$
a^{(l)} = \sigma(z^{(l)})
$$ $$
z^{(l)} = W^{(l)}a^{(l-1)} + b^{(l)}
$$
- **Description**: Composed of layers of interconnected nodes
(neurons). Each neuron's output is a weighted sum of its inputs
passed through an activation function $\sigma$. The parameters
$W^{(l)}$ and $b^{(l)}$ are the weights and biases of layer $l$.
### Convolutional Neural Networks (CNN) {.unnumbered .unlisted}
- **Formula**: $$
(f * g)(t) = \int_{-\infty}^{\infty} f(\tau)g(t - \tau) \, d\tau
$$
- **Description**: Uses convolutional layers to apply filters to the
input, which helps in capturing spatial hierarchies in data,
particularly useful for image and video processing.
### Recurrent Neural Networks (RNN) {.unnumbered .unlisted}
- **Formula**: $$
h_t = \sigma(W_h h_{t-1} + W_x x_t + b)
$$
- **Description**: Designed to recognize patterns in sequences of data
by maintaining a hidden state $h_t$ that captures information from
previous time steps.
### Gradient Boosting Machines (GBM) {.unnumbered .unlisted}
- **Formula**: $$
F_m(x) = F_{m-1}(x) + \eta \cdot h_m(x)
$$
- **Description**: Builds an additive model in a forward stage-wise
manner. Each base learner $h_m$ is trained to reduce the residual
error of the ensemble's previous predictions.
### Long Short-Term Memory Networks (LSTM) {.unnumbered .unlisted}
- **Formula**: $$
\begin{aligned}
f_t &= \sigma(W_f \cdot [h_{t-1}, x_t] + b_f) \\
i_t &= \sigma(W_i \cdot [h_{t-1}, x_t] + b_i) \\
\tilde{C}_t &= \tanh(W_C \cdot [h_{t-1}, x_t] + b_C) \\
C_t &= f_t * C_{t-1} + i_t * \tilde{C}_t \\
o_t &= \sigma(W_o \cdot [h_{t-1}, x_t] + b_o) \\
h_t &= o_t * \tanh(C_t)
\end{aligned}
$$
- **Description**: A type of RNN that can learn long-term dependencies
by using gates to control the flow of information.