https://youtu.be/9vM4p9NN0Ts?si=H3hK7m7wrETW7N0R

Language Modeling

LM’s are generative models, once you have the model of the distribution of the data, we can sample from the distribution and sample the data.

Autoregressive (AR) language models (chain rule of probability):

p( $x_1,...,x_n$) = p( $x_1$) p( $x_2$| $x_1$) p( $x_3$| $x_2,x_1$)… = $∏_i$ p( $x_i$ | $x_{1:i-1}$)

Steps: