Empirical risk minimization (ERM) and plug-in methods are ways of fitting models, *once the form of the model is chosen*. > [!warning] > This is not to be confused with the formulation of the models themselves, although certain models require one of them over the other. ERM directly *minimized the loss* over the **observed responses** $\mathbf{y}$ and purely discriminative methods use ERM to learn the discriminants. - For example, SVM is purely discriminative, and has to be trained with ERM. Plug-in takes in an (estimated) distribution of $X$ and/or $Y$, then *minimizes the risk* according to the distribution. In order to obtain this distribution, - **Generative** models estimate the joint distribution of $X,Y$, then derives $Y ~|~ X$ from it. - **Conditional** approaches only estimate $Y ~|~ X$, for example [[Naive Bayes|naive Bayes]] and [[Linear Discriminant Analysis and Generalizations#Linear Discriminant Analysis|LDA]] learns $X ~|~Y$ then inverts it using Bayes' rule.