This post supplements the linear regression slides. Please see the slides for the setup.
The Gaussian linear regression model is
where
We fit the model to training data by estimating the parameters with maximum likelihood. To motivate maximum likelihood, consider two possible sets of model parameters. One way to decide which set of parameters fits the training data better is to compare the probability of observing the training data from (the distributions associated with) the two sets of parameters: the higher the probability of observing the training data, the better the parameters fit. For the Gaussian linear regression model, the probability of observing the training data
We call
The maximum likelihood estimator (MLE) is the parameter value that maximizes the likelihood; i.e. the parameter values that best fit the training data in the sense that the probability of observing the training data from the MLE is higher than the probability of observing the training data from any other parameter values. In practice, we usually maximize the log of the likelihood (called the log-likelihood) or minimize the negative of the log-likelihood (called the negative log-likelihood). For the Gaussian linear regression model, the negative log-likelihood is
We drop the second term that does not depend on the parameters to obtain the least squares cost function. Thus least squares is the maximum likelihood estimator for the Gaussian linear regression model.
Posted on September 15, 2024 from San Francisco, CA.