DATASCI 415: Statistical Learning and Data Mining
University of Michigan
including slides by Gareth James, Daniela Witten, Trevor Hastie, Rob Tibshirani, Jonathan Taylor
simple and multiple linear regression
variable selection
qualitative features and interactions
(ordinary) least squares as maximum likelihood
Assume
The likelihood function
ouputs the probability of observing the training data
Idea: find the parameters that maximize the probability of observing the training data:
i.e. find the parameters so that training data is most probable.
In practice, it is often more convenient to find the parameters that (equivalently) minimize the negative log-likelihood: