STATS 413

Optimality of the conditional expectation

This post supplements the supervised learning slides. Please see the slides for the setup.

We wish to show that the conditional expectation E[YX=x] is the minimum mean squared error (MSE) prediction function of Y from X; i.e.

E[(Yf(X))2]E[(Yf(X))2] for any (other) function f.

First, we note that the problem of finding the minimum MSE prediction function of Y from X is equivalent to the problem of finding the minimum MSE constant prediction of YxYX=x; i.e. finding the constant μxR such that

E[(Yxμx)2]E[(Yxc)2] for any (other) constant cR.

This is because the minimum MSE prediction function f must equal μx at x; i.e. f(x)=μx. Otherwise, it is possible to reduce the MSE of f by replacing its value at x with μx:

f(x)={μxif x=x,f(x)otherwise.

Second, we show that μx=E[Yx] by solving the optimization problem: mincRE[(Yxc)2]. The cost function seems complicated, but it is actually a quadratic function of c:

E[(Yxc)2]=E[Yx2]2cE[Yx]+c2.

We differentiate the cost function and find its root to deduce μx=E[Yx]. Recalling f(x)=μx from the first part, we conclude

f(x)=E[Yx]=E[YX=x].

Posted on August 30, 2021 from Ann Arbor, MI