This post accompanies the lecture video on subgradient descent. Please see the video for the problem setup and definitions.
We have
\[\def\cC{\mathcal{C}} \begin{aligned} \|x_{t+1} - x_*\|_2^2 &= \|P_{\cC}(x_t - \eta_t g_t) - P_{\cC}(x_*)\|_2^2 \\ &\le \|x_t - \eta_tg_t - x_*\|_2^2 \\ &= \|x_t - x_*\|_2^2 - 2\eta_t\langle g_t,x_t - x_*\rangle + \eta_t^2\|g_t\|_2^2 \\ &\le \|x_t - x_*\|_2^2 - 2\eta_t(f(x_t) - f(x_*)) + \eta_t^2\|g_t\|_2^2, \end{aligned}\]where we appealed to non-expansivness of projection in the second step and the first-order characterization of convexity in the fourth step. We note that if \(f\) is strongly convex, then it is possible to strengthen the basic inequality by appealing to the first-order characterization of strong convexity in the fourth step:
\[\begin{aligned} \|x_{t+1} - x_*\|_2^2 &\le \|x_t - x_*\|_2^2 - 2\eta_t\langle g_t,x_t - x_*\rangle + \eta_t^2\|g_t\|_2^2 \\ &\le (1-\mu\eta_t)\|x_t - x_*\|_2^2 - 2\eta_t(f(x_t) - f(x_*)) + \eta_t^2\|g_t\|_2^2, \end{aligned}\]Recall Polyak’s step size minimizes the right side of the basic inequality (WRT \(\eta_t\)). Thus Polyak’s step size ensures
\[\|x_{t+1} - x_*\|_2^2 \le \|x_t - x_*\|_2^2 - \frac{(f(x_t) - f(x_*))^2}{\|g_t\|_2^2}.\]Rearranging, we have
\[\begin{aligned} (f(x_t) - f(x_*))^2 &\le (\|x_t - x_*\|_2^2 - \|x_{t+1} - x_*\|_2^2)\|g_t\|_2^2 \\ &\le L^2(\|x_t - x_*\|_2^2 - \|x_{t+1} - x_*\|_2^2). \end{aligned}\]Summing over iterations up to \(T\), we obtain
\[\sum_{t=0}^T(f(x_t) - f(x_*))^2 \le L^2(\|x_0 - x_*\|_2^2 - \|x_{t+1} - x_*\|_2^2).\]We lower bound the left side with \(T\min_{t\in[T]}(f(x_t) - f(x_*))^2\) and rearrange to obtain the stated result.
Iterating the basic inequality leads to
\[\|x_{T+1} - x_*\|_2^2 \le \|x_0 - x_*\|_2^2 - 2\sum_{t=0}^T\eta_t(f(x_t) - f(x_*)) + \sum_{t=0}^T\eta_t^2\|g_t\|_2^2.\]Rearranging, we have
\[\begin{aligned} 2\sum_{t=1}^T\eta_t(f(x_t) - f(x_*)) &\le \|x_0 - x_*\|_2^2 - \|x_{T+1} - x_*\|_2^2 + \sum_{t=0}^T\eta_t^2\|g_t\|_2^2 \\ &\le \|x_0 - x_*\|_2^2 + L^2\sum_{t=0}^T\eta_t^2. \end{aligned}\]Dividing both sides by \(\sum_{t=0}^T\eta_t\) (and rearranging), we obtain
\[\frac{\sum_{t=0}^T\eta_t(f(x_t) - f(x_*))}{\sum_{t=0}^T\eta_t} \le \frac{\|x_0 - x_*\|_2^2 + L^2\sum_{t=0}^T\eta_t^2}{2\sum_{t=0}^T\eta_t}.\]We recognize the left side is lower bounded by \(f(\bar{x}_T) - f(x_*)\) (because \(f\) is convex) to obtain the stated result.
Rearranging the basic inequality for strongly convex cost functions, we have
\[f(x_t) - f(x_*) \le \frac{1-\mu\eta_t}{2\eta_t}\|x_t - x_*\|_2^2 - \frac{1}{2\eta_t}\|x_{t+1} - x_*\|_2^2 + \frac{\eta_t}{2}\|g_t\|_2^2.\]We plug in \(\eta_t = \frac{2}{\mu(t+1)}\) to obtain
\[f(x_t) - f(x_*) \le \frac{\mu(t-1)}{4}\|x_t - x_*\|_2^2 - \frac{\mu(t+1)}{4}\|x_{t+1} - x_*\|_2^2 + \frac{1}{\mu(t+1)}\|g_t\|_2^2.\]Multiplying both sides by \(t\), we have
\[t(f(x_t) - f(x_*)) \le \frac{\mu t(t-1)}{4}\|x_t - x_*\|_2^2 - \frac{\mu t(t+1)}{4}\|x_{t+1} - x_*\|_2^2 + \frac{1}{\mu}\|g_t\|_2^2.\]We sum over iterations up tp \(T\) to obtain
\[\sum_{t=0}^Tt(f(x_t) - f(x_*)) \le -\frac{\mu t(t+1)}{4}\|x_{t+1} - x_*\|_2^2 + \frac{1}{\mu}\sum_{t=0}^T\|g_t\|_2^2 \le \frac{L^2T}{\mu}.\]This implies
\[f(\bar{x}_T) - f(x_*) \le \frac{\sum_{t=0}^Tt(f(x_t) - f(x_*))}{\sum_{t=0}^Tt} \le \frac{L^2T}{\mu\sum_{t=0}^Tt} \le \frac{2L^2}{\mu(T+1)}.\]The convexity-concavity of \(f\) implies
\[\begin{aligned} f(x_t,y_t) - f(x,y_t) \le \langle g_t,x_t - x\rangle\text{ for any }x\in\cX,\\ f(x_t,y) - f(x_t,y_t) \le \langle h_t,y - y_t\rangle\text{ for any }y\in\cY. \end{aligned}\]Adding the two inequalities leads to
\[f(x_t,y) - f(x,y_t) \le \langle g_t,x_t-x\rangle - \rangle h_t,y_t-y\rangle\]for any \(x\in\cX\), \(y\in\cY\). We appeal to the convexity-concavity of \(f\) again to obtain
\[\def\cX{\mathcal{X}} \def\cY{\mathcal{Y}} \begin{aligned} \epsilon(\bar{x}_T,\bar{y}_T) &= \max_{y\in\cY}f(\bar{x}_T,y) - \min_{x\in\cX}f(x,\bar{y}_T) \\ &\le \frac{1}{\sum_{t=0}^T\eta_t}(\max_{y\in\cY}\sum_{t=0}^T\eta_tf(x_t,y) - \min_{x\in\cX}\sum_{t=0}^T\eta_tf(x,y_t)) \\ &\le \frac{1}{\sum_{t=0}^T\eta_t}(\max_{x\in\cX,y\in\cY}\sum_{t=0}^T\eta_t(\langle g_t,x_t - x\rangle - \langle h_t,y_t-y\rangle) \end{aligned}\]To complete the proof, it is enough to show
\[\max_{x\in\cX,y\in\cY}\sum_{t=0}^T\eta_t(\langle g_t,x_t - x\rangle - \langle h_t,y_t-y\rangle \le \frac{D_{\cX}^2 + D_{\cY}^2 + L^2\sum_{t=0}^T\eta_t^2}{2}.\]For any \(x\in\cX\), we have
\[\begin{aligned} \|x_{t+1} - x\|_2^2 &= \|P_{\cX}(x_t - \eta_Tg_t) - P_{\cX}(x)\|_2^2 \\ &\le \|x_t - \eta_tg_t - x\|_2^2 \\ &= \|x_t - x\|_2^2 - 2\eta_T\langle g_t,x_t,x\rangle + \eta_t^2\|g_t\|_2^2, \end{aligned}\]where we appealed to non-expansiveness of projection in the second step. Rearranging, we obtain
\[2\eta_T\langle g_t,x_t,x\rangle \le \|x_t - x\|_2^2 - \|x_{t+1} - x\|_2^2 + \eta_t^2\|g_t\|_2^2.\]for any \(x\in\cX\). Similarly, it is possible to show
\[-2\eta_T\langle h_t,y_t,y\rangle \le \|y_t - y\|_2^2 - \|y_{t+1} - y\|_2^2 + \eta_t^2\|h_t\|_2^2\]for any \(y\in\cY\). We combine the two inequalities to obtain
\[\begin{aligned} &2\eta_t(\langle g_t,x_t,x\rangle - \langle h_t,y_t,y\rangle) \\ &\quad\le \|x_t - x\|_2^2 + \|y_t - y\|_2^2 - \|x_{t+1} - x\|_2^2 - \|y_{t+1} - y\|_2^2 + \eta_t^2\|g_t\|_2^2 + \eta_t^2\|h_t\|_2^2 \\ &\quad\le \|x_t - x\|_2^2 + \|y_t - y\|_2^2 - \|x_{t+1} - x\|_2^2 - \|y_{t+1} - y\|_2^2 + \eta_t^2 L^2, \end{aligned}\]where we appealed to the Lipschitz continuity of \(f\) in the second step. Summing over iterations up till \(T\), we arrive at the desired bound:
\[\begin{aligned} &2\sum_{t=1}^T\eta_t(\langle g_t,x_t,x\rangle - \langle h_t,y_t,y\rangle) \\ &\quad\le \|x_0 - x\|_2^2 + \|y_0 - y\|_2^2 - \|x_{T+1} - x\|_2^2 - \|y_{T+1} - y\|_2^2 + L^2\sum_{t=0}^T\eta_t^2 \\ &\quad\le D_{\cX}^2 + D_{\cY}^2 + L^2\sum_{t=0}^T\eta_t^2. \end{aligned}\]Posted on February 02, 2021 from San Francisco, CA.