# Convex optimization problems **STATS 606:** Computation and Optimization Methods in Statistics University of Michigan
including slides by Stephen
Boyd
and Lieven
Vandenberghe
## Ex: basis pursuit Given $X\in\reals^{n\times p}$ and $y\in\reals^n$ $(p > n)$, find the *sparsest* $\beta\in\reals^p$ such that $X\beta=y$: $$ \begin{aligned} &\min\nolimits_{\beta\in\reals^p} &&\textstyle\\|\beta\\|\_0\triangleq\sum_{j=1}^p\ones\\{\beta_j\ne 0\\}\\\\ &\subjectto && X\beta = y \end{aligned}. $$ The $\ell_0$ "norm" is hard to minimize (it's not even continuous), so we *relax* the problem by replacing the $\ell_0$ "norm" with the $\ell_1$ norm: $$ \begin{aligned} &\min\nolimits_{\beta\in\reals^p} &&\\|\beta\\|\_{\color{red}1}\\\\ &\subjectto && X\beta = y \end{aligned}. $$
## Ex: basis pursuit Basis pursuit can be reformulated as an LP: $$ \begin{aligned} &\left\\{\begin{aligned} &\min\nolimits_{\beta\in\reals^p} &&\\|\beta\\|\_1\\\\ &\subjectto && X\beta = y \end{aligned}\right\\} \\\\ &\quad\equiv \left\\{\begin{aligned} &\min\nolimits_{\beta_+,\beta_-\in\reals^p} &&1_p^\top(\beta_+ + \beta_-)\\\\ &\subjectto && X(\beta_+ - \beta_-) = y \\\\ & && \beta_+,\beta_-\in\reals_+^p \end{aligned}\right\\}. \end{aligned} $$
## Ex: Dantzig selector Given $X\in\reals^{n\times p}$ and $y\in\reals^n$ $(p > n)$, find the *sparsest* $\beta\in\reals^p$ such that $X\beta {\color{red}\approx} y$: $$ \begin{aligned} &\min\nolimits_{\beta\in\reals^p} &&\textstyle\\|\beta\\|\_1\\\\ &\subjectto && \\|X^\top(X\beta-y)\\|_\infty \le \lambda \end{aligned}. $$ * $\lambda\ge 0$ is a tuning parameter
## Ex: Dantzig selector The Dantzig selector can be formulated as an LP: $$ \begin{aligned} &\left\\{\begin{aligned} &\min\nolimits_{\beta\in\reals^p} &&\\|\beta\\|\_1\\\\ &\subjectto && \\|X^\top(X\beta - y)\\|\_\infty = y \end{aligned}\right\\} \\\\ &\quad\equiv \left\\{\begin{aligned} &\min\nolimits_{\beta_+,\beta_-\in\reals^p} &&1_p^\top(\beta_+ + \beta_-)\\\\ &\subjectto && X^\top(X(\beta_+ - \beta_-) - y) \preceq \lambda\ones_n \\\\ & && X^\top(X(\beta_+ - \beta_-) - y) \succeq -\lambda\ones_n \\\\ & && \beta_+,\beta_-\in\reals_+^p \end{aligned}\right\\}. \end{aligned} $$
## Ex: support vector machines Given $(X_1,Y_1),\dots,(X_n,Y_n)\in\reals^p\times\\{-1,1\\}$, find the **max margin** (linear) classifier: $$ \begin{aligned} &\min\nolimits_{\beta_0\in\reals,\beta\in\reals^p,\xi\in\reals_+^n} &&\textstyle\frac12\\|\beta\\|\_2^2 + C\sum_{i=1}^n\xi_i \\\\ &\subjectto && \\{Y_i(\beta_0 + \beta^\top X_i) \ge 1 - \xi_i\\}_{i=1}^n \end{aligned}. $$ Equivalently, $$ \begin{aligned} \textstyle\min_{\beta_0\in\reals,\beta\in\reals^p}\frac12\\|\beta\\|\_2^2 + C\sum_{i=1}^n\ell(Y_i(\beta_0 + \beta^\top X_i)) \\\\ \ell(z) \triangleq \max\\{0,1-z\\} \end{aligned}. $$ * $C\ge 0$ is a tuning parameter
## Ex: Basis pursuit denoising/LASSO Given $X\in\reals^{n\times p}$, $y\in\reals^n$, find a sparse $\beta\in\reals^p$ such that $X\beta \approx y$. **Basis pursuit denoising (BPDN):** $$ \begin{aligned} &\min\nolimits_{\beta\in\reals^p} &&\\|\beta\\|_1 \\\\ &\subjectto &&\textstyle\frac12\\|y - X\beta\\|_2^2\le \sigma^2 \end{aligned}. $$ **LASSO:** $$ \begin{aligned} &\min\nolimits_{\beta\in\reals^p} &&\textstyle\frac12\\|y-X\beta\\|\_2^2 \\\\ &\subjectto && \\|\beta\\|_1\le\rho \end{aligned}. $$ Lagrangian BPDN/LASSO: $$ \textstyle\min_{\beta\in\reals^p} \frac12\\|y-X\beta\\|_2 + \lambda\\|\beta\\|_1 $$
## Ex: Principal component analysis Given $\\{x_i\\}\_{i=1}^n$, find $k$-dim subspace that best approximates $\\{x_i\\}_{i=1}^n$: $$ \begin{aligned} &\min\nolimits_{P\in\symm^n}&&\textstyle\frac12\\|X - XP\\|\_F^2 = \frac12\sum_{i=1}^n\\|x_i^\top - x_i^\top P\\|_2^2\\\\ &\subjectto &&P\text{ is a projector} \\\\ & &&\rank(P) = k \end{aligned} $$ This problem (despite its non-convexity) has a closed-form solution in terms of the singular value decomposition (SVD) of $X$: $$P_* = V_kV_k^\top,$$ where $X = U\Sigma V^\top$ is SVD of $X$ and $V_k$ is the principal submatrix of $V$.
## Ex: Principal component analysis $$ \begin{aligned} &\left\\{\begin{aligned} &\min\nolimits_{P\in\symm^n}&&\textstyle\frac12\\|X - XP\\|\_F^2 \\\\ &\subjectto &&P\text{ is a projector} \\\\ & &&\rank(P) = k \end{aligned}\right\\} \\\\ &\quad\equiv \left\\{\begin{aligned} &\min\nolimits_{P\in\symm^n}&&\Tr(X^\top XP) \\\\ &\subjectto &&P\text{ is a projector} \\\\ & &&\rank(P) = k \end{aligned}\right\\} \end{aligned} $$ The feasible set is the (non-convex) set $$\\{P\in\symm^n\mid \lambda_i(P)\in\\{0,1\\},\Tr(P) = k\\}.$$
## Ex: Principal component analysis We *relax* the PCA problem by replacing the feasible set with its convex hull: $$ \begin{aligned} \cF_k &\triangleq \\{P\in\symm^n\mid \lambda_i(P)\in{\color{red}[0,1]},\Tr(P) = k\\} \\\\ &= \\{P\in\symm^n\mid 0\preceq P\preceq I_P,\Tr(P) = k\\}. \end{aligned} $$ This set is called the $k$-th order **Fanotope**. Remarkably, the relaxed problem has the same optimal solution as the original PCA problem!
## Ex: Fastest mixing random walk Given a graph $\cG\triangleq\\{\cV,\cE\\}$, find symmetric (edge) weights $W_{i,j}\in[0,1]$ so that the weighted random walk $(X_t)_{t=1}^\infty$ $$ \Pr\\{X_{t+1} = v_j\mid X_t = v_i\\} = W_{i,j} $$ mixes as quickly as possible. The matrix $W\in[0,1]^{n\times n}$ ($n\triangleq|\cV|$) satisfies $$ \begin{aligned} 1_n^\top W = 1_n^\top\text{ (it is stochastic)}, \\\\ W = W^\top\text{ (it is doubly stochastic)}, \\\\ W_{i,j} = 0\text{ whenever }(v_i,v_j)\notin\cE. \end{aligned} $$
## Ex: Fastest mixing random walk The mixing time of $X_t$ depends on the second largest eigenvalue modulus (SLEM) of $W$: $$\mu(W) \triangleq \max\nolimits_{i = 2,\dots,n}|\lambda_i(W)|.$$ Let $\pi_t\in\Delta^{n-1}$ be the distribution of $X_t$ (i.e. $\Pr\\{X_t = v_i\\} = [\pi_t]_i$), then $\pi_t$ satisfies the recursion $$\pi_t^\top = \pi_{t-1}^\top W = \dots = \pi_0^\top W^t$$ The smaller the SLEM, the faster the random walk mixes: $$\textstyle\frac12\\|\pi_T - \frac1n1_n\\|_1 \le \frac12\sqrt{n}\mu(W)^T.$$
## Ex: Fastest mixing random walk Fastest mixing Markov chain (FMMC) problem [[Boyd et al](https://epubs.siam.org/doi/10.1137/S0036144503423264)]: $$ \begin{aligned} &\min\nolimits_{W\in[0,1]^{n\times n}} &&\mu(W) \\\\ &\subjectto && W1_n = 1_n,\quad W = W^\top \\\\ & &&W_{i,j} = 0\text{ for any }(v_i,v_j)\notin\cE. \end{aligned} $$ $\mu$ is a convex function because $\textstyle\mu(W) = \\|W - \frac1n1_n1_n^\top\\|\_2$. SDP form of FMMC problem: $$ \begin{aligned} &\min\nolimits_{W\in[0,1]^{n\times n}} &&t \\\\ &\subjectto &&\textstyle-tI_n \preceq W - \frac1n1_n1_n^\top \preceq tI \\\\ & && W1_n = 1_n,\quad W = W^\top \\\\ & &&W_{i,j} = 0\text{ for any }(v_i,v_j)\notin\cE. \end{aligned} $$
## Ex: Experiment design Recall the variance of the ordinary least squares estimator $$\textstyle \widehat{\beta} \triangleq (\sum_{i=1}^nX_iX_i^\top)^{-1}(\sum_{i=1}^nX_iY_i) $$ (under linearity and homoscedasticity) is $$\textstyle \widehat{\Sigma} \triangleq (\sum_{i=1}^nX_iX_i^\top)^{-1} $$ **Q:** How to pick $n$ $X_i$'s from a set of $N \gg n$ possible design points $X_1,\dots,X_N$ so that $\widehat{\Sigma}$ is as small as possible?
## Ex: Experiment design $$ \begin{aligned} &\min_{w_1,\dots,w_N\in\bZ_+}(\text{WRT }\symm_+^n) &&\textstyle\widehat{\Sigma}(w) \triangleq \sum_{i=1}^N(w_iX_iX_i^\top)^{-1} \\\\ &\subjectto &&w_1 + \dots + w_N = n \end{aligned} $$ * $w_i$: number of selected design pts equal to $X_i$ This problem is generally hard to solve because of the integer constraint (the search space is not even connected).
## Ex: Experiment design **Idea:** relax the integer constraint to obtain a continuous problem: $$ \begin{aligned} &\min_{w\in\reals_+^N}(\text{WRT }\symm_+^n) &&\textstyle\widehat{\Sigma} \triangleq \sum_{i=1}^N(w_iX_iX_i^\top)^{-1} \\\\ &\subjectto &&w_1 + \dots + w_N = n \end{aligned} $$ * common scalarizations: $\log\det\widehat{\Sigma}(w)$, $\Tr(\widehat{\Sigma}(w))$, $\lambda_{\max}(\widehat{\Sigma}(w))$, ... * add other convex constraints (e.g. budget/cost constraint $c^\top w \le b$)