# Stochastic optimization problems **STATS 606:** Computation and Optimization Methods in Statistics University of Michigan
## Stochastic program (SP) $$ \begin{aligned} &\min\nolimits_{x\in\reals^n}&&F_0(x) \triangleq \Ex\big[f_0(x,\omega)\big] \\\\ &\subjectto && \\{F_i(x) \triangleq \Ex\big[f_i(x,\omega)\big] \le 0\\}_{i=1}^m \end{aligned} $$ * cost and constraint functions depend on decision/optimization variable $x\in\reals^d$ *and* data $\omega\in\Omega$ * Evaluating $F_i$'s exactly by integrating WRT $\omega$ is often intractable, but we can approximate empirically $F_i$. * $F_i$ is convex as long as $f_i$ is convex WRT $x$ for all $\omega\in\Omega$
## Certainty equivalent (CE) problem **Idea:** ignore data variation $$ \begin{aligned} &\min\nolimits_{x\in\reals^n}&&f_0(x,\Ex[\omega]) \\\\ &\subjectto && \\{f_i(x,\Ex[\omega]) \le 0\\}_{i=1}^m \end{aligned} $$ If $f_i$ is convex WRT $\omega$ for all $x$, then $f_i(x,\Ex[\omega]) \le \Ex\big[f_i(x,\omega)\big]$. Thus the optimal value of CE problem is a lower bound on optimal value of SP.
## Expected violation constraints **Idea:** replace $\Ex\big[f_i(x,\omega)\big]$ with * $\Ex\big[f_i(x,\omega)_+\big] \le \eps$ (LHS is expected constraint violation) * $\Ex\big[\max_{i\in[m]}f_i(x,\omega)_+\big] \le \eps$ (LHS is expected worst violation) variation: minimize expected violations: $$\textstyle\min\nolimits_{x\in\reals^n}\Ex\big[f_0(x,\omega) + \sum_{i=1}^m\lambda_if_i(x,\omega)_+\big]$$ * $\lambda_i > 0$ are penalty parameters for (expected) constraint violations Expected violation constraints/penalties lead to convex optimization problems (as long as $f_i$'s are convex in $x$ for all $\omega$).
## Sample average approximation (SAA) 1. generate $N$ realizations of $\omega$: $\omega_1,\dots,\omega_N$ 2. solve SAA problem: $$ \begin{aligned} &\min\nolimits_{x\in\reals^n}&&\textstyle\widehat{F}\_0(x) \triangleq \frac1N\sum_{i=1}^Nf_0(x,\omega_j) \\\\ &\subjectto &&\textstyle\\{\widehat{F}\_i(x) \triangleq \frac1N\sum_{i=1}^Nf_i(x,\omega_j) \le 0\\}_{i=1}^m \end{aligned} $$ Statistical theory guarantees 1. $\widehat{x}\_\SAA\pto x_*$ 2. $\widehat{F}\_0(\widehat{x}\_\SAA)\pto F_0(x_*)$ as $N\nearrow\infty$ (under some technical conditions).
## Ex: empirical risk minimization **Goal:** find $w\in\reals^n$ such that $w^\top x$ is a good prediction of $y$ * $(x,y)\in\reals^n\times\reals$ have joint distribution $\bP$ * measure discrepancy between $w^\top x$ and $y$ with a (convex) **loss function** $\ell:\reals\times\reals\to\reals$ **risk minimization (ERM):** $$\textstyle\min_{w\in\reals^n}R(w) \triangleq \Ex\big[\ell(w^\top x,y)\big]$$ * $R(w)$ is called the **risk** of $w$. * $w_*\in\argmin_{w\in\reals^n}R(w)$ is called the **risk minimizer**.
## Ex: empirical risk minimization In practice, we cannot evaluate $R$ because $\bP$ is unknown, but we have $(x_1,y_1),\dots,(x_N,y_N)\overset{\ind}{\sim}\bP$ **empirical risk minimization (ERM):** $$\textstyle\min_{w\in\reals^n}\widehat{R}(w)\triangleq\frac1N\sum_{i=1}^N\ell(w^\top x_i,y_i).$$ * $\widehat{R}$ is the SAA of $R$ * $\widehat{x}\_\SAA\in\argmin_{w\in\reals^n}\widehat{R}(w)$ is **empirical risk minimizer**
## Chance constraints $$ \begin{aligned} &\Pr\\{f_i(x,\omega) \le 0\\} \ge 1-\alpha \\\\ &\quad\equiv \Pr\\{f_i(x,\omega) > 0\\} \le \alpha \end{aligned} $$ * convex in some cases * common $\alpha$ values: 0.1, 0.05, 0.01 * smaller $\eta$ values (e.g 0.001) are meaningless because the tails of the distribution of $\omega$ are generally unknown
## Value-at-Risk (VaR) VaR of a random scalar $z$ at level $\eta$: $$ \begin{aligned} \VaR(z;\alpha) &\triangleq\inf\\{\gamma\mid\Pr\\{z \le \gamma\\} \ge 1-\alpha\\} \\\\ &=\inf\\{\gamma\mid\Pr\\{z> \gamma\\} \le \alpha\\} \end{aligned} $$ * $1-\alpha$-quantile of $z$ * $\VaR(z;\alpha)$ is the worst possible outcome excluding the worst outcomes with total probability at most $\alpha$. * chance constraints are VaR constraints: $$\Pr\\{f_i(x,\omega) \le 0\\} \ge 1-\alpha\equiv\VaR(f_i(x,\omega);\alpha) \le 0$$
## Value-at-Risk (VaR)
## Chance constraints for log-concave distributions $$\Pr\\{f(x,\omega) \le 0\\} = \int\ones\\{f(x,\omega)\le 0\\}p(\omega)d\omega$$ $\Pr\\{f(x,\omega)\le 0\\}$ is log-concave as long as * $p$ is log-concave * the set $\\{(x,\omega)\mid f(x,\omega)\le 0\\}$ is convex so $\Pr\\{f(x,\omega)>0\\} \le \alpha$ is equivalent to the convex constraint $$\log\Pr\\{f(x,\omega) \le 0\\} \ge \log(1-\alpha)$$
## Ex: portfolio optimization $$ \begin{aligned} &\max\nolimits_{x\in\cC} &&\Ex\big[p^\top x\big] \\\\ &\subjectto &&\Pr\\{p^\top x \le 0\\} \ge 1-\alpha \\\\ & && 1_n^\top x = 1. \end{aligned} $$ * $x\in\reals^n$: portfolio allocations; $x_i$ is (fractional) position in $i$-th asset * $x$ is normalized so total investment is 1: $1_n^\top x = 1$, * $\cC$: convex portfolio constraints * (stochastic) portfolio return: $p^\top x$, $p\sim N(\bar{p},\Sigma)$ $$ \begin{aligned} &\Pr\\{p^\top x \le 0\\} \ge 1-\alpha \\\\ &\quad\equiv\textstyle\Phi(\frac{0 - \bar{p}^\top x}{\\|\Sigma^{1/2}x\\|_2}) \ge 1-\alpha \\\\ &\quad\equiv -\bar{p}^\top x\ge \Phi^{-1}(1-\alpha)\\|\Sigma^{\frac12}x\\|_2 \end{aligned} $$
## Ex: portfolio optimization $$ \begin{aligned} &\max\nolimits_{x\in\cC} &&\Ex\big[p^\top x\big] \\\\ &\subjectto &&\Pr\\{p^\top x \le 0\\} \ge 1-\alpha \\\\ & && 1_n^\top x = 1 \end{aligned} $$ can be formulated as a convex problem (as long as $\alpha\le\frac12$) $$ \begin{aligned} &\max\nolimits_{x\in\cC} &&\bar{p}^\top x \\\\ &\subjectto &&-\bar{p}^\top x \ge\Phi^{-1}(1-\alpha)\\|\Sigma^\frac12x\\|_2 \\\\ & && 1_n^\top x = 1 \end{aligned} $$ (an SOCP when $\cC$ is a polytope)
## Convex approximation of chance constraints **Idea:** find convex $g_i:\reals^n\to\reals$ such that $$g_i(x) \le 0 \Rightarrow \Pr\\{f_i(x,\omega) > 0\\} \le \alpha$$ **Obs:** Let $\varphi:\reals\to\reals$ be non-negative, convex, non-decreasing and $\varphi(0) = 1$. For any $\beta_i > 0$, $$\textstyle\Ex\big[\varphi(\frac{f_i(x,\omega)}{\beta_i})\big] \ge \Pr\\{f_i(x,\omega) > 0\\}$$ (because $\varphi(\frac{z}{\beta_i}) \ge 1\\{z > 0\\}$ for all $z$). Thus enforcing $\Ex\big[\varphi(\frac{f_i(x,\omega)}{\beta_i})\big] \le \alpha$ ensures $\Pr\\{f_i(x,\omega) > 0\\} \le \alpha$.
## Convex approximation of chance constraints Multiply by $\beta_i$ to obtain a (jointly) convex constraint in $(x,\beta_i)$: $$\textstyle \beta_i\Ex\big[\varphi(\frac{f_i(x,\omega)}{\beta_i})\big] \le \alpha\beta_i $$ This constraint is (jointly) convex in $(x,\beta_i)$. * $v\varphi(\frac{u}{v})$ is the perspective of $\varphi$; it is (jointly) convex in $(u,v)$ for $v > 0$ * $\beta_i\Ex\big[\varphi(\frac{f_i(x,\omega)}{\beta_i})\big]$ is (jointly) convex in $(x,\beta_i)$ for $\beta_i > 0$ We optimize WRT $\beta_i$ to obtain the best approximation.
## Markov approximation Let $\varphi(u) = (u+1)_+$; this leads to $$ \begin{aligned} &\textstyle\beta_i\Ex(\frac{1}{\beta_i}f_i(x,\omega) + 1)\_+ - \alpha\beta_i\le 0 \\\\ &\quad\equiv\Ex(f_i(x,\omega) + \beta_i)_+ - \alpha\beta_i \le 0 \\\\ &\textstyle\quad\equiv\frac1\alpha\Ex(f_i(x,\omega) + \beta_i)\_+ - \beta_i \le 0. \end{aligned} $$ We minimize WRT $\beta_i$ to obtain the best Markov approximation: $$ \begin{aligned} &\textstyle\inf_{\beta_i\ge 0}\frac1\alpha\Ex(f_i(x,\omega) + \beta_i)\_+ - \beta_i \le 0 \\\\ &\textstyle\quad\equiv\inf_{\beta_i\le 0}\frac1\alpha\Ex(f_i(x,\omega) - \beta_i)\_+ + \beta_i \le 0 \\\\ &\textstyle\quad\equiv\inf_{\beta_i}\frac1\alpha\Ex(f_i(x,\omega) - \beta_i)\_+ + \beta_i \le 0. \end{aligned} $$
## Conditional Value-at-Risk (CVaR) $$\textstyle\CVaR(z;\alpha)\triangleq\inf\nolimits_\beta\frac1\alpha\Ex(z - \beta)_+ + \beta$$ The optimal $\beta$ in the CVaR definition is $\VaR(z;\alpha)$ (for continuous $z$): $$ \begin{aligned} 0 &=\textstyle \frac{d}{d\beta}(\frac1\alpha\Ex(z - \beta\_\*)\_+ + \beta_*) \\\\ &=\textstyle -\frac1\alpha\Ex\big[\ones\\{z \ge \beta_\*\\}\big] + 1 \end{aligned} $$ so $\beta_\*$ satisfies $\Pr\\{z\ge\beta_*\\} = \alpha$. Thus CVaR is an upper bound of VaR: $$\textstyle \CVaR(z;\alpha) = \VaR(z;\alpha) + \underbrace{\frac1\alpha\Ex(z - \beta_*)\_+}_{\ge 0} $$
## Conditional Value-at-Risk (CVaR) CVaR is also called **expected shortfall**: it is the expected outcome among the worst outcomes with total probability at most $\alpha$. $$ \begin{aligned} \CVaR(z;\alpha) &=\textstyle \frac1\alpha\Ex(z - \beta_\*)\_+ + \VaR(z;\alpha) \\\\ &= \frac{\Ex\big[(z - \beta_*)\ones\\{z\ge\beta_*\\}\big]}{\Pr\\{z\ge\beta_*\\}} + \VaR(z;\alpha) \\\\ &= \Ex\big[z - \beta_*\mid z\ge\beta_*\big] + \VaR(z;\alpha) \\\\ &= \Ex\big[z\mid z\ge\VaR(z;\alpha)\big] \end{aligned} $$
## Conditional Value-at-Risk (CVaR)
## Chebyshev approximation Let $\varphi(u) = (u + 1)_+^2$; this leads to $$ \begin{aligned} &\textstyle\beta_i\Ex(\frac{1}{\beta_i}f_i(x,\omega) + 1)^2\_+ \le \beta_i(1-\eta) \\\\ &\textstyle\quad\equiv\frac{1}{\alpha}\Ex(f_i(x,\omega) + \beta_i)_+^2 \le \beta_i(1-\eta) \end{aligned} $$ Drop $(\cdot)_+$ to obtain the classical Chebyshev approximation: $$ \begin{aligned} &\textstyle\beta_i\Ex(\frac{1}{\beta_i}f_i(x,\omega) + 1)^2\le \beta_i(1-\eta) \\\\ &\textstyle\quad\equiv 2\Ex\big[f_i(x,\omega)\big] + \frac{1}{\alpha}\Ex\big[f_i(x,\omega)^2\big] + \beta_i\eta \le 0 \end{aligned} $$ We minimize WRT $\beta_i$ to obtain $$\Ex\big[f_i(x,\omega)\big] + (\eta\Ex\big[f_i(x,\omega)^2\big])^{\frac12}\le 0,$$ which only depends on the first two moments of $f_i(x,\omega)$.
## Chebyshev approximation If $f_i(x,\omega) = \omega^\top x - b$, then the classical Chebyshev approximation is $$\bar{\omega}^\top x - b + \eta^\frac12(x^\top\Sigma x - 2b\bar{\omega}^\top x + b^2) \le 0$$ * $\bar{\omega} \triangleq \Ex\big[\omega\big]$ * $\Sigma \triangleq \Ex\big[\omega\omega^\top\big]$: (uncentered) second moment of $\omega$ * CE constraint with extra margin This is a second-order cone constraint: $$ \begin{aligned} \eta^\frac12\left\\|\begin{bmatrix}\Sigma^\frac12x - b\Sigma^{-\frac12}\bar{\omega}\\\\b(1-\bar{\omega}^\top\Sigma^{-1}\bar{\omega})\end{bmatrix}\right\\|_2 \le b - \bar{\omega}^\top x, \\\\ b - \bar{\omega}^\top x \ge 0. \end{aligned} $$