The Karush-Kuhn-Tucker (KKT) conditions

At a high-level, the KKT conditions are an instance of the (necessary) optimality condition for the general optimization problem to non-linear optimization problems in standard form. To keep things simple, we consider a non-linear optimization with only inequality constraints:

\[\begin{aligned} & \min\nolimits_{x\in\reals^d} && f_0(x) \\ & \subjectto && \{f_i(x) \le 0\}_{i=1}^m \end{aligned}.\]

Characterizing the tangent vectors

Define $\cC$ as the feasible set of the preceding optimization problem:

\[\cC \triangleq \{x\in\reals^d\mid\{f_i(x) \le 0\}_{i=1}^m\}.\]

At first blush, we guess that the tangent cone at $x\in\cC$ is

\[T_{\cC}(x) \overset{?}{=} \{v\in\reals^d:\{\langle\partial f_i(x),v\rangle\le 0\}_{i\in\cA}\},\]

where $\cA \triangleq \{i\in[m]\mid f_i(x) = 0\}$ is the set of indicies of the active constraints at $x$. This guess is motivated by the observation that we must have $\langle\partial f_i(x),v\rangle \le 0\rangle$ for all $i\in\cA$ or moving in the direction $v$ will violate an active inequality constraint. Unforunately, this guess is not restrictive enough: there are pathological cases in which there are $v$’s that satisfy the preceding guess, but are not tangent vectors.

Consider the set $\cC \triangleq \{x\in\reals^2\mid\frac12\|x + e_1\|_2^2 \le \frac12, \frac12\|x - e_1\|_2^2 \le \frac12\}$, where $e_1 \triangleq (1,0)$. This is the intersection of two disks of radius 1: one centered at $e_1$ and another centered at $-e_1$. The two disks only intersect at the origin. Thus $\cC = \{0\}$, and $T_{\cC}(0) = \{0\}$. On the other hand, the preceding guess is

\[T_{\cC}(x) \overset{?}{=} \{v\in\reals^2\mid\langle e_1,v\rangle = 0\}.\]

To rule out such pathological cases, we impose Slater’s constraint qualification (CQ).

Let $L_{\cC}(x)$ \triangleq \{v\in\reals^d:\{\langle\partial f_i(x),v\rangle\le 0\}_{i\in\cA}\}$. It is not hard to check that $T_{\cC}(x) \subset L_{\cC}(x)$. We leave the details as an exercise and focus on establishing

To see that Slater’s CQ

The KKT conditions

Recall the optimality condition of the general optimization problem: if $x_*$ is a local minimum, then $\langle\partial f_0(x_*),v\rangle \ge 0$ for all $v\in T_{\cC}(x_*)$. In light of the preceding characterization of $T_{\cC}(x_*)$, this is equivalent to there is no $v\in\reals^d$ such that

\[\begin{aligned} \{\langle\partial f_i(x_*),v\rangle &\le 0\}_{i\in\cA}, \\ -\langle\partial f_0(x_*),v\rangle &< 0. \end{aligned}\]

Consider the $v$ in the preceding system of inequalities as (the normal vector) of a hyperplane thru the origin. The optimality condition has a geometric interpretation: there is no hyperplane (thru the origin) that separates $-\partial f_0(x_*)$ from $\{\partial f_i(x_*)\}_{i\in\cA}$. This implies $\partial f_0(x_*)$ is in the conic hull of $\{\partial f_i(x_*)\}_{i\in\cA}$: i.e. there is $\lambda\in\reals_+^m$ such that

\[\textstyle-\partial f_0(x_*) = \sum_{i\in\cA}\lambda_i\partial f_i(x_*).\]

This is almost the stationarity condition in the KKT conditions. We pad $\lambda$ with zero entries for the constraints that are inactive to get

\[\textstyle-\partial f_0(x_*) = \sum_{i=1}^m\lambda_i\partial f_i(x_*)\]

to get the stationarity condition. The complementary slackness and dual feasibility conditions follows from the construction of $\lambda$. Finally, the primal feasibility condition is a consequence of the fact that $x_*$ is a local minimum (hence it’s feasible).

Posted on March 18, 2021 from San Francisco, CA.