At a high-level, the KKT conditions are an instance of the (necessary) optimality condition for the general optimization problem to non-linear optimization problems in standard form. To keep things simple, we consider a non-linear optimization with only inequality constraints:
\[\begin{aligned} & \min\nolimits_{x\in\reals^d} && f_0(x) \\ & \subjectto && \{f_i(x) \le 0\}_{i=1}^m \end{aligned}.\]Define \(\cC\) as the feasible set of the preceding optimization problem:
\[\cC \triangleq \{x\in\reals^d\mid\{f_i(x) \le 0\}_{i=1}^m\}.\]At first blush, we guess that the tangent cone at \(x\in\cC\) is
\[T_{\cC}(x) \overset{?}{=} \{v\in\reals^d:\{\langle\partial f_i(x),v\rangle\le 0\}_{i\in\cA}\},\]where \(\cA \triangleq \{i\in[m]\mid f_i(x) = 0\}\) is the set of indicies of the active constraints at \(x\). This guess is motivated by the observation that we must have \(\langle\partial f_i(x),v\rangle \le 0\rangle\) for all \(i\in\cA\) or moving in the direction \(v\) will violate an active inequality constraint. Unforunately, this guess is not restrictive enough: there are pathological cases in which there are \(v\)’s that satisfy the preceding guess, but are not tangent vectors.
Consider the set \(\cC \triangleq \{x\in\reals^2\mid\frac12\|x + e_1\|_2^2 \le \frac12, \frac12\|x - e_1\|_2^2 \le \frac12\}\), where \(e_1 \triangleq (1,0)\). This is the intersection of two disks of radius 1: one centered at \(e_1\) and another centered at \(-e_1\). The two disks only intersect at the origin. Thus \(\cC = \{0\}\), and \(T_{\cC}(0) = \{0\}\). On the other hand, the preceding guess is
\[T_{\cC}(x) \overset{?}{=} \{v\in\reals^2\mid\langle e_1,v\rangle = 0\}.\]To rule out such pathological cases, we impose Slater’s constraint qualification (CQ).
Let \(L_{\cC}(x)$ \triangleq \{v\in\reals^d:\{\langle\partial f_i(x),v\rangle\le 0\}_{i\in\cA}\}\). It is not hard to check that \(T_{\cC}(x) \subset L_{\cC}(x)\). We leave the details as an exercise and focus on establishing
To see that Slater’s CQ
Recall the optimality condition of the general optimization problem: if \(x_*\) is a local minimum, then \(\langle\partial f_0(x_*),v\rangle \ge 0\) for all \(v\in T_{\cC}(x_*)\). In light of the preceding characterization of \(T_{\cC}(x_*)\), this is equivalent to there is no \(v\in\reals^d\) such that
\[\begin{aligned} \{\langle\partial f_i(x_*),v\rangle &\le 0\}_{i\in\cA}, \\ -\langle\partial f_0(x_*),v\rangle &< 0. \end{aligned}\]Consider the \(v\) in the preceding system of inequalities as (the normal vector) of a hyperplane thru the origin. The optimality condition has a geometric interpretation: there is no hyperplane (thru the origin) that separates \(-\partial f_0(x_*)\) from \(\{\partial f_i(x_*)\}_{i\in\cA}\). This implies \(\partial f_0(x_*)\) is in the conic hull of \(\{\partial f_i(x_*)\}_{i\in\cA}\): i.e. there is \(\lambda\in\reals_+^m\) such that
\[\textstyle-\partial f_0(x_*) = \sum_{i\in\cA}\lambda_i\partial f_i(x_*).\]This is almost the stationarity condition in the KKT conditions. We pad \(\lambda\) with zero entries for the constraints that are inactive to get
\[\textstyle-\partial f_0(x_*) = \sum_{i=1}^m\lambda_i\partial f_i(x_*)\]to get the stationarity condition. The complementary slackness and dual feasibility conditions follows from the construction of \(\lambda\). Finally, the primal feasibility condition is a consequence of the fact that \(x_*\) is a local minimum (hence it’s feasible).
Posted on March 18, 2021 from San Francisco, CA.