Optimality conditions

Let $C \subset R^{d}$ be closed, and $f_{0} : R^{d} \to R$ be a continuously differentiable function. Consider the (possibly non-convex) general optimization problem

\begin{aligned} {min}_{x \in R^{d}} & f_{0} (x) \\ subject to & x \in C \end{aligned} .

Intuitively, if a point $x_{*}$ is a (local) minimum, then

⟨ \partial f_{0} (x_{*}), v ⟩ \geq 0 in all "feasible directions" v

If $x_{*}$ is in the interior of $C$ , then the set of “feasible directions” is $R^{d}$ , and we recover the usual zero-gradient (necessary) optimality condition:

⟨ \partial f_{0} (x_{*}), v ⟩ \geq 0 for all v \in R^{d} ⟺ \partial f_{0} (x_{*}) = 0.

If $x_{*}$ is on the boundary of $C$ , then the set of “feasible directions” is intuitively the set of all directions that point “inward” from $x_{*}$ . As we shall see, this is the set of tangent vectors.

Tangent vectors

At first blush, we consider defining tangent vectors at $x \in C$ as directions $v$ such that $x + η d \in C$ for all small enough step sizes $η$ . Although this definition works well for $\cC$ defined by linear constraints, it is too restrictive for $\cC$ with curved boundaries. For example, if $C$ is curve in $R^{2}$ , then it may not be possible to move in any direction and remain in $C$ .

Tangent vectors: A vector $v \in R^{d}$ is tangent to $C$ at $x \in C$ iff there are sequences $(x_{n}) \subset C$ , $(x_{n}) \to x$ and $(η_{n}) ↘ 0$ such that

$\frac{1}{η_{n}} (x_{n} - x) \to v$ or $x_{n} - x = η_{n} v + o (η_{n})$ .

Intuitively, $(x_{n})$ traces out a curve passing thru $x$ , and the line segment $x + η v$ is tangent to this curve. Compared to the initial (overly restrictive) definition of tangent vector, this definition adds a $o (η_{n})$ fudge factor.

We note that any direction $v$ such that $x + η v \in C$ for all small enough $η$ is a tangent vector. Indeed, let $η_{n}$ be a sequence of small enough step sizes that converges to zero and $x_{n} ≜ x + η_{n} v$ . The assumption $x + η v \in C$ for all small enough $η$ ensures $x_{n}$ is in $C$ . We have $x_{n} - x = η_{n} v$ .

We also note that this definition of tangent vector is coincides with the tangent space of a smooth manifold. Recall the tangent space of a smooth manifold $M$ at $x \in M$ consists of the derivatives (at $x$ ) of all smooth curves in $M$ passing thru $x$ . Let $γ : [- δ, δ] \to C$ be a curve in $C$ such that $x = γ (0)$ . To see that $\dot{γ} (0)$ is a tangent vector, let $η_{n} ↘ 0$ and $x_{n} ≜ γ (η_{n})$ . We have

x_{n} - x = γ (η_{n}) - γ (0) = η_{n} \dot{γ} (0) + o (η_{n}),

where we used the definition of the derivative in the second step.

Finally, we claim that the set of all tangent vectors at $x \in C$ is a closed cone called the tangent cone $T_{C} (x)$ . Recall a subset of $R^{d}$ is a cone iff it is closed under non-negative scalar multiplication: if $x \in C$ , then $α x \in C$ for any $α \geq 0$ . The proof of this claim is elementary, and we leave the details as an exercise to the reader.

Optimality conditions for the general optimization problem

The normal vectors at a point $x \in C$ are the vectors that make an obtuse angle with all tangent vectors: $⟨ u, v ⟩ \leq 0$ for all $v \in T_{C} (x)$ . It is not hard to check that the set of all normal vectors at a point is a closed convex cone. We note that the tangent cone of a non-convex set may not the convex, but the normal cone is generally convex.

Optimality conditions for the general optimization problem. If $x_{*}$ is a local minimum, then $- \partial f_{0} (x_{*}) \in N_{C} (x_{*})$ . This is equivalent to

⟨ \partial f_{0} (x_{*}), v ⟩ \geq 0 for all v \in T_{C} (x_{*}) .

To see this, let $v \in T_{C} (x_{*})$ be an arbitrary tangent vector. There are $(x_{n}) \in C$ , $x_{n} \to x$ and $η_{n} ↘ 0$ such that $\frac{1}{η_{n}} (x_{n} - x_{*}) \to v$ . We have

\begin{aligned} 0 & \leq f_{0} (x_{n}) - f_{0} (x_{*}) \\ = f_{0} (x_{*} + η_{n} v) - f_{0} (x_{*}) + O (‖ x_{n} - (x_{*} + η_{n} v) ‖) \\ = f_{0} (x_{*} + η_{n} v) - f_{0} (x_{*}) + o (η_{n}), \end{aligned}

where the first step is the (local) optimality of $x_{*}$ , the second step is the smoothness of continuously differentiable functions, and the third step is the definition of tangent vector. We divide both sides by $η_{n}$ and take limits to obtain

0 \leq \frac{1}{η_{n}} (f_{0} (x_{*} + η_{n} v) - f_{0} (x_{*})) \to ⟨ \partial f_{0} (x_{*}), v ⟩ .

This optimality condition is intuitive: if $x_{*}$ is a local minimum, then there is no direction $v$ that is (i) tangent to the feasible set and (ii) a (strict) descent direction.

Optimality conditions for convex problems. The main results here are

$T_{C} (x) = cone (C - x_{*})$ , where $cone$ is the conic hull;
$N_{C} (x) = {u \in R^{d} ∣ ⟨ u, x^{'} - x ⟩ \leq 0 for all x^{'} \in C}$ ;
The optimality condition $- \partial f_{0} (x_{*}) \in N_{C} (x_{*})$ is necessary and sufficient.

At a high-level, the KKT conditions are an instance of the (necessary) optimality condition for the general optimization problem to non-linear optimization problems in standard form. To keep things simple, we consider a non-linear optimization with only inequality constraints:

\begin{aligned} {min}_{x \in R^{d}} & f_{0} (x) \\ subject to & {f_{i} (x) \leq 0}_{i = 1}^{m} \end{aligned} .

Tangent vectors of optimization problems in “standard” form

To keep things simple, we consider a non-linear optimization with only inequality constraints:

\begin{aligned} {min}_{x \in R^{d}} & f_{0} (x) \\ subject to & {f_{i} (x) \leq 0}_{i = 1}^{m} \end{aligned} .

Let $C$ be the feasible set of the preceding optimization problem:

C ≜ {x \in R^{d} ∣ {f_{i} (x) \leq 0}_{i = 1}^{m}} .

At first blush, we guess that the tangent cone at $x \in C$ is

T_{C} (x) = {v \in R^{d} : {⟨ \partial f_{i} (x), v ⟩ \leq 0}_{i \in A}},

where $A ≜ {i \in [m] ∣ f_{i} (x) = 0}$ is the set of indicies of the active constraints at $x$ . This guess is motivated by the observation that we must have $⟨ \partial f_{i} (x), v ⟩ \leq 0 ⟩$ for all $i \in A$ or moving in the direction $v$ will violate an active inequality constraint. Unfortunately, this guess is not restrictive enough: there are pathological cases in which there are $v$ ’s that satisfy the preceding guess, but are not tangent vectors.

Consider the set $C ≜ {x \in R^{2} ∣ \frac{1}{2} ‖ x + e_{1} ‖_{2}^{2} \leq \frac{1}{2}, \frac{1}{2} ‖ x - e_{1} ‖_{2}^{2} \leq \frac{1}{2}}$ , where $e_{1} ≜ (1, 0)$ . This is the intersection of two disks of radius 1: one centered at $e_{1}$ and another centered at $- e_{1}$ . The two disks only intersect at the origin. Thus $C = {0}$ , and $T_{C} (0) = {0}$ . On the other hand, the preceding guess is

T_{C} (x) = {v \in R^{2} ∣ ⟨ e_{1}, v ⟩ = 0} .

To rule out such pathological cases, we impose constraint qualification (CQ). The standard CQ that we impose in this class is Slater’s CQ, but there are many alternatives.

The KKT conditions

Recall the optimality condition of the general optimization problem: if $x_{*}$ is a local minimum, then $⟨ \partial f_{0} (x_{*}), v ⟩ \geq 0$ for all $v \in T_{C} (x_{*})$ . In light of the preceding characterization of $T_{C} (x_{*})$ , this is equivalent to there is no $v \in R^{d}$ such that

\begin{aligned} {⟨ \partial f_{i} (x_{*}), v ⟩ & \leq 0}_{i \in A}, \\ - ⟨ \partial f_{0} (x_{*}), v ⟩ & < 0. \end{aligned}

Consider the $v$ in the preceding system of inequalities as (the normal vector) of a hyperplane thru the origin. The optimality condition has a geometric interpretation: there is no hyperplane (thru the origin) that separates $- \partial f_{0} (x_{*})$ from ${\partial f_{i} (x_{*})}_{i \in A}$ . This implies $\partial f_{0} (x_{*})$ is in the conic hull of ${\partial f_{i} (x_{*})}_{i \in A}$ : i.e. there is $λ \in R_{+}^{m}$ such that

- \partial f_{0} (x_{*}) = \sum_{i \in A} λ_{i} \partial f_{i} (x_{*}) .

This is almost the stationarity condition in the KKT conditions. We pad $λ$ with zero entries for the constraints that are inactive to get

- \partial f_{0} (x_{*}) = \sum_{i = 1}^{m} λ_{i} \partial f_{i} (x_{*})

to get the stationarity condition. The complementary slackness and dual feasibility conditions follows from the construction of $λ$ . Finally, the primal feasibility condition is a consequence of the fact that $x_{*}$ is a local minimum (hence it’s feasible).

Posted on February 17, 2025 from Ann Arbor, MI.