Optimality conditions
Let be closed, and be a continuously differentiable function. Consider the (possibly non-convex) general optimization problem
Intuitively, if a point is a (local) minimum, then
If is in the interior of , then the set of “feasible directions” is , and we recover the usual zero-gradient (necessary) optimality condition:
If is on the boundary of , then the set of “feasible directions” is intuitively the set of all directions that point “inward” from . As we shall see, this is the set of tangent vectors.
Tangent vectors
At first blush, we consider defining tangent vectors at as directions such that for all small enough step sizes . Although this definition works well for $\cC$ defined by linear constraints, it is too restrictive for $\cC$ with curved boundaries. For example, if is curve in , then it may not be possible to move in any direction and remain in .
Tangent vectors: A vector is tangent to at iff there are sequences , and such that
or .
Intuitively, traces out a curve passing thru , and the line segment is tangent to this curve. Compared to the initial (overly restrictive) definition of tangent vector, this definition adds a fudge factor.
We note that any direction such that for all small enough is a tangent vector. Indeed, let be a sequence of small enough step sizes that converges to zero and . The assumption for all small enough ensures is in . We have .
We also note that this definition of tangent vector is coincides with the tangent space of a smooth manifold. Recall the tangent space of a smooth manifold at consists of the derivatives (at ) of all smooth curves in passing thru . Let be a curve in such that . To see that is a tangent vector, let and . We have
where we used the definition of the derivative in the second step.
Finally, we claim that the set of all tangent vectors at is a closed cone called the tangent cone . Recall a subset of is a cone iff it is closed under non-negative scalar multiplication: if , then for any . The proof of this claim is elementary, and we leave the details as an exercise to the reader.
Optimality conditions for the general optimization problem
The normal vectors at a point are the vectors that make an obtuse angle with all tangent vectors: for all . It is not hard to check that the set of all normal vectors at a point is a closed convex cone. We note that the tangent cone of a non-convex set may not the convex, but the normal cone is generally convex.
Optimality conditions for the general optimization problem. If is a local minimum, then . This is equivalent to
To see this, let be an arbitrary tangent vector. There are , and such that . We have
where the first step is the (local) optimality of , the second step is the smoothness of continuously differentiable functions, and the third step is the definition of tangent vector. We divide both sides by and take limits to obtain
This optimality condition is intuitive: if is a local minimum, then there is no direction that is (i) tangent to the feasible set and (ii) a (strict) descent direction.
Optimality conditions for convex problems. The main results here are
- , where is the conic hull;
- ;
- The optimality condition is necessary and sufficient.
At a high-level, the KKT conditions are an instance of the (necessary) optimality condition for the general optimization problem to non-linear optimization problems in standard form. To keep things simple, we consider a non-linear optimization with only inequality constraints:
To keep things simple, we consider a non-linear optimization with only inequality constraints:
Let be the feasible set of the preceding optimization problem:
At first blush, we guess that the tangent cone at is
where is the set of indicies of the active constraints at . This guess is motivated by the observation that we must have for all or moving in the direction will violate an active inequality constraint. Unfortunately, this guess is not restrictive enough: there are pathological cases in which there are ’s that satisfy the preceding guess, but are not tangent vectors.
Consider the set , where . This is the intersection of two disks of radius 1: one centered at and another centered at . The two disks only intersect at the origin. Thus , and . On the other hand, the preceding guess is
To rule out such pathological cases, we impose constraint qualification (CQ). The standard CQ that we impose in this class is Slater’s CQ, but there are many alternatives.
The KKT conditions
Recall the optimality condition of the general optimization problem: if is a local minimum, then for all . In light of the preceding characterization of , this is equivalent to there is no such that
Consider the in the preceding system of inequalities as (the normal vector) of a hyperplane thru the origin. The optimality condition has a geometric interpretation: there is no hyperplane (thru the origin) that separates from . This implies is in the conic hull of : i.e. there is such that
This is almost the stationarity condition in the KKT conditions. We pad with zero entries for the constraints that are inactive to get
to get the stationarity condition. The complementary slackness and dual feasibility conditions follows from the construction of . Finally, the primal feasibility condition is a consequence of the fact that is a local minimum (hence it’s feasible).
Posted on February 17, 2025 from Ann Arbor, MI.