STATS 606

Optimality conditions

Let CRd be closed, and f0:RdR be a continuously differentiable function. Consider the (possibly non-convex) general optimization problem

minxRdf0(x)subject toxC.

Intuitively, if a point x is a (local) minimum, then

f0(x),v0 in all "feasible directions" v

If x is in the interior of C, then the set of “feasible directions” is Rd, and we recover the usual zero-gradient (necessary) optimality condition:

f0(x),v0 for all vRdf0(x)=0.

If x is on the boundary of C, then the set of “feasible directions” is intuitively the set of all directions that point “inward” from x. As we shall see, this is the set of tangent vectors.

Tangent vectors

At first blush, we consider defining tangent vectors at xC as directions v such that x+ηdC for all small enough step sizes η. Although this definition works well for $\cC$ defined by linear constraints, it is too restrictive for $\cC$ with curved boundaries. For example, if C is curve in R2, then it may not be possible to move in any direction and remain in C.

Tangent vectors: A vector vRd is tangent to C at xC iff there are sequences (xn)C, (xn)x and (ηn)0 such that

1ηn(xnx)v or xnx=ηnv+o(ηn).

Intuitively, (xn) traces out a curve passing thru x, and the line segment x+ηv is tangent to this curve. Compared to the initial (overly restrictive) definition of tangent vector, this definition adds a o(ηn) fudge factor.

We note that any direction v such that x+ηvC for all small enough η is a tangent vector. Indeed, let ηn be a sequence of small enough step sizes that converges to zero and xnx+ηnv. The assumption x+ηvC for all small enough η ensures xn is in C. We have xnx=ηnv.

We also note that this definition of tangent vector is coincides with the tangent space of a smooth manifold. Recall the tangent space of a smooth manifold M at xM consists of the derivatives (at x) of all smooth curves in M passing thru x. Let γ:[δ,δ]C be a curve in C such that x=γ(0). To see that γ˙(0) is a tangent vector, let ηn0 and xnγ(ηn). We have

xnx=γ(ηn)γ(0)=ηnγ˙(0)+o(ηn),

where we used the definition of the derivative in the second step.

Finally, we claim that the set of all tangent vectors at xC is a closed cone called the tangent cone TC(x). Recall a subset of Rd is a cone iff it is closed under non-negative scalar multiplication: if xC, then αxC for any α0. The proof of this claim is elementary, and we leave the details as an exercise to the reader.

Optimality conditions for the general optimization problem

The normal vectors at a point xC are the vectors that make an obtuse angle with all tangent vectors: u,v0 for all vTC(x). It is not hard to check that the set of all normal vectors at a point is a closed convex cone. We note that the tangent cone of a non-convex set may not the convex, but the normal cone is generally convex.

Optimality conditions for the general optimization problem. If x is a local minimum, then f0(x)NC(x). This is equivalent to

f0(x),v0 for all vTC(x).

To see this, let vTC(x) be an arbitrary tangent vector. There are (xn)C, xnx and ηn0 such that 1ηn(xnx)v. We have

0f0(xn)f0(x)=f0(x+ηnv)f0(x)+O(xn(x+ηnv))=f0(x+ηnv)f0(x)+o(ηn),

where the first step is the (local) optimality of x, the second step is the smoothness of continuously differentiable functions, and the third step is the definition of tangent vector. We divide both sides by ηn and take limits to obtain

01ηn(f0(x+ηnv)f0(x))f0(x),v.

This optimality condition is intuitive: if x is a local minimum, then there is no direction v that is (i) tangent to the feasible set and (ii) a (strict) descent direction.

Optimality conditions for convex problems. The main results here are

  1. TC(x)=cone(Cx), where cone is the conic hull;
  2. NC(x)={uRdu,xx0 for all xC};
  3. The optimality condition f0(x)NC(x) is necessary and sufficient.

At a high-level, the KKT conditions are an instance of the (necessary) optimality condition for the general optimization problem to non-linear optimization problems in standard form. To keep things simple, we consider a non-linear optimization with only inequality constraints:

minxRdf0(x)subject to{fi(x)0}i=1m.

Tangent vectors of optimization problems in “standard” form

To keep things simple, we consider a non-linear optimization with only inequality constraints:

minxRdf0(x)subject to{fi(x)0}i=1m.

Let C be the feasible set of the preceding optimization problem:

C{xRd{fi(x)0}i=1m}.

At first blush, we guess that the tangent cone at xC is

TC(x)={vRd:{fi(x),v0}iA},

where A{i[m]fi(x)=0} is the set of indicies of the active constraints at x. This guess is motivated by the observation that we must have fi(x),v0 for all iA or moving in the direction v will violate an active inequality constraint. Unfortunately, this guess is not restrictive enough: there are pathological cases in which there are v’s that satisfy the preceding guess, but are not tangent vectors.

Consider the set C{xR212x+e12212,12xe12212}, where e1(1,0). This is the intersection of two disks of radius 1: one centered at e1 and another centered at e1. The two disks only intersect at the origin. Thus C={0}, and TC(0)={0}. On the other hand, the preceding guess is

TC(x)={vR2e1,v=0}.

To rule out such pathological cases, we impose constraint qualification (CQ). The standard CQ that we impose in this class is Slater’s CQ, but there are many alternatives.

The KKT conditions

Recall the optimality condition of the general optimization problem: if x is a local minimum, then f0(x),v0 for all vTC(x). In light of the preceding characterization of TC(x), this is equivalent to there is no vRd such that

{fi(x),v0}iA,f0(x),v<0.

Consider the v in the preceding system of inequalities as (the normal vector) of a hyperplane thru the origin. The optimality condition has a geometric interpretation: there is no hyperplane (thru the origin) that separates f0(x) from {fi(x)}iA. This implies f0(x) is in the conic hull of {fi(x)}iA: i.e. there is λR+m such that

f0(x)=iAλifi(x).

This is almost the stationarity condition in the KKT conditions. We pad λ with zero entries for the constraints that are inactive to get

f0(x)=i=1mλifi(x)

to get the stationarity condition. The complementary slackness and dual feasibility conditions follows from the construction of λ. Finally, the primal feasibility condition is a consequence of the fact that x is a local minimum (hence it’s feasible).

Posted on February 17, 2025 from Ann Arbor, MI.