Let \(\cC\subset\reals^d\) be closed, and \(f_0:\reals^d\to\reals\) be a continuously differentiable function. Consider the (possibly non-convex) general optimization problem
\[\begin{aligned} &\min\nolimits_{x\in\reals^d} & & f_0(x) \\ &\subjectto & & x\in\cC \end{aligned}.\]Intuitively, if a point \(x_*\) is a (local) minimum, then
\[\langle\partial f_0(x_*),v\rangle \ge 0\text{ in all "feasible directions" }v\]If \(x_*\) is in the interior of \(\cC\), then the set of “feasible directions” is \(\reals^d\), and we recover the usual zero-gradient (necessary) optimality condition:
\[\langle\partial f_0(x_*),v\rangle \ge 0\text{ for all }v\in\reals^d\iff\partial f_0(x_*) = 0.\]If \(x_*\) is on the boundary of \(\cC\), then the set of “feasible directions” is intuitively the set of all directions that point “inward” from \(x_*\). As we shall see, this is the set of tangent vectors.
At first blush, we consider defining tangent vectors at \(x\in\cC\) as directions \(v\) such that \(x + \eta d\in\cC\) for all small enough step sizes \(\eta\). Although this definition works well for $\cC$ defined by linear constraints, it is too restrictive for $\cC$ with curved boundaries. For example, if \(\cC\) is curve in \(\reals^2\), then it may not be possible to move in any direction and remain in \(\cC\).
Tangent vectors: A vector \(v\in\reals^d\) is tangent to \(\cC\) at \(x\in\cC\) iff there are sequences \((x_n)\subset\cC\), \((x_n) \to x\) and \((\eta_n)\searrow 0\) such that
\(\frac{1}{\eta_n}(x_n - x) \to v\) or \(x_n - x = \eta_nv + o(\eta_n)\).
Intuitively, \((x_n)\) traces out a curve passing thru \(x\), and the line segment \(x + \eta v\) is tangent to this curve. Compared to the initial (overly restrictive) definition of tangent vector, this definition adds a \(o(\eta_n)\) fudge factor.
We note that any direction \(v\) such that \(x + \eta v\in\cC\) for all small enough \(\eta\) is a tangent vector. Indeed, let \(\eta_n\) be a sequence of small enough step sizes that converges to zero and \(x_n \triangleq x + \eta_nv\). The assumption \(x + \eta v\in\cC\) for all small enough \(\eta\) ensures \(x_n\) is in \(\cC\). We have \(x_n - x = \eta_nv\).
We also note that this definition of tangent vector is coincides with the tangent space of a smooth manifold. Recall the tangent space of a smooth manifold \(\cM\) at \(x\in\cM\) consists of the derivatives (at \(x\)) of all smooth curves in \(\cM\) passing thru \(x\). Let \(\gamma:[-\delta,\delta]\to\cC\) be a curve in \(\cC\) such that \(x = \gamma(0)\). To see that \(\dot{\gamma}(0)\) is a tangent vector, let \(\eta_n\searrow 0\) and \(x_n\triangleq\gamma(\eta_n)\). We have
\[x_n - x = \gamma(\eta_n) - \gamma(0) = \eta_n\dot{\gamma}(0) + o(\eta_n),\]where we used the definition of the derivative in the second step.
Finally, we claim that the set of all tangent vectors at \(x\in\cC\) is a closed cone called the tangent cone \(T_{\cC}(x)\). Recall a subset of \(\reals^d\) is a cone iff it is closed under non-negative scalar multiplication: if \(x\in\cC\), then \(\alpha x\in\cC\) for any \(\alpha \ge 0\). The proof of this claim is elementary, and we leave the details as an exercise to the reader.
The normal vectors at a point \(x\in\cC\) are the vectors that make an obtuse angle with all tangent vectors: \(\langle u,v\rangle \le 0\) for all \(v\in T_{\cC}(x)\). It is not hard to check that the set of all normal vectors at a point is a closed convex cone. We note that the tangent cone of a non-convex set may not the convex, but the normal cone is generally convex.
Optimality conditions for the general optimization problem: If \(x_*\) is a local minimum, then \(-\partial f_0(x_*)\in N_{\cC}(x_*)\). This is equivalent to
\[\langle\partial f_0(x_*),v\rangle \ge 0\text{ for all }v\in T_{\cC}(x_*).\]To see this, let \(v\in T_{\cC}(x_*)\) be an arbitrary tangent vector. There are \((x_n)\in\cC\), \(x_n\to x\) and \(\eta_n\searrow 0\) such that \(\frac{1}{\eta_n}(x_n - x_*) \to v\). We have
\[\begin{aligned} 0 &\le f_0(x_n) - f_0(x_*) \\ &= f_0(x_* + \eta_n v) - f_0(x_*) + O(\|x_n - (x_* + \eta_nv)\|) \\ &= f_0(x_* + \eta_n v) - f_0(x_*) + o(\eta_n), \end{aligned}\]where the first step is the (local) optimality of \(x_*\), the second step is the smoothness of continuously differentiable functions, and the third step is the definition of tangent vector. We divide both sides by \(\eta_n\) and take limits to obtain
\[0 \le \frac{1}{\eta_n}(f_0(x_* + \eta_n v) - f_0(x_*)) \to \langle\partial f_0(x_*),v\rangle.\]This optimality condition is intuitive: if \(x_*\) is a local minimum, then there is no direction \(v\) that is (i) tangent to the feasible set and (ii) a (strict) descent direction.
The main results here are
Posted on March 18, 2021 from San Francisco, CA.