SDSC6015 - Assignment 1

#assignment #sdsc6015

Problem 1: Jensen’s inequality

Let f be convex, x1,,xmdom(f),λ1,,λmR+x_{1},\ldots, x_{m}\in\operatorname{dom}(f),\lambda_{1},\ldots,\lambda_{m}\in R_{+} such that i=1mλi=1\sum_{i=1}^{m}\lambda_{i}=1 . Show that

f(i=1mλixi)i=1mλif(xi)f\left( \sum_{i=1}^m \lambda_i x_i \right) \leq \sum_{i=1}^m \lambda_i f(x_i)

Proof.

  1. For m=2m=2:
    Let λ1+λ2=1\lambda_1 + \lambda_2 = 1, then:

    f(λ1x1+λ2x2)λ1f(x1)+λ2f(x2)f(λ1x1+(1λ1)x2)λ1f(x1)+(1λ1)f(x2)\begin{align*} f(\lambda_1 x_1 + \lambda_2 x_2) & \leq \lambda_1 f(x_1) + \lambda_2 f(x_2) \\ f(\lambda_1 x_1 + (1 - \lambda_1) x_2) & \leq \lambda_1 f(x_1) + (1 - \lambda_1) f(x_2) \end{align*}

    For a convex function ff, x,ydom(f)\forall x, y \in \operatorname{dom}(f) and θ[0,1]\theta \in [0,1], we have:

    f(θx+(1θ)y)θf(x)+(1θ)f(y)f(\theta x + (1-\theta)y) \leq \theta f(x) + (1-\theta) f(y)

    This holds for m=2m = 2.

  2. Assume it holds for m=km=k
    That is, for m=km=k, λ1,,λmR+\lambda_{1},\ldots,\lambda_{m}\in R_{+} such that i=1mλi=1\sum_{i=1}^{m}\lambda_{i}=1, we have:

    f(i=1kλixi)i=1kλif(xi)f\left( \sum_{i=1}^k \lambda_i x_i \right) \leq \sum_{i=1}^k \lambda_i f(x_i)

  3. For m=k+1m=k+1, we have:

    a. For λk+1=1\lambda_{k+1}=1:

    Then:

    f(xi)f(xi)f\left(x_i \right) \leq f(x_i)

    This always holds.

    b. For λk+1<1\lambda_{k+1}\lt1:

    Then:

    f(i=1k+1λixi)=f(i=1kλixi+λk+1xk+1)=f((1λk+1)i=1kλi1λk+1xi+λk+1xk+1)f(i=1k+1λixi)(1λk+1)f(i=1kλi1λk+1xi)+λk+1f(xk+1)f(i=1kλixi)i=1kλif(xi)(1λk+1)f(i=1kλi1λk+1xi)(1λk+1)i=1kλi1λk+1f(xi)=i=1kλif(xi)f(i=1k+1λixi)i=1kλif(xi)+λk+1f(xk+1)=i=1k+1λif(xi)\begin{align*} f\left( \sum_{i=1}^{k+1} \lambda_i x_i \right) & = f\left( \sum_{i=1}^{k} \lambda_i x_i + \lambda_{k+1} x_{k+1} \right)\\ & = f\left( (1 - \lambda_{k+1})\sum_{i=1}^{k} \frac{\lambda_i}{1 - \lambda_{k+1}} x_i + \lambda_{k+1} x_{k+1} \right) \\ f\left( \sum_{i=1}^{k+1} \lambda_i x_i \right) & \leq (1 - \lambda_{k+1}) f\left(\sum_{i=1}^{k} \frac{\lambda_i}{1 - \lambda_{k+1}} x_i\right) + \lambda_{k+1}f(x_{k+1})\\ \because f\left( \sum_{i=1}^k \lambda_i x_i \right) &\leq \sum_{i=1}^k \lambda_i f(x_i)\\ \therefore (1 - \lambda_{k+1}) f\left(\sum_{i=1}^{k} \frac{\lambda_i}{1 - \lambda_{k+1}} x_i\right) & \leq (1 - \lambda_{k+1}) \sum_{i=1}^{k} \frac{\lambda_i}{1 - \lambda_{k+1}} f\left(x_i\right) = \sum_{i=1}^{k} \lambda_i f\left(x_i\right)\\ \therefore f\left( \sum_{i=1}^{k+1} \lambda_i x_i \right) & \leq \sum_{i=1}^{k}\lambda_i f(x_i) + \lambda_{k+1}f(x_{k+1}) = \sum_{i=1}^{k+1}\lambda_i f(x_i) \\ \end{align*}

    The inequality holds for m=k+1m=k+1.

By mathematical induction, Jensen’s inequality is proven.

Problem 2

(i) Let f1,f2,,fmf_{1}, f_{2},\ldots, f_{m} be convex functions, λ1,λ2,,λmR+\lambda_{1},\lambda_{2},\ldots,\lambda_{m}\in R_{+} . Show that f:=i=1mλifif:=\sum_{i=1}^{m}\lambda_{i} f_{i} is convex on dom(f):=i=1mdom(fi)\operatorname{dom}(f):=\bigcap_{i=1}^{m}\operatorname{dom}\left(f_{i}\right) .

Since each fif_i is convex, for any ii, we have:

fi(θx+(1θ)y)θfi(x)+(1θ)fi(y)λifi(θx+(1θ)y)λiθfi(x)+λi(1θ)fi(y)i=1mλifi(θx+(1θ)y)θi=1mλifi(x)+(1θ)i=1mλifi(y)f(θx+(1θ)y)θf(x)+(1θ)f(y)\begin{align*} f_i(\theta x + (1-\theta)y) &\leq \theta f_i(x) + (1-\theta) f_i(y) \\ \lambda_i f_i(\theta x + (1-\theta)y) &\leq \lambda_i \theta f_i(x) + \lambda_i (1-\theta) f_i(y) \\ \sum_{i=1}^m \lambda_i f_i(\theta x + (1-\theta)y) &\leq \theta \sum_{i=1}^m \lambda_i f_i(x) + (1-\theta) \sum_{i=1}^m \lambda_i f_i(y) \\ \therefore f(\theta x + (1-\theta)y) &\leq \theta f(x) + (1-\theta) f(y) \end{align*}

Therefore, ff is convex.

(ii) Let f be a convex function with dom(f)Rd,g:RmRd\operatorname{dom}(f)\subseteq R^{d}, g: R^{m}\rightarrow R^{d} an affine function, meaning that g(x)=Ax+bg(x)=A x+b , for some matrix ARd×mA\in R^{d\times m} and some vector bRdb\in R^{d} . Show that the function fgf\circ g (that maps x to f(Ax+b))f(A x+b)) is convex on dom(fg):={xRm:g(x)dom(f)}\operatorname{dom}(f\circ g):=\left\{x\in R^{m}:g(x)\in\operatorname{dom}(f)\right\} .

(fg)(θx+(1θ)y)=f(g(θx+(1θ)y))(f \circ g)(\theta x + (1-\theta)y) = f(g(\theta x + (1-\theta)y))

Since gg is an affine function:

g(θx+(1θ)y)=A(θx+(1θ)y)+b=θ(Ax+b)+(1θ)(Ay+b)=θg(x)+(1θ)g(y)f(θg(x)+(1θ)g(y))θf(g(x))+(1θ)f(g(y))(fg)(θx+(1θ)y)θ(fg)(x)+(1θ)(fg)(y)\begin{align*} g(\theta x + (1-\theta)y) &= A(\theta x + (1-\theta)y) + b \\ &= \theta (Ax + b) + (1-\theta)(Ay + b) \\ &= \theta g(x) + (1-\theta) g(y) \\ f(\theta g(x) + (1-\theta) g(y)) &\leq \theta f(g(x)) + (1-\theta) f(g(y)) \\ \therefore (f \circ g)(\theta x + (1-\theta)y) &\leq \theta (f \circ g)(x) + (1-\theta) (f \circ g)(y) \end{align*}

fgf \circ g is convex.

Problem 3

Show that the quadratic function f(x)=xQx+bx+cf(x)=x^{\top}Q x+b^{\top} x+c , with symmetric matrix Q, is smooth with parameter 2Q2||Q||.

f(x)f(y)2Qxy\| \nabla f(x) - \nabla f(y) \| \leq 2 \|Q\| \| x - y \|

Compute the gradient:

f(x)=2Qx+b\nabla f(x) = 2 Q x + b

Therefore:

f(x)f(y)=2Q(xy)f(x)f(y)=2Q(xy)2Qxy\begin{align*} \nabla f(x) - \nabla f(y) &= 2 Q (x - y) \\ \| \nabla f(x) - \nabla f(y) \| &= \| 2 Q (x - y) \| \leq 2 \|Q\| \| x - y \| \end{align*}

Thus, ff is smooth with parameter L=2QL = 2 \|Q\|.

Problem 4

Consider the projected gradient descent algorithm with a convex differentiable function f:XRf: X\rightarrow R . Suppose that for some t0,xt+1=xtt\geqslant 0, x_{t+1}=x_{t} . Prove that in this case, xtx_{t} is a minimizer of f over the closed and convex set X.

Given xt+1=xtx_{t+1} = x_t, so:

xt+1=xtxt=ΠX(xtηf(xt))\begin{align*} &\because x_{t+1} = x_t \\ &\therefore x_t = \Pi_X (x_t - \eta \nabla f(x_t)) \\ \end{align*}

Let z=xtηf(xt)z = x_t - \eta \nabla f(x_t), y=xty = x_t, so:

xt(xtηf(xt)),xxt0xXηf(xt),xxt0xXf(xt),xxt0xX\begin{align*} \langle x_t - (x_t - \eta \nabla f(x_t)), x - x_t \rangle &\geq 0 \quad \forall x \in X \\ \langle \eta \nabla f(x_t), x - x_t \rangle &\geq 0 \quad \forall x \in X \\ \langle \nabla f(x_t), x - x_t \rangle &\geq 0 \quad \forall x \in X \end{align*}

Therefore, xtx_t is a minimizer of ff over XX.