SDSC6015 - Assignment 2 | 迷麟の小站

#assignment #sdsc6015
题目链接SDSC6015 - Question of Assignment 2

Problem 1[10 marks]

Prove that if the function $f: \mathbb{R}^{d}\rightarrow \mathbb{R}$ has a subgradient at every point in its domain, then $f$ is convex.

Solution:
Let $x, y \in \mathbb{R}^d$ , $\lambda \in [0,1]$ , and define $z = \lambda x + (1-\lambda)y$ .
Since a subgradient exists at every point, for any $g_z \in \partial f(z)$ , we have:

$f(x) \geq f(z) + \langle g_z, x - z \rangle, \quad f(y) \geq f(z) + \langle g_z, y - z \rangle$

Taking a weighted sum of the two inequalities:

$\lambda f(x) + (1-\lambda) f(y) \geq f(z) + \langle g_z, \lambda(x - z) + (1-\lambda)(y - z) \rangle$

Substitute $z = \lambda x + (1-\lambda)y$ and simplify the inner product term:

$\lambda(x - z) + (1-\lambda)(y - z) = \lambda(1-\lambda)(x - y) - \lambda(1-\lambda)(x - y) = 0$

Thus:

$\lambda f(x) + (1-\lambda) f(y) \geq f(z) = f(\lambda x + (1-\lambda)y)$

Therefore, $f$ is convex.

Problem 2[20 marks]

Assume the function $f: \mathbb{R}^{d}\rightarrow \mathbb{R}$ has the sum structure:

$f(x)=\frac{1}{n}\sum_{i=1}^n f_i(x)$

where each $f_{i}: \mathbb{R}^{d}\rightarrow \mathbb{R}$ is L-smooth. Consider the over-parameterized setting, meaning there exists $x^{*}$ such that $\nabla f_{i}\left(x^{*}\right)=0,\forall i\in\{1,2,\ldots, n\}$ .
We run standard SGD by uniformly sampling $i$ and updating with step-size $\eta>0$ :

$x_{t+1}=x_{t}-\eta\nabla f_{i}\left(x_{t}\right)$

(i) Given the over-parameterization of $f$ , show that:

$\mathbb{E}\left[\left\|\nabla f_{i}\left(x_{t}\right)\right\|^{2} \mid x_{t}\right]\leqslant 2 L\left(f\left(x_{t}\right)-f\left(x^{*}\right)\right)$

(Hint: use the L-Lipschitz continuity of $\nabla f_{i}$ and $\nabla f_{i}\left(x^{*}\right)=0$ .)

Solution:

Since each $f_i$ is L-smooth and $\nabla f_i(x^*) = 0$ , by the co-coercivity property of L-smooth functions, for any $x$ and $y$ :

$\|\nabla f_i(x) - \nabla f_i(y)\|^2 \leq 2L \left( f_i(x) - f_i(y) - \nabla f_i(y)^\top (x-y) \right)$

Set $y = x^*$ and use $\nabla f_i(x^*) = 0$ to obtain:

$\|\nabla f_i(x_t)\|^2 \leq 2L \left( f_i(x_t) - f_i(x^*) \right)$

This holds for each $i$ . Taking the expectation over $i$ (uniform sampling), and noting $\mathbb{E}_i [f_i(x_t)] = f(x_t)$ and $\mathbb{E}_i [f_i(x^*)] = f(x^*)$ :

$\mathbb{E}_i \left[ \|\nabla f_i(x_t)\|^2 \right] \leq 2L \mathbb{E}_i \left[ f_i(x_t) - f_i(x^*) \right] = 2L \left( f(x_t) - f(x^*) \right)$

Thus:

$\mathbb{E}\left[\|\nabla f_i(x_t)\|^2 \mid x_t\right] \leq 2L \left(f(x_t) - f(x^*)\right)$

(ii) Using the result from (i), prove that:

$\mathbb{E}\left[f\left(x_{t+1}\right)\mid x_{t}\right]\leqslant f\left(x_{t}\right)-\eta\left\|\nabla f\left(x_{t}\right)\right\|^{2}+\eta^{2} L^{2}\left(f\left(x_{t}\right)-f\left(x^{*}\right)\right)$

(Hint: substitute the SGD update into the smoothness bound of $f$ .)

Solution:
By the L-smoothness of $f$ :

$f(x_{t+1}) \leq f(x_t) + \nabla f(x_t)^\top (x_{t+1} - x_t) + \frac{L}{2} \|x_{t+1} - x_t\|^2$

Substitute the SGD update $x_{t+1} - x_t = -\eta \nabla f_i(x_t)$ :

$f(x_{t+1}) \leq f(x_t) - \eta \nabla f(x_t)^\top \nabla f_i(x_t) + \frac{L \eta^2}{2} \|\nabla f_i(x_t)\|^2$

Take the conditional expectation over $i$ :

$\mathbb{E}_i \left[ f(x_{t+1}) \mid x_t \right] \leq f(x_t) - \eta \|\nabla f(x_t)\|^2 + \frac{L \eta^2}{2} \mathbb{E}_i \left[ \|\nabla f_i(x_t)\|^2 \mid x_t \right]$

Apply the result from (i): $\mathbb{E}_i \left[ \|\nabla f_i(x_t)\|^2 \mid x_t \right] \leq 2L \left(f(x_t) - f(x^*)\right)$ :

$\mathbb{E}\left[f(x_{t+1}) \mid x_t\right] \leqslant f(x_t) - \eta \|\nabla f(x_t)\|^2 + \eta^2 L^2 \left(f(x_t) - f(x^*)\right)$

Problem 3[10 marks]

Let $f: \mathbb{R}^{d}\rightarrow \mathbb{R}$ be convex, L-smooth, and differentiable, with $x^{*}$ as the unique global minimum of $f$ . Given an initial point $x_{1}\in \mathbb{R}^{d}$ and $T>0$ , consider Scalar AdaGrad with $G_{t}=\sum_{j=1}^{t}\|\nabla f(x_{t})\|^{2}$ and update:

$x_{t+1}=x_{t}-\frac{R}{\sqrt{G_{t}}}\nabla f\left( x_{t}\right),\quad t=1,2,\ldots,T$

Prove that:

$f\left(\frac{1}{T}\sum_{t=1}^{T} x_{t}\right)-f\left(x^{*}\right)\leqslant\frac{2 R^{2} L}{T}$

where $R=\max_{t=1}^{T}\left\|x_{t}-x^{*}\right\|$ .
Hint: use convexity and smoothness to show:

$f\left(\frac{1}{T}\sum_{t=1}^{T}x_{t}\right)-f\left(x^{*}\right)\leqslant\frac{1}{T}\left(\sum_{t=1}^{T}\left\langle\nabla f\left(x_{t}\right),x_{t}-x^{*}\right\rangle-\frac{1}{2L}\sum_{t=1}^{T}\|\nabla f\left(x_{t}\right)\|^{2}\right)$

where $\sum_{t=1}^{T}\left\langle\nabla f\left(x_{t}\right), x_{t}-x^{*}\right\rangle$ can be bounded as in the proof of Theorem 3 from Lecture 6.

Solution:
From the hint (convexity + L-smoothness):

$f\left(\frac{1}{T}\sum_{t=1}^{T} x_t\right) - f(x^*) \leqslant \frac{1}{T} \left( \sum_{t=1}^{T} \langle \nabla f(x_t), x_t - x^* \rangle - \frac{1}{2L} \sum_{t=1}^{T} \|\nabla f(x_t)\|^2 \right)$

For the AdaGrad update $x_{t+1} = x_t - \frac{R}{\sqrt{G_t}} \nabla f(x_t)$ with $G_t = \sum_{j=1}^{t} \|\nabla f(x_j)\|^2$ , refer to the proof of Theorem 3 in Lecture 6:

$\|x_{t+1} - x^*\|^2 = \|x_t - x^*\|^2 - \frac{2R}{\sqrt{G_t}} \langle \nabla f(x_t), x_t - x^* \rangle + \frac{R^2}{G_t} \|\nabla f(x_t)\|^2$

Rearranging and summing over $t$ :

$\sum_{t=1}^{T} \frac{\langle \nabla f(x_t), x_t - x^* \rangle}{\sqrt{G_t}} \leq \frac{\|x_1 - x^*\|^2}{2R} + \frac{R}{2} \sum_{t=1}^{T} \frac{\|\nabla f(x_t)\|^2}{G_t}$

Given $R = \max_t \|x_t - x^*\|$ and $G_t \geq \|\nabla f(x_t)\|^2$ , we derive:

$\sum_{t=1}^{T} \langle \nabla f(x_t), x_t - x^* \rangle \leq \frac{R^2}{2} \sqrt{G_T} + \frac{R}{2} \sqrt{G_T} = R^2 \sqrt{G_T}$

Combining with the lemma and $\sqrt{G_T} \leq \sqrt{T} \max_t \|\nabla f(x_t)\| \leq \sqrt{T} \cdot \sqrt{2L(f(x_1)-f(x^*))}$ , we obtain:

$f\left(\frac{1}{T}\sum_{t=1}^{T} x_t\right) - f(x^*) \leqslant \frac{2R^{2} L}{T}$

Problem 4 Practical Implementation of Stochastic Gradient Descent [60 marks]

For implementation, it is recommended to use Google Colab.
Please open the file “HW2_Lab_SGD.ipynb” and insert your code following the provided instructions. All necessary datasets and helper functions can be found in the HW2 Lab folder.
For submission, please submit your completed “HW2_Lab_SGD.ipynb” file along with a PDF containing the output results.