Skip to main content

Advances, Systems and Applications

# Table 2 Optimization algorithm

Algorithm: The optimization with the improved loss function
Input: Sample a minibatch of m examples from the training set {x(1),...,x(m)} with corresponding targets y(i).
Initialization: Step size ε=0.001, exponential decay rates for moment estimates ρ1=0.9, ρ2=0.999 and small constant δ used for numerical stabilization δ=10−8.
Output: Network parameters θ.
1. Initialize: Network parameters θ, 1st and 2nd moment variables s=0, r=0 and time step t=0.
2. While stopping criterion not met do.
Compute gradient: $$g\leftarrow \frac {1}{m}\nabla _{\theta } \sum _{i} L(f(x^{(i)};\theta),y^{(i)})$$.
t=t+1.
Update biased first moment estimate:
sρ1s+(1−ρ1)g.
Update biased second moment estimate:
$$r\leftarrow {\rho _{2} r+(1-\rho _{2})g\bigodot g}$$.
Correct bias in first moment: $${\overline {s}}\leftarrow {\frac {s}{(1-\rho _{1}^{t})}}$$.
Correct bias in second moment: $${\overline {r}}\leftarrow {\frac {r}{(1-\rho _{2}^{t})}}$$.
Compute update: $${\triangle \theta }=-\varepsilon \frac {\overline {s}}{\sqrt {\overline {r}}+\delta }$$.
Apply update: θ=θ+θ.
3. end while.
4. Return θ.