Advances, Systems and Applications
From: Lightweight image classifier using dilated and depthwise separable convolutions
Algorithm: The optimization with the improved loss function |
---|
Input: Sample a minibatch of m examples from the training set {x(1),...,x(m)} with corresponding targets y(i). |
Initialization: Step size ε=0.001, exponential decay rates for moment estimates ρ1=0.9, ρ2=0.999 and small constant δ used for numerical stabilization δ=10−8. |
Output: Network parameters θ. |
1. Initialize: Network parameters θ, 1st and 2nd moment variables s=0, r=0 and time step t=0. |
2. While stopping criterion not met do. |
Compute gradient: \(g\leftarrow \frac {1}{m}\nabla _{\theta } \sum _{i} L(f(x^{(i)};\theta),y^{(i)})\). |
t=t+1. |
Update biased first moment estimate: |
s←ρ1s+(1−ρ1)g. |
Update biased second moment estimate: |
\(r\leftarrow {\rho _{2} r+(1-\rho _{2})g\bigodot g}\). |
Correct bias in first moment: \({\overline {s}}\leftarrow {\frac {s}{(1-\rho _{1}^{t})}}\). |
Correct bias in second moment: \({\overline {r}}\leftarrow {\frac {r}{(1-\rho _{2}^{t})}}\). |
Compute update: \({\triangle \theta }=-\varepsilon \frac {\overline {s}}{\sqrt {\overline {r}}+\delta }\). |
Apply update: θ=θ+△θ. |
3. end while. |
4. Return θ. |