Advances, Systems and Applications
Algorithm | Characteristics | Advantages | Disadvantages |
---|---|---|---|
Adagrad [33] | Dynamically adjusts the learning rate based on the historical gradient | Adaptive learning rate, suitable for sparse data | Learning rate decay is fast, may result in small early parameter updates |
RMSprop [34] | An improvement over Adagrad, adjusts the learning rate with an exponential moving average | Adapts well to different learning rates for each parameter | Requires tuning hyperparameters, may converge slowly |
Adam [35] | Combines momentum and RMSprop, with adaptive learning rates and momentum | Efficient, suitable for various data and models | Requires tuning of additional hyperparameters, may be unstable at times |
Adadelta [36] | Further improvement over RMSprop, eliminates learning rate decay issues | Adaptive learning rate, reduces hyperparameter reliance | Sensitive to initialization, requires tuning of the initial learning rate |