Skip to main content

Advances, Systems and Applications

Table 2 Performance comparison of optimizers in different scenarios

From: Low-cost and high-performance abnormal trajectory detection based on the GRU model with deep spatiotemporal sequence analysis in cloud computing

Algorithm

Characteristics

Advantages

Disadvantages

Adagrad [33]

Dynamically adjusts the learning rate based on the historical gradient

Adaptive learning rate, suitable for sparse data

Learning rate decay is fast, may result in small early parameter updates

RMSprop [34]

An improvement over Adagrad, adjusts the learning rate with an exponential moving average

Adapts well to different learning rates for each parameter

Requires tuning hyperparameters, may converge slowly

Adam [35]

Combines momentum and RMSprop, with adaptive learning rates and momentum

Efficient, suitable for various data and models

Requires tuning of additional hyperparameters, may be unstable at times

Adadelta [36]

Further improvement over RMSprop, eliminates learning rate decay issues

Adaptive learning rate, reduces hyperparameter reliance

Sensitive to initialization, requires tuning of the initial learning rate