Fig. 2From: A bidirectional DNN partition mechanism for efficient pipeline parallel training in cloudAn example of three workers training under traditional MP. For simplicity, the time cost of BP is twice that of FP. Basically, this assumption is consistent with the characteristic of wall-clock time cost for model training in realistic scenarios. The BP process of a DNN layer consists of two steps: calculating the error/gradient of the current layer based on the error/gradient passed from the successive layer, and updating the model parameters of the current layer. On the contrast, the FP process of a DNN layer only calculates the output value for the next layer, which is generally have the same complexity as calculating the error/gradient. Therefore, the time consumption of BP is generally considered to be about twice that of FPBack to article page