Fast DRL-based scheduler configuration tuning for reducing tail latency in edge-cloud jobs

Journal of Cloud Computing

Advances, Systems and Applications

Table 1 Description of important notations in problem formulation and proposed approach

Notations	Introduction
\(\mathcal {S}\)	the state space
\(\mathcal {A}\)	the action space
\(R(s_t,a_t)\)	the reward function
\(P(s_{t+1}\|s_t,a_t)\)	transition dynamics (reflects the time-variant dynamics of cluster, \(0 \le P(s_{t+1}\|s_t,a_t) \le 1\))
\(s_t\)	the node and task state information during a scheduling interval
\(v^w\)	a waiting task
\(v^r\)	a running task
\(a_t\)	an action that is one possible configuration combination of cluster schedulers
\(V^{allocate}\)	the tasks that obtain resource allocations
\(V^{complete}\)	the completed tasks
\(V^{arrive}\)	the newly arrival tasks
JTL	denoted the job tail latency as JTL
J	the set of jobs completed within period \((t-1, t]\)
TTL	denoted the tail latency of a task as TTL
\(V^{run}\)	the set of tasks running within period \((t-1, t]\)
\(V^{wait}\)	the set of tasks waiting within period \((t-1, t]\)
\(r^{job}\)	the reward of job
\(r^{run}\)	the reward of the set \(V^{run}\)
\(r^{wait}\)	the reward of the set \(V^{wait}\)
\(r_t\)	the reward of time-step t
\(\alpha _1,\alpha _2,\alpha _3\)	the negative values
\(\beta _1,\beta _2,\beta _3\)	the positive values
B(Actor)	the maximal size of local buffer in Actor
\(T_s(Actor)\)	the number of sampling steps in Actor
N(Learner)	the experience number to start training in Learner
L(Learner)	the maximal size of local buffer in Learner
\(T_s(Learner)\)	the maximum number of training in Learner
\(t^s\)	the simulation time
\(\triangle t\)	the duration of one iteration in simulation
\|N\|	the number of cluster nodes