Low-power multi-cloud deployment of large distributed service applications with response-time constraints

Singh, Babneet; Kaur, Ravneet; Woodside, Murray; Chinneck, John W.

doi:10.1186/s13677-022-00363-w

Research
Open access
Published: 03 January 2023

Low-power multi-cloud deployment of large distributed service applications with response-time constraints

Babneet Singh¹,
Ravneet Kaur²,
Murray Woodside³ &
…
John W. Chinneck³

Journal of Cloud Computing volume 12, Article number: 1 (2023) Cite this article

1907 Accesses
15 Citations
Metrics details

Abstract

Distributed service applications make heavy use of clouds and multi-clouds, and must (i) meet service quality goals (e.g. response time) while (ii) satisfying cloud resource constraints and (iii) conserving power. Deployment algorithms must (iv) provide a solution meeting these requirements within a short time to be useful in practice. Very few existing deployment methods address the first three requirements, and those that do take too long to find a deployment. The Low-Power Multi-Cloud Application Deployment (LPD) algorithm fills this gap with a low-complexity heuristic combination of generalized graph partitioning between clouds, bin-packing within each cloud and queueing approximations to control the response time. LPD has no known competitor that quickly finds a solution that satisfies response time bounds. A host execution time approximation for contention is fundamental to achieving sufficient solution speed. LPD is intended for use by cloud managers who must simultaneously manage hosts and application deployments and plan capacity to offer services such as Serverless Computing.

On 104 test scenarios deploying up to 200 processes with up to 240 replicas (for scaling), LPD always produced a feasible solution within 100 s (within 20 seconds in over three-quarters of cases). Compared to the Mixed Integer Program solution by CPLEX (which took a lot longer and was sometimes not found) LPD solutions gave power consumption equal to MIP in a third of cases and within 6% of MIP in 95% of cases. In 93% of all 104 cases the power consumption is within 20% of an (unachievable) lower bound.

LPD is intended as a stand-alone heuristic to meet solution time restrictions, but could easily be adapted for use as a repair mechanism in a Genetic Algorithm.

Introduction

Applications with many services are increasingly common (e.g., microservice systems) and use multiple clouds or data centres to obtain suitable resources, to access regional data sets, or for reliability. Deployment must respect cloud resources, power consumption, and user response time, and must provide a fast solution for re-computation when adapting to changes. Network delays between clouds are large enough that they must be considered explicitly in a useful deployment solution. It is clear from the literature that solution by Mixed Integer Programming (MIP) or by metaheuristics such as genetic algorithms take too long for larger problems, so heuristics must be used.

The service applications considered here have deployable units called “tasks” which may be virtual machines or containers. A user request results in tasks calling each other, possibly many times, in diverse patterns modeled as random interactions with average numbers of calls. They use the internet for calls between clouds, which introduces important delays which may add up to violate response time requirements.

The heuristic Low Power Multi-Cloud Application Deployment (LPD) algorithm takes an application in the form of a directed Application Graph with tasks as nodes and their interactions as edges (Fig. 1a), with a maximum anticipated load level, and finds a deployment of tasks to hosts in multiple clouds (Fig. 1b), including scaled-out replicas as necessary. Under autoscaling the deployment is then feasible (although perhaps not optimal) at lower load levels as well, using fewer replica tasks and hosts.

LPD has four goals. It should:

Goal 1. have low total power use by the hosts (ideally, minimum power),
Goal 2. satisfy arbitrary constraints on computing and memory capacity (heterogeneous clouds),
Goal 3. satisfy constraints on the response time, including accounting for intercloud network delay (here, the mean response time is constrained),
Goal 4. provide a solution quickly enough to be useful.

A suitable target maximum solution time depends on how often a deployment must be calculated. It could be a few seconds or minutes; for the examples considered here it is taken to be 1 minute on the grounds that cloud adaptation (such as deployment of a new virtual machine) occurs on a timescale of minutes.

Metaheurstics such as genetic algorithms, particle swarm optimization or stochastic annealing address goals 1–3 by combining them in utility functions that include weighted penalties for delay and other constraints. Soft constraints do not prevent constraint violation without additional, possibly complex effort, so they are excluded here, and this excludes most metaheuristics. Additionally, their solution takes too long, as discussed below, and this applies also to versions which “repair” the solution to enforce actual constraints.

Excluding soft constraints the above combination of four goals is not resolved by existing methods, which motivates this work. Table 1 lists the combinations of the first three goals and scalable approaches that satisfy them, apart from MIP and metaheuristics. Some combinations have elementary solutions, indicated with italics.

Table 1 Deployment problem cases and scalable solution approaches (excluding metaheuristics and MIP)

Full size table

The LPD approach adds power reduction to both graph partitioning (to allocate tasks to clouds) and bin-packing (to allocate tasks to hosts). Using an approximate bound in the response time constraint reduces the effort to compute it and brings the solution time within the desired range.

Related work

Optimal deployment is a classic problem. There have been recent surveys of optimal deployment in clouds by Zhang et al. [4] (emphasizing efficient algorithms), Helali et al. [5] (emphasizing adaptation by consolidation), Masdari and Zangakani [6] (emphasizing predictive adaptation), and Skaltsis et al. [7], oriented to deploying multi-agent systems. They list a variety of approaches, including

Mixed Integer Programming (MIP),
Genetic Algorithms (GA) and other metaheuristics such as ant colonies, particle swarms, and simulated annealing, that apply varieties of random search,
Application Graph Partitioning (AGP), which focuses on the interactions between clouds,
other approaches including bin-packing, hill-climbing and custom heuristics.

The use of MIP is evaluated in several papers. Malek et al. [8] also gives additional references to MIP. Li et al. [9] combine MIP with bin-packing, to optimize power. Ciavotta et al. [10] use MIP to partition a software architecture and deploy it. The latter study minimized cost while constraining response time (without network delays) as computed by a simple queueing model, and then used a more detailed layered queueing model and a customized local search. MIP can address Goals 1–3 of LPD, and is formulated for this purpose below. However, all these studies found that MIP scaled too poorly for practical use.

GA evolves deployments by modifying a set of initial candidates by random mutations and by combining existing candidates, to optimize a fitness function. The fitness weights together the objectives and constraints, making them soft constraints. GA has been applied for Goals 1–3 in [3, 8, 11]. The use of other metaheuristics is referenced in these papers and in the surveys. A strength of the fitness function approach is that it can be extended to include additional goals such as reliability, and to deal with multiple fitness functions. It can find sets of Pareto-optimal (i.e. non-dominated) solutions for multiple objectives, which goes beyond our goals. In Frincu et al. [12] the objectives are response time, resource usage, availability, and fault-tolerance; in Guerrero et al. [13] they include failure rates and total network delay as a proxy for response time. Frey et al. [14] optimize deployment over multiple clouds and multiple cloud vendors with three objectives of cost, mean response time, and SLA violation rate. Fitness is evaluated by a simulator, and constraints are enforced by rejecting infeasible candidates. Ye et al. [15] consider four objectives including energy minimization in the cloud and the network.

The inability of unmodified GA to directly enforce constraints leads us to exclude it as an approach. A recent method used in [15] enforces constraints by “repairing” infeasibilities for each fitness evaluation. The repair changes the candidate solution to make it feasible. Similar changes are part of the partitioning algorithm in LPD. While it is not considered here, the potential use of part of LPD for GA repair is discussed later.

It is also clear from the experience in the literature that GA is too slow for our fourth goal, for large deployments. Malek et al. [8] compare MIP, GA and a heuristic and find that both MIP and GA are orders of magnitude too slow. Guerout et al. [16] compare MIP and GA for single-cloud deployment to optimize a utility function that combines four QoS objectives including energy, using a stagewise approach to improve MIP scalability, and find that MIP and GA are both too slow for practical use. Their calculated response times ignore network delays; including them would undoubtedly make both algorithms slower. Alizadeh and Nishi [17] carefully compare MIP solution times by CPLEX [18], and by GA, on a very large MIP unrelated to deployment. Their GA strictly enforces constraints by constraint repair. They find that the solution time of GA is at best an order of magnitude less than for MIP. Since MIP is several orders of magnitude too slow for Goal 4, this emphasizes that GA is unsuitable on the grounds of solution time.

Most papers consider only a single cloud or data centre. Multi-cloud systems have more serious response time concerns because of the network delay for messages that cross between clouds. Application Graph Partitioning (AGP) addresses these delays as costs on the arcs of an application graph as in Fig. 1(a). AGP algorithms such as in Fiduccia-Mattheyses [19] efficiently reduce or minimize the total cost, and can also include host constraints and costs. AGP is used for multi-cloud deployments in [20], and in [3] to minimize bandwidth (rather than delay), in [1] to minimize the sum of network and energy cost, and in [21] (via a metaheuristic) to minimize power in mobile clouds. The approach called Virtual Network Embedding [22] is also an adaptation of AGP. In summary, AGP deals efficiently with network delays but cannot resolve all the remaining goals; it needs to be augmented.

Bin-packing deals directly with capacity constraints (Goal 2) and is combined with performance models in [23] (ignoring power). It is adapted by Arroba et al. in [2] to reduce power (Goal 1) by applying CPU speed and voltage scaling, but they do not address response time.

Custom heuristics are described in [8] and other works; in [8] they are the only option that solved in a useful time, but this work does not address response time.

The response time includes delay due to processor contention and saturation, which are estimated by performance models. A survey of models for clouds is given by Ardagna et al. [24]. Some recent work on deployment that reflects resource contention (always on the mean response time): Aldhalaan and Menasce [25] use a queueing model to place VMs in a cloud, finding the number of replicas by a hill-climbing technique. Ciavotta et al. [10] use a simple queueing approximation in a first stage, and a detailed layered queueing model in a second stage of optimization. Molka and Casale [26] use queueing models with GA, bin-packing and non-linear optimization, to allocate resources for a set of in-memory databases. Wada et al. [27] use queuing models to predict multiple service level metrics and a multi-objective genetic algorithm to seek pareto-optimal deployments. Calheiros et al. [28] and Shu et al. [11] use a queueing model to scale a single task to meet response time requirements, and [11] also models response failures. None of these studies consider power.

Deployment to minimize power has been part of many studies, e.g. energy is included in the utility function in [8]. Wang and Xia [29] apply MIP to VM placement in the cloud to minimize power subject to processing and memory constraints (but ignoring response time and network delays). Tunc et al. [30] use a real-time control rule based on performance monitoring and a Value of Service metric that enforces the response time constraint while controlling for energy consumption. This is close to our stated problem, but they do not include network delays in the response time. Chen et al. [31] use a random search heuristic to minimize the money cost of power and network operations (but not delay).

For deployment tools, Arcangeli et al. [32] describe many tools from the viewpoint of a system operator, including MIP, GA, AGP and others.

To summarize, a heuristic solution for the minimum-power application deployment problem subject to constraints on response time and host capacities is needed because the solution times to find provable optima via methods such as MIP are too long. Metaheuristics like GA also take too long and have difficulty enforcing actual constraints. LPD, developed in the theses of Kaur [33] for a single cloud and Singh [34] for multi-clouds, uses instead a novel combination of graph partitioning and bin-packing to meet all the goals stated for LPD.

Background on techniques used in LPD

LPD uses customized algorithms derived from the following two families.

K-way graph partitioning (KGP) [19] divides a graph into K parts, where each part is a subset of the graph node set, the parts are non-intersecting, and cover the entire node set. A common objective is to minimize the cut-weight, which is the sum of the edge weights that connect nodes in different parts. KGP is NP-hard [35] and so does not scale well, but approximate solutions are provided more quickly using multi-level KGP schemes proposed by Karypis et al. [36, 37]. In LPD, the edge weights are defined so the cut-weight is the network delay and multi-level KGP is used to reduce the cut-weight until the response time constraint is satisfied, but not necessarily minimized. Graph partitioning algorithms often enforce a balance constraint to distribute the node weights equally among the partitions, but this is not required in the deployment problem.

2-D bin packing described in [1] is an NP-hard algorithm to pack a set of 2-dimensional items into a minimum number of 2-dimensional bins. In LPD, heuristic 2-D bin packing is incorporated into the KGP move evaluation, to reduce power consumption. After partitioning, a more rigorous bin packing is applied to each partition to further reduce power consumption.

The model for multi-cloud application deployments

We consider K heterogeneous clouds, with heterogeneous hosts, and different network latencies between clouds. Communication delays within the clouds are ignored. The set of users is treated as an additional cloud, which has only the users; there is no other deployment to the user cloud. Figure 2 shows the model, and Table 2 defines its parameters. The processing capacity of cloud hosts is defined in ssj_ops according to the SPECpower benchmark for processors [38].

Table 2 Cloud and Host Property Definitions

Full size table

A deployment D of tasks to individual hosts in the clouds must provide a user throughput of λ responses/sec, and have a latency (response time) of no more than R ^req msec.

The application model and application graph

The analysis begins from an Application Graph G of vertices (denoted as v_i) representing deployable components, and edges denoted as (v_i, v_j), representing their interactions and labelled by the number of messages c_ij exchanged, per user response. Optionally, G may be derived from an Application Model as in Fig. 3(a), which shows the components as “tasks” and the inter-task calls that invoke operations, labelled by resource demands. The Application Model uses the notation of layered queueing models [39, 40] in which the components are shown as rectangles (e.g. t0) and their operations as attached rectangles (e.g. e2601). Each invocation of an operation has a mean CPU demand and makes calls as shown by arrows, labeled by the mean calls. Derivation of the graph parameters from a layered queueing model is described in [41].

Service operations may have highly variable resource demands, depending on the nature of the user request, and the data that it accesses. This is modeled by random values for CPU demands and calls, with average values as parameters.

Figure 3(b) gives an example of an Application Graph derived from the Application Model of Fig. 3(a). Alternatively, the Application Graph could be derived directly from measurements on the deployment units. Table 3 defines the properties of the Application Graph.

Table 3 Application Model and Application Graph Property Definitions

Full size table

The task processing demand ${v}_i^{ops}$ is measured in operations/s in units of ssj_ops as defined in the SPECPower benchmark [38], executed on a reference host h_ref at a nominal user throughput of λ responses/s. To obtain the demand on any other host type it is scaled by the host speed factor to give ${v}_i^{ops}/{h}_{kl}^{sf}$.

Power model

Power consumption ${h}_{kl}^{pow}$ for each host h_kl is related to the work it is doing, modeled by a linear least-squares fit to its SPECpower [38] benchmark results, stated as:

$${h}_{kl}^{pow}={h}_{kl}^{base}+{h}_{kl}^{ops}\ast {h}_{kl}^{pRate}$$

(1)

where ${h}_{kl}^{base}$ is the base turn-on power in W, ${h}_{kl}^{pRate}$ in W/ssj_ops is the slope of the fitted line, and ${h}_{kl}^{ops}$ is the load.

Power Usage Effectiveness, defined as the ratio of cloud power consumption to total host power consumption [42], is assumed for simplicity to be unity for all clouds, so cloud power equals total host power. If it is less (but still the same for all clouds), power can be scaled.

Bound on Total execution delay

The response time is the network delay plus the execution delay. The total execution delay is the time (per user response) spent executing and waiting to be scheduled on some processor. LPD applies a capacity constraint on each host that restricts its CPU utilization below a nominal value U^nom. Since execution delay increases with utilization, the delay for utilization = U^nom is a bound.

For a given task v_i on host h_kl the maximum delay is calculated by the well-known processor-sharing approximation to host queueing (see e.g. [43]) as:

$$time\left({v}_i,{h}_{kl}\right)=\left({v}_i^{ops}/{h}_{kl}^{sf}\right)/\left(1-{U}^{nom}\right)\ ms.$$

(2)

The total delay bound D^Tproc is the sum of time(v_i, h_kl) over all tasks, which simplifies to

$$D^{Tproc}={\textstyle\sum_{k,\;l}}\left(h_{kl}^{ops}/h_{kl}^{sf}\right)/\left(1-U^{nom}\right)$$

(3)

Adding this value to the total network delay gives an upper bound on response time, defined as D^RT below.

Deployment model

Using the concepts and notation defined above, a deployment is described as follows:

d_ikl = 1 if task v_i is deployed to host h_kl (the lth host in cloud C_k), or 0 otherwise.
D: a deployment, a setting of all variables d_ikl. D_q is the qth deployment in a set of deployments.
D^pow, D^Tnet, D^Tproc, D^RT: respectively the total power (W), total network delay, and bounds on the total execution delay and the response time of deployment D in ms.
y_ik(D) = 1 if task i is assigned to a host on cloud C_k, or 0 otherwise.
z_kl(D) = 1 if host h_kl is used, or 0 otherwise.

A feasible deployment must not exceed the processing capacity (in ssj_ops) and memory capacity (in MB) of any host and must satisfy the overall response time requirement R^req (in msec).

Mixed-integer quadratic programming optimization

The deployment optimization gives a mixed-integer quadratic programming (MIQP) model (quadratic because the network delay depends on the deployment of pairs of tasks). MIQP solutions found using CPLEX [18] were used to evaluate the accuracy of the LPD heuristic on small examples, though MIQP does not scale up well.

In the formulation, a large positive constant M is introduced for calculating z_kl, of value $M={\mathit{\max}}_{kl}\left({h}_{kl}^{opsMax}\right)+ offset$, where offset is a large value, e.g. 1000.

The resulting deployment is designated D_MIQP with optimal power ${D}_{MIQP}^{pow}$ and response time ${D}_{MIQP}^{RT}$.

MIQP Constraints:

Host processing capacity:

$${h}_{kl}^{ops}=\sum\nolimits_i\left[{v}_i^{ops}/{h}_{kl}^{sf}\times {d}_{ikl}\right]\le \left[{h}_{kl}^{ops Max}\times {U}_{nom}\right]\forall k,l$$

(4a)

Host memory capacity:

$$\sum\nolimits_i\left[{v}_i^{mem}\times {d}_{ikl}\right]\le {h}_{kl}^{mem Max}\forall k,l$$

(4b)

Each task is assigned exactly once:

$${\textstyle\sum_{k,\;l}}d_{ikl}=1\forall i$$

(4c)

Indicator variable to show that task v_i is assigned to cloud C_k:

$$y_{ik}={\textstyle\sum_{l\;}}d_{ikl}\forall i,k$$

(4d)

Network delay for deployment D_MIQP (quadratic in y):

$${D}_{MIQP}^{Tnet}=\sum_{m,n\ in\ clouds}{c}_{ij}\times {y}_{im}\times {y}_{jn}\times {\delta}_{mn}\forall \left({v}_i,{v}_j\right)\ \textrm{in}\ E\ \textrm{and}\ \left(m\ne n\right)$$

(4e)

Total execution time:

$$D_{MIQP}^{Tproc}={\textstyle\sum_{k,l}}\;\left(h_{kl}^{ops}/h_{kl}^{sf}\right)/\left(1-U^{nom}\right)$$

(4f)

Response time constraint:

$${D}_{MIQP}^{RT}={D}_{MIQP}^{Tproc}+{D}_{MIQP}^{Tnet}\le {R}^{req}$$

(4g)

Indicator variable z_kl to show whether host h_kl is used:

$${h}_{kl}^{ops}-M\times {z}_{kl}\le 0\forall k,l$$

(4h)

Power consumption of host h_kl:

$$h_{kl}^{pow}={\textstyle\sum_i}\;\left(h_{kl}^{base}\times z_{kl}\right)+h_{kl}^{ops}\times h_{kl}^{pRate}\forall k,l$$

(4i)

MIQP Objective function (minimize total power in W):

$$\operatorname{minimize}\;D_{MIQP}^{pow}={\textstyle\sum_{k,l}}\;h_{kl}^{pow}$$

(4j)

CPLEX solutions and the time it took to obtain them are described in Section 6.1, in comparison with LPD.

The low-power multi-cloud application deployment (LPD) algorithm

LPD first scales the application by creating replicas of each task to provide sufficient processing power. Then it deploys tasks to clouds by a graph-partitioning algorithm with a first phase that iteratively reduces D^RT until D^RT ≤ R^req, and a second phase that minimizes D^pow while maintaining D^RT ≤ R^req. This is carried out for a set of initial partitions, including random partitions. Finally, the host deployments d_ikl within each cloud are determined by a power-aware bin-packing stage which satisfies the capacity constraints for each host and reduces power further.

There are six stages of LPD as described in the Algorithm summary and illustrated in Fig. 4.

LPD algorithm summary

INPUTS: clouds, hosts, users, application, control parameters, network delays.
Stage 1: Select a subset of clouds C to host the application.
Stage 2: Generate the application graph G from the application model.
Stage 3: Scale the application to the workload, generate a graph for the scaled application (G^scaled), and coarsen G^scaled to get G^coarse.
Stage 4: Generate a set of initial deployments.
Stage 5: For each initial deployment ${D}_q^{init}$ do:
- 5a ${D}_q^{partition}=$ Partition (${D}_q^{init}$, G^coarse, C, δ, R^req) (for the Partition algorithm, see below),
  - If ${D}_q^{partition}$ is null (no feasible deployment found), then go to next initial deployment.
- 5b Uncoarsen ${D}_q^{partition}$.
- 5c For each cloud, perform bin packing on its deployed tasks by both HBF-Proc and HBF-Mem and record the deployments as ${D}_q^{HBF- Proc}$ and ${D}_q^{HBF- Mem}$.
Stage 6: Set D* to the deployment with the smallest D^pow out of all ${D}_q^{partition}$, ${D}_q^{HBF- Proc}$, and ${D}_q^{HBF- Mem}$ or to null if there is no feasible deployment.
OUTPUT: Deployment D*.

A major part of LPD is the Partition sub-algorithm defined below along with the details of the six stages.

LPD stages 1–4: preparation and initial deployment

The first stage selects the clouds for deployment, and stage 2 creates the application graph G. Stage 3 adds replicas to provide sufficient capacity for each task, even if deployed on the slowest host in the system (call its speed factor s^min). For task v_i this gives:

$$replicas={v}_i^{ops}/\left({h}_{ref}^{ops}\times {s}^{min}\right)$$

(5)

Each replica receives an equal share of the calls to v_i and duplicates the calls made by v_i.

Stage 3 coarsens the resulting graph to G^coarse by repeatedly merging the nodes (tasks) joined by the edge with the largest weight, subject to maximum nodal processing and memory requirements (which ensure that the merged node will fit on any host). It terminates when the largest edge weight is less than the original average edge weight.

Experience showed that the partitioning sometimes found local optima. To improve the solution, Stage 4 creates a set of initial deployments using three strategies:

1.
Random: N^random initial deployments.
2.
Greedy on Power/Delay: with two variants: repeatedly deploy the task with the largest CPU demand on the feasible cloud that has space and (is most power-efficient)/(adds the least delay) (2 deployments),
3.
Bin-packing: all combinations of two bin-packing strategies, packing on memory or power, and sorting bins in increasing or decreasing order (8 deployments).

LPD stage 5: partitioning

Partitioning among clouds uses the sub-algorithm described below with a simplified set of constraints. The response time constraint was simplified by computing D^Tproc just once based on the slowest host, and then constraining D^Tnet by

$${D}^{Tnet}\le {R}^{req}-{D}^{Tproc}$$

(6)

which is the cut weight of the partition. Simplified capacity constraints for processing and memory in each cloud are computed based on the most restricted hosts in the cloud. These simplifications are conservative, in that any feasible solution satisfies the original constraints.

Partitioning uses the K-way variant of the Fiduccia-Mattheyses (FM) graph partitioning refinement heuristic [19, 35]. It recursively identifies clusters to move from one cloud to another, based on the improvement in a fitness function calculated for each potential move. In phase 1 it obtains feasibility by making moves that reduce D^Tnet until Eq. (6) is satisfied, and then in phase 2 it minimizes power while maintaining feasibility. The Partition sub-algorithm uses this additional notation:

move(v_i, h_{to, l}) moves task v_i to host l on cloud C_to,

∆_RT(move) = amount of response time reduction,
∆_pow(move) = amount of power reduction,

and partitions G^coarse among clouds to minimize power subject to response time and capacity constraints.

Partition sub-algorithm

INPUTS: D_init, G, C, δ, R^req

D←D_init. Calculate D^RT and D^pow and sort the tasks in G in decreasing order of ${v}_i^{ops}$.

Repeat:

Initialize best power reduction move: move_pow ← ϕ, ${\Delta }_{pow}^{max}\leftarrow 0$.
Initialize best latency reduction move: move_RT ← ϕ, ${\Delta }_{RT}^{max}\leftarrow 0.$
For each task v_i in the sorted task list:
- C_from = cloud containing v_i in deployment D.
- For each cloud C_to other than C₀ or C_from:
  - Find the most power efficient available host h_{to, l} in C_to that can accept v_i.
  - Calculate the power and latency reductions ∆_pow and ∆_RT for moving v_i to h_{to, l}.
  - If ∆_pow > ${\Delta }_{pow}^{max}$, then update ${\Delta }_{pow}^{max}$ to ∆_pow and move_pow ← move(v_i, h_{to, l}).
  - If ∆_RT > ${\Delta }_{RT}^{max}$, then update ${\Delta }_R^{max}$ to ∆_RT and set move_RT ← move(v_i, h_{to, l}).

If move_pow ≠ ϕ and does not violate R^req, then accept move_pow.
Else if R^req not met and move_RT ≠ ϕ, then accept move_RT.
If a move has been accepted, make it and update D, D^RT,and D^pow.
Else (no move is found): exit.

OUTPUT: D.

After partitioning among clouds, Stage 5 uncoarsens G and restores the original constraints. It applies a final 2-D bin-packing step within each cloud (tasks deployed on hosts) to reduce power consumption and satisfy the constraints of individual hosts. It uses two variants of Hybrid-Best-Fit (HBF) two-dimensional bin-packing [44] with dimensions of processing (${v}_i^{ops}\Big)$ and memory (${v}_i^{mem}$). HBF-Proc sorts the tasks in descending order by ${v}_i^{ops}$ and HBF-Mem sorts them by ${v}_i^{mem}$.

Stage 6: selecting the final deployment

Among all the deployments found from all the initial deployments, stage 6 returns the one with the lowest power consumption D^pow.

LPD design choices

Preliminary experiments described in [34] refined the choice of algorithm steps and parameters, showing that:

Most solutions arise from the random initial deployments. N^random = 60 assured a solution.
Coarsening using the stated termination condition reduced runtime and gave better quality solutions for both power and delay.
The final bin-packing using HBF improved the solution in 15% of cases, mostly due to HBF-Mem. Other approaches to bin-packing did not improve on this.

Time complexity of LPD

There are n tasks and ∣E∣ edges in G^scaled, K clouds each with O(h) hosts, and d initial deployments. The important operations have these time complexities (see [37] for related complexity analyses of the partitioning):

(i)
sorting hosts in each cloud by power efficiency is O(Kh log(h))
(ii)
application scaling is O(∣V∣) (tasks before scaling).
(iii)
coarsening and uncoarsening are both O(d ∣E∣),
(iv)
partitioning is O(dK ∣E∣) for the K-way Fiduccia Mattheyses algorithm,
(v)
bin packing for K clouds is O(dKn log(n)),
(vi)
selecting the best deployment is O(d).

Assuming that the number of interactions of each task is bounded, then ∣E∣ is O(n), operation (v) is dominant and the time complexity is O(dKn log(n)).

LPD for genetic algorithm solution repair

Stage 4 of the LPD generates numerous infeasible initial deployments. The partitioning in stage 5a first works to render the initial solution feasible, and then to improve the objective function value. It is simple to terminate the partitioning at the first feasible solution. This truncated stage 5a partitioning step can be used as a quick repair mechanism for infeasible intermediate solutions generated by a Genetic Algorithm. This possibility will be investigated in future research.

Evaluation experiments: setup

The quality of LPD deployments was compared to (i) MIQP solutions when they could be obtained, and (ii) a loose power bound. Response time feasibility was verified against the results from the LQNS solver, which gives a more refined calculation of delay.

Comparisons of measure m found by two methods A and B were made using the relative error measure ∆_A : B defined as:

$${\Delta }^m_{A:B}=\left(\frac{A-B}{B}\right)\times 100\%$$

(7)

Application test cases

Random test models were generated, structured to provide many variations of the layered architecture typical of cloud service applications such as e-commerce and social media, and of microservices generally. Application models consisted of 4, 8, 18, 24, or 30 tasks and had a wide range of workload characteristics. One hundred four test models were generated by the random model generator lqngen [45]. Model properties ranged widely, to represent a wide range of applications and of deployment difficulty, giving these properties:

Number of application components, including scaled-out replicas: 20–240,
Total processing requirement: 5–84% of the available total multi-cloud capacity,
Total memory requirement: 5–87% of the total multi-cloud capacity,
Total number of calls per user request: 3–11,050.

Six comparisons were made:

1.
Power: LPD vs MIQP (Section 6.1), measured by ${\Delta }_{LPD: MIQP}^{pow}$.
2.
Power: LPD vs the bound, measured by ${\Delta }_{LPD: bnd}^{pow}$.
3.
Response time: LPD vs the requirement, measured by ${\Delta }_{LPD: req}^{RT}$. Should be zero or negative.
4.
Response time: LPD vs more accurate LQNS calculation, measured by ${\Delta }_{LPD: LQNS}^{RT}$.
5.
Response time: LQNS on LPD deployment vs the requirement, measured by ${\Delta }_{LQNS: req}^{RT}$.
6.
LPD runtime, denoted by T_LPD.

Cloud environment

The test cases considered four clouds as shown in Fig. 2, with parameters in Table 4 below. Table 5 shows the network delays, and Table 6 describes the hosts. The experiments were run with U^nom = 0.8, which is a typical target value for maximum utilization, and with 60 initial deployments.

Table 4 Cloud parameters (Definitions in Tables 2 and 3)

Full size table

Table 5 Network delays δ_mn (msec)

Full size table

Table 6 Host parameters, in decreasing order of speed factor (Definitions in Tables 2 and 3)

Full size table

Software and hardware

The algorithm was implemented in Java, using the Java Universal Graph Framework (JUNG) v2.0.1 and Guava v19.0, with Google Core Libraries for Java for the Multimap data structure that represents the deployment. LPD was run on an Intel i5 3570 PC with 3.4 GHz processor, 16 GB memory and with Ubuntu v16.04 Linux, while the MIQP solutions were run on a faster Intel Core i9-9980HK running at 2.40GHz.

Experimental results

The experiments evaluate the effectiveness of LPD and explore its properties on applications of different sizes and requirements. The principal concerns are (1) closeness of the power to the bound on the optimal value, (2) meeting the response time constraint, and (3) algorithm solution time versus the target of 1 minute for practical use in deploying models in this size range. Also examined were (4) the effectiveness of the strategy used by LPD versus two simple alternatives, and (5) the impact of tighter response time constraints on solution effort and quality.

Power consumption: LPD vs. MIQP

MIQP solutions by the CPLEX optimizer [18] give the minimum power, which was compared with LPD. Forty four cases were solved by CPLEX within a time limit of 300 seconds per solution. Fig. 5 shows a histogram of ${\Delta }_{LPD: MIQP}^{pow}$. LPD power is equal to MIQP power (within a very small tolerance) in 15 of the 44 cases (34%) and is within 6% in 42 of 44 cases (95%). The largest difference is 10.8%. The overall success rate of LPD in finding near-optimum low-power solutions for these 44 cases is excellent. For the remaining 60 larger cases a comparison was made to a bound.

The 300-second MIQP limit is well beyond the assumed useful time limit of 60 s and the limited solvable cases demonstrate the poor scalability of the MIQP approach.

Power consumption: LPD vs. a lower bound

A simple lower bound P^bound on total power consumption is calculated by assigning fractions of the pooled total workload to the most power-efficient processors across all clouds and taking the sum of their power consumptions. The construction of P^bound ignores the granularity of processing by the tasks, and the response time constraints, which will make it optimistic and unattainable, but it provides a comparison for the cases in which the MIQP solver exceeds its time limit.

The histogram of the difference ${\Delta }_{LPD: bnd}^{pow}$ in Fig. 6 shows that the power found by LPD is within 10% of P^bound for 22 of the 104 cases and within 10–20% in a further 75 cases. Of the 7 cases with a larger difference, 3 had MIQP solutions with identical power to LPD, leaving just four cases with (possibly) unsatisfactory power solutions.

For the 60 cases where the MIQP timed out, the LPD power averages 14.3% greater than P^bound, with 56 of the 60 cases (93%) closer than 20%.

Verification of the response time constraint

In the 104 LPD solutions, the simplified response time approximation ${D}_{LPD}^{RT}$ always satisfies the response time limit R^req. If there are many calls between processes then network delay reduction may take precedence over power reduction and the response time may be close to the constraint. If the response time constraint is too tight a feasible solution may not be found (or even be possible). Figure 7 shows a histogram of the relative amount by which LPD met the response constraint; the solution was always feasible and in some cases the structure of the application messaging, combined with power optimization, gave response times much smaller than the constraint.

A more accurate estimate of the response time of the LPD deployment, denoted ${D}_{LQNS}^{RT}$, can be found using the Layered Queueing Network Solver LQNS [39]. In the LQNS solution the host utilizations depend on the deployment (rather than assuming the value U^nom) and the solver exploits more sophisticated queueing approximations. Because the actual utilizations can never be higher, the LQNS values are always feasible for an LPD solution. This was verified in the 84 cases for which LQNS could find a solution; in the remaining cases the model was too large for the LQNS solver. The histogram in Fig. 8 shows the relative margin by which RT found by LQNS was smaller than by LPD.

Variants on LPD

Two variants on LPD were created to consider alternative strategies:

Variant1: Stop at the first solution that satisfies the response time constraint. This pays most attention to reducing the latency until the response time constraint is met.
Variant2: Continue from the Variant1 solution to a final solution that minimizes response time. This gives absolute priority to a short response time.

The two variants were applied to 16 test cases. All the solutions met the response time constraint, with Variant2 providing the lowest values. However LPD gave the lowest power consumption in every case, exceeding P^bound by just 6.5% on average (vs 18.4% for Variant1 and 21.1% for Variant2), confirming that it provides a suitable balance between power and response time.

Algorithm runtime vs constraint tightness

Algorithm runtime is influenced by the difficulty of the deployment due to the tightness of the constraints. To examine this relationship, the response-time constraint of one model was gradually tightened. The response time includes an execution delay of ${R}_D^{pt}=608\ msec$. Table 7 shows the impact of the reduced values of R^req on the LPD runtime T_LPD with 52 starting points. At first almost all trials find feasible solutions with increasing T_LPD, then T_LPD levels off and there are increasing failures. At some point the capacity and runtime constraints together make the problem infeasible, shown by the final row.

Table 7 LPD Runtime Dependence on the Response Time Requirement

Full size table

Scalability of CPLEX MIQP vs LPD

Besides constraint tightness, the runtime is affected by the size of the task graph, the number of clouds and hosts, the capacities of the hosts, and network latencies. For a simple evaluation of algorithm scalability, runtime was related to problem “size” expressed as the number of tasks in the graph. Figure 9 summarizes runtime vs. problem size for LPD and the CPLEX MIQP solution on the 104 test cases. For technical reasons, CPLEX was run on a faster machine with an Intel Core i9-9980HK running at 2.40GHz (average CPU mark 15,089), or roughly 3 times faster. Thus the runtime difference between CPLEX and LPD is three times as great as the figure indicates.

CPLEX converges to a solution in less than 300 seconds in just 44 cases. For half of these it has a runtime of less than 10 seconds and is faster than LPD; for the other 22 cases LPD is faster than CPLEX. CPLEX is able to provide a first feasible solution (not optimal) within 60 sec of scaled time (time multiplied by three) for 85 of 104 models (81.7%), but with more power consumption than LPD in 64 (75%) of them.

LPD always runs the complete set of 60 initial deployments, taking as long as needed. If that takes less than 10 seconds, then it runs additional random initial deployments until 10 seconds have elapsed. The runtime breaks out from the minimum of 10 seconds at about 120 tasks. LPD succeeds in all cases, with a largest runtime of 100 seconds, and a runtime greater than 60 seconds in just 6 cases. The time of the first feasible solution was not recorded, but in all 6 cases, multiple feasible solutions were reached by 100 seconds, so at least one feasible solution was likely available at the 60 second mark and could be used for deployment.

Overall, Fig. 9 shows that LPD scales well with problem size as compared to the MIQP.

Conclusions

The deployments found for the 104 test cases with four clouds and up to 240 deployable units show that:

1.
LPD finds low power deployments: it matches the minimum found by MIQP for 34% of the comparable cases (where the MIQP solution succeeded within 5 minutes) and is within 5% in another 55% of the cases. Compared to a lower bound which is too low to ever be feasible, it is within 20% in 93% of all the cases.
2.
Solution times are practical: all are under 100 sec, and less than 20 sec in 79% of the cases. Only 6 cases exceeded 1 minute of solution time. This meets the goal to be fast enough for practical use.
3.
The algorithm scales well: solution times increase roughly linearly with problem size (in terms of deployable units, including replicas introduced by scaling out to handle high system loads), as shown in Fig. 9.

It was also found that (1) coarsening/uncoarsening of the application graph considerably improves the quality of the LPD solutions and reduces the time to obtain them; (2) random initial deployments provide almost all the final solutions; (3) the execution-time bound of Section 2.4 is essential to obtaining small solution times. Since the bound overestimates the response time, it may eliminate some feasible deployments.

LPD finds a deployment for the maximum expected level of the workload. When the load is lighter, simple autoscaling (which adapts the number of replicas of a task for varying loads) will always be able to meet the SLA requirement, although power optimality is not addressed.

Overall, LPD finds a balance between problem detail, solution accuracy, and run-time.

Threats, issues and extensions

Some aspects of LPD and its evaluation may be, or appear to be, limitations to its usefulness.

An evaluation on an industrial case study might be preferred for evaluation, over the set of randomly created examples used here. However, actual service systems take on a huge variety of sizes, structures and patterns of interaction. The difficulty of obtaining data on real cases would limit their number severely, and a large number of randomly created models spans the range of possibilities better, and gives more confidence that LPD can handle anything that is thrown at it. An even larger number of cases would be even better.

An LQN Application Model is assumed here and could be a burden to create. However it is not actually required, since the input to LPD is the Application Graph, which could be constructed from data obtained by monitoring (that is, from the mean total execution time for each deployable unit, and the mean number of interactions between pairs of units, per User response).

The Users are modeled as a single group. However to model different kinds of users, this group can make a variety of request types to the system.

The response time constraint applies to predicted mean delays, while SLAs often constrain percentiles of delay. To accommodate a target percentile, the measured shape of the response distribution can provide the ratio of the percentile value to the mean, and the requirement can be translated into a requirement on the mean.

The Power Usage Effectiveness of the clouds could easily be included in the power calculation, to give total power rather than just power used by the application.

Cloud internal network delays are ignored, and may be significant. They can be included in two ways: either as a fixed mean allowance for every interaction, which introduces a constant additional delay for the application, or by modeling the internal cloud network and performing the Partition algorithm on it for the final deployment, with bin-packing only on individual hosts. This would give one additional partitioning per cloud.

Storage operations are not mentioned, however they can be modeled as services and deployed by LPD.

The choice of 1 minute as a maximum execution time is arbitrary and could be larger (say, 10 minutes) without changing the determination of the non-scalability of MIQP or meta-heuristics. One minute is an attractive target time for practical use, given that scaling-out operations may take only a few minutes.

Extensions

Extensions that could improve LPD are (1) improved heuristics for the initial deployments, and (2) concurrent execution of Stage 5 of LPD. Also the LPD Partition algorithm could be exploited for feasibility repair in a Genetic Algorithm. The Partition algorithm determines a sequence of moves that respect capacity constraints and decrease response time and power, with initial priority to response time. It would be terminated at the first solution that is feasible for response time.

Limitations of the approach that would require substantial extensions are: (1) the Users all have the same network delays to access any cloud, and the same response requirement; (2) execution of each operation is sequential within a single response (parallelism is ignored); (3) limited concurrency within a task is not modeled; (4) host and network failures are ignored.

Availability of data and materials

The models, code and analysis are found in https://github.com/JChinneck/LPD.

References

Christensen HI, Khan A, Pokutta S, Tetali P (2017) Approximation and online algorithms for multidimensional bin packing: a survey. Comp Sci Rev 24:63–79
Article MathSciNet MATH Google Scholar
Arroba P, Moya J, Ayala J, Buyya R (2017) Dynamic voltage and frequency scaling-aware dynamic consolidation of virtual machines for energy efficient cloud data centers. Concur Comput Pract Exper 29 (10), https://doi.org/10.1002/cpe.4067
Verba N (2019) Application deployment framework for large-scale fog computing environments, PhD Thesis. Coventry University, Coventry, England
Google Scholar
Zhang J, Huang H, Wang X (2016) Resource provision algorithms in cloud computing: a survey. J Netw Comput Appl 64:23–42
Article Google Scholar
Helali L, Omri MN (2021) A survey of data center consolidation in cloud computing systems. Comp Sci Rev 39, 100366
Masdari M, Zangakani M (2020) Green cloud computing using proactive virtual machine placement: challenges and issues. Springer J. Grid Computing
Google Scholar
Skaltsis GM, Shin H-S, Tsourdos A (2021) A survey of task allocation techniques in MAS, pp 488–497 Int. Conf. on Unmanned Aircraft Systems (ICUAS)
Google Scholar
Malek S, Medvidovic N, Mikic-Rakic M (2012) An extensible framework for improving a distributed software System's deployment architecture. IEEE Trans Softw Eng 38(1):73–100
Article Google Scholar
Li J, Woodside M, Chinneck J, Litiou M (2017) Adaptive cloud deployment using persistence strategies and application awareness. IEEE Trans Cloud Comput 5(2):277–290
Article Google Scholar
Ciavotta M, Ardagna D, Gibilisco GP (2017) A mixed integer linear programming optimization approach for multi-cloud capacity allocation. J Syst Softw 123:64–78
Article Google Scholar
Shu W, Cai K, Xiong NN (2021) Research on strong agile response task scheduling optimization enhancement with optimal resource usage in green cloud computing. Elsevier Future Generation Computer Systems
Book Google Scholar
Frincu ME, Craciun C (2011) Multi-objective meta-heuristics for scheduling applications with high availability requirements and cost constraints in multi-cloud environments, pp 267–274 4th IEEE Int Conf on Utility and Cloud Computing
Google Scholar
Guerrero C, Lera I, Juiz C (2018) Genetic algorithm for multi-objective optimization of container allocation in cloud architecture. J Grid Computing 16:113–135
Article Google Scholar
Frey S, Fittkau F, Hasselbring W (2013) Search based genetic optimization for deployment and reconfiguration of software in the cloud, pp 512–521 Proc Int. Conf. on Software Engineering (ICSE '13)
Google Scholar
Ye X, Yin Y, Lan L (2017) Energy-efficient many-objective virtual machine placement optimization in a cloud computing environment. IEEE Access 5:16006–16020
Article Google Scholar
Geurout T, Gaoua Y, Artigues C, Da Costa G (2017) Mixed integer linear programming for quality of service optimization in clouds. Futur Gener Comput Syst 71:1–17
Article Google Scholar
Alizadeh R, Nishi T (2019) A genetic algorithm for multi-period location problem with modular emergency facilities and backup services. Trans Inst Syst Contr Inform Eng 32(10):370–377
Google Scholar
IBM, “IBM CPLEX Optimizer”, https://www.ibm.com/analytics/cplex-optimizer, (Accessed 11 May 2021)
Fiduccia CM, Mattheyses RM (1982) A linear-time heuristic for improving network partitions, pp 175–181 19th Design Automation Conference
Google Scholar
Alicherry M, Lakshman T (2012) Network aware resource allocation in distributed clouds, pp 963–971 Proc. IEEE INFOCOM
Google Scholar
Sachdeva S, Kaur K (2015) ACO based graph partitioning algorithm for optimistic deployment of software in MCC. Proc. Int. Conf. Innovations in information, pp 1–5 Embedded and Communication Systems (ICIIECS 2015)
Google Scholar
Leivadeas A, Papagianni C, Papavassiliou S (2013) Efficient resource mapping framework over networked clouds via iterated local search-based request partitioning. IEEE Trans Parallel Distrib Syst: vol. 24, no. 6, pp. 1077–1086.
Machida F, Kawato M, Maeno Y (2010) Redundant virtual machine placement for fault-tolerant consolidated server clusters, pp 32–39 Network Operations and Management Symposium (NOMS 2010)
Google Scholar
Ardagna D, Barbierato E, Evangelinou A, Gianniti E, Gribaudo M, Pinto TB, Guimarães A, Couto da Silva AP, Almeida JM (2018) Performance prediction of cloud-based big data applications, pp 192–199 Proc ACM/SPEC Int Conf on Performance Engineering
Google Scholar
Aldhalaan A, Menascé DA (2015) Near-optimal allocation of VMs from IaaS providers by SaaS providers. Proc. Int. Conf. on Cloud and Autonomic Computing
Book Google Scholar
Molka K, Casale G (2017) Energy-efficient resource allocation and provisioning for in-memory database clusters, IFIP/IEEE Symp on integrated network and service management (IM), pp 19–27
Google Scholar
Wada H, Suzuki J, Oba K, Theoretic Q (2009) Evolutionary deployment optimization with probabilistic SLAs for service oriented clouds, pp 661–669 Proc 2009 Congress on Services
Google Scholar
Calheiros RN, Ranjan R, Buyya R (2011) Virtual machine provisioning based on analytical performance and QoS in cloud computing environments, pp 295–304 Int. Conf. on Parallel processing (ICPP 2011)
Google Scholar
Wang Y, Xia Y (2016) Energy optimal VM placement in the cloud, pp 84–91 IEEE 9th Int Conf on Cloud Computing
Google Scholar
Tunc C, Kumbhare N, Akoglu A, Hariri S, Machovec D, Siegel HJ (2016) Value of service based task scheduling for cloud computing systems, pp 1–11 Int Conf on Cloud and Autonomic Computing (ICCAC)
Google Scholar
Chen K-Y, Xu Y, Xi K, Chao HJ (2013) Intelligent virtual machine placement for cost efficiency in geo-distributed cloud systems. IEEE Int Conf Commun:3498–3503
Arcangeli J-P, Boujbel R, S. (2015) Leriche automatic deployment of distributed software systems. J Syst Softw 103(C):198–218
Article Google Scholar
Kaur R (2015) Lightweight robust optimizer for distributed application deployment in multi-clouds, MASc thesis. Carleton Unversity, Ottawa, Canada
Google Scholar
Singh B (2019) Multi-cloud application deployment, MASc thesis. Carleton University, Ottawa, Canada
Google Scholar
Menegola B (2012) A study of the k-way graph partitioning problem. Federal University of Rio Grande do Sul, Rio Grande do Sul
Google Scholar
Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392
Article MathSciNet MATH Google Scholar
Karypis G, Kumar V (1998) Multilevel k-way partitioning scheme for irregular graphs. J Parallel Distributed Comput 48(1):96–129
Article MATH Google Scholar
Standard Performance Evaluation Corporation, SPECpower_ssj2008 Result File Fields, https://www.spec.org/power/docs/SPECpower_ssj2008-Result_File_Fields.html. (Accessed 6 Dec 2018)
Franks G, Al-Omari T, Woodside M (2009) Enhanced modeling and solution of layered Queueing networks. IEEE Trans Software Eng 35(2):148–161
Article Google Scholar
Woodside M, Tutorial Introduction to Layered Modeling of Software Performance, http://www.sce.carleton.ca/rads/lqns/lqn-documentation/tutorialh.pdf (Accessed Sept 2021)
Islam F, Petriu D, Woodside M (2015) Simplifying layered queuing network models. In: Beltrán M, Knottenbelt W, Bradley J (eds) Computer performance engineering (EPEW 2015), vol 9272. Springer LNCS
Google Scholar
Yuventi J, Mehdizadeh R (2013) A critical analysis of power usage effectiveness and its use in communicating data center energy consumption. Energy Buildings 64:90–94
Article Google Scholar
Bondi AB (2014) Foundations of software and system performance engineering. Addison-Wesley
Google Scholar
Han G, Que W, Jia G, Zhang W (2018) Resource-utilization-aware energy efficient server consolidation algorithm for green computing in IIOT. J Netw Comput Appl 103:205–214
Article Google Scholar
Franks G, lqngen − generate layered queueing network models. https://github.com/layeredqueuing/V5 (Accessed 10 Feb 2022)
Verbelen T, Stevens T, De Turck F, Dhoedt B (2013) Graph partitioning algorithms for optimizing software deployment in mobile cloud computing. Futur Gener Comput Syst 29(2):451–459
Article Google Scholar

Download references

Acknowledgments

This research was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC). Grant numbers (RGPIN): 06274-2016 and 2014-04056.

Funding

Natural Sciences and Engineering Research Council of Canada (NSERC) grant numbers (RGPIN): 06274–2016 and 2014–04056.

Author information

Authors and Affiliations

IBM, Ottawa, Canada
Babneet Singh
Phreesia, Inc., Brampton, Canada
Ravneet Kaur
Department of Systems and Computer Engineering, Carleton University, Ottawa, K1S 5B6, Canada
Murray Woodside & John W. Chinneck

Authors

Babneet Singh
View author publications
You can also search for this author in PubMed Google Scholar
Ravneet Kaur
View author publications
You can also search for this author in PubMed Google Scholar
Murray Woodside
View author publications
You can also search for this author in PubMed Google Scholar
John W. Chinneck
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Babneet Singh: LPD design and evaluation experiments. Ravneet Kaur: design of an earlier version of LPD called Hasrut. Murray Woodside: co-supervision and discussion. John Chinneck: co-supervision and discussion. All authors: writing and reviewing the manuscript. The authors read and approved the final manuscript.

Authors’ information

Babneet Singh received his MASc in Computer Engineering from Carleton University in 2019. Currently, he is employed by IBM and works on Java runtime development and cloud optimizations. He has contributed to open-source projects such as Eclipse OpenJ9 (Java Virtual Machine) and Eclipse OMR (language runtime framework).

Ravneet Kaur received her M.A.Sc. from Carleton University in 2016. Currently, she is employed by Phreesia Inc. and leads the development teams working on API based integration of electronic medical records(EMR) systems and Phreesia’s SAAS based in-house patient intake management system.

Murray Woodside received his PhD in Control Systems from Cambridge University in 1964 and has done research on control systems, system optimization, queueing theory and performance modeling at the National Research Council of Canada and at Carleton University. He is currently active in retirement at Carleton with research interests in cloud system performance model extraction, estimation, and optimization.

John Chinneck received his PhD in Systems Design from the University of Waterloo in 1983. His research focus is optimization, including improved algorithms and analysis of issues such as infeasibility. His algorithms are in use in most commercial and open-source optimization solvers. He continues to research the application of optimization techniques in a variety of areas including data classification, data compression, task assignment in cloud computing, etc.

Corresponding author

Correspondence to Murray Woodside.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Singh, B., Kaur, R., Woodside, M. et al. Low-power multi-cloud deployment of large distributed service applications with response-time constraints. J Cloud Comp 12, 1 (2023). https://doi.org/10.1186/s13677-022-00363-w

Download citation

Received: 18 February 2022
Accepted: 01 October 2022
Published: 03 January 2023
DOI: https://doi.org/10.1186/s13677-022-00363-w

Low-power multi-cloud deployment of large distributed service applications with response-time constraints

Abstract

Introduction

Related work

Background on techniques used in LPD

The model for multi-cloud application deployments

The application model and application graph

Power model

Bound on Total execution delay

Deployment model

Mixed-integer quadratic programming optimization

The low-power multi-cloud application deployment (LPD) algorithm

LPD algorithm summary

LPD stages 1–4: preparation and initial deployment

LPD stage 5: partitioning

Partition sub-algorithm

Stage 6: selecting the final deployment

LPD design choices

Time complexity of LPD

LPD for genetic algorithm solution repair

Evaluation experiments: setup

Application test cases

Cloud environment

Software and hardware

Experimental results

Power consumption: LPD vs. MIQP

Power consumption: LPD vs. a lower bound

Verification of the response time constraint

Variants on LPD

Algorithm runtime vs constraint tightness

Scalability of CPLEX MIQP vs LPD

Conclusions

Threats, issues and extensions

Extensions

Availability of data and materials

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Contributions

Authors’ information

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords