Optimal deployment is a classic problem. There have been recent surveys of optimal deployment in clouds by Zhang et al.  (emphasizing efficient algorithms), Helali et al.  (emphasizing adaptation by consolidation), Masdari and Zangakani  (emphasizing predictive adaptation), and Skaltsis et al. , oriented to deploying multi-agent systems. They list a variety of approaches, including
Mixed Integer Programming (MIP),
Genetic Algorithms (GA) and other metaheuristics such as ant colonies, particle swarms, and simulated annealing, that apply varieties of random search,
Application Graph Partitioning (AGP), which focuses on the interactions between clouds,
other approaches including bin-packing, hill-climbing and custom heuristics.
The use of MIP is evaluated in several papers. Malek et al.  also gives additional references to MIP. Li et al.  combine MIP with bin-packing, to optimize power. Ciavotta et al.  use MIP to partition a software architecture and deploy it. The latter study minimized cost while constraining response time (without network delays) as computed by a simple queueing model, and then used a more detailed layered queueing model and a customized local search. MIP can address Goals 1–3 of LPD, and is formulated for this purpose below. However, all these studies found that MIP scaled too poorly for practical use.
GA evolves deployments by modifying a set of initial candidates by random mutations and by combining existing candidates, to optimize a fitness function. The fitness weights together the objectives and constraints, making them soft constraints. GA has been applied for Goals 1–3 in [3, 8, 11]. The use of other metaheuristics is referenced in these papers and in the surveys. A strength of the fitness function approach is that it can be extended to include additional goals such as reliability, and to deal with multiple fitness functions. It can find sets of Pareto-optimal (i.e. non-dominated) solutions for multiple objectives, which goes beyond our goals. In Frincu et al.  the objectives are response time, resource usage, availability, and fault-tolerance; in Guerrero et al.  they include failure rates and total network delay as a proxy for response time. Frey et al.  optimize deployment over multiple clouds and multiple cloud vendors with three objectives of cost, mean response time, and SLA violation rate. Fitness is evaluated by a simulator, and constraints are enforced by rejecting infeasible candidates. Ye et al.  consider four objectives including energy minimization in the cloud and the network.
The inability of unmodified GA to directly enforce constraints leads us to exclude it as an approach. A recent method used in  enforces constraints by “repairing” infeasibilities for each fitness evaluation. The repair changes the candidate solution to make it feasible. Similar changes are part of the partitioning algorithm in LPD. While it is not considered here, the potential use of part of LPD for GA repair is discussed later.
It is also clear from the experience in the literature that GA is too slow for our fourth goal, for large deployments. Malek et al.  compare MIP, GA and a heuristic and find that both MIP and GA are orders of magnitude too slow. Guerout et al.  compare MIP and GA for single-cloud deployment to optimize a utility function that combines four QoS objectives including energy, using a stagewise approach to improve MIP scalability, and find that MIP and GA are both too slow for practical use. Their calculated response times ignore network delays; including them would undoubtedly make both algorithms slower. Alizadeh and Nishi  carefully compare MIP solution times by CPLEX , and by GA, on a very large MIP unrelated to deployment. Their GA strictly enforces constraints by constraint repair. They find that the solution time of GA is at best an order of magnitude less than for MIP. Since MIP is several orders of magnitude too slow for Goal 4, this emphasizes that GA is unsuitable on the grounds of solution time.
Most papers consider only a single cloud or data centre. Multi-cloud systems have more serious response time concerns because of the network delay for messages that cross between clouds. Application Graph Partitioning (AGP) addresses these delays as costs on the arcs of an application graph as in Fig. 1(a). AGP algorithms such as in Fiduccia-Mattheyses  efficiently reduce or minimize the total cost, and can also include host constraints and costs. AGP is used for multi-cloud deployments in , and in  to minimize bandwidth (rather than delay), in  to minimize the sum of network and energy cost, and in  (via a metaheuristic) to minimize power in mobile clouds. The approach called Virtual Network Embedding  is also an adaptation of AGP. In summary, AGP deals efficiently with network delays but cannot resolve all the remaining goals; it needs to be augmented.
Bin-packing deals directly with capacity constraints (Goal 2) and is combined with performance models in  (ignoring power). It is adapted by Arroba et al. in  to reduce power (Goal 1) by applying CPU speed and voltage scaling, but they do not address response time.
Custom heuristics are described in  and other works; in  they are the only option that solved in a useful time, but this work does not address response time.
The response time includes delay due to processor contention and saturation, which are estimated by performance models. A survey of models for clouds is given by Ardagna et al. . Some recent work on deployment that reflects resource contention (always on the mean response time): Aldhalaan and Menasce  use a queueing model to place VMs in a cloud, finding the number of replicas by a hill-climbing technique. Ciavotta et al.  use a simple queueing approximation in a first stage, and a detailed layered queueing model in a second stage of optimization. Molka and Casale  use queueing models with GA, bin-packing and non-linear optimization, to allocate resources for a set of in-memory databases. Wada et al.  use queuing models to predict multiple service level metrics and a multi-objective genetic algorithm to seek pareto-optimal deployments. Calheiros et al.  and Shu et al.  use a queueing model to scale a single task to meet response time requirements, and  also models response failures. None of these studies consider power.
Deployment to minimize power has been part of many studies, e.g. energy is included in the utility function in . Wang and Xia  apply MIP to VM placement in the cloud to minimize power subject to processing and memory constraints (but ignoring response time and network delays). Tunc et al.  use a real-time control rule based on performance monitoring and a Value of Service metric that enforces the response time constraint while controlling for energy consumption. This is close to our stated problem, but they do not include network delays in the response time. Chen et al.  use a random search heuristic to minimize the money cost of power and network operations (but not delay).
For deployment tools, Arcangeli et al.  describe many tools from the viewpoint of a system operator, including MIP, GA, AGP and others.
To summarize, a heuristic solution for the minimum-power application deployment problem subject to constraints on response time and host capacities is needed because the solution times to find provable optima via methods such as MIP are too long. Metaheuristics like GA also take too long and have difficulty enforcing actual constraints. LPD, developed in the theses of Kaur  for a single cloud and Singh  for multi-clouds, uses instead a novel combination of graph partitioning and bin-packing to meet all the goals stated for LPD.
Background on techniques used in LPD
LPD uses customized algorithms derived from the following two families.
K-way graph partitioning (KGP)  divides a graph into K parts, where each part is a subset of the graph node set, the parts are non-intersecting, and cover the entire node set. A common objective is to minimize the cut-weight, which is the sum of the edge weights that connect nodes in different parts. KGP is NP-hard  and so does not scale well, but approximate solutions are provided more quickly using multi-level KGP schemes proposed by Karypis et al. [36, 37]. In LPD, the edge weights are defined so the cut-weight is the network delay and multi-level KGP is used to reduce the cut-weight until the response time constraint is satisfied, but not necessarily minimized. Graph partitioning algorithms often enforce a balance constraint to distribute the node weights equally among the partitions, but this is not required in the deployment problem.
2-D bin packing described in  is an NP-hard algorithm to pack a set of 2-dimensional items into a minimum number of 2-dimensional bins. In LPD, heuristic 2-D bin packing is incorporated into the KGP move evaluation, to reduce power consumption. After partitioning, a more rigorous bin packing is applied to each partition to further reduce power consumption.