Implementation and performance analysis of various VM placement strategies in CloudSim
 Mohammed Rashid Chowdhury^{1},
 Mohammad Raihan Mahmud^{1} and
 Rashedur M. Rahman^{1}Email author
https://doi.org/10.1186/s1367701500455
© Chowdhury et al. 2015
Received: 30 May 2015
Accepted: 29 October 2015
Published: 11 November 2015
Abstract
Infrastructure as a Service (IaaS) has become one of the most dominant features that cloud computing offers nowadays. IaaS enables datacenter’s hardware to get virtualized which allows Cloud providers to create multiple Virtual Machine (VM) instances on a single physical machine, thus improving resource utilization and increasing the Return on Investment (ROI). VM consolidation includes issues like choosing appropriate algorithm for selection of VMs for migration and placement of VMs to suitable hosts. VMs need to be migrated from overutilized host to guarantee that demand for computer resources and performance requirements are accomplished. Besides, they need to be migrated from underutilized host to deactivate that host for saving power consumption. In order to solve the problem of energy and performance, efficient dynamic VM consolidation approach is introduced in literature. In this work, we have proposed multiple redesigned VM placement algorithms and introduced a technique by clustering VMs to migrate by taking account both CPU utilization and allocated RAM. We implement and study the performance of our algorithms on a cloud computing simulation toolkit known as CloudSim using PlanetLab workload data. Simulation results demonstrate that our proposed techniques outperform the default VM Placement algorithm designed in CloudSim.
Keywords
Introduction
We are living in a world of data where data pervades and controls almost every aspect of our lives. In order to keep up with growing data demands, there is a neverending need to establish quality resources. For maintaining the quality of resources we require high processing power and high end equipments that are sometimes expensive and unavailable. In order to meet the requirements, most of the end users and organizations have been led to the deployment of Cloud Computing which offers affordability, mobility, agility, and effective use of high priced infrastructure resources with reduced cost.
Cloud computing technology has resulted in maintaining large scale datacenters consisted of thousands of computing nodes that consume ample amount of electrical energy. According to report of NRDC (Natural Resources Defense Council) the nationwide data centers used 91 billion KWH (Kilo Watt Hours) of energy consumption in 2013, and it is estimated to reach around 139 billion of kilowatt hours by 2020 which is a 53 % increase compared to today’s consumption [http://www.nrdc.org/energy/datacenterefficiencyassessment.asp]. In another report it was said that only 10 % 15 % of supplied electricity is used in many data center to provide power to the servers [http://www.datacenterjournal.com/it/industryoutlookdatacenterenergyefficiency/]. One of the main reasons for this high consumption is due to the inefficient usage of these resources. Due to the narrow dynamic power range of servers, it has been seen that, even idle servers consume about 70 % of their peak power [1]. So from power consumption perspective, keeping servers unutilized is highly inefficient.
To address this problem, the adoption of a technology called Virtualization is embraced. Through virtualization, a physical server can create multiple instances of virtual machines on it, where each virtual machine defines virtual hardware and software package on behalf of a physical server.
In IaaS model, infrastructure requests are mainly served by allocating the VMs to cloud users [2]. Successful live migration of VMs among host to host without significant interruption of service results in dynamic consolidation of VMs. However, high variable workloads can cause performance degradation when an application requires increasing demand of resources. Besides power consumption we need to consider the performance as it puts Quality of Service (QoS) which is defined via Service Level Agreement (SLA).
It is clear that maintenance of cloud computing is an energyperformance tradeoff – we have to minimize the energy consumption while meeting the QoS. In order to address the problem, in this work, multiple VM placement algorithms are proposed based on the solution of bin packing problem. Previously Beloglazov and Buyya [3] proposed adaptive heuristics for energy and performance efficient dynamic VM consolidation. It includes many methods for host underload or overload detection to choose VMs to migrate from those underloaded and overloaded hosts. They proposed a modified version of BFD (best fit decreasing) for VM placement solution.
In our work we have followed the heuristics that Beloglazov and Buyya [3] stated in their work for dynamic VM consolidation, but instead of their modified best fit decreasing algorithm for VM placement, we proposed our algorithms based on other bin packing solutions for VM placement with custom modification. We have also introduced a new technique that forms clusters of VMs to migrate by taking into account both CPU utilization and allocated RAM (Random Access Memory). We implement and study the performance of our algorithms against the default VM placement algorithm designed in CloudSim to see whether our proposed algorithm can achieve an improved performance compared to the existing algorithm.
Adapative heuristics for dynamic VM consolidation
For VM placement, the typical approach that is introduced by many real datacenters is based on the solutions of Bin packing problem. First Fit algorithm is one of the popular solutions which are used to consolidate VMs in these datacenters. In order to minimize the number of server and prepare computational resources Ajiro [4] implemented a loadbalancing, least loaded algorithm and compared it with classical FFD (First Fit Decreasing) problem. Later in their work, they developed an improved version of FFD and LL algorithm, and evaluated them. They reported that for packing underutilized servers LL was more suitable but it produced poor performance on servers which were highly utilized.
Basmadjian et al. [5] presented different prediction models for power consumption in servers, storage devices and network equipments. For power saving they provided a three step model that consisted of optimization, reconfiguration and monitoring. The authors claimed that if the energy optimization policy could be guided by power consumption prediction models, then about 20 % energy consumption could be saved for typical single site private cloud datacenters.
Gebai el al. [6] studied the cause of task preemption across virtual machines. The authors used kernel tracing for latency problem. However, as the traces are from different virtual machines and generated from different time reference, a synchronization method is required. The authors proposed a trace synchronization method to merge the traces. Then the merged trace was explored further in a graphical interface to study the interactions among virtual machine CPUs. Finally, the interactions of threads among different systems were analyzed. This analysis could detect the execution flow centered on the threads and thus discover the cause for latency.
Dong et al. [7] proposed mostefficientserverfirst (MESF) taskscheduling algorithm for cloud computing data center. They reduced the energy consumption by limiting the number of active servers and response time. They used integer programming for the optimization solution and showed a tradeoff among active servers and response time. Their simulation results demonstrated that MESF could save 70 % energy consumption compared to random task scheduling scheme.
Panigrahy et al. [8] reordered the virtual machine request queue and proposed a geometric heuristics that run nearly as fast as FFD. Kangkang et al. [9] in their work emphasized on an approach based on the multiple multidimensional knapsack problem for VM placement instead of bin packing solution, where the main concern was to minimize the total job completion time of the input VM requests on the same physical machines through a reasonable VM placement schedule.
Khanna et al. [10] proposed a dynamically managed algorithm which is activated when a physical server becomes underloaded or overloaded. In the work, the authors reduced the violation of SLAs, minimized migration cost and number of physical server used, and optimized residual capacity. Jung et al. [11, 12] also tried to solve the dynamic VM consolidation problem while meeting SLA requirement where virtual machines were running on a multitier web application using live migration. Using gradient search and bin packing, a VM placement was done as a solution; however, this approach could only be applied to a single web application setup and therefore cannot be utilized for IaaS environment.
Speitkamp and Bichler [13, 14] used linear programming formulations for static and dynamic server consolidation problem by mapping virtual machines to certain physical servers which had unique attribute, and by limiting the number of VM migrations in physical machines. They proposed a LPrelaxation based heuristic to reduce the cost of solving the linear programming formulation. In another experiment, to minimize unnecessary migrations due to unpredictable workload, Tiago [15] proposed another LP formulation and heuristics to control VM migration prioritizing VMs with steady capacity which they named dynamic consolidation with migration control.
Beloglazov and Buyya [3] analyzed historical data of resource usage of VMs and proposed dynamic VM consolidation that can be split into four parts. First, they checked whether a host is overloaded. If it is, then decision is made to migrate some VMs from this particular host to another. Second, selection of VMs is done to decide the list of VMs that should be migrated from overloaded host. Third, checking is done to decide whether a host is underloaded and all VMs are needed to migrate to other hosts. Fourth, hosts have been selected to place the migrated VMs from overloaded and underloaded hosts. For VM placement optimization, they propose an algorithm which scans through the list of hosts and then tries to detect the hosts that are overloaded. If overloaded hosts are found then the algorithm tries to pick the VMs that are needed to be migrated from one host to another by applying any of the suitable VM selection policies. Once the list of VMs are created, the VM placement algorithm is executed to find a new placement for the migrated VMs. VM placement for underloaded host works in the similar fashion. After finding suitable host for all VMS from the underloaded host, the host is shut down or put in sleeping mode. The algorithm then returns the migration map which has the combined information of new VM placement which is needed to be migrated from both overloaded and underloaded hosts. They proposed a modified version of BFD (best fit decreasing) for VM placement solution.
In our work we also followed the heuristics that Beloglazov and Buyya [3] stated in their work for dynamic VM consolidation, but instead of their modified best fit decreasing algorithm for VM placement, we proposed our algorithms based on other bin packing solutions for VM placement with our custom modification.
 A.
Detection of overloadedhost

Median Absolute Deviation (MAD): For adjusting upper bound a statistical dispersion like MAD is used. The reason behind choosing MAD over standard deviation is that MAD is not heavily influenced by the outliers, so the magnitude of the distances of outliers is irrelevant.

Interquartile Range (IQR): This could be said as the second method for setting an adaptive upper threshold. For symmetric distribution half of IQR is equal to MAD.

Local Regression (LR): LR builds a curve that approximates original data by setting up the sample data models to localized subset of data.

Robust Local Regression (LRR): The local regression version was vulnerable to outliers that could be caused by heavy tailed distribution. In order to make a robust solution modification was proposed by adding the robust estimation method called bisquare which transformed LR onto an iterative method.
 B.
VM selection

Minimum migration time (MMT): This policy selects a VM to migrate that requires minimum amount of time to finish migrating, compared to other VMs allocated to the host.

Random Choice Policy (RC): This policy selects a VM that needs to be migrated according to a uniformly distributed discrete random variable Xd = U(0,Vj), whose values index a set of VMs Vj allocated to a host j. More details about RC is given in [3].

Maximum Correlation policy (MC): According to the proposal of Verma et al. [17], the higher the correlation between the resource usage by applications running on an over subscript server, the higher the probability of the server being overloaded. Based on this idea Beloglazov and Buyya [3] selected VMs which are needed to be migrated in such a way that VMs with highest correlation of the CPU utilization with other VMs are considered first. To estimate correlation, multiple correlation coefficients were applied.
 C.
VM placement
The VM placement problem could be modeled as bin packing problem with variable bin sizes and prices. The physical nodes can be represented as the bin, VMs that have to be allocated could be viewed as the items, bin size can be seen as available CPU capacities and price can be seen as the power consumption by the nodes. Among many solution of bin packing problem Beloglazov and Buyya [3] proposed a modification of popular Best Fit Decreasing (BFD) algorithm that was shown to use bins, not more than 11/9.OPT + 1 (where OPT is the number of bins provided by the optimal solution) [3]. The modified BFD was named PABFD (power aware best fit decreasing) algorithm which first sorts the VMs according to their CPU utilization in decreasing order and then for each VM it checks all the hosts and find the suitable host where the increase of power consumption is minimum. At final steps, it allocates the VM to that host. The algorithm is given as Algorithm 1.
Proposed work
 A.
Bin packing problem:

First Fit (FF): FF starts with the most active bin and tries to pack every item in it before going into the next bin. If no suitable bin is found for the item, then the next bin is selected to put in the new bin.

First Fit Decreasing (FFD): In FFD the items are sorted in nonincreasing order and then items are processed as the First Fit algorithm. It is actually the First Fit algorithm with the items are decreasingly sorted. It was proved by Brenda S Baker that FFD uses not more than 11/9▪OPT + 3 bins [18] where OPT is the number of bins provided by the optimal solution. Later In another discovery György Dósa proved that the tighter bound of FFD is , FFD(I) < = 11/9▪OPT(I) +6/9 [19].

Best Fit Decreasing (BFD): Like FFD, BFD also sorts items in nonincreasing order. It then chooses a bin such that minimum empty space will be left after the item is packed. In most of the cases BFD could find an optimal solution while FFD gives a nonoptimal solution as reported in [20].

Worst Fit Decreasing (WFD): It works exactly same as BFD except that instead of choosing bin with minimum empty space it chooses bin with maximum empty space to be left after the allocation of the item in that bin.

Second Worst Fit Decreasing (SWFD): Same as worst fit, it just choose bin with second minimum empty space. It is also known as almost worst fit decreasing (AWFD).
 B.
Proposed work for new VM placement algorithms:
 1.
If we t place VMs from the decreasingly sorted VM migration list (based on cpu utilization) on a host from hostlist, where the increase of power consumption is maximum, then VMs will occupy the host where available power would be minimum and turn on as many hosts as they can at the very beginning. The rate of turning on the hosts will eventually be reduced at each successive step.
 2.
Almost same as point 1, we will place VMs from VM migration list to host which has second minimum available power. For example, suppose we have one VM and three hosts, host 1, host 2 and host 3 has maximum power of 40 kw, 50 kw, and 60 kw respectivelty. The current power consumption of host 1, host 2 and host 3 is is 20 kw, 30 kw 40 kw respectively. Now suppose we allocate the VM to host 1. The power consumption after allocation would be 22 kw. Now rather than host 1 if we allocate the VM to host 2, the power consumption after allocation would be 33 kw and for the allocation to host 3, the power consumption after allocation will be 41 kw. Now we can see that the powerincrease in first case is 22 kw20 kw = 2 kw, for second case it is 3 kw, and for third case it is 1 kw. We can also notice that for the first case, now remaining power is 18 kw,for second case it is 17 kw and for third case it is 19 kw. So from these hosts we will choose to place the VM to host 1 as it has second minimum available power, whereas for point 1 we would have taken host 2 as it has the minimum amount of available power.
 3.
Now apart from point 1 and 2, we place VMs from the decreasingly sorted VM migration list(based on cpu utilization) on hosts where we do not check for increase of power consumption whether it is maximum or minimum. We first select a host and if the power after allocation of the VM on that host is less than the maximum power of the host we pour that VM onto that host. We could see that the hosts will be occupied in the same order as they are arranged in the hostlist. So the first host from the hostlist will be selected first. It will try to accommodate as many VMs as it can until the maximum power is reached.
 4.
Continuing from point 3 we will decreasingly sort hosts in hostlist in terms of available power. The host with maximum available power becomes the first candidate to receive the VM from decreasingly sorted VM list.
Now, if we follow carefully the above discussion, the first point could also be interpreted as the Worst Fit Decreasing technique. The reason for saying that is very trivial. Choosing host where the increase of power consumption is maximum is the exact opposite that BFD usually does. As we know WFD (worst fit decreasing) algorithm is the exact opposite algorithm of BFD. So at this point we are proposing a modified VM placement algorithm which we name MWFDVP (modified worst fit decreasing VM placement). The pseudocode for the algorithm is presented in Algorithm 2.
Now let us turn into the second point, as we can see that it is slight modification over the first point, instead of choosing the host with minimum available power, it is choosing the host with second minimum available power; it can be interpreted as second worst fit technique or almost worst fit technique. AWF (almost worst fit) tries to fill the second largest gap first and does the rest just like worst fit decreasing [23]. So we propose another VM placement algorithm based of almost worst fit decreasing technique that we named SWFDVP (Second worst fit decreasing VM placement) whose pseudo code is given below as Algorithm 3.
The observation in point 3 can be modeled as First Fit Decreasing algorithm. We choose a host from the hostlist (starting from the very first position) and check whether the host is suitable for the VM. If the host is suitable then we pour the VM into that host and check for the next VM from the VMlist. If the host is not suitable than we move onto the next host from the hostlist. Considering this we propose a VM placement algorithm which we name as MFFDVP (modified first fit decreasing VM placement). As this algorithm is almost identical with our lastt algorithm, we skip describing its pseudo code and move toward the next point. Let us consider the point 4. Now we can see that it is a representation of modified first fit technique where hosts are decreasingly sorted with respected to their available power. So for this case, each VM will be first allocated to the host which has maximum available power, then after allocating the VM, the hostlist will be decreasingly sorted again (when the next VM is called), and in this manner it will continue allocating VMs until allocation for all the VMs from the VM migration list is done; when one host reaches close to its maximum power, the next host from the hostlist will be called. We named it FFDHDVP (first fit decreasing with decreasing host VM placement). The pseudocode for the algorithm is presented in Algorithm 4.
 C.
Clustering technique
In CloudSim 3.0, the Virtual Machines that needed to be migrated from one host to another are first sorted according to their CPU utilization in a decreasing order and then a suitable host for each VMs is found by using PABD algorithm. The most efficient way the optimization algorithm works by allocating as many VMs as it can on a single host so that it could reduce the utilization of host as well reduce the migration of VMs and SLA violations. In this work, instead of decreasingly sorting virtual machines list, we explored clustering technique that could form clusters of VMs based on its CPU utilization and currently allocated RAM. After making clusters we tried to find hosts for the VMs that came from highest density cluster, which means, we give highest priorities to the virtual machines that are the members of mostly dense cluster. In this way, a group of maximum number of VMs which are close to each other with respect to CPU utilization and current allocated RAM are subjected to be poured into host at first, then the VMs from a group of second dense cluster will be allowed to be poured on suitable hosts. Each cluster is basically a VM list and for VM placement highest dense clusters (the VM list with maximum number of VMs in it) will be considered to be hosted first. We will discuss about how we form the clusters, our choice of clustering algorithm and how we implement it below.
By the term clustering we mean grouping of objects in such a way that objects with identical attribute values reside together. There are many cluster models that include connectivity model, centroid based models, distribution models, density models, graph based models, etc. Among those models we choose to start working with centroid based model.
In the centroid based clustering, a central vector which may not necessarily be a member of the data set usually represents a cluster. The most popular centroid based clustering algorithm is kmeans algorithm which is a prototype based ,partitioned clustering techniques that attempts to find a user specified number of clusters (k), which are represented by their centroid [24].
 1.
Finding optimal number of clusters.
 2.
Selecting k points as initial centroids.
 3.
Repetitively Building k clusters by assigning Virtual Machines to its closest centroid based on the CPU utilization and currently allocated RAM by re computing the centroid of each clusters until centroids do not change.
In this research, we introduced a new technique to find the number of clusters, i.e., k in this research. It is described below.
It is quite trivial to notice that if one host has a capacity of C _{ h } and if the maximum allocated cpu capacity of a VM is C _{ v } then the maximum number of VMs that could be allocated to a host is, MaxV = C _{ h } /C _{ v . } Following this we find the number of virtual machines in each cluster in such a way so that all the virtual machines in a cluster can be allocated to their suitable host as a group. Suppose we have a list of hosts H _{ 1 } , H _{ 2 } , H _{ 3 } ……., H _{ n } and a list of virtual machines like V_{1}, V_{2},……., V_{n}. Now if we build a set of hosts according to their available CPU Million Instructions Per Second (MIPS) from the host list, we can get , H _{ (x) } ^{ available cpu mips } = {H _{ (1) } , H _{ (2) } ,……,H _{ (n) } } where x ϵ {1,……..,n}. Now we denote alpha as, α = max ({H _{ (x) } ^{ available cpu mips } : x ϵ (1,…, n) }) And beta as, β = min ({H _{ (x) } ^{ available cpu mips } : x ϵ (1,…,n) }). In the similar way if we build a set of Virtual machines according to their currently allocated mips from the virtual machine list , we can get V _{ (x) } ^{ current allocated cpu mips } = {V _{ (1) } , V _{ (2) } ,……,V _{ (n) } } where x ϵ {1,……..,n}. We denote gamma as, γ = max ({V _{ (x) } ^{ current allocated cpu mips } : x ϵ (1,…, n) }) and delta as , δ = min ({V _{ (x) } ^{ current allocated cpu mips } : x ϵ (1,…,n) }). Now, if we need to migrate the Virtual Machines to their suitable hosts from the host list, the maximum number of VMs that could be allocated into a host can be found by maxpoint = (α/ δ) and the minimum number of VMs in a cluster can be found by minpoint = (β / γ). So we compute the optimal number of cluster k as finding average of the maximum and minimum numbers of VMs set for each cluster. So k = (maxpoint + minpoint) / 2.
 1.
From n objects calculate a point whose attribute value is an average of nobjects attributes values. Therefore, the first initial centroid is the average on nobjects.
 2.
Select next initial centroids from nobjects in such a way so that the Euclidean distance of that object is maximum from other selected initial centroids.
 3.
Repeat step 2 until we get k initial centroids.
After we have found the value for k (optimal number of cluster) and the initial centroids we will repetitively build clusters by assigning Virtual Machines to its closest centroid based on the CPU utilization and currently allocated RAM by recounting the centroid of each clusters, until there is no alteration of centroids. After the completion of these steps we will start performing kmeans algorithm which we named modified kmeans algorithm (MK) which is represented in Algorithm 5.
After we get our desired clusters, we will start allocating VMs from the cluster which have maximum VMs on it. It will keep repeating to allocate VMs until all the VMs are allocated from higher dense cluster to lower dense clusters. We will try to implement the bin packing solutions to design our custom VM placement algorithms for placing the VMs into their suitable hosts. We considered using best fit, first fit, a modified version of first fit with decreasingly sorted host with respect to their available power, worst fit and almost worst fit algorithms to design our VM placement algorithms. As we have already seen that Beloglazov and Buyya in their work [3] implemented PABFD, so if we want to use it, we will need to make some tweaks on PABFD to make it work for our clustering approach. In PABFD, instead of sorting VMs in decreasing order, we will call our MK algorithm to make cluster of VMs and then we will rebuild the VM migration list based on the preference of VMs which came from highest dense cluster to lower dense clusters; this rebuilding operation will be done by a function that we named arrangeByHighDensityCluster which takes the returned cluster from MK and the VM list as parameters. In this way all the VMs from higher to lower dense clusters will be arranged in VM migration list and then we will follow the rest of the same methodology of PABFD algorithm. We are naming the tweaked version of PABFD as PABF_C.
Now, If we try to place each VM from the redesigned VM migration list(after clustering) on a host from hostlist, where the increase of power consumption is maximum, then VMs will occupy the hosts where available power would be minimum and turn on as much host as they can at the very beginning. The rate of turning on the host will eventually be reduced at each successive step. If we look carefully, we can see that, this scenario can be interpreted as the Worst Fit technique. So based on this, at this point we are proposing a modified VM placement algorithm which we name MWFVP_C (modified worst fit VM placement for clustering). The pseudocode for the algorithm is presented in Algorithm 6.
In the similar manner we have designed SWFVP_C (second worst fit VM placement for clustering), MFFVP_C algorithm (modified first fit VM placement for clustering) and FFHDVP_C (first fit with decreasing host VM placement). Due to space limitation we do not provide the details of pseudo code here. We have integrated the entire proposed algorithm beside PABF_C to the CloudSim toolkit, and later we will verify whether these algorithms produce satisfactory results.
 D.
IaaS Simulation model:
As our objective is to improve the quality of IaaS system model, we need to run our methods in simulation which can provide us IaaS environment. It would be beneficial for us if the environment could be represented by data centers consisting of specific number of physical server, where each server has multicore architecture, adequate amount of RAM, bandwidth and storage. In general CPU performance is defined as Millions Instructions Per Second (MIPS) and instead of local disks the storage is defined as Network Attached storage (NAS) or Storage Area Network (SAN). We also need to make a power model to estimate the power consumption based on utilization of CPU. Now in order to design the power model we followed the steps of Beloglazov and Buyya’s work [3]. The authors reported that building precise analytical power model was quite difficult. Theredore, instead of using an analytical model of power consumption of a server they utilized real data on power consumption provided by the SPECpower benchmark result. Following their trail we selected the exact two server configuration that they choose to work with; HP ProLiant ML110 G4 (Intel Xeon 3040, 2 cores * 1860 MHz, 4 GB), and HP ProLiant ML110 G5 (Intel Xeon 3075, 2 cores * 2660 MHz, 4 GB). The power consumption data of HP ProLiant ML110 G4 server and HP ProLiant ML110 G5 was taken from the SPECpower benchmarks results from their website [https://www.spec.org/power_ssj2008/results/res2011q1/power_ssj20082011012400338.html] [https://www.spec.org/power_ssj2008/results/res2011q1/power_ssj20082011012400339.html].
We also used their metrics for calculating SLA violation, the first one was SLATH (SLA violation per active host) which is the percentage of time active host experiences 100 % utilization of cpu and the second one is PDM(performance degradation due to migrations).
The mathematical formula for SLATH is, \( \mathrm{SLATH}\kern0.5em =\kern0.5em \frac{1}{\mathrm{N}}\kern0.5em {\displaystyle {\sum}_{i=1}^N\frac{Tsi}{Tai}} \), where N = the number of hosts, Tsi = total time host i has experienced 100 % CPU utilization which led to SLA violation, Tai = total time of host i being active for serving virtual machines.
The mathematical formula for PDM is, \( PDM\kern0.5em =\kern0.5em \frac{1}{M}\kern0.5em {\displaystyle {\sum}_{j=1}^M\frac{Cdj}{Crj}} \), where M = number of Virtual Machines. Cdj = Estimate of performance degradation of VM j due to migrations. (By default it was set to 10 % of CPU utilization in MIPS during all migrations of VM j),Crj = total CPU capacity requested by VM j during its lifetime.
Performance evaluation
 A.
Experimental setup:
 B.
Workload data:
 C.
Performance metrics:
 D.
Results and analysis for nonclustering approach:
We ran simulation randomly among day wise PlanetLabs workload data for our proposed algorithms that have been discussed in earlier section. At first we ran the simulation according to the default mechanism of CloudSim. By default, VM migrationlist was decreasingly sorted with respect to CPU utilization and VMplacement algorithm was PABFD. We then used the decreasingly sorted VM migration but instead of PABFD we used our proposed MWFDVP, SWFDVP, MFFDVP, FFDHDVP algorithms for VM placement one at a time. We take into account the power consumption, SLA violation, SLA violation per active host, performance degradation due to migration and number of VM migration from the generated output to compare the performance of those algorithms.
 E.
Result and analysis for clustering approach:
Therefore, at the end of running all simulation, if we try to figure out which policy produced satisfactory results for a randomly chosen day, we can see that for power consumption, SLA violation and performance degradation due to host migration, both lrmmt 1.2 and lrrmmt 1.2 made very good results, compared with rest of the policies; for SLA violation time per active host iqrmmt 1.5 and for number of VM migration lrrrs 1.2 produced good results. If we consider to find the best policy considering all of them, we can see that being lagging behind in some of the cases (which is very negligible) lrmmt 1.2 and lrrmmt 1.2 gave us quite satisfactory results. For VM placement, in every cases all our proposed algorithms produced better result compared to the PABFD algorithm that was default VM placement techniquein CloudSim toolkit. However, among these algorithms, for power consumption, SLA violation and SLA violation time per active host FFHDVP_C, for performance degradation due to migration and number of VM migration MWFVP_C (having close contest with FFHDVP_C) showed strong results compared to other.
Performance of different VM placement techniques
PC  A1  A2  A3  A4  B1  B2  B3  B4  C1  C2  C3  C4  D1  D2  D3  D4  E1  E2  E3  E4 

P1  46.86  47.85  49.32  46.78  34.35  35.37  35.38  34.42  34.35  35.37  35.38  34.66  44.99  45.61  47.36  44.8  40.85  41.81  44.08  41.51 
MF1  41.63  41.33  41.16  42.07  32.71  32.29  32.44  32.88  32.71  32.29  32.44  32.85  39  39.06  38.78  38.86  33.94  34.37  33.31  34.28 
MW1  39.06  38.78  40.2  39.33  31.37  31.44  34.2  31.68  31.37  31.44  34.2  31.23  36.62  35.97  38.04  36.55  32.6  32.57  34.21  32.86 
F1  40.08  39.66  39.39  39.36  31.28  31.14  31.39  31.5  31.28  31.14  31.39  31.47  37.07  37.02  36.71  37.27  32.24  32.34  31.78  32.17 
S1  42.23  41.23  41.5  41.74  32.58  32.51  32.69  33.04  32.58  32.51  32.69  32.66  39.31  39.18  39.08  39.51  34.31  34.44  33.56  34.83 
P1_C  47.08  47.64  49.12  47.05  33.93  35.02  35.48  34.49  33.93  35.02  35.48  34.52  44.94  45.72  47.03  45.06  40.96  42.08  44.39  41.22 
MF1_C  42.43  41.7  41.87  41.73  32.44  32.49  32.94  32.84  32.44  32.49  32.94  32.85  39.88  39.35  39.04  39.65  34.53  34.16  33.43  34.14 
F1_C  40.74  40.75  40.06  40.33  31.45  31.27  31.21  31.33  31.45  31.27  31.21  31.35  37.23  37.71  37.26  37.47  32.44  32.58  32.09  32.42 
MW1_C  39.52  39.09  41.9  39.36  32.14  31.54  34.19  32.11  32.14  31.54  34.19  31.91  37.2  36.86  39.17  37.36  33.3  32.73  34.43  32.88 
S1_C  42.65  41.63  42.25  42.23  33.28  32.64  32.87  32.89  33.28  32.64  32.87  33.36  39.43  39.54  39.36  39.41  34.94  34.34  34.02  34.74 
SLV  
P2  0.02113  0.0177  0.0215  0.0213  0.0212  0.0191  0.0205  0.024  0.021  0.01912  0.0205  0.024  0.025  0.02  0.0253  0.025  0.0373  0.03  0.0355  0.036 
MF2  0.03248  0.0266  0.0462  0.0303  0.0246  0.0206  0.0291  0.0265  0.025  0.02057  0.0291  0.026  0.0374  0.033  0.0639  0.037  0.0699  0.058  0.1207  0.07 
MW2  0.02128  0.0178  0.0425  0.0244  0.0198  0.0163  0.0322  0.0212  0.02  0.01633  0.0322  0.021  0.025  0.022  0.0593  0.025  0.0541  0.045  0.1421  0.055 
F2  0.01889  0.0175  0.0249  0.0216  0.0154  0.0144  0.0155  0.0163  0.015  0.01439  0.0155  0.018  0.0245  0.023  0.032  0.024  0.051  0.042  0.0752  0.052 
S2  0.02977  0.0255  0.0496  0.0275  0.0261  0.0218  0.0284  0.0256  0.026  0.02182  0.0284  0.025  0.0382  0.03  0.059  0.038  0.0722  0.056  0.1196  0.062 
P2_C  0.02107  0.0175  0.0222  0.0202  0.0225  0.0202  0.0238  0.0232  0.022  0.02022  0.0238  0.022  0.0249  0.02  0.0252  0.024  0.0402  0.029  0.033  0.039 
MF2_C  0.03194  0.0278  0.049  0.0316  0.0246  0.0212  0.0297  0.0257  0.025  0.02124  0.0297  0.026  0.0345  0.031  0.064  0.036  0.0694  0.057  0.138  0.07 
F2_C  0.01978  0.0189  0.0312  0.0218  0.0157  0.0136  0.0162  0.0177  0.016  0.01362  0.0162  0.016  0.03  0.02  0.0427  0.029  0.0497  0.046  0.0894  0.055 
MW2_C  0.02137  0.0159  0.0463  0.0201  0.0199  0.0155  0.0335  0.0204  0.02  0.01546  0.0335  0.019  0.0258  0.019  0.0595  0.025  0.0489  0.04  0.1397  0.051 
S2_C  0.02974  0.0232  0.0474  0.0293  0.0235  0.0199  0.0268  0.0241  0.023  0.0199  0.0268  0.029  0.0363  0.029  0.0591  0.034  0.0646  0.052  0.1225  0.066 
PDVM  
P3  0.26  0.23  0.26  0.25  0.14  0.13  0.13  0.15  0.14  0.13  0.13  0.15  0.26  0.23  0.26  0.26  0.27  0.23  0.28  0.27 
MF3  0.24  0.21  0.29  0.23  0.12  0.09  0.11  0.13  0.12  0.09  0.11  0.12  0.24  0.22  0.32  0.25  0.3  0.25  0.41  0.32 
MW3  0.21  0.18  0.28  0.21  0.1  0.08  0.12  0.11  0.1  0.08  0.12  0.11  0.2  0.19  0.31  0.21  0.26  0.22  0.45  0.27 
F3  0.24  0.22  0.27  0.26  0.11  0.09  0.1  0.12  0.11  0.09  0.1  0.12  0.25  0.24  0.29  0.26  0.31  0.28  0.39  0.32 
S3  0.24  0.2  0.3  0.23  0.12  0.1  0.12  0.11  0.12  0.1  0.12  0.11  0.24  0.21  0.31  0.25  0.29  0.25  0.4  0.29 
P3_C  0.26  0.23  0.26  0.25  0.14  0.13  0.15  0.15  0.14  0.13  0.15  0.14  0.26  0.23  0.26  0.26  0.27  0.24  0.26  0.28 
MF3_C  0.24  0.22  0.31  0.24  0.12  0.1  0.12  0.12  0.12  0.1  0.12  0.12  0.24  0.21  0.33  0.24  0.3  0.26  0.46  0.31 
F3_C  0.26  0.26  0.32  0.27  0.11  0.09  0.11  0.12  0.11  0.09  0.11  0.11  0.28  0.23  0.34  0.29  0.31  0.3  0.46  0.34 
MW3_C  0.21  0.16  0.29  0.2  0.11  0.08  0.12  0.11  0.11  0.08  0.12  0.1  0.2  0.17  0.32  0.2  0.25  0.21  0.49  0.26 
S3_C  0.23  0.2  0.3  0.24  0.12  0.1  0.11  0.12  0.12  0.1  0.11  0.13  0.24  0.2  0.32  0.23  0.29  0.24  0.44  0.29 
SVTAH  A1  A2  A3  A4  B1  B2  B3  B4  C1  C2  C3  C4  D1  D2  D3  D4  E1  E2  E3  E4 
P4  8.14  7.82  8.24  8.41  15.63  14.31  15.21  15.97  15.63  14.31  15.21  15.74  9.81  8.61  9.73  9.48  13.79  12.99  12.69  13.26 
MF4  13.46  12.83  16.02  13.09  20.68  22.48  25.8  20.96  20.68  22.48  25.8  21.23  15.67  15.5  20  14.97  23.45  22.86  29.32  22.11 
MW4  10.11  9.75  15.08  11.58  19.7  19.34  27.23  19.06  19.7  19.34  27.23  18.58  12.34  11.57  19.16  12.05  20.56  19.99  31.32  20.56 
F4  7.77  7.79  9.07  8.39  14.16  15.15  14.99  14.09  14.16  15.15  14.99  15.39  9.68  9.47  10.89  9.39  16.54  14.84  19.42  16.42 
S4  12.57  12.99  16.3  12.08  22.4  22.25  24.49  22.47  22.4  22.25  24.49  22.31  15.75  14.25  19.13  15.37  24.72  22.53  29.83  21.61 
P4_C  8.24  7.59  8.61  7.98  16.48  15.84  16  15.4  16.48  15.84  16  15.6  9.57  8.95  9.79  9.33  14.62  12.45  12.69  13.78 
MF4_C  13.16  12.64  15.89  12.92  20.91  21.85  25.24  21.38  20.91  21.85  25.24  22.44  14.46  14.47  19.46  15.32  23.24  22.41  30.16  22.74 
F4_C  7.73  7.38  9.86  8.14  14.11  14.62  15.18  14.69  14.11  14.62  15.18  14.71  10.71  8.82  12.46  10.09  16.02  15.53  19.6  16.07 
MW4_C  10.29  9.69  16  9.97  18.38  18.48  27.11  18.76  18.38  18.48  27.11  19.16  12.59  11.23  18.61  12.27  19.18  18.69  28.61  19.91 
S4_C  12.66  11.69  15.7  12.01  19.39  20.81  24.55  20.93  19.39  20.81  24.55  22.12  15.4  14.35  18.74  14.31  21.91  21.64  28.15  22.37 
NVM  
P5  5085  5502  5789  5048  2203  2872  2808  2285  2203  2872  2808  2440  4778  5265  5628  4849  4392  4839  5404  4523 
MF5  3841  4287  4919  3736  1567  1867  1826  1679  1567  1867  1826  1626  3432  4206  4914  3455  3247  4014  4965  3533 
MW5  3296  3492  4557  3246  1375  1633  1915  1474  1375  1633  1915  1411  2947  3536  4491  2928  3119  3663  5338  3198 
F5  4144  4325  4990  4242  1629  1824  1929  1739  1629  1824  1929  1744  3998  4411  4904  4044  3910  4211  5161  3998 
S5  3743  4118  5098  3615  1637  1859  1883  1559  1637  1859  1883  1569  3524  4082  4950  3585  3329  4063  4847  3516 
P5_C  5086  5447  5864  4967  2214  2737  3058  2349  2214  2737  3058  2333  4840  5278  5616  4773  4419  4790  5193  5193 
MF5_C  3998  4250  5607  3929  1657  1837  1816  1597  1657  1837  1816  1552  3523  4033  5345  3728  3571  4098  5453  5453 
F5_C  4316  4736  5583  4383  1640  1867  1901  1763  1640  1867  1901  1637  4216  4382  5482  4362  3849  4208  5766  4134 
MW5_C  3002  3371  4675  2955  1417  1574  1798  1452  1417  1574  1798  1324  2920  3415  4601  2979  3192  3705  5548  5548 
S5_C  3723  4108  5440  3794  1590  1888  1810  1669  1590  1888  1810  1666  3697  3993  5222  3673  3488  4095  5304  3522 
 F.
Comparison among clustering and nonclustering approach
In the previous section, we individually showed results of our VM placement algorithms for both nonclustered and clustered approach with different VM selection and overload detection policies. On both cases we ran our simulation for one day data from ten days PlanetLab workload traces. We have seen that our proposed VM placement algorithms did very well compared to the PABFD algorithm in all perspectives, i.e., all performance metrics.
Day wise planet lab data
Data  Number of VMs 

3 March  1052 
6 March  898 
9 March  1061 
22 March  1516 
25 March  1078 
3 April  1463 
9 April  1358 
11 April  1233 
12 April  1054 
20 April  1033 
However, if we consider every VM placement, VM selection and overload detection techniques, the number of combinations is huge, i.e., 10 VM placement strategies (5 clustered and 5 nonclustered), 3 VM selection strategies and 5 overload detection strategies. If we want to consider all of them, 150 comparisons should be made which is huge and considerable amount of simulation time is required to report the findings. As we already found our best two VM Placement techniques as FFDHDVP and MWFDV, we consider them for further analysis. So, we took only FFDHDVP and MWFDVP for nonclustered, FFHDVP_C and MWFVP_C for Clustered VM placement algorithms and selected policies like lrrmmt 1.2, lrrmc 1.2, and iqrmc1.5. We skipped policies like lrmmt 1.2 and lrmc1.2 as they scored exactly the same as their robust versions.
The result of this descriptive statistic is discussed below. In the figures we used numbers to reflect policies name so that they can visibly fit into our graphs. We used #1, #2, and #3 to represent iqrmc 1.5, lrrmmt 1.2 and lrrmc 1.2 respectively. For example PABFD#1 means we used PABFD as our VM placement algorithm and iqrmc 1.5 as our VM selection and overload detection policy, i.e., iqr as overload detection with threshold 1.5, mc as maximum correlation VM selection policy.
We could say that when lrrmmt 1.2 is chosen as VM selection and overload detection policy, for reducing power consumption FFDHDVP, for SLA violation, SLA violation time per active host and number of VM migration our clustered FFHDVP_C and, for performance degradation due to migration clustered MWFVP_C made outstanding results. So, we can say that all of our proposed algorithms outperformed PABFD algorithm that was the default VM placement algorithm given in CloudSim toolkit.
Pvalue for different performance metric
Metric  pvalue 

Power Consumption  0.004118825 
SLAViolation  6.93*10^{6} 
Performance degradation due to migration  1.83*10^{13} 
SLA Violation per active host  3.27*10^{7} 
Number of VM migration  2.32*10^{6} 
If the pvalue is greater than 0.05, then we must accept the null hypothesis, otherwise we must reject the null hypothesis. From Table 3 we find that the pvalue is significantly smaller than 0.05 for every performance metric. Therefore, we must reject the null hypothesis and we could conclude that there is significantly difference in performance found by the built in Power Aware Best Fit Decreasing (PABFD) in CloudSim and the best algorithm, FFHDVP_C that we develop and implement in this research. From the box plots, we could easily verify that our proposed algorithms perform significantly better than the built in PABFD VM placement algorithm
Conclusion
Using different solutions of bin packing problem, our proposed VM placement algorithm could make remarkable improvements over the existing solution. Our proposed techniques managed to get lower power consumption, less amount of SLA violation and less amount of performance degradation over the existing PABFD VM placement algorithm. We are also successful to show that VM placement is favored by higher virtual machine density which we proved by adopting clustering method. From our result we also find out that local regression based algorithm equipped with the minimum migration time VM selection policy significantly outperforms other dynamic VM consolidation algorithms. As a future work we plan to introduce fuzzy algorithm that could take advantages from different selection criteria and form a rule base for VM selection. We also suggest for making more ecofriendly IT infrastructures with reasonable amount of ondemand operating cost to improve the quality of IaaS of cloud computing.
Declarations
Acknowledgement
There is no acknowledgement from authors’ side.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 Fan X, Weber WD, Barroso LA (2007) Power provisioning for a warehousesized computer. Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA 2007), ACM New York, NY, USA, pp 13–23Google Scholar
 Hyukho K, Woongsup K, Yangwoo K (2010) Experimental study to improve resource utilization and performance of cloud systems based on grid middleware. J Commun Comput 7(12):32–43Google Scholar
 Beloglazov A, Buyya R (2012) Optimal online deterministic algorithms and adaptive heuristics for energy and performance efficient dynamic consolidation of virtual machines in Cloud data centers. Concurrency Computat. Pract Exper 24:1397–1420. doi:10.1002/cpe.1867 View ArticleGoogle Scholar
 Ajiro Y, Tanaka A (2007) “Improving packing algorithms for server consolidation”, in Proceedings of the International Conference for the Computer Measurement Group (CMG)Google Scholar
 Robert B, Hermann De M, Ricardo L, Giovanni G (2012) Cloud computing and its interest in saving energy: the use case of a private cloud”. J Cloud Comput Adv Syst Appl 1:5View ArticleGoogle Scholar
 Mohamad G, Francis G, Dagenais MR (2014) Finegrained preemption analysis for latency investigation across virtual machines”. J Cloud Comput Adv Syst Appl 3:23View ArticleGoogle Scholar
 Ziqian D, Ning L, Roberto RC (2015) Greedy scheduling of tasks with time constraints for energyefficient cloudcomputing data centers”. J Cloud Comput Adv Syst Appl 4:5View ArticleGoogle Scholar
 Panigrahy R, Talwar K, Uyeda L, Wider U (2011) Heuristics for Vector Bin Packing”, Microsoft’s VMM Product Group, Microsoft Research Sillicon ValleyGoogle Scholar
 Kangkang L, Huanyang Z, Jie W (2013) “Migrationbased virtual machine placement in cloud systems,” Cloud Networking (CloudNet), 2013 IEEE 2nd International Conference., pp 83–90. doi:10.1109/CloudNet.2013.6710561, 1113 NovGoogle Scholar
 Khanna G, Beaty K, Kar G, Kochut A (2006) Application performance management in virtualized server environments. In: Proceedings of the 10th IEEE/IFIP Network Operations and Management Symposium, NOMS’06Google Scholar
 Jung G, Joshi KR, Hiltunen MA, Schlichting RD, Pu C (2008) Generating adaptation policies for multitier applications in consolidated server environments. Proceedings of the 5th IEEE International Conference on Autonomic Computing (ICAC 2008), Chicago, IL, USA, pp 23–32Google Scholar
 Jung G, Joshi KR, Hiltunen MA, Schlichting RD, Pu C (2009) A costsensitive adaptation engine for server consolidation of multitier applications. Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware (Middleware 2009), Urbana Champaign, IL, USA, pp 1–20Google Scholar
 Bichler M, Setzer T, Speitkamp B (2006) Capacity planning for virtualized servers, in: Proceedings of the 16th Annual Workshop on Information Technologies and Systems, WITS’06Google Scholar
 Speitkamp B, Bichler M (2010) A mathematical programming approach for server consolidation problems in virtualized data centers, IEEE Transactions on Services ComputingGoogle Scholar
 Ferreto TC, Netto MAS, Calheiros RN, De Rose CAF (2011) Server consolidation with migration control for virtualized data centers. Future Gener Comput Syst 27(8):1027–1034, October 2011View ArticleGoogle Scholar
 Beloglazov A, Abawajy J, Buyya R (2011) Energyaware resource allocation heuristics for efficient management of data centers for cloud computing. Future Generat Comput Syst. doi:10.1016/j.future.2011.04.017
 Verma A, Dasgupta G, Nayak TK, De P, Kothari R (2009) Server workload analysis for power minimization using consolidation. Proceedings of the 2009 USENIX Annual Technical Conference, San Diego, CA, USA, pp 28–28Google Scholar
 Baker BS (1985) A new proof for the firstfit decreasing binpacking algorithm. J Algorithms 6(1):49–70MATHMathSciNetView ArticleGoogle Scholar
 Dósa G (2007) The Tight Bound of First Fit Decreasing BinPacking Algorithm Is FFD(I) ≤ 11/9OPT(I) 6/9. In: Combinatorics, Algorithms, Probabilistic and Experimental Methodologies, 2007, vol 4614. Springer, Berlin Heidelberg, pp 1–11View ArticleGoogle Scholar
 Venigella, Swathi, “Cloud storage and online bin packing” (2010) UNLV Theses/Dissertations/ Professional Papers/Capstones. Paper 894. http://digitalscholarship.unlv.edu/thesesdissertations/894. Accessed 22 April, 2015.Google Scholar
 Fan X, Weber WD, Barroso LA (2007) Power provisioning for a warehousesized computer. Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA 2007), ACM New York, NY, USA, pp 13–23Google Scholar
 Kusic D, Kephart JO, Hanson JE, Kandasamy N, Jiang G (2009) Power and performance management of virtualized computing environments via lookahead control. Cluster Comput 12(1):1–15View ArticleGoogle Scholar
 Ausiello G, Lucertini M, Serafini P (1984) Algorithm Design for Computer System Design. Springer, Wien, PrintMATHView ArticleGoogle Scholar
 Tan PN, Michael S, Vipin K (2005) Introduction to Data Mining. Pearson Addison Wesley, Boston, PrintGoogle Scholar
 Baswade AM, Nalwade PS (2013) Selection of initial centroids for kmeans algorithm. IJCSMC 2(7):161–164Google Scholar