In 2011, V. Shrivastava et al. [3] proposed Application Aware, a virtual machine migration strategy. AppAware gives the basic definition of virtual machine migration problem as follows.
For the virtual machine migration problem, we should consider three factors:

1
The communication correlations of VMs and the network topologies of physical machines. The former includes the settings of VMs and the communication relationships between them. The latter includes the settings of physical machines and network distance between them.

2
The resource demands of VMs and resource capacities of physical machines. The former means the computing resource demands of VMs, such as CPU, memory and disk storage. The latter means the quantity of the computing resources.

3
The sizes of VMs, which represent the amount of network traffic created by the VM migration.
For introducing the virtual machine migration problem briefly, we use some abbreviations in both mathematical functions and texts. The detailed abbreviations instructions are shown in Tables 1 and 2. In a datacenter, the virtual machine set is defined as V={V_{1}, V_{2}, V_{3}, …, V_{n}}, in which a subset of o number of virtual machines is defined as O={V_{1}, V_{2}, V_{3}, …, V_{o}}. The physical machine set is S={S_{1}, S_{2}, S_{3}, …, S_{m}}.The communication correlations between virtual machines can generate a network topological dependence graph, G=(V, E), where E represents the set of the edges of the communication correlations between virtual machines. The definition of E is E={(V_{i},V_{j})∣V_{i}andV_{j}haveacommunicationcorrelation}. W (V_{i}, V_{j}) represents the communication demand between virtual machines V_{i} and V_{j}.
Next is the definition of virtual machine resource demand and physical machine source. Load(V_{i}) is the resource demand of virtual machine V_{i}, and Capacity (S_{i}) represents the CPU, memory, and disk storage capacity of physical machine S_{j}. The migration cost of virtual machine V_{i} from physical machine S_{k} to physical machines S_{l} is Cost(V_{i},S_{k},V_{j},S_{l})=Distance(S_{k},S_{l})×W(V_{i},V_{j}), where virtual machines V_{i} and V_{j} have communication demand W(V_{i},V_{j}) and Distance(S_{k},S_{l}) represents the delay or hops between S_{k} and S_{l}.
To describe the relationship between virtual machines and physical machines, X_{ik} indicates whether virtual machine V_{i} is distributed to physical machine S_{k}. If so, then X_{ik} is equal to 1; otherwise, X_{ik} is 0. Similarly, \(X^{jl}_{ik} = X_{ik}\ *\ X_{jl}\) indicates whether virtual machines V_{i} and V_{j} are distributed to physical machines S_{k} and S_{l}. If the distribution exists, \(X^{jl}_{ik} = 1\); otherwise, it is equal to 0.
On the basis of these definitions, the total communication cost of a datacenter is expressed as
$$ \sum\limits_{A} Cost\left(V_{i}, S_{k}, V_{j}, S_{l}\right) \times X^{jl}_{ik} $$
(1)
where A={(i,j,k,l∣i<j, k<l, j<V, l<S)}. Meanwhile, the constraints of the problem are expressed as follows:
$$ \sum\limits_{k}^{\mid S\mid} X_{ik} = 1, \forall V_{i} \in O $$
(2)
$$ \sum\limits_{i}^{\mid V\mid}Load_{i} \times X_{ik} \leq Capacity_{k}, \forall S_{k} \in S $$
(3)
Condition (2) indicates that every virtual machine must be allocated to a specific physical machine. Condition (3) indicates that the summation of the resource demands of the virtual machines in each physical machine must be lower than the amount of resources the physical machine provides.
In addition, AppAware presumes that the communication correlations between virtual machines is fixed. Thus, AppAware’s optimization objective function is
$$ Minimize \sum\limits_{A} Cost\left(V_{i}, S_{k}, V_{j}, S_{l}\right) * X_{ik}^{jl} $$
(4)
To simplify this problem, we presume that the allocation of underloaded virtual machines to physical machines is fixed. Thus, the allocation of underloaded virtual machine V_{i} to physical machine S_{k} is represented by x_{ik}, which is defined as
For convenience of description, we call the virtual machine migration problem VMIG. The virtual machine is called VM, and the physical machine is called PM.
Improvement of the problem model
Communication cost model
In our work, the communication cost still follows the definition in (1). Here, we use Cost_Com to represent the total communication cost in the datacenter.
$$ Cost\_{Com} = \sum\limits_{A} Cost\left(V_{i}, S_{k}, V_{j}, S_{l}\right) \times X^{jl}_{ik} $$
(6)
As the calculations of the communication cost and migration cost are totally different, the dimensions of these two costs are also different, which unbalances the influences of these two costs to the total cost. To eliminate this unbalance, we need to make sure the values of these two costs in objective function are in the same order of magnitude. Tiwari et al. [25], Zhao et al. [26], to increase the models’ extensibility and adaptability and consider the migration cost in the objective function, we normalize the communication cost to a value between 0–1. The normalized communication cost is
$$ Cost\_{Com}\_{std} = \frac{\sum\limits_{A} Cost\left(V_{i}, S_{k}, V_{j}, S_{l}\right) \times X^{jl}_{ik}}{e_{Topo\_vm} \times \max\limits_{i,j\in V}\left(W_{ij}\right)\max\limits_{k,l\in S}\left(D_{KL}\right)} $$
(7)
where \(e_{Topo\_{vm}}\) represents the number of VM pairs that are communicating with each other in the datacenter. It also represents the total number of edges in the network topological dependence graph. Given that each pair of VMs that communicate with each other corresponds to a Cost(V_{i},S_{k},V_{j}S_{l}), we use the communication cost divided by the number of VM pairs that communicate with each other, the maximum VM communication demand \(\left (\max \limits _{i,j\in V}\left (W_{ij}\right)\right)\) and the maximum distance \(\max \limits _{k,l\in S}\left (D_{kl}\right)\) between PMs, in which D_{kl} is Distance(S_{k},S_{l}) and W_{ij} is W(V_{i},V_{j}).
Migration cost model
We define the migration cost as the product of the amount of VM migration data and the distance between the source and destination PMs. The VM is essentially a software container. It packages a complete set of virtual hardwares and softwares that includes the operating system and all applications. During VM migration, the migration operation is similar to the transfer and duplication of other software files. Therefore, the VM migration cost is related to the VM size, which is called Size. The function of the migration cost of VM V_{i} from S_{l} is
$$ Cost\_{Mig}(V_{i}) = \sum\limits_{k \in S} Size_{i}\times D_{lk} \times X_{ik} $$
(8)
where S_{l} is the physical machine where the overloaded VM V_{i} initially located. For an overloaded VM that needs to be reallocated, if X_{ik} = 1, then VM V_{i} will migrate to S_{k}.
The function of the total migration cost of a datacenter is
$$ Cost\_{Mig} = \sum\limits_{i \in O}\sum\limits_{k \in S} Size_{i}\times D_{lk} \times X_{ik} $$
(9)
As the communication cost need to be normalized, the migration cost also need to be normalized to the same order of magnitude with the communication cost. To increase the models’ extensibility and adaptability, we also normalize the migration cost. The normalized value will be used in the improvement of the objective function. The normalized migration cost is
$$ Cost\_{Mig} \_{std}= \frac{\sum\limits_{i \in O}\sum\limits_{k \in O}Size_{i} \times\ D_{lk} \times X_{ik}}{N_{o} \times \max \limits_{i \in O}(Size_{i})\max \limits_{l, k \in S}(D_{lk})} $$
(10)
where \(\max \limits _{i \in O}\)(Size_{i}) is the maximum size of all VMs, and N_{O}=O, which is the number of VMs in the overloaded PM. The migration strategy addresses the migration of these overloaded VMs. Thus, the number of migrated VMs following the selection of the migration strategy should be lower than N_{O}. The total migration cost generated by redeploying VMs onto these PMs is N_{O} orders of magnitude.
Improvement of the objective function
Next is the definition of the new objective function for this problem. The total cost, Cost_Total, is the weighted sum of the communication cost and the migration cost. This both enables our algorithm to estimate the network cost close to reality and ensures the extensibility and adaptability of the algorithm. We use the weight coefficients, α and β (which must sum to 1) to adjust the effect of these two factors in the optimization objective function.
$$ Cost\_{Total} = \alpha \times Cost\_{Com}\_{std} + \beta \times Cost\_{Mig}\_{std} $$
(11)
These two coefficients can balance the optimization of the communication cost and the migration cost effectively. The administrators of datacenters can adjust these two coefficients based on the network condition to have different optimization of the communication cost and the migration cost. In the following experiment, we set α=0.7 and β=0.3. With these coefficient values, the performance of our proposed algorithms in simulated datacenters with different number of virtual machines is best in our experiments. We also finished the experiment in “Experiment” section to find the most appropriate coefficient values with the best optimization of the total cost.