An intelligent decision system for virtual machine migration based on specific Q-learning

Zhu, Xinying; Xia, Ran; Zhou, Hang; Zhou, Shuo; Liu, Haoran

doi:10.1186/s13677-024-00684-y

Research
Open access
Published: 17 July 2024

An intelligent decision system for virtual machine migration based on specific Q-learning

Xinying Zhu¹,
Ran Xia¹,
Hang Zhou¹,
Shuo Zhou¹ &
…
Haoran Liu²

Journal of Cloud Computing volume 13, Article number: 122 (2024) Cite this article

348 Accesses
Metrics details

Summary

Due to the convenience of virtualization, the live migration of virtual machines is widely used to fulfill optimization objectives in cloud/edge computing. However, live migration may lead to side effects and performance degradation when migration is overused or an unreasonable migration process is carried out. One pressing challenge is how to capture the best opportunity for virtual machine migration. Leveraging rough sets and AI, this paper provides an innovative strategy based on Q-learning that is designed for migration decisions. The highlight of our strategy is the harmonious mechanism for applying rough sets and Q-learning. For the ABDS (adaptive boundary decision system) strategy in this paper, the exploration space of Q learning is confined by the boundary region of rough sets, while the thresholds of the boundary region can be dynamically adjusted by the reaction results from the computing cluster. The structure and mechanism of the ABDS strategy are described in this paper. The corresponding experiments show a firm advantage for the cooperation of rough sets and reinforcement learning algorithms. Considering both the energy consumption and application performance, the ABDS strategy in this paper outperforms the benchmark strategies in comprehensive performance.

Introduction

Due to the explosion of the Internet of Things, the amount of data for internet data centers (IDCs) is booming with an irreversible trend. Confronting the complexity and uncertainty of modern application requirements (accurate concurrency, high fault tolerance, large fluctuations), the traditional approach for resource optimization and distributed computing is no longer sustainable. Innovative algorithms and artificial intelligence (AI) will be necessary for autonomic optimization and management of intelligent IDC in the future.

Cloud/edge computing is a consolidated computing paradigm. Equipped with massive virtualized CPU, memory and storage resources, Infrastructure as a Service (IaaS) providers offer low-level computing resources on demand for end users. Computing instances with virtual machines (VMs) can be provided flexibly to accommodate diverse applications. The foremost advantage of applying VMs is that they are flexible enough to be transparently migrated from one physical machine (PM) to another without interrupting online services. The operation of live migration is convenient by leveraging VMM, e.g., VMware, KVM and XEN. In large computing clusters, there are rich scenarios in which a VM migration process should be launched.

As the optimal problem becomes more complex, AI technology is now used by advanced cloud providers such as Amazon AWS, Tencent Cloud, and MS Azure to provide reliable analysis and operation. For instance, reinforcement learning algorithms have been applied to address job scheduling and resource provision in cloud/edge computing scenarios. In [1], a generic algorithm and deep reinforcement learning were combined for cost-aware scheduling. Study [2] proposed a DRL-based preemptive scheduling approach for cloud jobs. Specifically, the researcher in [2] focused on reducing the execution cost of VMs and guaranteeing users’ expected response time with a deep Q-network.

Although VM migration is crucial and convenient, it also has some side effects. First, the performance of the application running on the migrating VM is affected, especially during the beginning and downtime of a migration process [3]. Performance degradation in VMs is accompanied by high resource utilization on PMs. Experiments by Xu et al.[4] showed that applications running on VMs enduring serious performance degradation and variation, e.g., the loading time of the Doom3 Game will increase by 25 to 110% when an Amazon EC2 will endure resource contention with other instances. Inappropriate VM migration leads to performance variation or degradation. The corresponding SLAV (violation of service level agreement) will be triggered, which will cause loss of revenue and business reputation for the IaaS provider. Second, frequent VM migrations can also induce additional increases in energy consumption. Research [5, 6] has investigated the operating cost of live migration and shown inevitable delays in both the execution time and response time of applications during migration. Considering that thousands of servers run in clusters, the electricity cost is not trivial for IaaS providers [7, 8].

The general assumption in the literature on virtual resource optimization is that VM migration is rational and imperative because it mainly focuses on how to schedule migration based on the execution time, routing path and other migration costs. The missing fact is that some migrations may be irrational or erroneous, which has been discussed in our previous research [9]. VM migration is usually overused, and the decision about whether to migrate is crucial but has not received sufficient attention. Although the application of artificial intelligence in enhancing the performance, design, monitoring, and maintenance of time-critical computing systems has been well described in [10], live VM migration in this paper is also a different research topic. Before selecting the migration destination, the crucial step is the decision of whether a VM should migrate.

Reinforcement learning algorithms such as DQN (deep Q-network), actor-critic and DDPG (deep deterministic policy gradient) can achieve optimal action by improving the prediction accuracy. However, in our experiments, we found that it is difficult to use these advanced RL algorithms to address VM migration decision problems. The environment of the VM pool in a computing cluster is constantly changing. The deep neural network trained based on the previous data needs to be constantly readjusted. This leads to a slow convergence speed, which makes it difficult to meet the needs of live VM migration. The aim of this paper is to find a harmonious combination of reinforcement learning and rough sets. As green computation is one of the optimal objectives, Q-learning, which is the elementary reinforcement learning algorithm, was applied in this paper. We innovated the method of updating the value table, which increases the efficiency of Q-learning. In addition, the action selection method was also innovated to apply a rough set decision system. All these innovations were patented by this team. These engineering innovations are expected to have positive value in real-time AI decision scenarios.

The remainder of this paper is organized as follows: Sect. "Motivation scenarios" presents the motivation of this paper. A thorough formal definition is given in Sect. "Problem formulation". In Sect. "Strategy interpretation", we describe the details of the innovative algorithm, and the corresponding performance evaluation is performed in Sect. "Performance Evaluation". We discuss related works in Sect. "Related works", while the conclusion and future work are included in Sect. "Conclusion remarks and future directions".

Motivation scenarios

Here, we address two scenarios based on our experiments. The motivation of the research in this paper is clear with the following two typical examples. The workload benchmarks for the experiments are examples from the RUBiS [11] and DACAPO [12] suites. The mixed application types consist of mail service, matrix multiplication, database read/write operations, web page browsing, etc. Note that workloads can be adjusted by specific parameters such as the size of the matrix and the number of connections in the Apache web server.

The first scenario involves overmigration with fluctuating resource demand. As shown in our previous research [9], the CPU utilization of a PM fluctuates between 49 and 87%. In the eighth sampling data (each once every second), this utilization value is 83%. Then, VM migration occurs when the CPU utilization exceeds the predetermined threshold. Due to the performance degradation in this experiment, it took 205 s to complete this migration. Although 205 s is still normal considering network congestion across different network partitions, the problem is that the duration of enduring excessive CPU utilization is only 3 s. Due to resource content, this 205 s of migration results in an extended execution time of jobs. This leads to a corresponding SLAV that is distinctly greater than the enduring resource shortage of a mere 3 s (the shaded area in Fig. 1).

The second scenario concerns the phenomenon of resource shortages in migration. In this experiment, there are 3 VMs running on PM2. The initial CPU utilization of PM 2 is less than 80%. Due to storage failure at another node, a VM migration process is triggered, and PM 2 is chosen as the destination. As migration is resource intensive, the CPU of PM2 significantly increases (from 78% to more than 90%) in the next few seconds. The main problem in this scenario is that when the final synchronization process for live migration confronts the increase in resource demand in PM2, the resource shortage becomes more severe (CPU utilization increases to 100% in our experiment). This brings about significant performance degradation, and it takes more than 50 min (almost 100 times more than normal migration) for this abortive migration in the underlying experiment to complete. In addition, severe SLAV of application in the VM and extra energy consumption of PM are incurred for the prolonged migration time in this experiment.

Drawbacks exist for the static threshold approach, as it leads to over-migration, which is shown in the first scenario. However, this fixed threshold is prevalent even in mature platforms (e.g., VMware and Xen), and this dichotomy is currently common due to its easy implementation. Our early research [9] provided a three-way decision approach based on probability theory and rough sets. Although the migration decision issue can be transformed into a three-classification problem by [9], we did not provide a judgment method for the boundary region. A specific performance probe from our patent approach is utilized in [9] to make further decisions about live migration. However, performance monitoring from probes leads to elaborate adjustments of the parameters. An advanced strategy with automatic judgment from an AI perspective is expected, which is also our straightforward motivation for this paper.

Problem formulation

In this paper, both rough sets and Q-learning are involved in the ABDS (adaptive boundary decision system) strategy for VM migration decisions. In this section, three regions of rough sets are provided to reduce the optimization space, and the corresponding Q-learning approach is interpreted in the following formulation.

Denote U as a finite set, while E represents an equivalence relation on U. The equivalence relation E induces the partition of U, which can be denoted as U/E. For an object $x\in U$ , the equivalence class that contains x can be denoted as $\left[x\right]\;=\;\left\{y\in U\;\left|\;xEy\right.\right\}$.

For migration decisions, the first critical step is to determine whether the related PM is in a state of overload. The element x is a two-dimensional vector that represents the resource status of a computing node. Therefore, a clear partition of U (meaning the whole space for the resource utilization state) to classify the status of a PM is needed. D is the subset of U, which indicates the overload state of a PM, while D^c represents the PMs that are not oversubscribed. To change the static dichotomy, the approximation space and rough sets are introduced here to solve this problem. The lower and upper approximations of subset D are defined as (1).

$$\begin{array}{*{20}{c}} {\underline {apr} (D) = \{ x \in U|[x] \subseteq D\} } \\ {\overline {apr} (D) = \{ x \in U|[x] \cap D \ne \emptyset \} } \end{array}$$

(1)

Leveraging rough sets [13, 14], space or sets can be approximately divided into positive, negative and boundary regions. According to this method, for subset D, space U can also be divided into three regions: positive region POS(D), negative region NEG(D) and boundary region BND(D). Following the definition of regions in rough sets, the corresponding three regions can be formulated as (2).

$$\begin{array}{*{20}{c}} {POS(D) = \{ x \in U|[x] \in D\} ,} \\ {BND(D) = \{ x \in U|[x] \cap D \ne \emptyset \wedge [x] \not\subset D\} ,} \\ {NEG(D) = \{ x \in U|[x] \cap D = \emptyset \} } \end{array}$$

(2)

Combined with Formula (1), the three regions for the migration decision can also be represented as (3).

$$\begin{array}{*{20}{c}} {POS(A) = \underline {apr} (A),} \\ {BND(A) = \overline {apr} (A) - \underline {apr} (A),} \\ {NEG(A) = U - \overline {apr} (A)} \end{array}$$

(3)

For most engineering applications with discrete space, applying the above Pawlak rough set model may be too strict [15, 16]. As probability and statistics are always involved in artificial intelligence algorithms, a probabilistic rough set model is utilized here to enable robustness response to uncertainty. In complex environments, uncertainty is effective in reducing the side effects of an arbitrary threshold. In this paper, uncertainty is quantified by the degree of overlap between equivalence and approximation, i.e., [x] and D in (2). Given an object in [x], Pr(D|[x]) is the conditional probability that an object belongs to D. In the cloud/edge computing scenario, Pr(D|[x]) is the conditional probability that the target computing node is overloaded.

$$\Pr (D|[x]) = \frac{|D \cap [x]|}{{|x|}}$$

(4)

The symbol |•| in (4) represents the cardinality of a set. According to the above definitions, three regions with rough sets in (2) can be equivalently represented as (5), where α and β are the parameters used in the partition of U. They are key thresholds of regions in the three-way classification [13, 17, 18].

$$\begin{array}{*{20}{c}} {PO{S_{(\alpha ,\beta )}}(D) = \{ x \in U|\Pr (D|[x]) \geqslant \alpha \} ,} \\ {BN{D_{(\alpha ,\beta )}}(D) = \{ x \in U|\beta < \Pr (D|[x]) < \alpha \} ,} \\ {NE{G_{(\alpha ,\beta )}}(D) = \{ x \in U|\Pr (D|[x] \leqslant \beta \} } \end{array}$$

(5)

The principle of (5) means that x is identified as a member of D if the probability value is greater than α; x is rejected as a member of set D if the probability is less than β. We neither accept nor reject x if the probability of x belonging to D is between α and β. Leveraging the rough sets approach in (5), the flexible three-way decision can be applied more effectively than the arbitrary dichotomy method. In this paper, vector x represents the resource utilization of a PM, while D represents the set of overloaded PM. Therefore, if a PM is in an overload state (the certainty of being overloaded is greater than α), it will be defined within the positive region. The corresponding instruction is that the PM can be regarded as oversubscribed, and VM migration should be conducted immediately (otherwise, it may lead to performance degradation, which is discussed in Sect. "Motivation scenarios"). If a PM is far from the overload state (the possibility of belonging to set A is less than β), then any migration plan (regarding this PM as a source) should be denied.

The problem is in the boundary region. If the state of the PM is within the boundary region, a dedicated monitoring probe was utilized in our previous work [9] to support further judgment. In this paper, an innovative method is proposed based on Q-learning to decrease the overhead and enable automatic decision-making. The formulation for the underlying mechanism is provided here, while the details for the ABDS strategy are shown in the next section.

The Bellman equation is utilized here to solve the decision process. Considering the discount factor γ in Q-learning, the expectation of reward (considering current migration which represents feedback from the computing cluster) within an episode can be presented as (6). According to the iterative relation, it can also be converted to Formula (7).

$${q_\pi }(s,a) = {E_\pi }[{r_{t + 1}} + \gamma {r_{t + 2}} + {\gamma^2}{r_{t + 2}} + {\gamma^3}{r_{t + 3}} + ....]|{A_t} = a,{S_t}{\text{ = s}}$$

(6)

$${q_\pi }(s,a) = {E_\pi }[{R_{t + 1}} + \gamma {q_\pi }({S_{t + 1}},{A_{t + 1}})|{A_t} = a,{S_t}{\text{ = s}}$$

(7)

The principle of Q-learning is to achieve the best optimal function of action (the action in this paper has only two choices—to migrate or not). Considering the probability factor, the optimal function within the boundary region can be formulated as formula (8).

$${Q^*}(s,a) = \sum {_{s^{'}}P} {\text{(s'|s,a)(R(s,a,s') + }}\gamma {\text{ma}}{{\text{x}}_{a^{'}}}{{\text{Q}}^*}{\text{(s',a'))}}$$

(8)

The probability of a state transition can be derived from previous data, and the best Q value can be derived from Formula (8). Then, the appropriate action will be chosen based on the optimal Q value and the specific policy. Usually, mature policies such as the ε-greedy, UCB (upper-confidence bound) or gradient bandit will be utilized for action choice. Note that we innovated the action choice method in our experiment, which will be interpreted in the following section. After the action is conducted, the Q table will be updated dynamically according to Formula (9).

$$Q(s,a) = Q(s,a) + \alpha [r + \gamma {\max_{a^{\prime}}}Q(s^{\prime},a^{\prime}) - Q(s,a)$$

(9)

In this paper, the updating Q table in Formula (9) is critical for the threshold of the boundary region in Formula (5). The parameters and other details of the innovative method incorporating Q-learning and rough sets are presented in the next section.

Strategy interpretation

In this paper, the migration decision issue is solved from an AI perspective. As VM migration closely interacts with computing clusters, e.g., the source PM, the destination PM and the throughput of the routing path, a reinforcement learning algorithm is utilized here to address the interactions between individuals and the environment.

In our research, the ABDS was developed based on Q-learning, which is a classic RL approach. Compared with the original Q-learning algorithm, many of the details have been modified in the ABDS, as shown in Fig. 2. In Fig. 2, the computing cluster is recognized as the environment, while the PM is considered to be the individual in the RL algorithm.

Given a computing node, a migration decision should be made based on the corresponding resource utilization. The foremost contribution of our ABDS is the harmonious combination of rough sets and the RL algorithm. As Fig. 2 shows, the migration choice will affect the cluster environment, while feedback (reward) will be collected to update the thresholds of rough sets. Meanwhile, the dynamically updating boundary region leads to flexible adjustment of the exploration space for the related Q-learning.

Macro configuration of the ABDS

Leveraging the Q-learning framework, some parameters need to be specified in the proposed ABDS strategy. First, the resource utilization of the PM is regarded as the state S. The real-time resource utilization can be derived using specific tools such as VmStat and htop. Although various resources can be derived, only CPU and memory are considered. Therefore, state S is a two-dimensional space composed of CPU and memory sampling data. To minimize the state space and improve the convergence rate, this two-dimensional continuous space needs to be discretized to a certain extent. In our experiment, the sampling data are set approximately equal to the value of the nearest discrete point. Second, episodes are also an important concept in Q-learning. Both the solution exploration and the parameter updates depend on the iteration of the episode. Unfortunately, unlike in the classic maze, there is no clear episode as the computing cluster works continuously. In this paper, we set 20 min as an episode period, and the thresholds of the boundary region in the ABDS are adjusted once per episode. Third, the action of an individual (PM) has only two choices—migration or not. In addition, other parameters should be considered. γ is the discount factor that is used to discount the previous reward value. In this paper, the ABDS strategy has the opposite effect on the discount series compared with classic Q-learning. R_t is denoted as the reward of the environment at t, while the former reward R_t-n should be discounted as γⁿR_t at t-n. α in Formula (9) represents the learning rate, which is used to limit the updating extent of the Q table. Our experiments show that VMs may have abnormal migration experiences; thus, the learning rate is used to update the Q table in a soft way.

Calculation and updating of the Q table in ABDS

The Q table is crucial in our ABDS strategy. In this paper, the Q table has two dimensions: the rows represent different resource states (discrete resource utilization vectors), while the columns represent the action choices (migrating or not migrating). Each value in the Q table is the discounted summation of experienced rewards, while the current reward is related to the migration interaction with the computing cluster. In this paper, three components should be considered for the calculation of the Q value. First, when live migration is executed, the performance of applications running on this VM will degrade [19]. C_vm is the cost of migration from the VM perspective, and the values of C_vm are different for various types of applications. The second level is PM. If a migration occurs, then resource contention issues may arise at the destination PM. C_pm is denoted as the cost of migration from the perspective of resource contention on PM. Note that only those for which the resource utilization of PM exceeds 60% will be involved in the calculation, as migration is usually triggered by high resource utilization. For PM, the migration cost is the opposite of the reward. If the migration process relieves the status of resource contention, then the predicted reward value is positive, and vice versa. The third factor is energy consumption, which is not negligible for solving the performance and energy trade-off issue. An experiment in [20] showed that the power of a PM mainly depends on CPU utilization. In our experiment, the power of two types of servers is measured every 10 points for CPU utilization, and some key point data are shown in Table 1.

Table 1 Power (Wattle) of CPU with different utilization

Full size table

Let C_ec be the increment of energy consumption for migration in the computing cluster. Linear programming is applied to obtain the C_ec value at each CPU percentage point. From the perspective of power consumption, both the duration of migration and the power value on the source and destination PM are related to the reward of the migration process. The prediction of resource utilization and migration duration has been well researched [21, 22]. For brevity, the details of the other components are not included; rather, we only focus on how to further calculate and update the reward value in the Q table.

Parameter settings and other factors are also provided here for better interpretation. For the discount factor γ in Sect. "Problem formulation", we set the value to 0.7 to guarantee that the interaction of the latest 5 migrations can be traced back. Since the previous rewards can be derived by looking up the Q table, the remaining problem for implementation is the action selection method. As mentioned in Sect. "Problem formulation", the selection method is different from the general ε-greedy and UCB methods. In this paper, the action choice is related to the corresponding regions of rough sets. If state S is in the positive region, then the probability of selecting migration is 100%, which means that a migration is to be carried out immediately. If state S is in the negative region, then the probability is 0. If state S is in the boundary region shown in (5), the probability value depends on the reward sampling data within the boundary region. As discussed in this section, the predicted reward of current migration is composed of C_vm, C_pm and C_ec. The weight parameters for the three components can be adjusted according to the actual situation. Considering the nonlinear negative effect of SLAV, both the performance degradation in VM and the resource contention in PM should penalize the estimated reward, which is denoted as pt in (10).

$$reward{ = }{q_1}{C_{vm}}{ + }{q_{2}}{C_{pm}} + {q_{3}}{C_{ec}} + pt$$

(10)

Normalization is utilized for the reward, and the cluster will benefit more if the reward value approaches 0. Since the reward feedback can be fixed by Formula (10), the remaining parameter in Formula (9) is α. α is the learning rate used to control the extent of updating the Q value. It is set to 0.6 in our experiment. Our underlying experiments verify that 0.55 to 0.67 for α is acceptable. If we continue to reduce the value of α, the AI decision system will inadequately respond to the current changes in the computing cluster environment.

Adaptive adjustment of the boundary thresholds

This paper provides a harmonious combination of rough sets and reinforcement learning. Three regions in the ABDS strategy provide a flexible decision system. Compared with predefined boundary thresholds in rough sets, a new method is introduced to update the related threshold.

Considering the actual migration environment, the threshold values for α and β should be updated at moderate frequencies. In our ABDS strategy, threshold adjustment is triggered at the end of each episode, which is also shown in the following pseudocode. In general, the Q value is updated more frequently at each action step, while the thresholds of the boundary regions are adjusted at the end of each episode. In this paper, the initial values for α and β are set to 100% and 60%, respectively, which means that performance degradation can be ignored when resource utilization is less than 60%. After a certain number of sampling dates for migration are collected, the thresholds can be adjusted by the distribution of the sampling data. The innovative method is that the lower bound of resource utilization for all positive samples in the Q table is regarded as the ideal value of threshold α, while the upper bound for all negative samples is the target value for threshold β. To reduce the oscillation of thresholds, a half-step adjustment method is applied in which the mean of the current value and the target value are adopted to enhance the stability of the decision system. The framework of our ABDS strategy is shown in the following pseudocode.

As with all other reinforcement learning algorithms, the input of the ABDS in the above table is the status of a computing node. For example, when a polling computing node is taken into consideration, its resource utilization can be noted as the state of the node. The output of the ABDS is the action decision, which means to migrate or not (this is just the action type). As mentioned in Sect. "Calculation and updating of the Q table in ABDS", the reward value is a negative value between 0 and -1.

Performance evaluation

In this section, corresponding experiments are performed to evaluate the performance. First, we introduce the experimental settings and the metrics used in our evaluation. Then, we reveal the experimental results and present the analysis.

Experimental settings and performance metrics

Related experiments are carried out based on two basic types of servers in our laboratory: server A (Intel i79750H, hexa-core with 2600 MHz, 16 GB RAM) and server B (Intel octa-core i9-9900k with 3600 MHz, 32 GB RAM). We adopt OpenStack (4.0.2) to manage the virtual resources of clusters composed of the above two types of servers. Nova (15.1.1) is implemented to undertake computing jobs in VMs from the control node and compute node in this cluster. In addition, the realization of the innovative strategy in this paper also depends on modifying related components (e.g., the Nova scheduler) in OpenStack. To fit the real scenario in the IaaS cloud, different types of VMs are created based on different mirror images. The small VM type has 1 CPU unit and 2 GB of RAM, the medium type has 2 CPU units and 4 GB of RAM, and the large type has 4 CPU units with 8 GB of RAM.

The workloads applied in this paper are mixed types of benchmarks from SPECCPU (http://www.spec.org/cpu2006/), Netperf, Hadoop instances, and SPECweb2005 (http://www.spec.org/web2005/). In addition, considering the importance of web applications, the ApacheBench test is also implemented for performance tests on response time. To increase the frequency of data sampling for migration, an extra workload is implemented to cause fluctuations in resource consumption. These workloads include database transaction, matrix transposition and a simple probe designed to test the execution time for CPU-intensive applications. In this paper, the resource situation of the cluster is monitored by Prometheus (2.8.1).

For the experimental metric, both the energy consumption and performance should be considered. For energy consumption, we actually measured the power of each PM model at different CPU resource utilizations. Table 1 shows the power of two PMs measured by separating 10 CPU percentages. Linear regression is applied within each 10-point CPU power range to obtain the power value at each integer CPU percentage point. For the SLAV metric, the response time is taken into consideration because it is the most obvious indicator for web applications. The response time data come from the Apache Bench test. The client constantly sends requests to the Apache server for access to the homepage of the website in our experiment.

The values of the learning rate, discount factor and initial thresholds in our ABDS strategy follow the values discussed in Sect. "Strategy interpretation". The weighted values for the three components in the reward calculation are set to be equal in the initial setting. Note that we only focus on the optimization of the overloaded PM. In addition to the metrics discussed above, downtime is also an important metric for migration. In our experiment, we used the nova migration list to measure the duration of migration, while the downtime of live migration could be derived by the timestamp difference of VMs in Nova log files. As the downtime of sampling data is usually quite low, it is not included in the metrics, and the default value is maintained for related parameters such as the max_downtime, steps and delay in the evaluation experiment. In addition, the experiment is carried out in our LAN network with high throughput; thus, the network bandwidth and topology have no direct effect on the performance evaluation.

Comparative algorithms and experimental results

In this subsection, we address the experimental results and present the relevant analysis. Five benchmark algorithms of four types are compared in the performance evaluation. The algorithm proposed in our previous research [9] is not used here for additional factitious prevention within the boundary region. The first type is the empirical static threshold. For migration, 80 to 85 CPU percentages are regarded as the proper threshold range for source PM with high resource utilization [23, 24]. Here, 80TR and 85TR represent the triggering algorithms with thresholds of 80 and 85 CPU percentages, respectively. The third algorithm is classic Q-learning, in which the action selection depends on the ε-greedy operator. The fourth algorithm is denoted as RS + R, which is the combination approach of rough sets and random sets. In RS + R, state S is also divided into three regions. The difference is that when the current state is in the boundary region, RS + R chooses the action with absolute randomness. The fifth comparative algorithm is the combination of rough sets and Q-learning. We denote it as RS + Q, in which the classic Q- learning is only used in the phase of action selection in the boundary region. As discussed in Sect. "Strategy interpretation", our ABDS involves the combination of rough sets and modified Q-learning, which is described in Sect. "Strategy interpretation". The specific API for the environment in this paper is developed with Open AI Gym.

First, we evaluate the power consumption of the comparative algorithms. Note that we only focus on the power of the CPU on active servers. The reference line is the initial power state of four servers (two servers A and two servers B) with a 70% CPU percentage. Due to the constantly triggered workloads, all six power curves in Fig. 3 increase in different ways. Note that underload detection of the server and PM hibernation is not considered in this experiment; thus, the total power will increase when a new PM is activated during migration.

Figure 3 shows that the performance of the traditional strategy for the static threshold is acceptable. The power of both the 80 and 85 strategies is moderate compared with that of the other strategies. Basic Q-learning has the highest power value, and we found that it is difficult to achieve performance convergence during the entire 100-min duration. This shows that effectively applying the naïve AI algorithm is not trivial.

For deep analysis, the values in the Q table do not converge in a constantly changing computing cluster environment. In addition, the exploration of ε-greedy leads to many poor migration decisions. The power of RS-related strategies is quite competitive, which shows the benefit of confined space due to region partitions. The RS + Q and ABDS have the best power performance. The performance of the ABDS is in line with that of the RS + Q. These two strategies have the lowest power consumption. During the last 20 min, the power of the RS + Q strategy is nearly 20% less than that of the 80 strategy. The underlying reason for this large power difference is that more migration is triggered in strategy 80, which leads to more active servers with significant power increases.

The second metric for our experiment is the response time (RS time). This sub-experiment is launched leveraging the AB test on the Apache server in a VM, and the data in Fig. 4 are the mean values of the 10 experiments.

Since 20 min is the predefined episode duration in the Q-learning related algorithms, 100 min are divided into 5 parts, as shown in Fig. 4.

As shown in Fig. 4, the RS time fluctuates during different durations. In the initial 20 min, the 80 strategy has the worst performance, which is obviously higher than that of the other traditional strategies. The reason is that the 80 strategy has lower triggering thresholds; thus, more migration is launched than in the other strategies during the first 20 min. The RS value is strongly related to the migration process, and interestingly, in our experiment, the RS time increased sharply during the downtime of migration (RS increased to more than 0.8 s, which is almost 500 times greater than that in the normal state). During the evolution (finish and creation) of workloads in the following durations, the RS time increases moderately. The RS time for strategy 85 is the highest during the second 20 min. We noticed that the CPU utilizations of the four PMs approach the specific threshold, which leads to an obvious increase in migration. Q-learning has the highest RS value, as it is still far from the optimal policy during the middle durations. The performance of RS + R is worse than that of RS + Q and ABDS. We traced the Nova log file, and half of the migration process was not suitable for absolute random policy. RS + Q and ABDS dominate other strategies during most of the study period, as they receive minimum interference from inappropriate migration processes.

In addition to the RS time, the execution time is another important metric for the performance of applications running on VMs. For CPU-intensive or data-intensive jobs, makespan is an important Qos (quality of service) metric. The expected execution time of applications will be extended when current VMs endure resource contention or unsuitable live migration. As discussed in Sect. "Experimental Settings and performance metrics", a lightweight probe with a matrix calculation job is applied to quantify the ratio of the actual execution time to the expected running time. Note that the definition of SLAV can be flexible in different computing scenarios. In this paper, SLAV is defined as the situation in which the RS time of a request in the AB test is greater than 0.1 s or the actual execution time is extended to more than 20% of the expected duration.

Figure 5 shows the mean number of SLAVs with six different algorithms during the entire experimental duration. In Fig. 5, RS and ET represent the number of SLAVs in the RS time and prolonged execution time, respectively. The number of SLAVs for both the 80 and 85 strategies is within a moderate interval between 20 and 25. For Q-learning, the value of ET is quite different from that of RS. Q-learning has the best ET performance because it always tries to carry out live migration before resource contention occurs.

The ABDS strategy becomes cautious in the boundary domain of rough sets when determining whether a virtual machine should migrate. This modest delay in VM migration results in an increase in the ET value compared with classical Q-learning. However, severe performance degradation occurs for classical Q-learning due to frequent VM migration processes. The performance of RS + R is also relatively poor, and we speculate that the random choice of migration may lead to unreasonable migration decisions. Compared with the static threshold strategy, there is no advantage of RS + Q for the SLAV metric. However, we can still observe the positive effects of the rough set on Q-learning from Fig. 5. By limiting the exploration space, the SLAV in RS + Q decreased by 25.6% compared with that in naïve Q-learning. The ABDS has the minimum SLAV because the regions of rough sets can be adaptively adjusted with the episode iteration.

Figure 6 shows the normalized reward value for different optimization strategies. From Figs. 3, 4 and 5, we can see that the performance of RS + Q dominates the naïve Q-learning, while the ABDS performs much better than the random choice in RS + R. Therefore, Q-learning and RS + R are excluded from further experiments. Figure 6 shows that the reward values of strategies 80 and 85 fluctuate steadily around the mean value. The underlying reason is that each migration process will be triggered under the same conditions. During the experimental duration, the reward values in the RS + Q and ABDS increased with great fluctuations. In the first 20 min, the reward for the RS + Q and ABDS strategies was lower than that for the 80 and 85 strategies. The underlying reason is that the AI agent cannot find the optimal policy during the exploration phase. During the remainder of the experiment, the rewards of both strategies increase along with the updates of the Q table. However, the reward value of the ABDS dominated that of the RS + Q during most of the experimental duration. It is inferred that the compression of the exploration space for reinforcement learning is conducive to earlier convergence, while the reward from the cluster can be applied to adjust the boundary region of rough sets, which leads to better migration decisions.

Related works

VM migration is the key operation for cluster management and has received sustained attention in recent years [5,6,7, 25]. Hybrid algorithms were applied to solve the trade-off problem between energy and performance. Multiscale sliding windows were applied to estimate the load, and a specific relationship between resource utilization and SLA was proposed to address the performance and energy issues [26]. Salehi et al. [27] discussed the preemption technique to reduce the overall energy consumption with SLA constraints. Factorization skills were adopted to forecast workload fluctuations, and an integrated method was introduced to reduce energy consumption while maintaining SLA. Karthikeyan [28] et al. innovated a VM swap system in which energy conservation and SLA can be guaranteed at the same time. Although they focused on the energy and performance issues for VM migration, the limitations of these studies are obvious. The threshold for triggering live migration is always a static value that was defined in advance. The greatest challenge for VM migration is how to use smart AI algorithms to meet the needs of live migration decisions. Considering the reinforcement learning algorithm, there are two points hindering its application in live VM migration. The first is agent exploration, which is the optimization space. The improvement of this strategy depends on effective exploration, but this exploration itself leads to two problems. First, the convergence rate is slow, which will lead to the negative impact of delayed decision-making. Second, an excessive number of searches increases the amount of computations, which violates the requirements of reducing energy consumption in green computing. The second point is the instability of the training model. Based on underlying experiments, in a complex and time-series distributed (cloud/edge) computing environment, it is impossible to guarantee the stability of the trained model. Some parameters need to be constantly updated and retrained, which leads to the consumption of considerable additional computing power. Therefore, in the last 5 years, only one study [29] has focused on the VM migration problem using a reinforcement learning algorithm. In [29], Hummaida utilized reinforcement learning to learn when it is rewarding to migrate a VM. The reinforcement learning reward function drives a policy toward high CPU utilization and attaches a penalty to overachieving SLAs. Similar to this paper, the CPU utilization and response time are also regarded as key indicators. However, [29] used a simulation method that is far from the real computing cluster environment used in this paper.

Compared with the above studies, this paper applies a special threshold for the boundary region that can be adjusted dynamically based on the modified Q-learning approach.

From the perspective of the optimization strategy itself, the only characteristic that is worth differentiating is the category of these algorithms. Considering the complexity of the optimization problem, heuristics are the most common algorithm applied in this research. Torre et al. [30] introduced a heuristic algorithm based on an island population model to approximate the Pareto optimal of VM placement. In some scenarios, it is difficult to design an effective heuristic algorithm for multiobjective optimization. Many bionics algorithms have been applied to solve complex problems. In [31], a hybrid IEFWA/BBO algorithm was used to construct an energy-efficient program for virtual resource management, while an ant colony algorithm was used in [32, 33] to optimize the process of live VM migration. With the growth of AI technology in recent years, machine learning algorithms have become exciting solutions for complex problems in various research areas. An individualized machine learning algorithm was introduced in [34] to increase the accuracy of load prediction. Leveraging the reinforcement learning algorithm, Peng et al. [35] designed a special approach to manage virtual resources in the IaaS cloud. Reinforcement learning is a suitable approach for simulating complex interactions between VMs/PMs and computing clusters. However, few studies on VM migration decisions exist. Applying AI skills in complex scenarios is not easy, and many factors (discretization, episodes, parameters, etc.) need to be well designed for real environments. In contrast to the above studies, this paper proposes a new decision system based on Q-learning with specific innovations. To the best of our knowledge, this is the first time rough sets have been combined with the criterion of action selection in Q-learning.

Conclusion remarks and future directions

VM migration is widely used for its convenience. However, the side effects of live migration have not received enough attention. This paper describes specific scenarios in which VM migration is misused. Focusing on the problem of overmigration, a new formulation is introduced based on rough sets and Q-learning. To solve this complex optimization problem, we establish an innovative ABDS strategy based on Q-learning. In the ABDS strategy, Q-learning is modified, and new methods are introduced. A series of experiments verify that the boundary regions from rough sets can be effectively applied in the action selection of reinforcement learning algorithms, while our ABDS strategy outperforms the benchmark in the evaluation of energy and performance tests.

There are two directions for our future work. One is to introduce the topology of networks in a hybrid computing cluster. Considering the topology, a hybrid network would greatly increase the complexity of the optimization problem, which is beyond the scope of this paper. The other objective of this paper is to explore the feasibility of using advanced AI algorithms to solve live VM migration problems. Among these advanced AI algorithms, federated learning has a special advantage in the live optimization of multiple objectives. For example, [36] obtained Pareto optimal solutions balancing resource efficiency and test accuracy. We will attempt the federated learning method because it has the potential to solve the live VM migration problem in the future.

Availability of data and materials

No datasets were generated or analysed during the current study.

References

Zhang J, Cheng L, Liu C, et al (2023) Cost-aware scheduling systems for real-time workflows in cloud: an approach based on genetic algorithm and deep reinforcement learning. Expert Syst Appl v234:120972
Cheng L, Wang Y, Cheng F, et al (2023) A deep reinforcement learning-based preemptive approach for cost-aware cloud job scheduling. IEEE transactions on sustainable computing. p 422–432
Duong- Ba TH, Nguyen T, Bose B (2021) A dynamic virtual machine placement and migration scheme for data centers. IEEE Trans Serv Comput 14(2):329–341
Article Google Scholar
Xu F, Liu F, Jin H, Vasilakos AV (N. 2013) Managing performance overhead of virtual machines in cloud computing: a survey, state of the art, and future directions. Proc IEEE 102(1):11–31
Article Google Scholar
Aldossary M (2021) A review of dynamic resource management in cloud computing environments. CSSE 36(3):461–476
Article Google Scholar
Khaleel MI, Zhu MM (2021) Adaptive virtual machine migration based on performance-to-power ratio in fog-enabled cloud data centers. J Supercomput 11986–12025. https://doi.org/10.1007/s11227-021-03753-0
Peng X, Zhenyu N, Dongbo L (2021) A power and thermal-aware virtual machine management framework based on machine learning. Cluster Comput 24(3):2231–2248. https://doi.org/10.1007/s10586-020-03228-6
Alrajeh O, Forshaw M, Thomas N (2021) Using Virtual Machine live migration in trace-driven energy-aware simulation of high-throughput computing systems. Sustain Comput Inform Syst 29(Part B):100468. .https://doi.org/10.1016/j.suscom.2020.100468
Zhou H, Qing L I, Zhu H, et al (2018) A new strategy for virtual machine migration based on decision-theoretic rough sets. IEICE Transactions on Communications, Institute of Electronics, Information and Communication, Engineers, IEICE, Tokyo, p. 2172–2185
Cheng L, Chen X, Zhao Z (2024) Preface of special issue on artificial intelligence for time-critical computing systems. Future Gener Comput Syst v159:102–104
E.Cecchet,“Rubis: Rice university bidding system” available URL: http://rubis.ow2.org, 2013
Blackburn SM, Garner R, Hoffmann C, Khang AM, Mckinley KS, Bentzur R, Diwan A, Feinberg D, Guyer SZ, Guyer SZ. The DaCapo benchmarks: Java benchmarking developmentandanalysis,” ACM sigplan conference on object-oriented programming systems, languages, and applications. 2006. pp.169–190
Yao Y (2008) Probabilistic rough set approximations. Int J Approx Reason 49(2):255–271
Article Google Scholar
Yao YY (1996) Two views of the theory of rough sets in finite universes. Int J Approx Reason 15(4):291–317
Article MathSciNet Google Scholar
Pawlak Z (1995) Rough sets. Int J Comput Inform Sci 38(11):88–95
Google Scholar
Pawlak Z (1992) Rough sets: theoretical aspects of reasoning about data. Kluwer Academic Pub, Dordrecht
Yao Y. Decision-theoretic rough set models. International Conference on Rough Sets and Knowledge Technology. 2007. pp.1–12
Yao YY, Wong SK (1992) A decision theoretic framework for approximating concepts. Int J Man Mach Stud 37(6):793–809
Article Google Scholar
He TianZhang, Toosi AN, Buyya R (S. 2019) Performance evaluation of live virtual machine migration in SDN-enabled cloud data centers. J Parallel Distributed Comput 131:55–68
Article Google Scholar
Dayarathna M, Wen Y, Fan R (2016) Data center energy consumption modeling: a survey. IEEE Commun Surv Tutor 18(1):732–794. https://doi.org/10.1109/COMST.2015.2481183
Article Google Scholar
Chen Z, Zhu Y, Di Y, Feng S (2015) Self-adaptive prediction of cloud resource demands using ensemble model and subtractivefuzzy clustering based fuzzy neural network. Comput Intell Neurosci 2015(10a):17
Google Scholar
Da-Yu XU, Yang SL, Liu RP (2013) A mixture of HMM, GA, and Elman network for load prediction in cloud-oriented data centers. J Zhejiang Univ Science C 14(11):845–858
Article Google Scholar
Zhu X, Young D, Watson BJ, Wang Z, Rolia J, Singhal S, Mckee B, Hyser C, Gmach D, Gardner R (2009) 1000 islands: an integrated approach to resource management for virtualized data centers. Cluster Comput 12(1):45–57
Article Google Scholar
Gulati A, Holler A, Ji M, Shanmuganathan G, Waldspurger C, Zhu X (2012) VMware distributed resource management: design, implementation, and lessons learned. VMware Technical J 1(1):45–64
Google Scholar
He T, Toosi AN, Buyya R. SLA-aware multiple migration planning and scheduling in SDN-NFV-enabled clouds. J Syst Softw. 2021;176. https://doi.org/10.1016/j.jss.2021.110943
Beloglazov A, Buyya R (2013) Managing overloaded hosts for dynamic consolidation of virtual machines in cloud data centers under quality of service constraints. IEEE Trans Parallel Distrib Syst 24(7):1366–1379. https://doi.org/10.1109/TPDS.2012.240
Article Google Scholar
Salehi MA, Krishna PR, Deepak KS, Buyya R (2012) Preemption-aware energy management in virtualized data centers. IEEE Fifth Intern Conference Cloud Comput 2012:844–851. https://doi.org/10.1109/CLOUD.2012.147
Article Google Scholar
Karthikeyan R, Balamurugan V. Energy-aware and SLA-guaranteed optimal virtual machine swap and migrate system in cloud-Internet of Things. Concurr Comput Pract Exp 2021;33(10). https://doi.org/10.1002/cpe.6171
Hummaida AR, Paton NW, Sakellariou R (2022) Dynamic threshold setting for VM migration. LNAI 13226:31–46. https://doi.org/10.1007/978-3-031-04718-3_2
Article Google Scholar
Torre E, Durillo JJ, de Maio V, Agrawal P, Benedict S, Saurabh N, Prodan R (2020) A dynamic evolutionary multi-objective virtual machine placement heuristic for cloud data centers. Inform Software Technol 128:106390. https://doi.org/10.1016/j.infsof.2020.106390
Article Google Scholar
Ali HM, Lee DC (2016) Optimizing the Energy Efficient VM Placement by IEFWA and Hybrid IEFWA/BBO Algorithms. Simul Series 48(8):61–68
Google Scholar
Sutar SG, Mali PJ, More AY (2020) Resource utilization enhancemnet through live virtual machine migration in cloud using ant colony optimization algorithm. Int J Speech Technol 23(1):79–85. https://doi.org/10.1007/s10772-020-09682-2
Article Google Scholar
Mahil M, Jayasree T (2021) Combined particle swarm optimization and ant colony system for energy efficient cloud data centers. Concurrency and Computation Pract Exper 33(10):e6195:1–19
Moghaddam SM, O’Sullivan M, Walker C, Piraghaj SF, Unsworth CP (2020) Embedding individualized machine learning prediction models for energy efficient VM consolidation within cloud data centers. Future Gen Comput Syst 106:221–233
Article Google Scholar
Jun PENG, Chenglong WANG, Fu JIANG, Xin GU, Yueyue MU, Weirong LIU (2020) A fast deep q-learning network edge cloud migration strategy for vehicular service. J Elect Inform Technol 42(1):58–64. https://doi.org/10.11999/JEIT190612
Article Google Scholar
Dang Q, Zhang G, Wang L, et al (2023) Hybrid IoT device selection with knowledge transfer for federated learning[J]. IEEE Internet of things journal

Download references

Acknowledgements

This work was funded in part by the National Natural Science Foundation of China under Grant 62172457, in part by key scientific research project of Henan Province under the grant 23A520047, in part by science and technology development plan of Zhoukou under the grant 2024093, in part by lifelong education special project in Henan Province under the grant 45746, in part by the research teaching series project in Henan Province under the grant 2022SYJXLX089, in part by the industry-university-research project in ministry of education under the grant 220600643282037.

Author information

Authors and Affiliations

School of Telecommunications, Zhoukou Normal University, Zhoukou, 466001, China
Xinying Zhu, Ran Xia, Hang Zhou & Shuo Zhou
School of Software, Yunnan University, Kunming, 650504, China
Haoran Liu

Authors

Xinying Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Ran Xia
View author publications
You can also search for this author in PubMed Google Scholar
Hang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Shuo Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Haoran Liu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Xinying Zhu and Ran Xia provided the ideas and wrote the main manuscript. Algorithm design and software validation were completed by Shuo Zhou and Haoran Liu. Hang Zhou provided funding acquisition and proofreading work.

Corresponding author

Correspondence to Xinying Zhu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zhu, X., Xia, R., Zhou, H. et al. An intelligent decision system for virtual machine migration based on specific Q-learning. J Cloud Comp 13, 122 (2024). https://doi.org/10.1186/s13677-024-00684-y

Download citation

Received: 19 April 2024
Accepted: 03 July 2024
Published: 17 July 2024
DOI: https://doi.org/10.1186/s13677-024-00684-y

An intelligent decision system for virtual machine migration based on specific Q-learning

Summary

Introduction

Motivation scenarios

Problem formulation

Strategy interpretation

Macro configuration of the ABDS

Calculation and updating of the Q table in ABDS

Adaptive adjustment of the boundary thresholds

Performance evaluation

Experimental settings and performance metrics

Comparative algorithms and experimental results

Related works

Conclusion remarks and future directions

Availability of data and materials

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords