Using genetic algorithms to find optimal solution in a search space for a cloud predictive cost-driven decision maker

Nikravesh, Ali Yadav; Ajila, Samuel A.; Lung, Chung-Horng

doi:10.1186/s13677-018-0122-7

Research
Open access
Published: 07 November 2018

Using genetic algorithms to find optimal solution in a search space for a cloud predictive cost-driven decision maker

Journal of Cloud Computing volume 7, Article number: 20 (2018) Cite this article

6826 Accesses
9 Citations
2 Altmetric
Metrics details

Abstract

In a cloud computing environment there are two types of cost associated with the auto-scaling systems: resource cost and Service Level Agreement (SLA) violation cost. The goal of an auto-scaling system is to find a balance between these costs and minimize the total auto-scaling cost. However, the existing auto-scaling systems neglect the cloud client’s cost preferences in minimizing the total auto-scaling cost. This paper presents a cost-driven decision maker which considers the cloud client’s cost preferences and uses the genetic algorithm to configure a rule-based system to minimize the total auto-scaling cost. The proposed cost-driven decision maker together with a prediction suite makes a predictive auto-scaling system which is up to 25% more accurate than the Amazon auto-scaling system. The proposed auto-scaling system is scoped to the business tier of the cloud services. Furthermore, a simulation package is built to simulate the effect of VM boot-up time, Smart Kill, and configuration parameters on the cost factors of a rule-based decision maker.

Introduction

The elastic nature of cloud computing enables cloud clients to benefit from the cloud’s pay-as-you-go pricing model, which reduces cloud clients’ capital expenses and their overall operational costs. However, maintaining Service Level Agreements (SLAs) with the end users obliges the cloud service provider to provide a certain level of Quality-of-Service (QoS) and the cloud service provider gets penalized if the cloud service fails to meet the desired SLAs.

Deciding the optimal amount of resources in a cloud computing environment is a double-edged sword which may lead to either under-provisioning or over-provisioning conditions. Under-provisioning condition is a result of saturation of the resources and may cause SLA violation. In contrast, over-provisioning condition occurs when the provisioned resources are wasted which results in excessive energy consumption and high operational cost [1]. Auto-scaling systems are developed to automatically balance a cost/performance trade-off and prevent the under-provisioning and over-provisioning conditions.

Figure 1 illustrates the typical stakeholders and their relationships in an Infrastructure-as-a-Service (IaaS) environment.

The three stakeholders in the IaaS environment are [2]:

Cloud infrastructure provider: refers to the IaaS provider who offers logically unlimited virtual resources in the form of virtual machines (VMs), virtual networks, etc.
Cloud client: is the customer of the IaaS provider who uses the infrastructure for hosting the cloud service. The cloud client also is known as the cloud service provider.
End user: is the user that accesses the cloud service and generates the workload that drives the cloud service’s behavior.

There are two types of SLAs in a cloud computing environment: SLAs between the end user and the cloud client, and SLAs between the cloud client and the cloud infrastructure provider. This paper investigates the cost/performance trade-off from the cloud clients’ perspective. From the cloud client’s point-of-view the auto-scaling goal is to reduce resource cost (i.e., the cost of the leased resources from the IaaS provider) and the SLA violation cost (i.e., the cost that is associated with the SLA breaches), at the same time.

According to [3], rule-based systems are the most popular auto-scaling system in the commercial cloud computing environments. The rule-based systems reactively provision resources for the cloud service based on a set of scaling rules. However, the rule-based systems suffer from two main shortcomings [3]: a) their reactive nature, and b) the difficulty of selecting a correct set of configuration parameters. This paper investigates the impacts of these shortcomings on the accuracy of the rule-based systems and proposes an auto-scaling system to overcome the issues.

The reactive nature of the rule-based systems allows them to scaled-in or scaled-out a cloud service as soon as the performance of the cloud service reaches a predefined threshold. However, it takes between 5 and 15 min to boot-up a new VM and scaled-out the cloud service [4,5,6]. During the VM boot-up time the cloud service will be in the under-provisioning condition which may cause SLA violations. Therefore, the main shortcoming of the reactive auto-scaling systems (including the rule-based systems) is neglecting the VM boot-up time. The proposed auto-scaling system forecasts the future workload of the cloud service and generates the scaling requests ahead of time. This way, a new VM will be ready before the workload surge arrives to the cloud service.

The second shortcoming of using the rule-based systems is the configuration difficulty. A rule-based auto-scaling system has a set of configuration parameters which impacts its accuracy. Therefore, selecting the correct values for the configuration parameters is crucial in achieving an accurate auto-scaling system. In addition, the configuration values affect the auto-scaling system’s decisions on how to balance the resource cost and the SLA violation cost. Since different cloud clients have different cost preferences, the auto-scaling system should be able to find a balance between the resource cost and the SLA violation cost based on the cloud clients’ preferences. The proposed auto-scaling system uses genetic algorithm principle to automatically identify an optimum configuration of the rule-based systems. The focus of this paper is on the configuration issue. The proposed genetic algorithm considers the cloud client’s cost preferences to find the optimum configuration set. Figure 2 shows the architecture of the proposed auto-scaling system and it consists of a “self-adaptive prediction suite” and a “cost driven decision maker”.

In our previous work [6] we proposed a self-adaptive prediction suite which automatically chooses the most suitable prediction algorithm based on the incoming workload pattern to forecast the future workload of the cloud service. In this paper we propose a cost-driven decision maker that minimizes the auto-scaling cost according to the cloud client’s cost preferences. The research question here is: “How to configure a rule-based decision maker to minimize the total auto-scaling cost based on the cloud clients’ cost preferences?”

The main contributions of this paper are:

A novel cost driven decision maker to reduce the total auto-scaling cost based on the cloud client’s preferences.
An evaluation of our predictive auto-scaling system [6] against the Amazon auto-scaling system.
An investigation of the impact of the VM boot-up time on the accuracy of the rule-based auto-scaling systems.
An investigation of the impact of the configuration parameters on the accuracy of the rule-based auto-scaling systems.

The remainder of this paper is organized as follows: section 2.0 discusses the background, the related work, auto-scaling accuracy and cost driven decision maker. In section 3.0, experiments are presented that show the impact of VM boot-up time, configuration parameters, and smart kill on auto-scaling accuracy and cost. This is followed with the proposed optimum configuration for auto-scaling problem using genetic algorithm. The evaluation of the cost driven decision maker and the predictive auto-scaling system is presented in section 5.0. The conclusion and possible future directions for the research are discussed in section 6.0.

Background and related work

In this section we present an overview of the existing auto-scaling systems, and describe the rule-based auto-scaling technique and introduce its configuration parameters. In addition, we summarize our previous work [6] on self-adaptive prediction auto-scaling suite. This summary is necessary for understanding the present research work in this paper. The authors in [3] group the existing auto-scaling approaches into five categories: rule based technique, reinforcement learning, queuing theory, control theory, and time-series analysis. Among these categories, the time-series analysis focuses on the prediction side of the resource provisioning task and is not a “decision making” technique per se. In contrast, the rule-based technique is a pure decision making mechanism while the rest of the auto-scaling categories plays the predicator and the decision maker roles at the same time. The rule based technique is the only approach which is widely used in the commercial auto-scaling systems [7,8,9].

Existing auto-scaling systems

Auto-scaling systems can be grouped into reactive and predictive categories. Reactive systems scale-in or -out a cloud service based on the current performance of the cloud service. Reactive systems use either rule-based or schedule-based techniques to carry out the auto-scaling task. Rule-based systems use a set of scaling rules to scale-in or -out a cloud service when its performance reaches a predefined threshold. Schedule-based mechanism allows cloud clients to add or remove VMs at a given time and are suitable when the changes in the workload are known ahead of time [10]. However, not all of the cloud services have time-based workload patterns, and it is not straightforward for the cloud clients to correctly determine all the related scaling indicators or the thresholds based on the performance goals [10].

Predictive auto-scaling systems forecast the cloud service’s future workload and adjust the compute and the storage capacity in advance to meet the future needs. Predictive auto-scaling systems can be grouped into four categories [3]: reinforcement learning, queuing theory, control theory, and time-series analysis. Among these categories, the time-series analysis focuses on the prediction side of the resource provisioning task and is not a “decision making” technique per se. Therefore, a time-series analysis technique should be bundled with a decision maker to create a predictive auto-scaling system. Queuing theory models each VM as a queue of requests and calculates the performance metrics’ values. The calculated values are used to generate a scale action. Reinforcement learning algorithms handle the auto-scaling task without any à priori knowledge or system model. However, the time for the reinforcement learning methods to converge to an optimal policy can be unfeasibly long. Control theory creates a reactive or a predictive controller to automatically adjusting the required resources to the cloud service’s demand. Readers are encouraged to see [3] for more details about the different decision making approaches.

The proposed auto-scaling system (see Fig. 2) avails the predictive approach to carry out the auto-scaling task. Since time-series analysis is the most dominant prediction technique in the cloud auto-scaling domain [3], the prediction suite uses the time-series analysis technique to forecast the future workload of the cloud service. Moreover, our prediction suite applies decision fusion technique [11] to increase the prediction accuracy (see [6] for more details on the prediction suite). The cost driven decision maker uses the rule-based technique to generate the scaling decisions. Although the rule-based technique is easy to use, it is not a trivial task to configure the rule-based systems. The proposed cost-driven decision maker uses the genetic algorithm principle to overcome this problem.

Self-adaptive prediction suite

This subsection summarizes our previous work in [6] which serves as a foundation for the research work in this paper. Researchers have already used prediction methods to alleviate the reactive nature of the rule-based systems. However, the existing predictive auto-scaling systems use only one prediction method to forecast the future performance condition of the cloud service. Therefore, to increase the prediction accuracy, our predictive auto-scaling system identifies the pattern of the incoming workload and chooses the prediction algorithm based on the detected pattern. Therefore, the self-adaptive suite automatically chooses:

The Multi-layer Perception (MLP) prediction model to forecast the workload in the environments with the unpredictable workload pattern
The Multi-Layer Perception with Weight Decay (MLPWD) prediction model to forecast the workload in the environments with the periodic workload pattern
The Support Vector Machine (SVM) prediction model to forecast the workload in the environments with the growing workload pattern

Readers are encouraged to read the paper in [6] for more details about the reasons to choose the aforementioned prediction models and their corresponding environments.

The objective of a classical self-adaptive system is to make the system self-managed as a result of objects change or environmental influence on the inputs. A requirement in this context is that the system must be able to keep knowledge about its past, present, and future goals. In our case, self-adaptive prediction suite architecture is designed by adapting the classical autonomic system architecture to the cloud auto-scaling system (see Fig. 3). The cloud auto-scaling architecture consists of a cloud workload context element; a cloud auto-scaling system which includes the meta-autonomic elements (the workload pattern and the cloud auto-scaling); and a cloud computing scaling decisions element. In addition, an element for the autonomic manager, knowledge, and goals is added to the architecture. The cloud workload usage represents the real world usage context while the cloud computing scaling decisions represents the computing environment context. It is important to note that an autonomic system always operates and executes within a context. The context in general is defined by the environment as well as the runtime behavior of the system. The purpose of the autonomic manager is to apply the domain specific knowledge which is linked to the cloud workload pattern and apply the appropriate predictor algorithm (see Fig. 3) to predict the future workload. The cloud autonomic manager is constructed around the analyze/decide/act control loop.

The prediction suite identifies the pattern of the incoming workload and chooses the most accurate prediction algorithm based on the workload pattern. The cloud auto scaling autonomic elements (i.e., the workload patterns and the predictor component) are designed such that the architecture can be implemented using the strategy software design pattern (see Fig. 4). The strategy software design pattern consists of a strategy and a context. In the cloud auto-scaling domain, the predictor is the strategy and the workload pattern is the context. In general the strategy and the context interact to implement the chosen algorithm. A context passes all of the data (i.e., the workload pattern) that is required by the algorithm to the strategy.

Rule-based systems

In the rule-based auto-scaling, the number of the leased VMs varies according to a set of scaling rules. A scaling rule has two parts: the condition and the action to be executed when the condition is met. The condition part of a scaling rule uses one or more performance indicator(s), such as the average response time or the average workload. A typical rule-based system has six configuration parameters: the upper threshold (thrU), the lower threshold (thrL), the upper scaling duration (durU), the lower scaling duration (durL), the upper cool-down duration (inU), and the lower cool-down duration (inL). A performance indicator has an upper (i.e., thrU) and a lower (i.e., thrL) thresholds. If the scaling condition is met for a given duration (i.e., durU or durL) then the corresponding action will be triggered. After executing a scale action, the decision maker stops itself for a cool-down period which is defined by inU or inL.

Some research works have proposed additional parameters to improve the auto-scaling accuracy. For instance, the proposed method in [12] uses two upper and two lower thresholds to determine the trend of the performance indicator. Considering the trend of the performance indicator helps to predict the future performance of the cloud service and generate the scale actions ahead of time. Although the proposed method in [12] generates the scale actions ahead of time, it does not have a better accuracy compared to the traditional rule-based systems [3]. This paper uses a typical rule-based system to scale-in (or -out) the cloud service.

Specification of the auto-scaling accuracy

The auto-scaling accuracy is closely related to the cost incurred by the cloud clients. The more accurate the auto-scaling system, the lower the cost incurred by the cloud clients. Therefore, cost is the main metric that measures the accuracy of the auto-scaling systems. From the cloud client’s perspective, there are two types of costs associated with the auto-scaling systems: resource cost (C_R) and SLA violation cost (C_SLA).

Resource cost refers to the cost of the leased VMs and can be measured by the number of the leased VMs and their hourly rental rate. This paper assumes that the IaaS provider supplies only one type of VM with a fixed hourly rate. Then, the resource cost can be measured by:

$$ {C}_R={\sum}_{t=0}^T{n}_t\times {c}_{vm} $$

(1)

where T is the total hours that the auto-scaling system is running, c_vm is the hourly rate of leasing a VM, and n_t is the number of the leased VMs between hour t and t + 1.

SLA violation cost is the cost associated with the SLA breaches. A SLA breach (i.e., SLA violation) refers to any act or behavior that does not comply with the SLAs document. In this paper, response time is considered to be the main Quality-of-Service factor and any request with a response time more than the maximum response time (which is defined in the SLAs document) is recognized as a SLA violation. Therefore, total number of SLA violations v_t at time t is defined as:

$$ {\displaystyle \begin{array}{l}{v}_t={\sum}_{req=1}^N{v}_{t, req}\\ {}{v}_{t, req}=\left\{\begin{array}{c}1\kern3.75em if\ \left({r}_{req}-R\right)>0\\ {}0\kern6.75em otherwise\ \end{array}\right.\end{array}} $$

(2)

where req represents an incoming request, N is the total number of requests at time t, r_req is the response time of the request req, and R is the maximum response time defined in the SLAs document.

Measuring SLA violation cost depends on different factors, such as the downtime duration of the cloud service, the number of affected end users, and even the sociological aspects of the end users’ behaviors. In this paper a constant penalty c_b is assigned to each SLA violation. The value of c_b is defined by the cloud client who provides the cloud service. The SLA violation cost is:

$$ {C}_{SLA}={\sum}_{t=o}^T{v}_t\times {c}_b $$

(3)

A highly accurate auto-scaling system prevents SLA violations as well as reduces the resource cost. However, it is not possible to minimize the number of the SLA violations and the resource cost at the same time. Adding more infrastructural resources reduces the number of the SLA violations, but results in an excessive resource cost. On the other hand, releasing the infrastructural resources saves the resource cost, but increases the number of the SLA violations. Therefore, the auto-scaling system’s job is to find the balance between the SLA violations and the resource cost (i.e., the cost/performance trade-off). The optimum solution to this trade-off varies for the different cloud clients. The smaller businesses that do not have many end users, such as startup companies, usually prefer to reduce the resource cost, while the bigger businesses that have many end users, such as eBay or Netflix, prefer to minimize the SLA violations. Therefore, the cloud client’s cost preference is one of the factors that should be considered by the auto-scaling system to solve the cost/performance trade-off.

Specification of the cost-driven decision maker

Recall that a performance indicator has an upper (i.e., thrU) and a lower (i.e., thrL) threshold. If the scaling condition is met for a given duration (i.e., durU or durL) then the corresponding action will be triggered. After executing a scale action, the decision maker stops itself for a small cool-down period which is defined by inU or inL. In order to have an accurate rule-based system it is crucial to configure the system such that the resource cost and SLA violation cost are minimized. However, the resource cost and the SLA violation cost cannot be minimized at the same time and the balance point between them depends on the cloud client’s cost preference (see Section 2.4.).

The objective here is to find the best value for each of the configuration parameters such that the configured rule-based decision maker minimizes the final auto-scaling cost. Since the domain of valid values for each of the parameters is known, the universal set of possible solutions can be created where each solution is a valid combination of the parameters. Then to find the optimal solution, the search space (i.e., the universal set of the solutions) is traversed and the solution with the least auto-scaling cost is found. To measure the auto-scaling cost of a given solution, a decision maker is configured with the parameters of that solution, and an auto-scaling simulation is run for a predefined duration to calculate the total auto-scaling cost of the decision maker. In this paper an in-house simulation package [13] is implemented and used to carry out the simulations. Based on the simulation result, measuring the auto-scaling cost of a given solution averagely takes five minutes. Therefore, for a search space with 100 possible solutions, it takes 500 min (more than eight hours) to traverse the search space and find the optimum solution.

For example, assume that in an auto-scaling environment the CPU utilization is considered as the performance indicator. Since the CPU utilization value is always between 0 and 100, the upper threshold can take 100 different values. In addition, the lower threshold can take any value greater than zero and less than the upper threshold. Moreover, suppose that the inU, inL, durU, and durL take any values between 0 and 5 min. In this environment, the universal set includes 6,413,904 valid solutions. Given that measuring the total cost of a solution takes 5 min, traversing the whole search space takes 32,069,520 min (i.e., more than 61 years), which is infeasible to perform. Therefore, this paper proposes a genetic algorithm model to find an optimal solution within the search space in a shorter time.

Impact of VM-boot-up time, SMARTKILL and configuration parameters on AUTOSCALING accuracy and costs

In this section we present the result of three experiments on the impact of VM boot-up time on auto-scaling accuracy, and the impact of the smart kill technique on auto-scaling cost factors and the impact of configuration parameters on auto-scaling accuracy. The results of impact of configuration parameters show how difficult it is for a cloud client to handle the configuration parameters. In addition, the smart kill results show how important the smart kill is in decreasing resource cost or reducing SLA violations.

VM boot-up time Vis-à-Vis the auto-scaling accuracy

An in-house simulation package [13] is developed and used to carry out simulation on the effect of VM boot-up time on the cost factors of a rule-based decision maker.

In the simulation, three instances of the cloud workload patterns are sent to a multi-layer cloud service which is deployed to an IaaS infrastructure. In each of the monitoring intervals, the decision maker compares the incoming workload with the capacity of the cloud service. The capacity of the cloud service refers to the number of the requests that cloud service can accommodate per second. If the incoming workload exceeds the cloud service’s capacity (i.e., the upper threshold) then the auto-scaling system scales up the infrastructure. If the cloud service’s upper threshold is not reached, the auto-scaling system verifies whether the cloud service is still able to accommodate the incoming workload after releasing one of the provisioned VMs (i.e., the lower threshold). If so, then the auto-scaling system scales down the cloud service. Table 1 shows the configuration parameters that are used in the simulation.

Table 1 Simulation parameters and values

Using genetic algorithms to find optimal solution in a search space for a cloud predictive cost-driven decision maker

Abstract

Introduction

Background and related work

Existing auto-scaling systems

Self-adaptive prediction suite

Rule-based systems

Specification of the auto-scaling accuracy

Specification of the cost-driven decision maker

Impact of VM-boot-up time, SMARTKILL and configuration parameters on AUTOSCALING accuracy and costs

VM boot-up time Vis-à-Vis the auto-scaling accuracy

The smart kill effect on the auto-scaling cost factors

Impact of the configuration parameters on the auto-scaling accuracy

The upper threshold (thrU)

The lower threshold (thrL)

The duration parameters (durU and durL)

The freezing durations (inU and inL)

Summary

Finding an optimum configuration for AUTOSCALING problem

Chromosome representation

Fitness function

Genetic operators

Finding the optimal values for the genetic algorithm

Population size

Stop condition

Crossover rate

Mutation rate

The proposed decision maker algorithm

Complexities of Algorithms 1 and 2

Evaluation of the COST-driven decision maker and the auto-scaling system

Evaluation of the cost-driven decision maker

Evaluation of the combined decision maker and the predictive auto-scaling system

Conclusions, open challenge, and future work

Abbreviations

References

Acknowledgements

Funding

Availability of data and materials

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Authors’ information

Competing interests

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords