Two-level fuzzy-neural load distribution strategy in cloud-based web system

Cloud computing Web systems are today the most important part of the Web. Many companies transfer their services to the cloud in order to avoid infrastructure aging and thus preventing less efficient computing. Distribution of the load is a crucial problem in cloud computing systems. Due to the specifics of network traffic, providing an acceptable time of access to the Web content is not trivial. The utilization of the load distribution with adaptive intelligent distribution strategies can deliver the highest quality of service, short service time and reduce the costs. In the article, a new, two-level, intelligent HTTP request distribution strategy is presented. In the process of designing the architecture of the proposed solution, the results of earlier studies and experiments were taken into account. The proposed decision system contains fuzzy-neural models yielding minimal service times in the Web cloud. The article contains a description of the new solution and the test-bed. In the end, the results of the experiments are discussed and conclusions and presented.


Introduction
Today cloud computing is essential in running most online businesses and plays a very important role in IT. It has opened new opportunities for providing large-scale computing resources [1]. Cloud computing systems enable their clients to access a shared pool of computing resources like servers, storage, applications and services that can be dynamically configured and delivered ondemand [2,3]. This kind of organization of resources improves the general performance, utilization of resources, energy consumption management and helps to avoid SLA (Service Level Agreement) violation [4]. To achieve those kinds of goals load balancing mechanisms are implemented. Those techniques are known and used for decades in distributed systems like cluster and gridbased systems. However, the proper (effective) distribution of the load is still an open problem in cloud computing that needs new architecture structures and algorithms to meet new customer demands. Little comprehensive research about load balancing in the field of cloud computing has been done.
Mainly in load balancing approaches, we can distinguish very simple techniques in which decisions are taken very fast but the quality of decision is low. More sophisticated algorithms taking in to account the state of the controlled system and intelligent approaches offer high quality of services.
The intelligent approaches can significantly improve the utilization of resources and let to achieve aims required by administrators of the systems.
In the previous work [5][6][7][8] a research on intelligent fuzzy-neural distribution methods minimizing service time in cluster-based and cloud-based Web systems has been presented. In the field of cloud systems, an effective HTTP request distribution system using two-layer architecture, in which decisions were made on two independent levels by web switches learning mutual behavior has been proposed [9,10]. In this article, a novel HTTP request distribution method using one-layer architecture is presented. The decision mechanism, being a key element of the method, uses fuzzy-neuro techniques to make decisions on two levels. On the first level, an algorithm is choosing a group of web servers, while in the second level, the next algorithm selects the specific server, out of the chosen group, to service the HTTP request. The presented solution is heuristic, uses intelligent and adaptive decision algorithms and is designed to minimize the service time of HTTP requests in cloud-based Web systems.
The rest of the article is composed as follows. In section two the related work is presented with a description of the previous, selected works constituting the basis for a new solution. Section three contains a description of the new HTTP distribution method. In section four the test-bed and the results of experiments are discussed. Section five summarizes the article.

Related work
Cloud computing, similarly to many other computer technologies uses distributed systems to meet the very high demand for computational power. To distribute tasks and load among nodes in the system load balancing methods are used [11]. Those methods help to improve utilization of the resources, reduce the response times and enable elastic scalability, which is an essential part of the cloud computing [12,13].
Load balancing mechanisms used in Web clouds can be divided into three main categories [11,[14][15][16][17]: static, dynamic and adaptive. In static load balancing assignments are conducted with the use of deterministic or probabilistic algorithms that do not take in to account the current state of the system. Simple strategies have been popular from the beginning of cluster-based Web systems and they are still being improved. An example of such a strategy is Round Robin, assigning incoming HTTP requests to the subsequent servers. A slightly more intelligent version taking into account the Web server load was created by Xu Zongyu and Wang Xingxuan [18]. Stochastic strategies are also still developed and become more and more sophisticated [4,19,20].
In a dynamic approach, the decisions are made on the basis of the current state of the system. The most popular in AWS (Amazon Web Services) [21] dynamic load balancing algorithm is Least Load, assigning HTTP requests to the nodes with the least value of chosen load measure.
The adaptive approach is the most complex one and the decisions are not only made on the basis of the state of the system but also the strategy can change when the state of the system is changing [22]. Most of the adaptive strategies are intelligent approaches. Many of the specialists claim that only this kind of strategy can effectively provide an acceptable time of access to the Web content in the conditions of typical Web traffic, characterized by self-similarity and burstiness [23][24][25][26]. Among many artificial techniques used in intelligent strategies, we can distinguish Ant Colony Optimization (ACO) algorithm proposed by Kumar Nishant et al. [27], in which generated ants traverse the width and length of the cloud network in the way that they know about the location of both the under-loaded and over-loaded nodes. Another interesting HTTP request distribution method using the natural phenomena-based strategies is called Artificial Bee Colony (ABC) [25]. It uses a decision mechanism imitating the behavior of a bee colony. Particle Swarm Optimization (PSO) strategy is another solution based on heuristic algorithms and is designed to schedule requests to individual components of the cloud [28].
Artificial neural networks have been also used in adaptive load distribution systems [29][30][31]. A good example of a solution taking into account the energy consumption is presented in [32].
Another group of intelligent adaptive approaches, using fuzzy-neural models, was proposed in the articles of the author and the research group. Those are the solutions enabling global distributions among server rooms located in different geographical locations e.g. GARD [6] and GARDIB [8] and local approaches like FARD [5] and FNRD strategies [7]. All the systems work in this way to minimize the response time for each HTTP request separately. In the latest work [9,10], a proposal of a two-layer architecture of the Web cloud was made. In such a solution, the devices called Web switches are making independent decisions on two layers. On the first layer, a Web switch is distributing HTTP requests among the availability zones -groups of servers located in the same geographical location (a region) ( Fig. 1.a). On the second layer Web, switches are distributing requests inside availability zones (simply zones). The obtained results showed a much better performance for the intelligent fuzzy-neural web switches than for nonintelligent strategies. We compared also results of the work of our intelligent FNRD Web switch in one-layer architecture ( Fig. 1.b) and two-layer architecture. In one-layer architecture one Web switch was distributing requests among all servers in the zones.
Results for two-layer architecture were surprisingly much better than for one layer architecture.
Taking into account the results of the latest research, a new distribution method and design of a new Web switch for one-layer architecture are proposed in this article. In the new method, all of the Web servers are divided into groups, a separate mechanism in the Web switch is choosing a group of servers and another mechanism chooses a server to service the request in the group. The new strategy is called Two-Level Fuzzy-Neural Request Distribution, or simply TLFNRD.
The main difference between methods and algorithms presented in the works [5][6][7][8][9][10] and the TLFNRD strategy is that the previously proposed Web switches and brokers make the distribution decision only on one level (Fig. 2a). Those switches and brokers are estimating service times using the fuzzy-neural mechanism for all executors servicing HTTP requests (e.g. Web servers or availability zones). In the TLFNRD strategy, Web switch makes decisions on two levels (Fig. 2b). On each level, separate fuzzy-neural mechanism estimates service times for logical groups of servers (on the first level) and servers itself (on the second level). In the proposed solution, neuro-fuzzy mechanisms on different levels can, in some way, cooperate and learn mutual behaviors.
This article should help to answer if a two-level fuzzyneural distribution strategy TLFNRD is better then a simple one-level FNRD approach.
The detailed description of the new TLFNRD solution is presented in the next section.

Two-level fuzzy-neural web switch
The main aim of the proposed Web switch is to distribute HTTP requests to minimize service time for each  request. Service times are measured from the moment the Web switch sends a request to the chosen server to the moment the Web switch receives the response. The Web switch is processing requests in the order in which they are received. Requests are not queued or scheduled.
In addition, all Web servers working in the cloud can service all of the HTTP requests. It should be noticed here that the design of the presented switch does not include all of the features of Web switches used in practical applications. Due to the clarity of the presentation, it lacks solutions connected with security, resistance to cyber-attacks, and failover system used when the Web server or the Web switch fails itself. However, the presented method of request distribution fully enables the implementation of the indicated mechanisms, and in particular, the design of the request distribution algorithm supports the failover mechanism.
As mentioned above, the Web switch makes its decisions on two logical levels. On the first level, a group of servers is chosen. On the second level, a single server is selected from the group.
The overall construction of the Web switch is presented in Fig. 3. The Web switch consists of the following modules: the request analysis, group, server group, redirection, and measurement modules.
The HTTP request is redirected in the following way in the presented Web switch. At first, the request is classified. Request belonging to the same class have similar service times. The group module chooses a group of servers that can service the HTTP request in the shortest time. The server group module, for which group was chosen, selects the Web server for which the estimated service time is the shortest. The redirection module sends the HTTP request to the chosen server. After servicing the request, the HTTP response is sent back by the Web server to the Web switch and the Web switch passes it to the client (due to clarity reasons this process is not presented in Fig. 3). The measurement module measures the real service time and sends it to the group module and to the server group module, which was previously chosen. Both of the modules update information about processing time in the cloud.
In the following subsections, the decision-making process is described in detail.

Classification of http requests
The incoming HTTP request r i (where i is the index of request and i = 1, …, I) is at the beginning assigned to a class k i (k i ∈ {1, …, K}) in the request analysis module. The classification is made in this way to make requests having a similar response time to belong to the same class. Serviced HTTP requests should be constructed properly according to the HTTP protocol. In the system, there are distinguished two types of HTTP requests: static and dynamic. Static requests have responses delivered from the files (like HTML files, jpg, png, and other picture files) placed on the Web server, and are classified by their sizes. Dynamic requests have content generated by the Web serve after the request arrival (by executing scripts on the server like PHP, Python, Java or .Net Core, e.t.c.), and are classified separately by its address. The effectiveness of this method of classification has been confirmed both in simulation experiments as well as in research on a real cluster system [7,33].

The FIRST decision level
The group module makes the decision on the first level. It chooses a group of servers that will deliver the response. Dividing the servers into groups is not physical but only logical. The group module chooses this group of servers g i (where g i ∈ {1, …, G}), for which the estimated service time is the shortest. The decision is made by taking into account a load of groups of servers g i and f g i are measures of the load on a server group g i at the moment of arrival of i th request. Those measures were chosen according to the experiments in [5,7,33]. The e g i is the overall number of requests being currently serviced by the group of servers, and f g i is the number of dynamic requests serviced. The group module also adapts to the changing environment by taking in to account a measured service times i after the request service.
The information MD i ands i are delivered by the measurement module.
The construction of the group module is complex and is similar to the construction of the server group module. A detailed description of both of the modules and the overall working functionality is presented in section 3.4.

The second decision level
There are as many server group modules as the number G of groups of servers. Those modules constitute the second decision level. The server group module chooses the server z i to service the i th request. It is assumed that all servers are general-purpose Web servers using the same hardware and being able to service each HTTP request within the Web service.
In the Web switch only the g i th server group module, chosen by the group module for the given request, is making the decision. The construction and the action of the server group module are exactly the same, as the group module, and is described in section 3.4. The inputs for the module are a load of servers in the group ML Z is the number of servers in the group, e z i and f z i are measures of the load on a server z and the meaning of them is the same as for the group module. Also, the module is adapting to the environment by taking into account the measured service times i , but only in the case when, one of the Web servers of the group was servicing the i th request.
The redirection module, with the use of TCP/IP protocol stack in the operating system, redirects the r i th request to the chosen Web server z i . The Web switch also receives the response from the server and sends it to that client, which sent the request (Fig.  1b). This process is not included in Fig. 3 because it is not important for our considerations.
The measurement module collects information MD i , ML 1 i ; …; ML g i ; …; ML G i ands i necessary for other modules to make decisions. Because the Web switch sent the HTTP requests to Web servers and receives HTTP responses it can measure the service times i and the number of static and dynamic HTTP requests being serviced by individual Web servers. Importantly, this module acquires information available on the Web switch and does not need to use other data sources.

The process of choosing an executor to service the request
The group module and the server group module choose the group of servers or a single server to service the HTTP request. Both of the decision elements act in a similar way and do not require information about the internal structure of the part of the system for which the decision is taken. For this reason, the group module and the server group are called in this section selection module. Consequently, a group of servers or a single server are called an executor. Figure 4 presents the overall structure of the selection module. The selection module contains a decision module and as many executors models as the number of executors (a group of servers or Web servers) belonging to the group.
Each of the executor models estimates service time for executor it corresponds to and for the i th request belonging to the class k i . The estimation is done every time a new HTTP request arrives before making the distribution decision on the basis of the information of the load of the executor M w i ¼ ½e w i ; f w i (which is equivalent to the load MG g i and MS z i from the previous sections), where w is an index of the executor, and w = 1, …, W. The executor model updates its information about the executor, by taking into account the measured service times i , only if the executor model corresponds to the element of the system that serviced the i th request.
The decision module chooses the executor d i for which the estimated service timeŝ w i is the shortest, according to The key element of the system is the executor module which estimates the service time for the given executor and can adapt to the changing environment. It owes its capabilities to the use of a fuzzy-neural mechanism. This kind of construction of the executor was introduced and described in detail in [7].
The fuzzy structure of the system is based on the Mamdani [34] model, while the neural approach permits to change the parameters of input and output fuzzy sets.
The overall structure of the executor module fuzzyneural network is presented in Fig. 5 (Fig. 5.c).
The estimated service time is calculated in the following waŷ The process of adaptation is conducted every time the executor, corresponding to the executor model, services the request. Both the input and output fuzzy set parameters are tuned with the use of the Back Propagation Method [35] taking into account the measured service times i . Modification of the parameters is conducted in the following way: where η s , η c , η d are adaptation ratios, φ = 1, …, L − 1, γ = 1, …, M − 1 [7]. In the beginning, the values of the input parameters c 1ki , …, c lki , …, c Lki and d 1ki , …, d mki , …, d Mki are evenly distributed over the space of executor operation, while the output parameters are set to zero. In this way, the estimated service times are always close to zero for those executors for which the decision system is not yet learned. Over time, the Web switch adapts and the estimated service times become longer and closer to the real service times [7]. In the preliminary experiments, the optimal value of the number of input fuzzy sets has been determined as L = M = 10, and the number of output fuzzy sets is equal to J = L · M.

Experiments and discussion
Research and experiments which were conducted for Web switches working in two-layer architecture showed that the cooperation of intelligent switches can significantly reduce the service time. In this section, it will be determined whether the use of a single-layer architecture with the two-level decision-making strategy TLFNRD is advantageous and better than the one-level intelligent FNRD decision strategy.
To evaluate the proposed system, simulation experiments have been conducted. The simulation program was written in the OMNeT++ environment. The OMNeT ++ provides appropriate libraries as well as the environment for conducting simulation and is the most popular system for evaluating networking systems [36].
The simulation program was divided into independent modules that imitate the behavior of different parts of the real system, namely: HTTP request generator, Web switch, Web servers, and database server. The scheme of the simulator is presented in Fig. 6. To determine the values of the real system parameters, which can be used in the simulation, preliminary experiments have been conducted in a manner similar to that in [37]. The experiments were conducted for Web server with a computer equipped with Intel Core i7 7800X CPU, a Samsung SSD 850 EVO driver and 32 GB of RAM. The Apache Web server was running WordPress, the most popular CMS system in the world which is used on more than 25% of the world's websites [38]. Thank that it was possible to simulate the behavior of many real, business-oriented Websites.
The module of the request generator in the simulator contained many submodules of clients, each of which behaved like a real Web browser. To download a Web page, they were downloading the first document with HTML content, and then opening up to 6 TCP connections to fetch other elements of the page like css, js, pictures and other files. The number of Web pages downloaded by a single client during one session was modeled according to the behavior of human beings with the use of the Inverse Gaussian distribution (μ = 3.86, λ = 9.46). The time between the opening of subsequent pages (the user think time) was modeled according to the Pareto distribution (α = 1.4, k = 1) [39]. Each client after finishing its session was deleted and a new one was invoked.
Each of the clients was downloading a simulated Web site whose parameters (type and size of HTML and nested objects) were exactly the same as those in the very popular site https://www.sonymusic.com [40] running also on WordPress.
The Web switch in the simulator was able to distribute HTTP requests with the use of strategies popular in Amazon AWS [21] Web switches and with the use of intelligent strategies, namely: Round Robin (RR); Least Load (LL)assigns HTTP requests to the nodes with the lowest number of serviced HTTP requests; Partitioning (P)assigns requests to servers previously chosen for this kind of request. In the experiments, a modification of the P algorithm was used. It was more adaptive and behaving like LARD algorithm [41], in which, if the server is overloaded than the service of a given type of requests is moved to the least loaded server; Fuzzy-Neural Request Distribution (FNRD)intelligent strategy using fuzzy-neural approach and single-level decision algorithm [7]; Two-Level Fuzzy-Neural Request Distribution (TLFNRD)the new approach.
In order to properly evaluate the TLFNRD and FNRD strategy, the simulator should have implemented other intelligent strategies known from the literature. However, the implementation of those complex strategies can be very time-consuming. Moreover, the strategies, in most cases, are not described in detail in the articles and their implementation would not have fully reflected the way of working and the intentions of the authors. Therefore, it is almost impossible to compare the best solutions today. In this article the work of fuzzy-neural strategies is compared with the dynamic LL strategy that is very good in practical solutions and is a kind of the reference line. The Web switch was modeled in the simulation as a single queue. The service times were measured on a real server with Intel Xeon E5-2640 v3 processor and were as follow: LL 0.0103 μs, RR 0.00625 μs, P 0.0101 μs, FNRD 0.2061 μs, TLFNRD 0.2033 μs.
Each of the Web server modules contained a separate queue modeling processor and SSD drive. Service times were acquired for the server described above. The RAM memory acted as the cache memory for the file system.
The database server was modeled as a single queue. The service times were measured for the same server as the Web server was running.
The experiments have been conducted for four different cloud systems containing: 8, 10, 12 and 14 Web servers, and a single database server. The chosen number of servers was not the smallest one that can be found in real solutions. Because of the nature of the two-level decision system, it would be recommended to use bigger Web clusters.
During the experiments, the mean service time was measured and in each experiment different load, measured as the number of generated clients, was used. Also 40 million of HTTP requests were served in every experiment, the warming phase was taking about 10 million requests and for 30 million the service time was measured.
Before starting the experiments, it was necessary to decide how to logically divide the web servers into groups in the TLFNRD strategy. Research presented in [9] for two-layer architecture indicated that there should be two servers in each group. Therefore, experiments have been carried out for such settings.
The results of the experiment are presented in Fig. 7. Each of the four diagrams presents the results for the cloud system containing a different number of Web servers.
In the real world, the end-users expect to get the content of the Web page as soon as it is possible. A significant part of the time of delivering the content to the user is the time of servicing HTTP requests. So it is crucial to service the request quickly to keep the high quality of service even the load of the cloud system is high.
The best results with short service times, for nonintelligent solutions, have been achieved for the LL strategy. This strategy is simple, adaptive and very effective, especially in practical applications. Service times for this strategy are significantly lower than for RR and P approaches.
However, the results for both intelligent strategies were much better than for non-intelligent ones. In all of the experiments, service times for the new TLFNRD strategy were shorter than for FNRD and, when the load was increasing, the distance between the results was becoming significantly bigger.
In many of the experiments, service times for TLFNRD were two times lower than for LL strategy. In consequence, to keep the same quality of service for the TLFNRD strategy as for LL, a lower number of servers is necessary, and the maintenance costs are much lower.
It is also worth mentioning that in the simulation, the time of making a decision was taken into account. Although the decision times for the FNRD and the TLFNRD strategies are almost two orders of magnitude higher than for other strategies, the Web switch was not a bottleneck for the system in the experiments and obtained overall service times were much shorter for intelligent strategies.
Since the TLFNRD strategy uses information available within the Web switch in the decision-making process, the implementation of the proposed strategy would be possible in presently used Web switches in the Web cloud. The strategy can be used both in software and hardware Web switches. It should be noticed, however, that making decisions in the TLFNRD strategy requires more computing power than simple strategies like RR or LL need. Therefore the TLFNRD Web switch should have adequate computing power.
Summing up the research results, it is clearly beneficial to use the two-level intelligent, fuzzy-neuro, decisionmaking strategy. Results for the one-level, fuzzy-neuro strategy are worse. The proposed new method increases the quality in the Web cloud systems and lowers the costs. Further research on the new solution can deliver information on its uses and features.

Summary
In this article, a new HTTP request distribution strategy for cloud-based Web systems was presented. The proposed TLFNRD strategy is a new quality in the field of load balancing strategies using the fuzzy-neuro approach. The new strategy uses a two-level decision system in which, on the first level, a group of servers is chosen to service the request, and on the second level, a single Web server is selected. On each level, the fuzzyneural model estimates the service times for chosen elements.
To evaluate the TLFNRD strategy a simulation environment was designed and implemented. The simulator was able to imitate correctly the behavior of Web clients, as well as the work of Web switch and both the Web and the database servers. In all of the experiments, intelligent strategies get much better results than the nonintelligent strategies used mostly in popular Web cloud systems. The two-level TLFNRD strategy resulted in shorter service times than the one-level intelligent FNRD strategy, especially when the load was high.
The research results indicate that the new solution is important and further research into two-level decisionmaking systems should be continued.