AI-empowered game architecture and application for resource provision and scheduling in multi-clouds

Current deep learning technologies used a large number of parameters to achieve a high accuracy rate, and the number of parameters is commonly more than a hundred million for image-related tasks. To improve both training speed and accuracy in multi-clouds, distributed deep learning is also widely applied. Therefore, reducing the network scale or improving the training speed has become an urgent problem to be solved in multi-clouds. Concerning this issue, we proposed a game architecture in multi-clouds, which can be supported by resource provision and service schedule. Furthermore, we trained a deep learning network, which can ensure high accuracy while reducing the number of network parameters. An adapted game, called flappy bird, is used as an experimental environment to test our neural network. Experimental results showed that the decision logic of the flappy bird, including flight planning, avoidance, and sacrifice, is accurate. In addition, we published the parameters of the neural network, so other scholars can reuse our neural network parameters for further research.


Introduction
IoT users, virtual reality application developers, or game platform operators just need to rent resources from the cloud providers without maintaining the cloud system.However, a single cloud may not have sufficient services for applications in different regions, and the lonely cloud sometimes has performance degradation.Therefore, it is not a feasible way to rely solely on a single cloud to provide all resources and services for some users to avoid the cloud provider lock-in.
The resource capacity of the cloud is different and the service request arrivals are dynamic.Cloud users will have bad quality of experience (QoE) if the resources are not enough.To provide good QoE costeffectively in multi-clouds, an effective resource provision and service scheduling method is critical, which will reduce network congestion or system crashes.However, AI applications that are deployed on it should change their architecture to better adapt to multi-cloud scenarios.
Deep learning techniques are good at dealing with complex and dynamic systems.It is common for deep learning to have hundreds of millions of parameters and hundreds of megabytes of storage space.Therefore, it is an urgent problem to reduce the network size and improve the training speed in multi-cloud.To solve this problem: (1) We trained a deep learning network, which can ensure high accuracy while reducing the scale of network parameters.(2) We adapted a flappy bird game, which is originally played by human beings, to build an evaluation environment.
Through the mouse or the keyboard, players in the game have two action options.One action is to make the bird flap its wings, and the other action is to do nothing.If the bird flaps its wings, it will experience an upward force, which will cause an upward acceleration, and the bird will fly upward.If the player does nothing, the bird will fall to the ground due to gravity.The flappy bird may touch both the upper and lower pipes during flight.When it hits them, the game is over and the flappy bird gets a negative payoff.The purpose of the bird is to pass most pipes to get the highest payoff.We trained a deep neural network to enable computers freely choose between two actions previously chosen by humans: the bird flaps its wings or not at a given point in time.To increase the randomness, we added random factors to the environment, such as the height of the pipes, the spacing between the upper pipes and the lower pipes, and the location of the pipes.However, the distance between the two pipes in the horizontal direction is fixed, and this fixed form does not affect the learning ability of the neural network and the generalization ability of the neural network.

Related works
Cloud computing has significantly promoted the alteration of many industries [1].Myers et al. [2] used cloud resources by attaching the storage capacity and web server.Service scheduling [3] in the multi-cloud environment is a multi-constraint, multi-objective, and multi-type optimization problem [4], where traditional basic scheduling algorithms do not consider the real load and linking status of the work node.The scheduling problem is to find the optimal group of resources satisfying multiple constraint objectives, which are combinatorial optimization problems [5].
Panwar et al. [6] divided task scheduling into two stages to reduce task time and improve cloud resource utilization.Taking both the execution period and cost into account, Chen et al. [4] modeled cloud scheduling [7] as an optimization problem and proposed a multiobjective ant colony system.George et al. [8] used the Cuckoo Search algorithm for minimizing the computation time of tasks, while Ghasemi et al. [9] proposed a scheduling method for minimizing the processing time and transmission cost.Compared with traditional scheduling algorithms, heuristic algorithms have a robust ability for optimization, but still have slow convergence, and easily fall into local optimal solutions.
Considering the dynamic nature of computing resources and the heterogeneity of cloud platforms, Deep Reinforcement Learning (DRL) shows continuous decision-making ability [10] for resource allocation and service scheduling policies in cloud environments [11].Cheng et al. [12] designed a scheduler combining resource and scheduling based on Deep Q-Learning, which reduced the energy consumption and the task rejection rate in the cloud.Based on the Deep Q-Learning algorithm, Wei et al. [13] proposed a QoSaware scheduling framework that can reduce the average response time of jobs under variable loads.Meng et al. [14] designed an online server-side task scheduling algorithm by combining reinforcement learning with DNN.Ran et al. [15] used the Deep Determining Policy Gradient (DDPG) algorithm to find the optimal task assignment structure.Zhang et al. [16] proposed a multi-task scheduling algorithm based on deep reinforcement learning to reduce the completion time of the job.Dong et al. [17] proposed an algorithm to vigorously schedule tasks that have priority associations in the cloud manufacturing environment.For industrial IoT, an AoI-aware energy control and computation offloading method [18] is proposed to enhance intelligence.
Without discussing the related services and their diversity, references [6,11,19] discussed a single service type, while references [7,20] just gave the resource weight coefficients.References [16,17] did not consider the data transmission cost in the scheduling process of composite services.References [4,10,20,21] did not take into account the service scheduling parallelism.Although artificial intelligence and game theory are well applied to this topic, they are used in specific cloud environments [22].In addition, we compared parts of the algorithms in Table 1.

AI-empowered game architecture and application
The software components in one cloud application can be wrapped by cloud services, and these cloud services can be deployed in multi-clouds when a service registration component is ready.In addition, offloading and resource allocation should be concerned in certain areas.Figure 1 shows a deployment graph for multi-cloud applications, where the gateways represent service registrations (Zookeeper, Eureka, Nacos, or Consul).In addition, load balancing, fault handling, routing, etc. can be added by Eureka, Ribbon, Feign, and Hystrix services.The component of resource provision and service schedule can be deployed in other network nodes.
Cloud applications including games can be divided into multiple software components, and these software components can be deployed in different places, such as different servers in one cloud, or even in different clouds.Therefore, a distributed game theoretical and credibility-guaranteed approach [28] is suitable for this purpose.Figure 2 shows an architecture of a cloud game that is deployed in different clouds, where each software component interacts with the other.This architecture is suitable for more broad applications in multi-clouds.
The player in our game controlled a flappy bird and made it fly over obstacles made of pipes of different lengths, avoid humans, and attack wolves.The game ends when the flappy bird hits the pipe or falls to the bottom of the screen.The flappy bird will obtain a positive payoff for each pipe it passes, and the player needs to score as many points as possible.
Our method is based on reinforcement learning, and Fig. 3 shows the method framework.The test begins in the process of 'Finish Train and Validation' .To reduce the number of neural network parameters, we adjusted the input layer and preprocessed the input data.The neural network parameters are trained on batches by replay buffers and a current optimal strategy.
Algorithm 1 shows the process in each game step: the first pipe will be removed if it is out of the screen before the next blocker is shown.Then we check if the bird, the human, the wolf, the upper pipes, and the lower pipes crash by the function checkCrash (player, upperPipes, lowerPipes, blocker).The function returns 'hard' if the player collided with the ground or pipes and obtains different rewards.

Experiments and data analysis
We trained the neural network by having Al and the player play the adapted flappy bird game, and recorded their actions during the game.Table 2 shows the parameter settings for recording the data.In addition, the neural network training process and data collection process are shown in two videos.

Game data and analysis
The game speed, data collected speed, and neural network training speed can be accelerated in many ways.Parts of the game parameters are used to simulate a true physical world.Notice that X and Y are screen coordinates, and there is a relative movement between the bird and the pipes: The first part shows the results of our deep neural network at the initial state of training.The results can be directly watched through the game images.At this time, the flappy bird flaps its wings at random, does not learn some strategies, does not avoid people, and occasionally attacks the wolf.The second part of the video shows the results of the model of the trained neural network.This model makes the bird avoid all pipes, the ground, and the sky while ensuring that it only attacks wolves, not humans.Under this situation, the flappy bird with the current model is suicidal.

Model data of deep neural network
Table 3 shows that our neural network only needs to train 69,506 parameters to implement all the decision logic of the flappy bird, including flight planning, avoidance, and  sacrifice.This is a small neural network than neural networks that often require millions or hundreds of millions of training parameters because we optimized the algorithm and do not use Convolutional Neural Networks (CNN).Comparing CNNs that take all information as input, we found that human knowledge and elaborate designing can significantly reduce the input size of the neural network, and then reduce the whole size of the neural network in the experiments.That is why our neural network is small and the training is extremely fast.In addition, the following methods are used in the neural network: Activation: relu, Loss: mse, Optimizer: adam.The human and wolf in the lower pipe can be regarded as two organisms blocking the progress of the flappy bird.The neural network shown in video 2 trained a cruel flappy bird.To achieve its goal of passing through most pipes, the bird attacks any creature.The goal of training the flappy bird is just to keep it from falling or hitting the pipe, so we can see that it has already passed through 410 pipes.

Interpreting the two systems
We added some game environment parameters.For example, the spacing of pipes is random, and the other randomness is reflected in the timing of the appearance of humans and wolves.Therefore, the entire game environment is uncertain, requiring a lot of time to debug the game and deep learning modules.
The whole software system consists of two modules.The first module is responsible for the game's operation and display, including the collision effect and animation effect.And the other module is responsible for letting Al play the game, which uses deep learning.Software bugs generated by these two modules will affect each other, causing a bug cycle, so debugging takes a certain amount of time.However, there are many skills in the software development process and the process of deep learning and training.Using these skills can speed up software development, system debugging, and training, which can avoid endless bug cycles.Sometimes, game bugs slow down deep learning, and even fail deep learning.Sometimes, deep learning bugs affect the game, so it is a mutually influencing system.Therefore, when readers reuse our dataset, they need to use it in conjunction with the game.

Self-control and explainable AI
Some scholars now believe that neural networks have a rudimentary self-consciousness.Our neural network also has some self-consciousness because it is composed of three psychological components: self-awareness, self-experience, and self-control, which are interrelated and mutually restricted.Self-awareness includes self-perception, self-analysis, and self-criticism.Our bird is written by program code.Without self-analysis and self-perception, it cannot live longer in this virtual environment.The second point of self-consciousness is selfexperience.Individuals can feel self-love, self-esteem, a sense of responsibility, a sense of obligation, and a sense of superiority.The third point of self-consciousness is self-control.The main elements of self-control are selfrestraint and self-discipline.The strongest point of our bird is self-control.Therefore, from the definition point of view, our neural network already has a certain degree of self-awareness.According to the design of the parameters of our neural network, an artificial agent can be trained into a cold agent, a sentimental agent, or an agent with morality, obligation, and self-love.
In video 1, we adjusted and optimized the neural network parameters to create a new agent.The agent in the face of different social groups has a certain sense of self-consciousness and can make different judgments.In other words, it can make different choices depending on whether the blocker is a person or a wolf.It can choose to commit suicide to save human life or sacrifice the wolf so that the bird can live longer.In the latter case, we can imagine that the bird carries important information and the cost of passing on the information is the loss of a wolf or the loss of all the wolves in the path.

Conclusion
Cloud applications including games can be divided into multiple software components, and these software components can be deployed in different clouds.We proposed an architecture that is suitable for more broad applications in multi-clouds, which can be supported by resource provision and service schedule.To reduce the number of neural network parameters in multi-clouds, we adjusted the input layer and preprocessed the input data.The neural network parameters are trained on batches by replay buffers and a current optimal strategy.Our method is based on deep reinforcement learning, while current deep reinforcement learning technologies use a large number of parameters and the number of parameters is commonly huge.To improve both training speed and accuracy in multi-clouds, distributed deep learning is also widely applied.An adapted game, called flappy bird, is used as an experimental environment to test our neural network.Experimental results showed that the decision logic is accurate.In addition, the results show that our neural network only needs to train 69,506 parameters to implement all the decision logic of the flappy bird, including flight planning, avoidance, and sacrifice because we optimized the algorithm and do not use Convolutional Neural Networks.

Fig. 1
Fig. 1 Deployment graph for multi-cloud applications

( 1 ) 2 2
Radius = (x max −x min ) 2 +(y max min) 2 +(z max −z min ) Dist = ||Center a − Center b || Dist < Radius a + Radius b → intersect(a, b) 1. velocity along X of pipe, human, and wolves: -4 2. bird's starting velocity along Y: 0 3. bird max velocity along Y or max descend speed: 10 4. bird min velocity along Y or max ascend speed: -8 5. bird downward acceleration: 1 6.bird acceleration on flapping: -9 The actions of Al were recorded.The neural network training process and data collection process are shown in two videos.Video 1 is divided into two parts.

Table 3
Model data summary