 Research
 Open access
 Published:
Wavelet transforms based ARIMAXGBoost hybrid method for layer actions response time prediction of cloud GIS services
Journal of Cloud Computing volume 12, Article number: 11 (2023)
Abstract
Layer actions response time is a critical indicator of cloud geographical information services (cloud GIS Services), which is of great significance to resource allocation and schedule optimization. However, since cloud GIS services are highly dynamic, uncertain, and uncontrollable, the response time of layer actions is influenced by spatiotemporal intensity and concurrent access intensity, posing significant challenges in predicting layer action response time.To predict the response time of layer actions more accurately, we analyzed the data association of cloud GIS services. Furthermore, based on the characteristics of longterm stable trends and shortterm random fluctuations in layer actions response time series, a wavelet transformsbased ARIMAXGBoost hybrid method for cloud GIS services is proposed to improve the onestep and multistep prediction results of layer actions response time.We generate a multivariate time series feature matrix using the historical value of the layer actions response time, the predicted value of the linear component, and the historical value of the nonlinear component. There is no need to meet the traditional assumption that the linear and nonlinear components of the time series are additive, which minimizes the model’s time series requirements and enhances its flexibility. The experimental results demonstrate the superiority of our approach over previous models in the prediction of layer actions response time of cloud GIS services.
Introduction
In recent years, the cloud geographic information system (cloud GIS), could be defined as a serious candidate for the nextgeneration GIS computing paradigm that uses a virtualized platform or infrastructure in a scalable and elastic environment [1]. Its services are distinguished by their ability to make GIS analyzes with browsers or small applications on cloud services. They apply locationindependent resource pooling to allow any user to input, analyse and manipulate spatial information in a shared infrastructure while also reducing implementation costs [2]. Because of these characteristics, the cloud GIS services facilitated by the internet of things, cloud computing, and big data technologies, have integrated aspects of urban planning and construction such as smart cities, the natural environment, and resource allocation. For example, Google Earth Engine [3], Singapore Geospatial Collaborative Environment(SGSPACE) [4]. However, massive amounts of multisource geospatial data are constantly being generated, including remote sensing images, graphic photos, and digitally summarized text. Storage, computing, and visualization resources required for spatial information services such as query, interoperability, and virtualization are also rapidly expanding. Numerous personnel visits and service requests result in a variety of issues, including service overload [5], network congestion [6], and response timeout [7]. How to effectively predict the quality of service (Qos) during peak and offpeak periods for cloud GIS services in order to allocate and balance resources to address the geographic information field’s unique data intensity, computing intensity, spatiotemporal intensity, and concurrent access intensity has become a critical issue [8].
Layer action is the smallest operation granularity of cloud GIS services, which is an important part of thematic analysis and spatial analysis. Its response time has a direct impact on the quality of service and the user experience (UE). Few studies have focused on the QoS and UE of cloud GIS services from the perspective of layer action response time prediction. Furthermore, cloud GIS services have pattern characteristics of appropriate separation and integration, compared to traditional horizontal networking and vertical multilevel GIS services. This means that task division and assignment are necessary, when storing massive multisource geospatial data or analyzing the largescale data tasks [9]. The Qos of cloud services is also a particularly important decisive factor for task division and allocation. The layer actions response time is an important indicator of cloud GIS services. Accurate shortterm and longterm predictions provide indepth industry insights and decision support for cloud GIS service allocation and scheduling, as well as information service status monitoring and resource optimization [10]. It is of great significance to the cloudification of GIS services [11] as well as the coordinated development of multiple mobile terminals and geospatial information [12].
With the widespread use of cloud computing and big data technology in cloud GIS services, predicting layer action response times has become extremely difficult. On the one hand, cloud computing allows a large number of users to share geographic information resources ondemand at any time via resource pooling and ondemand use, thereby lowering service costs. However, it also makes the environment of cloud GIS services dynamic, uncertain, and uncontrollable, bringing difficulties to accurate prediction of layer actions response time. On the other hand, in the analysis mode of cloud GIS services, different analysis granularities of theme, space, and layer action are available. Because of parallel computing, the response time for different granularities affects each other. Therefore, clarifying the data association logic in cloud GIS services, mining the temporal features and knowledge hidden in layer action response time series, and improving the accuracy of layer action response time prediction are critical points in optimizing cloud service resource deployment and service strategy. Additionally, when the response time series is stable over time and monitoring the performance of crossregional clusters is difficult, the layer operation response time prediction problem can be considered as a time series prediction problem. Onestep prediction focuses on the accuracy of the value at the next moment, and multistep prediction clearly shows the trend of the forecasted value within the longer predicting step [13]. Multitime dimensions provide references for load balancing strategy optimization and service resource allocation to better realize the deployment and elastic invocation of cloud service resources. However, because of the expansion of the prediction range and the increase of uncertain factors, multistep prediction is a more challenging problem in the time series prediction [14]. Previous research on time series prediction issues pays more attention to onestep prediction, pursuing the accuracy of onestep prediction, or achieving multistep predicting effects using iterative prediction [15,16,17]. But this method is easy to accumulate errors. Therefore, studying the onestep and multistep prediction of layer actions response time is difficult to optimise load balancing strategies and service resource allocation strategies in multiple time dimensions.
Thus, in order to solve the abovementioned problems in the prediction of layer actions response time in cloud GIS services, we proposed a wavelet transformsbased ARIMAXGBoost hybrid method, which improves the accuracy of layer actions response time in onestep and multistep. The model uses wavelet transform to decompose the time series so that the features of the layer action response time series are more clear. Additionally, there is no need to satisfy the traditional assumption that the linear and nonlinear components of the time series are additive. Instead, a multivariate time feature matrix is constructed to increase the versatility of time series predicting models by considering the relationship between the historical value of the time series, the predicted value of the linear component, and the historical value of the nonlinear component. The linear and nonlinear hybrid model improves the accuracy of time series prediction.
Our contributions of this article are as follows:

1
We classified cloud GIS services using the topdown threelevel refinement processing mode of “thematic visit  spatial analysis  layer operation.” Analyzing and mining the temporal features of layer actions response time in cloud GIS services with longterm stable trends and shortterm random fluctuations demonstrates that decomposing layer actions response time into time series is beneficial to temporal feature extraction and model training.

2
It is not based on the traditional assumption that the time series’ linear and nonlinear components are additive. A method is proposed for constructing a multivariate time feature matrix using historical data on layer action response time, the predicted linear component value, and the historical nonlinear component value. This method enhances the versatility of the model and analyzes the influence of the embedding dimension of the multivariate time feature matrix on time series prediction results.

3
A hybrid model is proposed that is based on the wavelet transform of the linear and nonlinear components. The analysis model’s onestep and multistep prediction have achieved an accurate prediction of the layer actions response time for cloud GIS services.
The remainder of this article is structured as follows. In "Related work" section, we introduced related research. Then we presented data description and model motivation in "Data descriptions and model motivation" section. In "The WTARIMAXGBoost hybrid prediction method" section, we introduced a wavelet transformsbased ARIMAXGBoost hybrid method. In "Experiment and analysis" section, we evaluated performance and discuss experiments. We gave our conclusions and suggestions for future work in "Conclusion and future works" section.
Related work
Layer actions response time is an important indicator of cloud GIS services. The prediction methods of layer action response time are similar to web service Qos, which mainly focus on time series methods, collaborative filtering, and machine learning.
The time series method is a regression model that builds a mathematical relationship between historical time series data and predicted value. The classical method is the autoregressive integrated moving average model(ARIMA). The concept is to exploit the linear relationship between historical time series data and the predicted value. Its advantage is that the model is simple and timesensitive, but the predicting period is short, making it unsuitable for longterm forecasting, nonlinear data, or complex time series data with both linear and nonlinear characteristics [18]. Collaborative filtering is a typical recommendation algorithm. Its idea is that a certain user is known to use a certain web service Qos, and based on the similarity of attributes and behavior preferences between users to predict unknown users who use the Web service [19]. Chen, Z. et al. used collaborative filtering algorithms to predict the throughput and response time of services under the condition of largescale data fluctuation [20]. Its advantages are simple design and fast calculation speed, but it is only used for singlefunction and taskdetermined services. And the accuracy of the prediction results depends on the definition of user similarity, calculation, and the selection of users to be compared. So the algorithm lacks the ability to generalize. Machine learning has a strong ability to fit nonlinear data and more complex time series data. Guo J. et al. proposed a method to predict the response time of virtual machine components by using the genetic algorithmback propagation (GABP) method to predict the response time of the virtual machine service [21]. Its advantages are high prediction accuracy and strong selfadaptability, but it has high computational overhead and slow training speed.
Furthermore, the time series has special characteristic such as noise and nonstationarity. And the time series hybrid model is also a popular research topic [22]. Shouwen Ji et al. proposed a clusteringARIMAXGBoost time series prediction model [23]. First, clustering is used to select data features for different clusters, and the ARIMA model is designed to extract the linear component of the clustering data, while the XGBoost model aims to extract the nonlinear component. The prediction result of the complex model depends on the effect of using clustering for feature selection. Determining of the optimal number of clusters consumes too much computing resources, since it has to explore all possible values for each number of clusters. Zhang proposed an ARIMAANN hybrid time series predicting model. First, he used the ARIMA model to extract the linear component of the data, and then used the artificial neural network (ANN) to fit the residuals [24]. However, for this model to perform well in terms of prediction, it must satisfy the assumption that the time series data at a given point is the sum of the linear and nonlinear components at that point, which has a limited ability to generalize.
Cloud GIS services need multitime dimensions references to flexible service resource allocation and load balancing strategy optimization [25]. Different focal points and knowledge are provided by onestep prediction and multistep prediction when it comes to deploying and publishing services. Onestep prediction focuses on current fluctuations and the precise value of the next moment, whereas multistep prediction focuses on global periodicity and stability. Current cloud GIS prediction methods do not take into account both onestep and multistep prediction in depth.
In summary, when fitting both linear and nonlinear components of time series simultaneously, single linear and nonlinear model has limitations. And a more complex model is required to capture additional time characteristics [26]. Additionally, previous research has demonstrated that the dynamic laws that govern nature and human activities are rarely linear [27]. The massive number of access logs and human activities data of specific visitors are collected from cloud GIS services. The layer actions response time, as an important part of the abovementioned data, should also have nonlinear characteristics that represent the law of human activities. In addition, existing hybrid time series methods seldom focus on cloud GIS services, especially their specific work—layer actions response time. At the same time, existing time series models rarely consider the relationship between the time series’ historical value, linear and nonlinear components.
Thus, for cloud GIS services, especially layer actions services, we proposed a hybrid time series predicting model based on a combination of the linear and nonlinear wavelet transform models in order to mine the linear and nonlinear components of layer actions response time and improve the accuracy of onestep and multistep layer actions response time prediction.
Data descriptions and model motivation
In this section, we analyze the data association logic of cloud GIS services and then mine the temporal features of layer actions response time, showing the motivation for layer actions response time prediction and time decomposition of cloud GIS services.
Layer action data for cloud GIS services
The data for this article were derived from the spatial cloud planning platform’s serverside layer actions service log and spatial analysis log.There are three reasons why spatial cloud planning platform is a classic service application platform for cloud GIS services and why our study on spatial cloud planning platform. Firstly, it contains a large amount of unstructured data and provides multiple types and scales of online map services. It faithfully preserves the whole process of geographic data collecting, computation, analysis, and processing, as well as delivering interactive geographic information services. Secondly, it also adopt a mature modeling method in cloud GIS Services. It abstracts realworld objects into geometric feature elements like points, lines, and areas, which are then organized and expressed as geographic entities in layers. The description of spatial relationships, symbolic expression, and thematic display of elements can be improved using this model. The last reason is that it is inspirational and expansive for cloud GIS services that build applications with ArcGIS location services. To summarize, the spatial cloud planning platform is a suitable research subject for cloud GIS services research because it includes a significant quantity of log data, the typical geographic information layer modeling method in cloud GIS services, and the geographic information services application method.
The cloud GIS services are based on “thematic access  spatial analysis  layer actions” processing, which can be concluded from the logs of service actions and spatial analysis results. The same theme is subdivided into multiple layers. Multiple layer actions are associated with the same spatial analysis result record. The granularity of the layer action is the smallest. And its response time has a direct effect on the thematic and spatial analysis response times. This is also why we predict the response time of layer actions in cloud GIS services.
A large amount of layer action service log data and spatial analysis log data are generated. Among them, the log data mainly includes creating time, work file number, layer actions type, layer actions response time, layer address, layer name, project area, and theme name, etc. We make use of the correlation method based on the work file number to match the layer response time data and spatial analysis results.
Motivation of model
We conduct statistical analysis on the response time of layer action in cloud GIS services. The BoxCox transformation is introduced to increase the mean square error of the observations. It enhances the normality of the data and the correlation between different time features in the layer actions data, which benefits the accuracy of the layer action response time prediction. The BoxCox transformation is expressed as:
where \(\lambda\) is BoxCox transformation parameter.
BoxCox transformation is a transformation family, which is the extension of logarithmic transformation and exponential transformation [28]. When the BoxCox transformation parameter is less than 1, the high peak is compressed, when it is greater than 1, and vice versa.
Considering the periodicity and stability of cloud GIS services, we sampled the average value at onehour and oneday intervals, as illustrated in Fig. 1. According to Fig. 1, the BoxCox conversion values of the average response time of layer actions per day mostly fluctuate in the range of [3.2,3.6], while the BoxCox conversion value of the average response time of layer actions per hour fluctuates in the range of [3,5]. Under the two sampling granularities, the average response values of layer actions fluctuate within their respective numerical ranges, occasionally jittering but remaining within their specific numerical ranges.
Furthermore, we calculated common statistics such as the average and standard value of layer actions response time at different sampling granularities of one day and one hour, as shown in the Table 1 below. From the Table 1, the difference between the upper and lower quartiles of layer action response time in one hour is approximately 0.6745, while the difference between the upper and lower quartiles of layer action response time in one day is approximately 0.3071. It demonstrates that the layer actions response time series exhibits longterm stable trends and shortterm random fluctuations at oneday and onehour sampling granularities.
In order to predict the response time of layer actions more accurately, we choose every hour as the sampling granularity for research. By using wavelet transform (the wavelet is DB2), we decompose the layer action response time series into subsequences under different frequencies, that is, approximation coefficient subsequence (lowfrequency component) and detail coefficient subsequence (highfrequency component), as shown in Fig. 2. The data distribution of the approximation coefficient and detail coefficient subsequences of layer actions response time reveals that the approximation coefficient’s fluctuation range is relatively large, with an interval of 8 units, which can represent the disturbance characteristics of layer actions response time. The fluctuation range is small, 1.6 units, which can represent the stable characteristics of layer actions response time.
The WTARIMAXGBoost hybrid predicition method
In this section, the WTARIMAXGBoost hybrid prediction method for layer actions response time of cloud GIS services is presented. It mainly includes wavelet transform, linear model (ARIMA), and nonlinear model (XGBoost). The framework is shown in the Fig. 3.
According to the description in 3.2, the layer actions response time series has both stable longterm trends and random fluctuations. The original layer actions time series are transformed into equallength approximate coefficient subsequences and detail coefficient subsequences using the wavelet decomposition and reconstruction method. The ARIMA model is appropriate for extracting stable and linear features from a subsequence of detail coefficients, whereas the XGBoost model is appropriate for fitting nonlinear features of layer action response time. Furthermore, the fitting error of the linear component of the time series is used as a correction for the prediction of the nonlinear component of the time series in the following step.
ARIMA method for stationary part of response time
Since the response time of layer actions has longterm stable trends, we created an ARIMA model to mine the stable linear relationship of detailed coefficient subsequences.
Three components comprise the autoregressive integrated moving average model (ARIMA): autoregression, integration, and moving average. To accomplish regression prediction, we mine the linear relationship between the current time value, historical data, and linear fitting error of the response time of layer actions for cloud GIS services. The model exhibits superior performance in linear regression and prediction.
The mathematical expression of the model is:
where \({\widehat{y}_t}\) represents the predicted value of layer action response time at time t, \({\alpha _i , \beta _j}\) represent the AR model and MA model coefficients, respectively. \({\varepsilon _t}\) represent random errors. p, q represent the number of items in the AR model and MA model, respectively, which can be determined by the autocorrelation coefficient and partial correlation coefficient.
XGBoost model for the nonstationary of response time
The layer actions response time has the time series characteristics of shortterm random fluctuations. Therefore, we establish the XGBoost model in order to mine the nonlinear relationship of the layer actions response time.
eXtreme Gradient Boosting is a highperformance implementation of the classification and regression tree (CART) algorithm. The main idea is that strong learners are difficult to construct, whereas weak learners are relatively simple. Additionally, several weak learners can combine to form a strong learner. Thus, the XGBoost model trains K weak classifiers step by step, fitting the residuals of each round, and finally combining the models to obtain a strong classifier model.
The mathematical expression of the model is:
where \({f_k}\) represents regression trees.
The objective function is expressed as:
where \(L( \bullet )\) represents the loss function, which is used to calculate the error between the predicted value and the real value of layer actions response time of layer actions. \(\Omega ( \bullet )\) represents a regular term.
Hybrid prediction method
We study the representation of linear and nonlinear components of layer actions response time series on cloud GIS services. Furthermore, our research focuses on the relationship between the historical value, the linear component, and the nonlinear component of the layer actions response time.
Specifically, it is assumed that the historical time series of the layer actions response time BoxCox conversion value at the current time t is expressed as \({Y_t} = {[{y_{t,}}{y_{t  1}},...,{y_{t  m}}]^T}\).
Following wavelet decomposition and reconstruction, the BoxCox transform value of the layer action response time is calculated using the same length approximate and detail coefficient subsequences. And the relationship of superposition is as follows:
Among them, the ARIMA model is suitable for extracting linear components from the detail coefficient subsequence of layer actions response time. And the error between them can be expressed as:
XGBoost model is suitable for extracting nonlinear components from the layer actions response time. In order to predict the layer actions response time more accurately, the linear component fitting error of the BoxCox conversion value of the layer actions response time is used as the correction of the nonlinear component data, which can be expressed as
The traditional time series prediction model requires the assumption that the time series is the sum of its linear and nonlinear components.
By contrast, we are not required to adhere to this assumption. To predict layer actions response time more precisely, we construct a multivariate time series matrix of the historical value, the predicted value of the linear component, and the historical value of the nonlinear component of layer actions response time as the input of the regression model. The purpose is to find the relationship among the historical value, linear prediction value, and nonlinear value of the layer actions response time.
The mathematical expression is as follows:
Our proposed method for layer actions response time prediction has been depicted in Algorithm 1.
Experiment and analysis
In order to study the effectiveness of the WTARIMAXGBoost hybrid prediction method algorithm, we collected the layer actions log data and spatial analysis log data of the cloud GIS services from October 28, 2019 to July 9, 2020. And then, we removed data during the epidemic period (January 20, 2020 to March 20, 2020). Combining with the fact that different layer action types in cloud GIS services have different response time distribution characteristics, we sequentially extracted the layer processing analysis type (such as storage, linear analysis, point analysis), thematic spatial analysis features (such as basic farmland access probability), area shape, area block number, and time characteristics to remove outlier. Furthermore, the onestep and multistep prediction experiments of the layer actions response time in the cloud GIS services were designed , respectively. On this basis, we calculated the average response time of layer actions within one hour and data smoothing, leaving 2211 time points in period. The last 100 time points are used as the test data sets, and the remaining data sets are used as the training data sets. The best parameters of BoxCox conversion in the experiment \(\lambda \mathrm{{ = }}0.03304\).
Evaluation indicators
In order to quantitatively describe the performance of the model, we choose root mean square error (RMSE), RSquare (\({R^2}\)), and mean absolute error (MAE) to evaluate the model.
RMSE (root mean square error) is used to measure the deviation between the predicted values and the actual values. RSquare (\({R^2}\)) reflects the accuracy of model prediction. And MAE (mean absolute error) is the average value of the absolute between the real values and the predicted values. The smaller RMSE and MAE, the better the model. Larger \({R^2}\) shows more similar the predicted values are to the actual values. Their expressions are shown as follows:
where \({y_i}\) denotes the actual values, \(\widehat{{y_i}}\) denotes the predicted values, \(\overline{{y_i}}\) denotes the mean values of actual values, m is the number of predicted samples.
Onestep prediction
Onestep prediction and comparison of experimental results
Onestep prediction is used to predict the average response time of layer actions in the future one hour based on historical data for layer actions response time in cloud GIS services. ARIMA model is a good linear time series prediction model, while the XGBoost model is a good nonlinear time series prediction model. The proposed WTARIMAXGBoost model is compared to existing time series prediction models and their combination models, including ARIMA, XGBoost, WTARIMA, WTXGBoost, and ARIMAXGBoost. According to the truncation and tailing of the autocorrelation coefficient and partial correlation coefficient, the ARIMA model p=2, q=2 are determined. The number of regression trees for XGBoost is 9. Both historical and nonlinear data embedding dimensions of WTARIMAXGBoost are 6.
Figure 4 and Table 2 show the RMSE, MAE, RSquare values of six models for the layer actions response time in onestep prediction. In onestep prediction, RMSE can better depict the overall cumulative error between the predicted value and the actual value at each time point. MAE indicates the relative difference between the predicted and actual value at each time point of the layer actions response time in onestep prediction. The degree of correlation between the predicted and actual value of the layer actions response time is represented by RSquare.
(1) Comparison of WTARIMA XGBoost hybrid model and single model
As shown in Table 2, the RMSE value of the WTARIMAXGBoost hybrid method is both lower than the single linear model ARIMA and the single nonlinear model XGBoost, which were reduced by 14.02% and 12.9%, respectively. The Rsquare value of the WTARIMAXGBoost model (0.9028) is greater than that of the ARIMA model (0.8686) and the XGBoost model (0.8719). Therefore, the WTARIMAXGBoost hybrid model outperforms the single linear model ARIMA and nonlinear model XGBoost.
(2) Comparison of WTARIMA XGBoost hybrid model with WTBased model
In Table 2, the RMSE value of the WTARIMAXGBoost hybrid model is both smaller than the WTARIMA model and the WTXGBoost model, which are reduced by 24.56% and 5.365%, respectively. The Rsquare value of the WTARIMAXGBoost model (0.9028) is greater than that of the WTARIMA model (0.8293) and WTXGBoost model (0.8915).
As shown in (1) and (2), a single linear model or a single nonlinear model has limitations when applied to time series prediction problems with both linear and nonlinear components. Even if the wavelet transform method is used to predict the time series data separately, it may not be able to obtain better prediction results than the undecomposed time series model (ARIMA and XGBoost models).
(3) Comparison of WTARIMAXGBoost hybrid model and ARIMAXGBoost model
In Table 2, the RMSE value of the WTARIMAXGBoost hybrid model is smaller than the ARIMAXGBoost model, which is reduced by 10.32%. The Rsquare value of the WTARIMAXGBoost hybrid model (0.9028) is greater than the ARIMAXGBoost model (0.8791). Compared with the ARIMAXGBoost model, the WTARIMAXGBoost model transforms the layer action response time series into wavelet sequences and uses the resulting subsequences as input to the linear and nonlinear models to achieve better experimental results. The result shows that layer actions response time series indeed contain both linear and nonlinear components. After the wavelet transformation of the layer actions response time series, it is easier to obtain different time features, which is beneficial to the training of the model.
(4) Comparison of ARIMAXGBoost model and single model
As given in Table 2, the ARIMAXGBoost hybrid model has a lower RMSE value than the single linear model ARIMA and single nonlinear model XGBoost, which are reduced by 4.13% and 2.88%, respectively. The Rsquare value of the ARIMAXGBoost model (0.8791) is greater than that of the ARIMA model(0.8686) and the XGBoost model(0.8719). When compared to the ARIMA and XGBoost models, the ARIMAXGBoost model demonstrates that the linearnonlinear hybrid model outperforms the single linear or nonlinear model.
It can be seen from (3) and (4) that the ARIMAXGBoost model, as a typical linear and nonlinear hybrid prediction model, can better overcome the limitation of the single linear model and nonlinear model. The WTARIMAXGBoost model is easier to extract different temporal features of layer actions response time with wavelet transform than the ARIMAXGBoost model. Therefore, in the onestep prediction of the layer actions response time, the WTARIMAXGBoost model can reduce the root mean square error compared to the above model while also preventing overfitting.
The relationship between the embedding dimension of multivariate time series feature matrix and the prediction result of WTARIMAXGBoost model
The relationship between the embedding dimension of multivariate time series feature matrix and the prediction result of WTARIMAXGBoost model is discussed, when the historical data dimension is equal to the nonlinear data dimension in multivariate time series feature matrix. Since the error between the true value and the fitted value of the historical data is used as an correction of nonlinear historical data, the dimensionality of the historical data is generally greater than that of the nonlinear historical data.
The relationship between the embedding dimension of the multivariate time series feature matrix of the WTARIMAXGBoost model and the prediction result of layer actions response time is shown in Fig. 5. The RMSE and MAE values tend to decrease as the historical value of the multivariate time feature matrix and the embedding dimension of nonlinear components increases, while the Rsquare value tends to increase. Moreover, the change rate of RMSE, MAE, and Rsquare values in the range of embedding dimension [1,7] is greater than that in the range of embedding dimension [8,25]. It shows that the increase in the embedding dimension of the multivariate time series feature matrix of the WTARIMAXGBoost model can improve the onestep prediction performance of layer actions response time series to a certain extent. The historical values and nonlinear component data of the nearest prediction point have a greater impact on the prediction result.
Multistep prediction
When the prediction step is 3, the multistep prediction experiment results of layer actions response time are shown in the Fig. 6. To predict the average hourly response time of layer actions over the next three hours, a prediction step size of three is used. The XGBoost model has a total of 18 regression trees. p=3.q=2 in the ARIMA model. The embedding dimensions of WTARIMAXGBoost’s historical data and nonlinear data are 3 and 5, respectively.
When the prediction step is 5, the multistep prediction experiment results of the layer actions response time are shown in the Fig. 7. A prediction step of 5 is used to predict the average hourly layer actions response time in the next five hours. The number of regression trees of the XGBoost model is 50. In the ARIMA model, p=5,q=2, the embedding dimensions of historical data and nonlinear data of WTARIMAXGBoost model are 4 and 2, respectively.
The experimental results of the RMSE, Rsquare, and MAE values of predicted values of the layer actions response time under different prediction steps are shown in the Table 3. RMSE can better illustrate the total cumulative error between the predicted value and the actual value in a trend in multistep prediction. MAE indicates the relative difference between the predicted and actual value in the trend of the layer actions response time in multistep prediction. RSquare can evaluate how well the predicted trend of the layer actions response time fits the actual trend in multistep prediction (Figs. 6 and 7).
(1) Comparison of the WTARIMAXGBoost hybrid model with other models under different prediction steps
As illustrated in the Fig. 8, the RMSE, MAE, and Rsquare values of the prediction result of layer actions response time continuously increase as the prediction step size increases. It demonstrates that expanding the prediction range in multistep prediction of layer actions response time results in an increase in uncertainty factors, thereby increasing the difficulty of multistep prediction.
(2) Comparison of WTARIMAXGBoost hybrid model with other models under same prediction step
As shown in Table 3,when the prediction step is 3, the RMSE of the WTARIMAXGBoost hybrid model is less than that of the ARIMA and XGBoost models, which are reduced by 6.93% and 8.64 %,, respectively. The Rsquare value of the WTARIMAXGBoost model (0.6994) is higher than that of the ARIMA model (0.6556 ) and the XGBoost model (0.6398). The RMSE values of the WTARIMA XGBoost hybrid model were lower than those of the WTARIMA model and WTXGBoost model, which decreased by 13.32% and 4.93%, respectively. The Rsquare value of the WTARIMAXGBoost model (0.6994) is higher than that of the WTARIMA model (0.6028) and the WTXGBoost model (0.6674).
In Table 3, when the prediction step is 5, the RMSE value of the WTARIMAXGBoost hybrid model is lower than that of the ARIMA and XGBoost model, which are reduced by 3.38% and 9.44% respectively. The Rsquare value of the WTARIMAXGBoost model (0.5109) is higher than that of the ARIMA model (0.4760) and the XGBoost model (0.4033). The RMSE value of the WTARIMAXGBoost model is lower than those of the WTARIMA model and WTXGBoost model, which decreased by 0.97% and 8.02%, respectively. The Rsquare value of the WTARIMAXGBoost model (0.5109) is higher than that of the WTARIMA model (0.5012) and the WTXGBoost model (0.4219).
It can be seen from the Fig. 8 that under the same prediction step, the prediction result of WTARIMAXGBoost model is shown by the red line, RMSE value is always at the bottom of the Fig. 8(a), Rsquare value is at the top of the Fig. 8(b), MAE value is smaller than other models under different prediction step in the Fig. 8(c). It demonstrates the WTARIMAXGBoost model is more robust than the above model.
Computational complexity of models
To analyse the computational complexity of WTARIMAXGBoost model, we used the big \({O(\cdot )}\) notation to describe the relation between operation quantities and size of layer actions response time series. We assumed that our input layer actions response time series size is N , the multistep prediction is M predict time step. We both focused on M and N.
In ARIMA model, as given by [29] ,the computational complexity of ARIMA itself is \(O((N  p){p^2} + (N  q){q^2})\). So the computational complexity of ARIMA model for onestep prediction is
The computational complexity of ARIMA model for multistep prediction is
In XGBoost model, as reference [30] in shown, if let d be the maximum depth of the tree, K be total number of trees and \(\left\ {{X_0}} \right\\) to denote number of nonmissing entries in the training data, the computational complexity of XGBoost model for onestep prediction is
The computational complexity of XGBoost model for multistep prediction is
In WTARIMAXGBoost model, the computational complexity of wavelet transform processing is O(N) [31]. we constructed a multivariate time series matrix with \({m_1}\) size of historical values and \({m_2}\) size of nonlinear component data, which are depended on data and selected as features to model. The computational complexity of WTARIMAXGBoost model for multistep prediction using multipleinput multipleoutput (MIMO) prediction strategy is
As we know, predict time step M is much smaller than the size of layer actions response time series N. So our model WTARIMAXGBoost reduces the computational complexity for multistep prediction, compared with mathematical expression (13), (15) and (16).
Time performance of models
We evaluated the time cost of six models on log data. The time cost refers to the total amount of time spent from the input of the layer actions response time series to the output of predicted value, which includes model training and testing time (the testing time is the predicting time of the time step after the history value series of layer actions responds time). As shown in Table 4, the six models’ time performance for onestep and multistep predictions are presented.
It can be seen from the Table 4, the ARIMA model takes the smallest time for onestep prediction of layer actions responds time, while WTARIMAXGBoost model takes the shortest time for multistep prediction. The major reason for this is that in the WTARIMAXGBoost model, we constructed a multivariate time series matrix and used a MIMO prediction strategy. A multivariate time series matrix effectively represents the relationship between the historical value, linear and nonlinear component of the layer actions response time, which is useful for parallel prediction using the MIMO prediction strategy.
Simultaneously, the time cost of the ARIMAXGBoost linear and nonlinear combination model is slightly greater than that of the single model (ARIMA, XGBoost) and WTBased model (WTARIMA, WTXGBoost) in the onestep prediction of the layer action responds time, but the multistep prediction is reduced. When the linear and nonlinear component of the layer action responds time series are learned and predicted independently, the model training speed increases. Unlike the Iterated strategy, the MIMO prediction can parallelize the prediction value of each time point in multistep prediction. The premise is that there is a feature matrix with a strong relational expression that defines the historical value and the value to be predicted at each time point. As a result, WTARIMAXGBoost model possesses a multivariate time series matrix to parallelize the prediction.
In conclusion, our WTARIMAXGBoost model improves the accuracy of onestep and multistep predictions for layer actions response time in cloud GIS services. In terms of computational complexity of models and time performance experiments, it has been shown that our model takes less time in multistep prediction. This is related to the way we built a multivariate time series matrix and used MIMO prediction strategy. Thus, both the accuracy of the model prediction and the advantage of time performance are ensured. The gap between model accuracy and time use will expand in terms of thematic analysis, spatial analysis and parallel operations.
Conclusion and future works
The layer actions response time in cloud GIS services has a direct impact on the response time of thematic and spatial analysis, which can provide decisionmaking reference for deployment and allocation of cloud GIS services resources. We propose a WTARIMAXGBoost hybrid prediction model that takes advantage of the linear model (ARIMA) in stationary time series prediction and the nonlinear model (XGBoost) in nonstationary dataset regression. By using the time series characteristics of longterm stable trends and shortterm random fluctuations in layer actions response time series, we realize its onestep and multistep accurate prediction. To overcome the limitation of the traditional assumption that linear and nonlinear components of time series are additive, we use the historical value, predicted value, and historical value of the nonlinear component of layer actions response time to construct the multivariate time series feature matrix, which realizes accurate prediction of the layer actions response time.
Predicting layer actions response time is still a promising and challenging issue for optimising service quality and computing resource allocation of cloud GIS services. In future work, we will focus on finding the optimal solution of layer actions response time historical data dimension and nonlinear component dimension in multivariate time series feature matrix, as well as the relationship between dimension, prediction step, and sampling granularity. To improve the support of dynamic cloud GIS services, we will also consider realtime collection, online processing, and prediction of layer action response time.
Availability of data and materials
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Helmi AM, Farhan MS, Nasr MM (2018) A framework for integrating geospatial information systems and hybrid cloud computing. Comput Electr Eng 67:145–158
Bhat MA, Shah RM, Ahmad B (2011) Cloud computing: A solution to geographical information systems(gis). Int J Comput Sci Eng 3(2):594–600
Tamiminia H, Salehi B, Mahdianpari M, Quackenbush L, Adeli S, Brisco B (2020) Google earth engine for geobig data applications: A metaanalysis and systematic review. ISPRS J Photogramm Remote Sens 164:152–170
Yee LS, Khoo V (2010) Spatially enabled Singapore through Singapore geospatial collaborative environment (sgspace). In: Crompvoets J, Kalantari M, Kok B (eds) Spatially enabling society: research, emerging trends and critical assessment. Leuven University Press, Leuven, pp 111–116
Li R, Dong G, Jiang J, Wu H, Yang N, Chen W (2019) Selfadaptive loadbalancing strategy based on a time series pattern for concurrent user access on web map service. Comput Geosci 131:60–69
Li R, Xu T, Shi X, Fan J, Gui Z (2015) A replication strategy based on optimal load balancing for a heterogeneous distributed caching system in networked giss. Geomatics Inf Sci Wuhan Univ 40(10):1287–1293
Nourikhah H, Akbari MK, Kalantari M (2015) Modeling and predicting measured response time of cloudbased web services using longmemory time series. J Supercomput 71(2):673–696
Yang C, Goodchild M, Huang Q, Nebert D, Raskin R, Xu Y, Bambacus M, Fay D (2011) Spatial cloud computing: how can the geospatial sciences use and help shape cloud computing? Int J Digit Earth 4(4):305–329
Chen Q, Wang L, Shang Z (2008) Mrgis: A mapreduceenabled high performance workflow system for gis. In: 2008 IEEE Fourth International Conference on eScience. Indianapolis pp 646–651
Keshavarzi A, Haghighat AT, Bohlouli M (2021) Clustering of large scale qos time series data in federated clouds using improved variable chromosome length genetic algorithm (cqga). Expert Syst Appl 164:113840
Degen L, Qin’ou L (2012) Research progress and connotation of cloud gis. Prog Geogr 11:13
Xin Z, Xiaodong H, Jiawei W (2019) Cloud computing based geographical information service technologies. Comput Sci 46(6A):532–536
Taieb SB, Bontempi G, Atiya AF, Sorjamaa A (2012) A review and comparison of strategies for multistep ahead time series forecasting based on the nn5 forecasting competition. Expert Syst Appl 39(8):7067–7083
Grigorievskiy A, Miche Y, Ventelä AM, Séverin E, Lendasse A (2014) Longterm time series prediction using opelm. Neural Netw 51:50–56
Wang Y, Guo Y (2020) Forecasting method of stock market volatility in time series data based on mixed model of arima and xgboost. China Commun 17(3):205–221
Büyükşahin ÜÇ, Ertekin Ş (2019) Improving forecasting accuracy of time series data using a new arimaann hybrid method and empirical mode decomposition. Neurocomputing 361:151–163
Binkowski M, Marti G, Donnat P (2018) Autoregressive convolutional neural networks for asynchronous time series. In: International Conference on Machine Learning. PMLR, pp 580–589
Pankratz A (2009) Forecasting with univariate BoxJenkins models: Concepts and cases, vol 224. Wiley
Song Y (2020) Collaborative prediction of web service quality based on user preferences and services. PLoS ONE 15(12):e0242089
Chen Z, Shen L, Li F, Dianlong Y, Buanga MJP (2020) Web service qos prediction: when collaborative filtering meets data fluctuating in bigrange. World Wide Web 23(3):1715–1740
Guo J, Liu S, Zhang B, Yan Y (2014) Research on virtual machine response time prediction method based on GABP neural network. Math Probl Eng 2014;Article ID 141930:9. https://doi.org/10.1155/2014/141930
Zhou F, Zhou H, Yang Z, Gu L (2021) If2cnn: Towards nonstationary time series feature extraction by integrating iterative filtering and convolutional neural networks. Expert Syst Appl 170:114527
Ji S, Wang X, Zhao W, Guo D (2019) An application of a threestage xgboostbased model to sales forecasting of a crossborder ecommerce enterprise. Math Probl Eng 2019;Article ID 8503252:15. https://doi.org/10.1155/2019/8503252
Zhang GP (2003) Time series forecasting using a hybrid arima and neural network model. Neurocomputing 50:159–175
Alfaqih TM, Hassan MM (2016) Gis cloud: Integration between cloud things and geographic information systems (gis) opportunities and challenges. Int J Comput Sci Eng (IJCSE) 3(5):360–365
Dong G, Li R, Jiang J, Wu H, McClure SC (2019) Multigranular wavelet decompositionbased support vector regression and moving average method for servicetime prediction on web map service platforms. IEEE Syst J 14(3):3653–3664
Zou Y, Donner RV, Marwan N, Donges JF, Kurths J (2019) Complex network approaches to nonlinear time series analysis. Phys Rep 787:1–97
Chen B, Qin J, Yuan A (2022) Variable selection in the boxcox power transformation model. J Stat Plan Infer 216:15–28
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–22
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. New York pp 785–794
Wirsing K (2020) Time frequency analysis of wavelet and fourier transform. In: Wavelet Theory. IntechOpen, London
Acknowledgements
The authors would like to thank National Natural Science Foundation of China,National Key Research and Development Program of China and Information Center of Department of Natural Resources of Hubei Province for supporting for this article.
Funding
This work was supported by the National Natural Science Foundation of China (Grant No. U20A2091), Natural resources Science and Technology project of Hubei Province (Grant No. ZRZY2021KJ13) and Zhizhuo Research Fund on SpatialTemporal Artificial Intelligence (Grant No. ZZJJ202204).
Author information
Authors and Affiliations
Contributions
All authors have participated in conception and design, or analysis and interpretation of this paper. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Li, J., Cai, J., Li, R. et al. Wavelet transforms based ARIMAXGBoost hybrid method for layer actions response time prediction of cloud GIS services. J Cloud Comp 12, 11 (2023). https://doi.org/10.1186/s1367702200360z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1367702200360z