Wavelet transforms based ARIMA-XGBoost hybrid method for layer actions response time prediction of cloud GIS services

Li, Jiang; Cai, Jing; Li, Rui; Li, Qiang; Zheng, Lina

doi:10.1186/s13677-022-00360-z

Research
Open access
Published: 20 January 2023

Wavelet transforms based ARIMA-XGBoost hybrid method for layer actions response time prediction of cloud GIS services

Jiang Li^1,2,
Jing Cai¹,
Rui Li¹,
Qiang Li² &
…
Lina Zheng²

Journal of Cloud Computing volume 12, Article number: 11 (2023) Cite this article

2316 Accesses
Metrics details

Abstract

Layer actions response time is a critical indicator of cloud geographical information services (cloud GIS Services), which is of great significance to resource allocation and schedule optimization. However, since cloud GIS services are highly dynamic, uncertain, and uncontrollable, the response time of layer actions is influenced by spatiotemporal intensity and concurrent access intensity, posing significant challenges in predicting layer action response time.To predict the response time of layer actions more accurately, we analyzed the data association of cloud GIS services. Furthermore, based on the characteristics of long-term stable trends and short-term random fluctuations in layer actions response time series, a wavelet transforms-based ARIMA-XGBoost hybrid method for cloud GIS services is proposed to improve the one-step and multi-step prediction results of layer actions response time.We generate a multivariate time series feature matrix using the historical value of the layer actions response time, the predicted value of the linear component, and the historical value of the non-linear component. There is no need to meet the traditional assumption that the linear and nonlinear components of the time series are additive, which minimizes the model’s time series requirements and enhances its flexibility. The experimental results demonstrate the superiority of our approach over previous models in the prediction of layer actions response time of cloud GIS services.

Introduction

In recent years, the cloud geographic information system (cloud GIS), could be defined as a serious candidate for the next-generation GIS computing paradigm that uses a virtualized platform or infrastructure in a scalable and elastic environment [1]. Its services are distinguished by their ability to make GIS analyzes with browsers or small applications on cloud services. They apply location-independent resource pooling to allow any user to input, analyse and manipulate spatial information in a shared infrastructure while also reducing implementation costs [2]. Because of these characteristics, the cloud GIS services facilitated by the internet of things, cloud computing, and big data technologies, have integrated aspects of urban planning and construction such as smart cities, the natural environment, and resource allocation. For example, Google Earth Engine [3], Singapore Geospatial Collaborative Environment(SG-SPACE) [4]. However, massive amounts of multi-source geospatial data are constantly being generated, including remote sensing images, graphic photos, and digitally summarized text. Storage, computing, and visualization resources required for spatial information services such as query, interoperability, and virtualization are also rapidly expanding. Numerous personnel visits and service requests result in a variety of issues, including service overload [5], network congestion [6], and response timeout [7]. How to effectively predict the quality of service (Qos) during peak and off-peak periods for cloud GIS services in order to allocate and balance resources to address the geographic information field’s unique data intensity, computing intensity, spatiotemporal intensity, and concurrent access intensity has become a critical issue [8].

Layer action is the smallest operation granularity of cloud GIS services, which is an important part of thematic analysis and spatial analysis. Its response time has a direct impact on the quality of service and the user experience (UE). Few studies have focused on the QoS and UE of cloud GIS services from the perspective of layer action response time prediction. Furthermore, cloud GIS services have pattern characteristics of appropriate separation and integration, compared to traditional horizontal networking and vertical multi-level GIS services. This means that task division and assignment are necessary, when storing massive multi-source geospatial data or analyzing the large-scale data tasks [9]. The Qos of cloud services is also a particularly important decisive factor for task division and allocation. The layer actions response time is an important indicator of cloud GIS services. Accurate short-term and long-term predictions provide in-depth industry insights and decision support for cloud GIS service allocation and scheduling, as well as information service status monitoring and resource optimization [10]. It is of great significance to the cloudification of GIS services [11] as well as the coordinated development of multiple mobile terminals and geospatial information [12].

With the widespread use of cloud computing and big data technology in cloud GIS services, predicting layer action response times has become extremely difficult. On the one hand, cloud computing allows a large number of users to share geographic information resources on-demand at any time via resource pooling and on-demand use, thereby lowering service costs. However, it also makes the environment of cloud GIS services dynamic, uncertain, and uncontrollable, bringing difficulties to accurate prediction of layer actions response time. On the other hand, in the analysis mode of cloud GIS services, different analysis granularities of theme, space, and layer action are available. Because of parallel computing, the response time for different granularities affects each other. Therefore, clarifying the data association logic in cloud GIS services, mining the temporal features and knowledge hidden in layer action response time series, and improving the accuracy of layer action response time prediction are critical points in optimizing cloud service resource deployment and service strategy. Additionally, when the response time series is stable over time and monitoring the performance of cross-regional clusters is difficult, the layer operation response time prediction problem can be considered as a time series prediction problem. One-step prediction focuses on the accuracy of the value at the next moment, and multi-step prediction clearly shows the trend of the forecasted value within the longer predicting step [13]. Multi-time dimensions provide references for load balancing strategy optimization and service resource allocation to better realize the deployment and elastic invocation of cloud service resources. However, because of the expansion of the prediction range and the increase of uncertain factors, multi-step prediction is a more challenging problem in the time series prediction [14]. Previous research on time series prediction issues pays more attention to one-step prediction, pursuing the accuracy of one-step prediction, or achieving multi-step predicting effects using iterative prediction [15,16,17]. But this method is easy to accumulate errors. Therefore, studying the one-step and multi-step prediction of layer actions response time is difficult to optimise load balancing strategies and service resource allocation strategies in multiple time dimensions.

Thus, in order to solve the above-mentioned problems in the prediction of layer actions response time in cloud GIS services, we proposed a wavelet transforms-based ARIMA-XGBoost hybrid method, which improves the accuracy of layer actions response time in one-step and multi-step. The model uses wavelet transform to decompose the time series so that the features of the layer action response time series are more clear. Additionally, there is no need to satisfy the traditional assumption that the linear and nonlinear components of the time series are additive. Instead, a multivariate time feature matrix is constructed to increase the versatility of time series predicting models by considering the relationship between the historical value of the time series, the predicted value of the linear component, and the historical value of the nonlinear component. The linear and nonlinear hybrid model improves the accuracy of time series prediction.

Our contributions of this article are as follows:

1
We classified cloud GIS services using the top-down three-level refinement processing mode of “thematic visit - spatial analysis - layer operation.” Analyzing and mining the temporal features of layer actions response time in cloud GIS services with long-term stable trends and short-term random fluctuations demonstrates that decomposing layer actions response time into time series is beneficial to temporal feature extraction and model training.
2
It is not based on the traditional assumption that the time series’ linear and nonlinear components are additive. A method is proposed for constructing a multivariate time feature matrix using historical data on layer action response time, the predicted linear component value, and the historical nonlinear component value. This method enhances the versatility of the model and analyzes the influence of the embedding dimension of the multivariate time feature matrix on time series prediction results.
3
A hybrid model is proposed that is based on the wavelet transform of the linear and nonlinear components. The analysis model’s one-step and multi-step prediction have achieved an accurate prediction of the layer actions response time for cloud GIS services.

The remainder of this article is structured as follows. In "Related work" section, we introduced related research. Then we presented data description and model motivation in "Data descriptions and model motivation" section. In "The WT-ARIMA-XGBoost hybrid prediction method" section, we introduced a wavelet transforms-based ARIMA-XGBoost hybrid method. In "Experiment and analysis" section, we evaluated performance and discuss experiments. We gave our conclusions and suggestions for future work in "Conclusion and future works" section.

Related work

Layer actions response time is an important indicator of cloud GIS services. The prediction methods of layer action response time are similar to web service Qos, which mainly focus on time series methods, collaborative filtering, and machine learning.

The time series method is a regression model that builds a mathematical relationship between historical time series data and predicted value. The classical method is the autoregressive integrated moving average model(ARIMA). The concept is to exploit the linear relationship between historical time series data and the predicted value. Its advantage is that the model is simple and time-sensitive, but the predicting period is short, making it unsuitable for long-term forecasting, nonlinear data, or complex time series data with both linear and nonlinear characteristics [18]. Collaborative filtering is a typical recommendation algorithm. Its idea is that a certain user is known to use a certain web service Qos, and based on the similarity of attributes and behavior preferences between users to predict unknown users who use the Web service [19]. Chen, Z. et al. used collaborative filtering algorithms to predict the throughput and response time of services under the condition of large-scale data fluctuation [20]. Its advantages are simple design and fast calculation speed, but it is only used for single-function and task-determined services. And the accuracy of the prediction results depends on the definition of user similarity, calculation, and the selection of users to be compared. So the algorithm lacks the ability to generalize. Machine learning has a strong ability to fit nonlinear data and more complex time series data. Guo J. et al. proposed a method to predict the response time of virtual machine components by using the genetic algorithm-back propagation (GABP) method to predict the response time of the virtual machine service [21]. Its advantages are high prediction accuracy and strong self-adaptability, but it has high computational overhead and slow training speed.

Furthermore, the time series has special characteristic such as noise and non-stationarity. And the time series hybrid model is also a popular research topic [22]. Shouwen Ji et al. proposed a clustering-ARIMA-XGBoost time series prediction model [23]. First, clustering is used to select data features for different clusters, and the ARIMA model is designed to extract the linear component of the clustering data, while the XGBoost model aims to extract the nonlinear component. The prediction result of the complex model depends on the effect of using clustering for feature selection. Determining of the optimal number of clusters consumes too much computing resources, since it has to explore all possible values for each number of clusters. Zhang proposed an ARIMA-ANN hybrid time series predicting model. First, he used the ARIMA model to extract the linear component of the data, and then used the artificial neural network (ANN) to fit the residuals [24]. However, for this model to perform well in terms of prediction, it must satisfy the assumption that the time series data at a given point is the sum of the linear and nonlinear components at that point, which has a limited ability to generalize.

Cloud GIS services need multi-time dimensions references to flexible service resource allocation and load balancing strategy optimization [25]. Different focal points and knowledge are provided by one-step prediction and multi-step prediction when it comes to deploying and publishing services. One-step prediction focuses on current fluctuations and the precise value of the next moment, whereas multi-step prediction focuses on global periodicity and stability. Current cloud GIS prediction methods do not take into account both one-step and multi-step prediction in depth.

In summary, when fitting both linear and nonlinear components of time series simultaneously, single linear and nonlinear model has limitations. And a more complex model is required to capture additional time characteristics [26]. Additionally, previous research has demonstrated that the dynamic laws that govern nature and human activities are rarely linear [27]. The massive number of access logs and human activities data of specific visitors are collected from cloud GIS services. The layer actions response time, as an important part of the above-mentioned data, should also have nonlinear characteristics that represent the law of human activities. In addition, existing hybrid time series methods seldom focus on cloud GIS services, especially their specific work—layer actions response time. At the same time, existing time series models rarely consider the relationship between the time series’ historical value, linear and nonlinear components.

Thus, for cloud GIS services, especially layer actions services, we proposed a hybrid time series predicting model based on a combination of the linear and nonlinear wavelet transform models in order to mine the linear and nonlinear components of layer actions response time and improve the accuracy of one-step and multi-step layer actions response time prediction.

Data descriptions and model motivation

In this section, we analyze the data association logic of cloud GIS services and then mine the temporal features of layer actions response time, showing the motivation for layer actions response time prediction and time decomposition of cloud GIS services.

Layer action data for cloud GIS services

The data for this article were derived from the spatial cloud planning platform’s server-side layer actions service log and spatial analysis log.There are three reasons why spatial cloud planning platform is a classic service application platform for cloud GIS services and why our study on spatial cloud planning platform. Firstly, it contains a large amount of unstructured data and provides multiple types and scales of online map services. It faithfully preserves the whole process of geographic data collecting, computation, analysis, and processing, as well as delivering interactive geographic information services. Secondly, it also adopt a mature modeling method in cloud GIS Services. It abstracts real-world objects into geometric feature elements like points, lines, and areas, which are then organized and expressed as geographic entities in layers. The description of spatial relationships, symbolic expression, and thematic display of elements can be improved using this model. The last reason is that it is inspirational and expansive for cloud GIS services that build applications with ArcGIS location services. To summarize, the spatial cloud planning platform is a suitable research subject for cloud GIS services research because it includes a significant quantity of log data, the typical geographic information layer modeling method in cloud GIS services, and the geographic information services application method.

The cloud GIS services are based on “thematic access - spatial analysis - layer actions” processing, which can be concluded from the logs of service actions and spatial analysis results. The same theme is subdivided into multiple layers. Multiple layer actions are associated with the same spatial analysis result record. The granularity of the layer action is the smallest. And its response time has a direct effect on the thematic and spatial analysis response times. This is also why we predict the response time of layer actions in cloud GIS services.

A large amount of layer action service log data and spatial analysis log data are generated. Among them, the log data mainly includes creating time, work file number, layer actions type, layer actions response time, layer address, layer name, project area, and theme name, etc. We make use of the correlation method based on the work file number to match the layer response time data and spatial analysis results.

Motivation of model

We conduct statistical analysis on the response time of layer action in cloud GIS services. The Box-Cox transformation is introduced to increase the mean square error of the observations. It enhances the normality of the data and the correlation between different time features in the layer actions data, which benefits the accuracy of the layer action response time prediction. The Box-Cox transformation is expressed as:

$$\begin{aligned} {y^{(\lambda )}} = \left\{ \begin{array}{c} {\frac{{{y^\lambda } - 1}}{\lambda },\lambda \ne 0}\\ {\log (y),\lambda = 0} \end{array} \right. \end{aligned}$$

(1)

where $\lambda$ is Box-Cox transformation parameter.

Box-Cox transformation is a transformation family, which is the extension of logarithmic transformation and exponential transformation [28]. When the Box-Cox transformation parameter is less than 1, the high peak is compressed, when it is greater than 1, and vice versa.

Considering the periodicity and stability of cloud GIS services, we sampled the average value at one-hour and one-day intervals, as illustrated in Fig. 1. According to Fig. 1, the Box-Cox conversion values of the average response time of layer actions per day mostly fluctuate in the range of [3.2,3.6], while the Box-Cox conversion value of the average response time of layer actions per hour fluctuates in the range of [3,5]. Under the two sampling granularities, the average response values of layer actions fluctuate within their respective numerical ranges, occasionally jittering but remaining within their specific numerical ranges.

Furthermore, we calculated common statistics such as the average and standard value of layer actions response time at different sampling granularities of one day and one hour, as shown in the Table 1 below. From the Table 1, the difference between the upper and lower quartiles of layer action response time in one hour is approximately 0.6745, while the difference between the upper and lower quartiles of layer action response time in one day is approximately 0.3071. It demonstrates that the layer actions response time series exhibits long-term stable trends and short-term random fluctuations at one-day and one-hour sampling granularities.

Table 1 Common statistics of layer actions response time under different sampling granularities

Full size table

In order to predict the response time of layer actions more accurately, we choose every hour as the sampling granularity for research. By using wavelet transform (the wavelet is DB2), we decompose the layer action response time series into subsequences under different frequencies, that is, approximation coefficient subsequence (low-frequency component) and detail coefficient subsequence (high-frequency component), as shown in Fig. 2. The data distribution of the approximation coefficient and detail coefficient subsequences of layer actions response time reveals that the approximation coefficient’s fluctuation range is relatively large, with an interval of 8 units, which can represent the disturbance characteristics of layer actions response time. The fluctuation range is small, 1.6 units, which can represent the stable characteristics of layer actions response time.

The WT-ARIMA-XGBoost hybrid predicition method

In this section, the WT-ARIMA-XGBoost hybrid prediction method for layer actions response time of cloud GIS services is presented. It mainly includes wavelet transform, linear model (ARIMA), and nonlinear model (XGBoost). The framework is shown in the Fig. 3.

According to the description in 3.2, the layer actions response time series has both stable long-term trends and random fluctuations. The original layer actions time series are transformed into equal-length approximate coefficient subsequences and detail coefficient subsequences using the wavelet decomposition and reconstruction method. The ARIMA model is appropriate for extracting stable and linear features from a subsequence of detail coefficients, whereas the XGBoost model is appropriate for fitting nonlinear features of layer action response time. Furthermore, the fitting error of the linear component of the time series is used as a correction for the prediction of the nonlinear component of the time series in the following step.

ARIMA method for stationary part of response time

Since the response time of layer actions has long-term stable trends, we created an ARIMA model to mine the stable linear relationship of detailed coefficient subsequences.

Three components comprise the autoregressive integrated moving average model (ARIMA): autoregression, integration, and moving average. To accomplish regression prediction, we mine the linear relationship between the current time value, historical data, and linear fitting error of the response time of layer actions for cloud GIS services. The model exhibits superior performance in linear regression and prediction.

The mathematical expression of the model is:

$$\begin{aligned} {\widehat{y}_t} = \sum \limits _{i = 1}^p {{\alpha _i}{y_{t - i}} + {\varepsilon _t}} - \sum \limits _{j = 1}^q {{\beta _j}} {\varepsilon _{t - j}} \end{aligned}$$

(2)

where ${\widehat{y}_t}$ represents the predicted value of layer action response time at time t, ${\alpha _i , \beta _j}$ represent the AR model and MA model coefficients, respectively. ${\varepsilon _t}$ represent random errors. p, q represent the number of items in the AR model and MA model, respectively, which can be determined by the autocorrelation coefficient and partial correlation coefficient.

XGBoost model for the non-stationary of response time

The layer actions response time has the time series characteristics of short-term random fluctuations. Therefore, we establish the XGBoost model in order to mine the nonlinear relationship of the layer actions response time.

eXtreme Gradient Boosting is a high-performance implementation of the classification and regression tree (CART) algorithm. The main idea is that strong learners are difficult to construct, whereas weak learners are relatively simple. Additionally, several weak learners can combine to form a strong learner. Thus, the XGBoost model trains K weak classifiers step by step, fitting the residuals of each round, and finally combining the models to obtain a strong classifier model.

The mathematical expression of the model is:

$$\begin{aligned} \overset{\wedge }{y_i} = \phi ({x_i}) = \sum \limits _{k = 1}^K {{f_k}({x_i})} ,\ {f_k} \in F \end{aligned}$$

(3)

where ${f_k}$ represents regression trees.

The objective function is expressed as:

$$\begin{aligned} L(\phi ) = \sum \limits _i {L({y_i}, \overset{\wedge }{{y_i}} )} + \sum \limits _k {\Omega ({f_k})} \end{aligned}$$

(4)

where $L( \bullet )$ represents the loss function, which is used to calculate the error between the predicted value and the real value of layer actions response time of layer actions. $\Omega ( \bullet )$ represents a regular term.

Hybrid prediction method

We study the representation of linear and nonlinear components of layer actions response time series on cloud GIS services. Furthermore, our research focuses on the relationship between the historical value, the linear component, and the nonlinear component of the layer actions response time.

Specifically, it is assumed that the historical time series of the layer actions response time Box-Cox conversion value at the current time t is expressed as ${Y_t} = {[{y_{t,}}{y_{t - 1}},...,{y_{t - m}}]^T}$.

Following wavelet decomposition and reconstruction, the Box-Cox transform value of the layer action response time is calculated using the same length approximate and detail coefficient subsequences. And the relationship of superposition is as follows:

$$\begin{aligned} {Y_t} = {Y_A} + {Y_D} \end{aligned}$$

(5)

Among them, the ARIMA model is suitable for extracting linear components from the detail coefficient subsequence of layer actions response time. And the error between them can be expressed as:

$$\begin{aligned} res = {Y_D} - {Y_L} \end{aligned}$$

(6)

XGBoost model is suitable for extracting nonlinear components from the layer actions response time. In order to predict the layer actions response time more accurately, the linear component fitting error of the Box-Cox conversion value of the layer actions response time is used as the correction of the nonlinear component data, which can be expressed as

$$\begin{aligned} {Y_{NL}} = {Y_A} + res \end{aligned}$$

(7)

The traditional time series prediction model requires the assumption that the time series is the sum of its linear and nonlinear components.

By contrast, we are not required to adhere to this assumption. To predict layer actions response time more precisely, we construct a multivariate time series matrix of the historical value, the predicted value of the linear component, and the historical value of the nonlinear component of layer actions response time as the input of the regression model. The purpose is to find the relationship among the historical value, linear prediction value, and nonlinear value of the layer actions response time.

The mathematical expression is as follows:

$$\begin{aligned} \overset{\wedge }{{y_t}} = f({y_{t - 1}},{y_{t - 2}},...,{y_{t - m}},\overset{\wedge }{y_{lp}},{y_{N{L_{t - 1}}}},{y_{N{L_{t - 2}}}},...,{y_{N{L_{t - n}}}}) \end{aligned}$$

(8)

Our proposed method for layer actions response time prediction has been depicted in Algorithm 1.

Experiment and analysis

In order to study the effectiveness of the WT-ARIMA-XGBoost hybrid prediction method algorithm, we collected the layer actions log data and spatial analysis log data of the cloud GIS services from October 28, 2019 to July 9, 2020. And then, we removed data during the epidemic period (January 20, 2020 to March 20, 2020). Combining with the fact that different layer action types in cloud GIS services have different response time distribution characteristics, we sequentially extracted the layer processing analysis type (such as storage, linear analysis, point analysis), thematic spatial analysis features (such as basic farmland access probability), area shape, area block number, and time characteristics to remove outlier. Furthermore, the one-step and multi-step prediction experiments of the layer actions response time in the cloud GIS services were designed , respectively. On this basis, we calculated the average response time of layer actions within one hour and data smoothing, leaving 2211 time points in period. The last 100 time points are used as the test data sets, and the remaining data sets are used as the training data sets. The best parameters of Box-Cox conversion in the experiment $\lambda \mathrm{{ = }}0.03304$.

Evaluation indicators

In order to quantitatively describe the performance of the model, we choose root mean square error (RMSE), R-Square (${R^2}$), and mean absolute error (MAE) to evaluate the model.

RMSE (root mean square error) is used to measure the deviation between the predicted values and the actual values. R-Square (${R^2}$) reflects the accuracy of model prediction. And MAE (mean absolute error) is the average value of the absolute between the real values and the predicted values. The smaller RMSE and MAE, the better the model. Larger ${R^2}$ shows more similar the predicted values are to the actual values. Their expressions are shown as follows:

$$\begin{aligned} RMSE = \sqrt{\frac{1}{m}\sum \limits _{i = 1}^m {({y_i} - \widehat{{y_i}})} } \end{aligned}$$

(9)

$$\begin{aligned} {R^2} = 1\mathrm{{ - }}\frac{{\sum \nolimits _{i = 1}^m {{{({y_i} - \overset{\wedge }{{y_i}})}^2}} }}{{\sum \nolimits _{i = 1}^m {{{({y_i} - \overline{{y_i}} )}^2}} }} \end{aligned}$$

(10)

$$\begin{aligned} MAE = \frac{1}{m}\sum \limits _{i = 1}^m {\left| {y{}_i - \overset{\wedge }{{y_i}}} \right| } \end{aligned}$$

(11)

where ${y_i}$ denotes the actual values, $\widehat{{y_i}}$ denotes the predicted values, $\overline{{y_i}}$ denotes the mean values of actual values, m is the number of predicted samples.

One-step prediction

One-step prediction and comparison of experimental results

One-step prediction is used to predict the average response time of layer actions in the future one hour based on historical data for layer actions response time in cloud GIS services. ARIMA model is a good linear time series prediction model, while the XGBoost model is a good nonlinear time series prediction model. The proposed WT-ARIMA-XGBoost model is compared to existing time series prediction models and their combination models, including ARIMA, XGBoost, WT-ARIMA, WT-XGBoost, and ARIMA-XGBoost. According to the truncation and tailing of the autocorrelation coefficient and partial correlation coefficient, the ARIMA model p=2, q=2 are determined. The number of regression trees for XGBoost is 9. Both historical and nonlinear data embedding dimensions of WT-ARIMA-XGBoost are 6.

Figure 4 and Table 2 show the RMSE, MAE, R-Square values of six models for the layer actions response time in one-step prediction. In one-step prediction, RMSE can better depict the overall cumulative error between the predicted value and the actual value at each time point. MAE indicates the relative difference between the predicted and actual value at each time point of the layer actions response time in one-step prediction. The degree of correlation between the predicted and actual value of the layer actions response time is represented by R-Square.

(1) Comparison of WT-ARIMA -XGBoost hybrid model and single model

As shown in Table 2, the RMSE value of the WT-ARIMA-XGBoost hybrid method is both lower than the single linear model ARIMA and the single nonlinear model XGBoost, which were reduced by 14.02% and 12.9%, respectively. The R-square value of the WT-ARIMA-XGBoost model (0.9028) is greater than that of the ARIMA model (0.8686) and the XGBoost model (0.8719). Therefore, the WT-ARIMA-XGBoost hybrid model outperforms the single linear model ARIMA and nonlinear model XGBoost.

(2) Comparison of WT-ARIMA -XGBoost hybrid model with WT-Based model

In Table 2, the RMSE value of the WT-ARIMA-XGBoost hybrid model is both smaller than the WT-ARIMA model and the WT-XGBoost model, which are reduced by 24.56% and 5.365%, respectively. The R-square value of the WT-ARIMA-XGBoost model (0.9028) is greater than that of the WT-ARIMA model (0.8293) and WT-XGBoost model (0.8915).

As shown in (1) and (2), a single linear model or a single nonlinear model has limitations when applied to time series prediction problems with both linear and nonlinear components. Even if the wavelet transform method is used to predict the time series data separately, it may not be able to obtain better prediction results than the undecomposed time series model (ARIMA and XGBoost models).

(3) Comparison of WT-ARIMA-XGBoost hybrid model and ARIMA-XGBoost model

In Table 2, the RMSE value of the WT-ARIMA-XGBoost hybrid model is smaller than the ARIMA-XGBoost model, which is reduced by 10.32%. The R-square value of the WT-ARIMA-XGBoost hybrid model (0.9028) is greater than the ARIMA-XGBoost model (0.8791). Compared with the ARIMA-XGBoost model, the WT-ARIMA-XGBoost model transforms the layer action response time series into wavelet sequences and uses the resulting subsequences as input to the linear and nonlinear models to achieve better experimental results. The result shows that layer actions response time series indeed contain both linear and nonlinear components. After the wavelet transformation of the layer actions response time series, it is easier to obtain different time features, which is beneficial to the training of the model.

(4) Comparison of ARIMA-XGBoost model and single model

As given in Table 2, the ARIMA-XGBoost hybrid model has a lower RMSE value than the single linear model ARIMA and single nonlinear model XGBoost, which are reduced by 4.13% and 2.88%, respectively. The R-square value of the ARIMA-XGBoost model (0.8791) is greater than that of the ARIMA model(0.8686) and the XGBoost model(0.8719). When compared to the ARIMA and XGBoost models, the ARIMA-XGBoost model demonstrates that the linear-nonlinear hybrid model outperforms the single linear or nonlinear model.

It can be seen from (3) and (4) that the ARIMA-XGBoost model, as a typical linear and nonlinear hybrid prediction model, can better overcome the limitation of the single linear model and nonlinear model. The WT-ARIMA-XGBoost model is easier to extract different temporal features of layer actions response time with wavelet transform than the ARIMA-XGBoost model. Therefore, in the one-step prediction of the layer actions response time, the WT-ARIMA-XGBoost model can reduce the root mean square error compared to the above model while also preventing overfitting.

Table 2 Evaluation indicator values for one-step prediction of layer actions response time

Full size table

The relationship between the embedding dimension of multivariate time series feature matrix and the prediction result of WT-ARIMA-XGBoost model

The relationship between the embedding dimension of multivariate time series feature matrix and the prediction result of WT-ARIMA-XGBoost model is discussed, when the historical data dimension is equal to the nonlinear data dimension in multivariate time series feature matrix. Since the error between the true value and the fitted value of the historical data is used as an correction of nonlinear historical data, the dimensionality of the historical data is generally greater than that of the nonlinear historical data.

The relationship between the embedding dimension of the multivariate time series feature matrix of the WT-ARIMA-XGBoost model and the prediction result of layer actions response time is shown in Fig. 5. The RMSE and MAE values tend to decrease as the historical value of the multivariate time feature matrix and the embedding dimension of nonlinear components increases, while the R-square value tends to increase. Moreover, the change rate of RMSE, MAE, and R-square values in the range of embedding dimension [1,7] is greater than that in the range of embedding dimension [8,25]. It shows that the increase in the embedding dimension of the multivariate time series feature matrix of the WT-ARIMA-XGBoost model can improve the one-step prediction performance of layer actions response time series to a certain extent. The historical values and nonlinear component data of the nearest prediction point have a greater impact on the prediction result.

Multi-step prediction

When the prediction step is 3, the multi-step prediction experiment results of layer actions response time are shown in the Fig. 6. To predict the average hourly response time of layer actions over the next three hours, a prediction step size of three is used. The XGBoost model has a total of 18 regression trees. p=3.q=2 in the ARIMA model. The embedding dimensions of WT-ARIMA-XGBoost’s historical data and nonlinear data are 3 and 5, respectively.

When the prediction step is 5, the multi-step prediction experiment results of the layer actions response time are shown in the Fig. 7. A prediction step of 5 is used to predict the average hourly layer actions response time in the next five hours. The number of regression trees of the XGBoost model is 50. In the ARIMA model, p=5,q=2, the embedding dimensions of historical data and nonlinear data of WT-ARIMA-XGBoost model are 4 and 2, respectively.

The experimental results of the RMSE, R-square, and MAE values of predicted values of the layer actions response time under different prediction steps are shown in the Table 3. RMSE can better illustrate the total cumulative error between the predicted value and the actual value in a trend in multi-step prediction. MAE indicates the relative difference between the predicted and actual value in the trend of the layer actions response time in multi-step prediction. R-Square can evaluate how well the predicted trend of the layer actions response time fits the actual trend in multi-step prediction (Figs. 6 and 7).

(1) Comparison of the WT-ARIMA-XGBoost hybrid model with other models under different prediction steps

As illustrated in the Fig. 8, the RMSE, MAE, and R-square values of the prediction result of layer actions response time continuously increase as the prediction step size increases. It demonstrates that expanding the prediction range in multi-step prediction of layer actions response time results in an increase in uncertainty factors, thereby increasing the difficulty of multi-step prediction.

(2) Comparison of WT-ARIMA-XGBoost hybrid model with other models under same prediction step

As shown in Table 3,when the prediction step is 3, the RMSE of the WT-ARIMA-XGBoost hybrid model is less than that of the ARIMA and XGBoost models, which are reduced by 6.93% and 8.64 %,, respectively. The R-square value of the WT-ARIMA-XGBoost model (0.6994) is higher than that of the ARIMA model (0.6556 ) and the XGBoost model (0.6398). The RMSE values of the WT-ARIMA -XGBoost hybrid model were lower than those of the WT-ARIMA model and WT-XGBoost model, which decreased by 13.32% and 4.93%, respectively. The R-square value of the WT-ARIMA-XGBoost model (0.6994) is higher than that of the WT-ARIMA model (0.6028) and the WT-XGBoost model (0.6674).

Table 3 Evaluation indicator values for layer actions response time under different time granularities

Full size table

In Table 3, when the prediction step is 5, the RMSE value of the WT-ARIMA-XGBoost hybrid model is lower than that of the ARIMA and XGBoost model, which are reduced by 3.38% and 9.44% respectively. The R-square value of the WT-ARIMA-XGBoost model (0.5109) is higher than that of the ARIMA model (0.4760) and the XGBoost model (0.4033). The RMSE value of the WT-ARIMA-XGBoost model is lower than those of the WT-ARIMA model and WT-XGBoost model, which decreased by 0.97% and 8.02%, respectively. The R-square value of the WT-ARIMA-XGBoost model (0.5109) is higher than that of the WT-ARIMA model (0.5012) and the WT-XGBoost model (0.4219).

It can be seen from the Fig. 8 that under the same prediction step, the prediction result of WT-ARIMA-XGBoost model is shown by the red line, RMSE value is always at the bottom of the Fig. 8(a), R-square value is at the top of the Fig. 8(b), MAE value is smaller than other models under different prediction step in the Fig. 8(c). It demonstrates the WT-ARIMA-XGBoost model is more robust than the above model.

Computational complexity of models

To analyse the computational complexity of WT-ARIMA-XGBoost model, we used the big ${O(\cdot )}$ notation to describe the relation between operation quantities and size of layer actions response time series. We assumed that our input layer actions response time series size is N , the multi-step prediction is M predict time step. We both focused on M and N.

In ARIMA model, as given by [29] ,the computational complexity of ARIMA itself is $O((N - p){p^2} + (N - q){q^2})$. So the computational complexity of ARIMA model for one-step prediction is

$$\begin{aligned} O((N - p){p^2} + (N - q){q^2}) \end{aligned}$$

(12)

The computational complexity of ARIMA model for multi-step prediction is

$$\begin{aligned} O(M(N - p){p^2} + (N - q){q^2}) \end{aligned}$$

(13)

In XGBoost model, as reference [30] in shown, if let d be the maximum depth of the tree, K be total number of trees and $\left\| {{X_0}} \right\|$ to denote number of non-missing entries in the training data, the computational complexity of XGBoost model for one-step prediction is

$$\begin{aligned} O(Kd\left\| {{X_0}} \right\| {} \log {} N) \end{aligned}$$

(14)

The computational complexity of XGBoost model for multi-step prediction is

$$\begin{aligned} O(MKd\left\| {{X_0}} \right\| {} \log {} N) \end{aligned}$$

(15)

In WT-ARIMA-XGBoost model, the computational complexity of wavelet transform processing is O(N) [31]. we constructed a multivariate time series matrix with ${m_1}$ size of historical values and ${m_2}$ size of nonlinear component data, which are depended on data and selected as features to model. The computational complexity of WT-ARIMA-XGBoost model for multi-step prediction using multiple-input multiple-output (MIMO) prediction strategy is

$$\begin{aligned} O(N + Kd\left\| {{X_0}} \right\| {} \log ({m_1} + {m_2} + M) + (N - p){p^2} + (N - p){q^2}) \end{aligned}$$

(16)

As we know, predict time step M is much smaller than the size of layer actions response time series N. So our model WT-ARIMA-XGBoost reduces the computational complexity for multi-step prediction, compared with mathematical expression (13), (15) and (16).

Time performance of models

We evaluated the time cost of six models on log data. The time cost refers to the total amount of time spent from the input of the layer actions response time series to the output of predicted value, which includes model training and testing time (the testing time is the predicting time of the time step after the history value series of layer actions responds time). As shown in Table 4, the six models’ time performance for one-step and multi-step predictions are presented.

Table 4 Time performance of layer actions response time in training and testing time of six models

Full size table

It can be seen from the Table 4, the ARIMA model takes the smallest time for one-step prediction of layer actions responds time, while WT-ARIMA-XGBoost model takes the shortest time for multi-step prediction. The major reason for this is that in the WT-ARIMA-XGBoost model, we constructed a multivariate time series matrix and used a MIMO prediction strategy. A multivariate time series matrix effectively represents the relationship between the historical value, linear and nonlinear component of the layer actions response time, which is useful for parallel prediction using the MIMO prediction strategy.

Simultaneously, the time cost of the ARIMA-XGBoost linear and nonlinear combination model is slightly greater than that of the single model (ARIMA, XGBoost) and WT-Based model (WT-ARIMA, WT-XGBoost) in the one-step prediction of the layer action responds time, but the multi-step prediction is reduced. When the linear and nonlinear component of the layer action responds time series are learned and predicted independently, the model training speed increases. Unlike the Iterated strategy, the MIMO prediction can parallelize the prediction value of each time point in multi-step prediction. The premise is that there is a feature matrix with a strong relational expression that defines the historical value and the value to be predicted at each time point. As a result, WT-ARIMA-XGBoost model possesses a multivariate time series matrix to parallelize the prediction.

In conclusion, our WT-ARIMA-XGBoost model improves the accuracy of one-step and multi-step predictions for layer actions response time in cloud GIS services. In terms of computational complexity of models and time performance experiments, it has been shown that our model takes less time in multi-step prediction. This is related to the way we built a multivariate time series matrix and used MIMO prediction strategy. Thus, both the accuracy of the model prediction and the advantage of time performance are ensured. The gap between model accuracy and time use will expand in terms of thematic analysis, spatial analysis and parallel operations.

Conclusion and future works

The layer actions response time in cloud GIS services has a direct impact on the response time of thematic and spatial analysis, which can provide decision-making reference for deployment and allocation of cloud GIS services resources. We propose a WT-ARIMA-XGBoost hybrid prediction model that takes advantage of the linear model (ARIMA) in stationary time series prediction and the nonlinear model (XGBoost) in non-stationary dataset regression. By using the time series characteristics of long-term stable trends and short-term random fluctuations in layer actions response time series, we realize its one-step and multi-step accurate prediction. To overcome the limitation of the traditional assumption that linear and nonlinear components of time series are additive, we use the historical value, predicted value, and historical value of the nonlinear component of layer actions response time to construct the multivariate time series feature matrix, which realizes accurate prediction of the layer actions response time.

Predicting layer actions response time is still a promising and challenging issue for optimising service quality and computing resource allocation of cloud GIS services. In future work, we will focus on finding the optimal solution of layer actions response time historical data dimension and nonlinear component dimension in multivariate time series feature matrix, as well as the relationship between dimension, prediction step, and sampling granularity. To improve the support of dynamic cloud GIS services, we will also consider real-time collection, online processing, and prediction of layer action response time.

Availability of data and materials

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Helmi AM, Farhan MS, Nasr MM (2018) A framework for integrating geospatial information systems and hybrid cloud computing. Comput Electr Eng 67:145–158
Article Google Scholar
Bhat MA, Shah RM, Ahmad B (2011) Cloud computing: A solution to geographical information systems(gis). Int J Comput Sci Eng 3(2):594–600
Google Scholar
Tamiminia H, Salehi B, Mahdianpari M, Quackenbush L, Adeli S, Brisco B (2020) Google earth engine for geo-big data applications: A meta-analysis and systematic review. ISPRS J Photogramm Remote Sens 164:152–170
Article Google Scholar
Yee LS, Khoo V (2010) Spatially enabled Singapore through Singapore geospatial collaborative environment (sg-space). In: Crompvoets J, Kalantari M, Kok B (eds) Spatially enabling society: research, emerging trends and critical assessment. Leuven University Press, Leuven, pp 111–116
Google Scholar
Li R, Dong G, Jiang J, Wu H, Yang N, Chen W (2019) Self-adaptive load-balancing strategy based on a time series pattern for concurrent user access on web map service. Comput Geosci 131:60–69
Article Google Scholar
Li R, Xu T, Shi X, Fan J, Gui Z (2015) A replication strategy based on optimal load balancing for a heterogeneous distributed caching system in networked giss. Geomatics Inf Sci Wuhan Univ 40(10):1287–1293
Google Scholar
Nourikhah H, Akbari MK, Kalantari M (2015) Modeling and predicting measured response time of cloud-based web services using long-memory time series. J Supercomput 71(2):673–696
Article Google Scholar
Yang C, Goodchild M, Huang Q, Nebert D, Raskin R, Xu Y, Bambacus M, Fay D (2011) Spatial cloud computing: how can the geospatial sciences use and help shape cloud computing? Int J Digit Earth 4(4):305–329
Article Google Scholar
Chen Q, Wang L, Shang Z (2008) Mrgis: A mapreduce-enabled high performance workflow system for gis. In: 2008 IEEE Fourth International Conference on eScience. Indianapolis pp 646–651
Keshavarzi A, Haghighat AT, Bohlouli M (2021) Clustering of large scale qos time series data in federated clouds using improved variable chromosome length genetic algorithm (cqga). Expert Syst Appl 164:113840
Article Google Scholar
Degen L, Qin’ou L (2012) Research progress and connotation of cloud gis. Prog Geogr 11:13
Google Scholar
Xin Z, Xiao-dong H, Jia-wei W (2019) Cloud computing based geographical information service technologies. Comput Sci 46(6A):532–536
Google Scholar
Taieb SB, Bontempi G, Atiya AF, Sorjamaa A (2012) A review and comparison of strategies for multi-step ahead time series forecasting based on the nn5 forecasting competition. Expert Syst Appl 39(8):7067–7083
Article Google Scholar
Grigorievskiy A, Miche Y, Ventelä AM, Séverin E, Lendasse A (2014) Long-term time series prediction using op-elm. Neural Netw 51:50–56
Article MATH Google Scholar
Wang Y, Guo Y (2020) Forecasting method of stock market volatility in time series data based on mixed model of arima and xgboost. China Commun 17(3):205–221
Article Google Scholar
Büyükşahin ÜÇ, Ertekin Ş (2019) Improving forecasting accuracy of time series data using a new arima-ann hybrid method and empirical mode decomposition. Neurocomputing 361:151–163
Article Google Scholar
Binkowski M, Marti G, Donnat P (2018) Autoregressive convolutional neural networks for asynchronous time series. In: International Conference on Machine Learning. PMLR, pp 580–589
Pankratz A (2009) Forecasting with univariate Box-Jenkins models: Concepts and cases, vol 224. Wiley
Song Y (2020) Collaborative prediction of web service quality based on user preferences and services. PLoS ONE 15(12):e0242089
Article Google Scholar
Chen Z, Shen L, Li F, Dianlong Y, Buanga MJP (2020) Web service qos prediction: when collaborative filtering meets data fluctuating in big-range. World Wide Web 23(3):1715–1740
Article Google Scholar
Guo J, Liu S, Zhang B, Yan Y (2014) Research on virtual machine response time prediction method based on GA-BP neural network. Math Probl Eng 2014;Article ID 141930:9. https://doi.org/10.1155/2014/141930
Zhou F, Zhou H, Yang Z, Gu L (2021) If2cnn: Towards non-stationary time series feature extraction by integrating iterative filtering and convolutional neural networks. Expert Syst Appl 170:114527
Article Google Scholar
Ji S, Wang X, Zhao W, Guo D (2019) An application of a three-stage xgboost-based model to sales forecasting of a cross-border e-commerce enterprise. Math Probl Eng 2019;Article ID 8503252:15. https://doi.org/10.1155/2019/8503252
Zhang GP (2003) Time series forecasting using a hybrid arima and neural network model. Neurocomputing 50:159–175
Article MATH Google Scholar
Alfaqih TM, Hassan MM (2016) Gis cloud: Integration between cloud things and geographic information systems (gis) opportunities and challenges. Int J Comput Sci Eng (IJCSE) 3(5):360–365
Google Scholar
Dong G, Li R, Jiang J, Wu H, McClure SC (2019) Multigranular wavelet decomposition-based support vector regression and moving average method for service-time prediction on web map service platforms. IEEE Syst J 14(3):3653–3664
Article Google Scholar
Zou Y, Donner RV, Marwan N, Donges JF, Kurths J (2019) Complex network approaches to nonlinear time series analysis. Phys Rep 787:1–97
Article Google Scholar
Chen B, Qin J, Yuan A (2022) Variable selection in the box-cox power transformation model. J Stat Plan Infer 216:15–28
Article MATH Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–22
MATH Google Scholar
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. New York pp 785–794
Wirsing K (2020) Time frequency analysis of wavelet and fourier transform. In: Wavelet Theory. IntechOpen, London

Download references

Acknowledgements

The authors would like to thank National Natural Science Foundation of China,National Key Research and Development Program of China and Information Center of Department of Natural Resources of Hubei Province for supporting for this article.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. U20A2091), Natural resources Science and Technology project of Hubei Province (Grant No. ZRZY2021KJ13) and Zhizhuo Research Fund on Spatial-Temporal Artificial Intelligence (Grant No. ZZJJ202204).

Author information

Authors and Affiliations

State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan, 430079, China
Jiang Li, Jing Cai & Rui Li
Information Center of Department of Natural Resources of Hubei Province, Wuhan, 430071, China
Jiang Li, Qiang Li & Lina Zheng

Authors

Jiang Li
View author publications
You can also search for this author in PubMed Google Scholar
Jing Cai
View author publications
You can also search for this author in PubMed Google Scholar
Rui Li
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Li
View author publications
You can also search for this author in PubMed Google Scholar
Lina Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors have participated in conception and design, or analysis and interpretation of this paper. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Rui Li.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Li, J., Cai, J., Li, R. et al. Wavelet transforms based ARIMA-XGBoost hybrid method for layer actions response time prediction of cloud GIS services. J Cloud Comp 12, 11 (2023). https://doi.org/10.1186/s13677-022-00360-z

Download citation

Received: 09 December 2021
Accepted: 03 November 2022
Published: 20 January 2023
DOI: https://doi.org/10.1186/s13677-022-00360-z

Wavelet transforms based ARIMA-XGBoost hybrid method for layer actions response time prediction of cloud GIS services

Abstract

Introduction

Related work

Data descriptions and model motivation

Layer action data for cloud GIS services

Motivation of model

The WT-ARIMA-XGBoost hybrid predicition method

ARIMA method for stationary part of response time

XGBoost model for the non-stationary of response time

Hybrid prediction method

Experiment and analysis

Evaluation indicators

One-step prediction

One-step prediction and comparison of experimental results

(1) Comparison of WT-ARIMA -XGBoost hybrid model and single model

(2) Comparison of WT-ARIMA -XGBoost hybrid model with WT-Based model

(3) Comparison of WT-ARIMA-XGBoost hybrid model and ARIMA-XGBoost model

(4) Comparison of ARIMA-XGBoost model and single model

The relationship between the embedding dimension of multivariate time series feature matrix and the prediction result of WT-ARIMA-XGBoost model

Multi-step prediction

(1) Comparison of the WT-ARIMA-XGBoost hybrid model with other models under different prediction steps

(2) Comparison of WT-ARIMA-XGBoost hybrid model with other models under same prediction step

Computational complexity of models

Time performance of models

Conclusion and future works

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords