QoS-based ranking and selection of SaaS applications using heterogeneous similarity metrics

The plethora of cloud application services (Apps) in the cloud business apps e-marketplace often leads to service choice overload. Meanwhile, existing SaaS e-marketplaces employ keyword-based inputs that do not consider both the quantitative and qualitative quality of service (QoS) attributes that characterise cloud-based services. Also, existing QoS-based cloud service ranking approaches rank cloud application services are based on the assumption that the services are characterised by quantitative QoS attributes alone, and have employed quantitative-based similarity metrics for ranking. However, the dimensions of cloud service QoS requirements are heterogeneous in nature, comprising both quantitative and qualitative QoS attributes, hence a cloud service ranking approach that embrace core heterogeneous QoS dimensions is essential in order to engender more objective cloud selection. In this paper, we propose the use of heterogeneous similarity metrics (HSM) that combines quantitative and qualitative dimensions for QoS-based ranking of cloud-based services. By using a synthetically generated cloud services dataset, we evaluated the ranking performance of five HSM using Kendall tau rank coefficient and precision as accuracy metrics benchmarked with one HSM. The results show significant rank order correlation of Heterogeneous Euclidean-Eskin Metric, Heterogeneous Euclidean-Overlap Metric, and Heterogeneous Value Difference Metric with human similarity judgment, compared to other metrics used in the study. Our results confirm the applicability of HSM for QoS ranking of cloud services in cloud service e-marketplace with respect to users’ heterogeneous QoS requirements.


Introduction
Cloud computing is a model of service provisioning in which dynamically scalable and virtualized resources, that includes infrastructure, platform, and software, are delivered and accessed as services over the internet [1,2].The popularity of the cloud attracts a variety of providers that offer a wide range of cloud-based services to users in an e-marketplace environment, culminating in an exponential increase in the number of available functionally equivalent cloud services [3,4].Currently, there exist a number of cloud-based digital distribution services such as Saasmax.com, 1 Appexchange.com 2 (viz.cloud e-marketplaces), which host SaaS cloud services (business cloud apps) that are designed to provide specific user-oriented services when selected.The proliferation of cloud application services in the cloud e-marketplace without a systematic framework to guide the selection of the most relevant ones usually leaves the users with the problem of which service to select, a phenomenon that can be described as service choice overload [5][6][7][8].Currently, these existing cloud service e-marketplaces elicit keyword-based search queries that do not allow users to indicate their preferences in terms of quality of service (QoS) requirements and present search results as an unordered list of icons that must be explored individually by a user before making a decision [9].This mode of presentation does not enable the user to discriminate among services in terms of their suitability with respect to user's request, which complicates decision making [10].Decision making can be simplified and service choice overload can be reduced by considering user's QoS requirements and ranking of services based on their QoS attributes so that users can gain quicker insight on the best services that are more likely to satisfy their requirements.
QoS are measurable non-functional attributes that describe and distinguish services and forms the basis for service selection [11,12].However, QoS attributes are usually heterogeneous in nature, covering both quantitative and qualitative (or categorical) attributes.The Service Measurement Index (SMI) [13] defines seven main categories to be considered when comparing QoS of cloud services, which are a combination of quantitative and qualitative measures.These are Accountability, Agility, Assurance, Financial, Performance, Security and Privacy, and Usability.Each category has multiple attributes, which are either quantitative or qualitative in nature.For example, quantitative attributes such as service response time, accuracy, availability, and cost can be measured quantitatively by using relevant software and hardware monitoring tools, whereas qualitative attributes such as usability, flexibility, suitability, operability, elasticity etc. which cannot be quantified are mostly deduced based on user experiences.These qualitative attributes are measured using an ordinal scale consisting of a set of predefined qualifier tags such as good, high, medium, fair, excellent rating etc. [13][14][15].Most of the existing cloud service selection approaches hitherto reported in the literature have overlooked critical dimensions of QoS requirements that are qualitative such as security and privacy, usability, accountability, and assurance in formulating a basis for cloud service ranking and selection.
A number of cloud service selection approaches are based on a content-based recommendation scheme that explores the similarity between the QoS attributes of the user's requirements and the features description of specific cloud services in order to rank them [16][17][18][19].Most of these approaches have only considered quantitative attributes for their ranking of services, which is based on the assumption that all QoS attributes are quantitative in nature, and therefore used quantitative similarity metrics such as exponential weighted difference metric or weighted difference metric [17].This form of assumption is deficient to adequately model the heterogeneous nature of QoS requirements, as a precursor to creating a credible basis for comparing and ranking cloud services.Also, there are instances such as [20,21], where steps were taken to quantify specific qualitative attributes such as security or usability in order to apply homogeneous distance metrics on them for the purpose of decision making.The drawback of this approach is that since cloud QoS attributes are usually heterogeneous in nature, heterogeneous metrics are more likely to produce better generalization over time on heterogeneous data [22,23].This scenario imposes a limitation on approaches where quantification of qualitative attributes has been undertaken for the purpose of cloud service ranking and selection.
In order to achieve an effective QoS-based ranking of cloud services in cloud service e-marketplaces, there is a need for a service selection approach that considers both the quantitative and qualitative QoS dimensions that characterises cloud services and is able to rank cloud services accurately with respect to user requirements using heterogeneous similarity metrics.
In this paper, we propose the use of in similarity metrics that combines quantitative and qualitative dimensions to rank cloud services in cloud e-marketplace context based on QoS attributes.An experimental study of five heterogeneous similarity metrics was conducted to ascertain their suitability for cloud service ranking and selection using a simulated dataset of cloud services.This is in contrast to previous work in the domain of cloud service selection.
The remaining part of this paper is as follows: Section "Background and Related Work" provides background to the context of this work, and also a discussion of related work.In section "Heterogeneous Similarity Metrics for Cloud Service Ranking and Selection" we give the descriptions of the five heterogeneous similarity metrics used in this study, while the empirical results of the comparison of the ranking performance of the metrics were presented in Section "Experimental Evaluation and Results".A discussion of the findings of this study is contained in Section "Discussion".The paper is concluded in Section "Conclusion" with a brief note and an overview of further work.

Background and related work
The relevant concepts that underpin this study and an overview of related work are presented in this section.

Cloud service e-marketplace
The e-marketplace of cloud services provides an electronic emporium where service providers offer users a wide range of services for users to select from [24][25][26].Similar to Amazon 3 or Alibaba 4 that deal in commodity products, the goal of a cloud service e-marketplace such as SaaSMax, and AppExchange is to provide a facility for finding and consuming cloud services, by allowing users to search for suitable business apps that offer user-oriented services that match their QoS requirements.However, unlike commodity products, cloud services possess QoS attributes that distinguish functionally equivalent services from each other.The profitability of the cloud service e-marketplace is realised by users' ability to easily and quickly find and select suitable services that meet their QoS requirements.However, most cloud service e-marketplaces in existence do not consider QoS information from the users but rely on keyword matching, and the results are not ranked in a manner that makes the differences among the services to be obvious with respect to users' requirements.This leads to service choice overload because a large number of services are presented as an unordered list of icons that require the user to further investigate the differences between the services by checking them one after the other.The discrimination of services based on their QoS information is a panacea towards reducing service choice overload as the cloud service QoS model encompasses Key Performance Indicators for decision making [27].Besides, the QoS model comprises the important comparable characteristics of each service, and suitable for matching user QoS requirements to services' QoS attribute [28].One of the most comprehensive International Standard Organization (ISO) certified QoS model for cloud services is the Service Measurement Index (SMI) [13].

Service measurement index
The Service Measurement Index (SMI) is developed by the Cloud Services Measurement Initiative Consortium (CSMIC).The SMI is a framework of critical characteristics, associated attributes, and metrics that can be used to compare and evaluate cloud-based services from different service providers [27,29].SMI was designed as the standard method to measure any type of cloud service (i.e.XaaS) based on the user requirements.The SMI is a hierarchical framework, with seven top-level categories, which are Accountability, Agility, Assurance, Financial, Performance, Security and Privacy, and Usability and each category is further broken into four or more attributes that underscore the categories.Based on the SMI QoS model, it is obvious that some metrics are quantitative in nature while others are qualitative.Quantitative QoS metrics are those which can be measured and quantified (e.g.response time, throughput); whereas, qualitative QoS metrics is subjective in nature and are only inferred by user's feedback (e.g.security, usability etc.).Cloud services can be assessed and ranked based on both QoS metric dimensions, i.e., quantitative and qualitative, by comparing the similarity of user's QoS requirements and service QoS properties, thus following a content-based approach.

QoS similarity-driven cloud service ranking
The similarity is a measure of proximity between two or more objects or variables [30] and it has been applied in domains that require distance computation.Similarity can be measured on two types of data: quantitative data (also called numerical data) and qualitative (also called categorical/nominal data) [31].Many metrics have been proposed for computing similarity on either quantitative data or qualitative data.However, few metrics have been proposed to handle datasets containing a mixture of both quantitative and qualitative data.Such metrics usually combines quantitative and qualitative distance functions.For quantitative data, a generic method for computing distance is Minkowsky [32], with widely used specific instances such as the Manhattan (of order 1) and Euclidean (of order 2).The computation of similarity for quantitative data is more direct, compared to qualitative data, because quantitative data can be completely ordered, while comparing two qualitative values is somewhat complex [31].For example, the overlap metric [33], assigns a similarity value of 1 when two qualitative values are the same and 0 otherwise.In the context of selecting cloud services from the list of available services, the ranking of services based on the heterogeneous QoS model necessitates the application of similarity metrics that can handle mixed QoS data.The notion of similarity considered in this paper is between vectors with the same set of QoS properties, which might differ in their QoS values i.e. users' QoS requirements and service QoS descriptions.

Related work
The success of a cloud service e-marketplace is hinged on adequate support for satisfactory selection based on the QoS requirements of the user.So far in the literature, the approaches used for cloud service ranking and selection can be broadly classified as content-based filtering, collaborative filtering, and multi-criteria decision-making methods.Instances of collaborative filtering-based approaches include Clou-dRank, which is a personalised ranking prediction framework that utilises a greedy-based algorithm.It was proposed in [18] to predict QoS ranking by leveraging on similar cloud service user's past service usage experiences of a set of cloud services.The ranking is achieved by finding the similarity between the user-provided QoS requirements and those of other users in the past.Similar users are identified based on these similarity values and services are ranked accordingly.In contrast to our work, Clou-dRank [18] did not consider the computation of vector similarity between cloud services and user-defined QoS requirements.
CloudAdvisor, a Recommendation-as-a-Service platform was proposed in [34] for recommending optimal cloud offerings based on a given user preference requirements.Users supply preference values to each property (energy-level, budget, performance etc.) of the cloud offerings, and the platform recommends available optimal cloud offerings that match user's requirements.Service recommendations in [34] are determined by solving a constraint optimization model and users can compare several offerings automatically derived by benchmarking-based approximations.However, the QoS dimensions considered in [34] are mainly quantitative and do not reflect the holistic heterogeneous QoS model of cloud services.
Selection of cloud services in the face of many QoS attributes is a type of Multi-criteria Decision Making (MCDM) [14].Considering the multiple QoS criteria involved in selecting cloud services, [14] propose a ranking mechanism based on Analytical Hierarchical Process (AHP) to assign weights to non-functional attributes to quantitatively realise cloud services ranking.Apart from the complexity in computing the pairwise comparisons of the attributes of the cloud service alternatives, this approach is most suitable when the number of cloud services is few, which is not the case in a cloud service e-marketplace that comprises numerous services.Besides, in the approach proposed in [14], users cannot determine the desired values of the QoS service properties, and services are ranked based on quantitative QoS attributes alone.
Content-based filtering approaches include [17] in which a ranked list of services that best match user requirements is returned based on the nearness of user's QoS requirement to the QoS properties of cloud services in the marketplace.Also, Rahman et al. [17] proposed an approach to select cloud service based on multiple criteria that select services that best match the user's QoS requirements from a list of services by comparison.The authors introduced two methods, Weighted Difference, and Exponential weighted Difference, for computing similarity values.It is however assumed in [17] that all cloud service QoS attributes are quantitative, thereby ignoring the qualitative QoS attributes of services.In [35] a QoS-driven approach called MSSOptimiser, which supports the service selection for multi-tenant cloud-based software applications (Software as a Service -SaaS) was proposed.In the work, certain qualitative and non-numerical QoS parameters such as reputation were mapped to numerical values based on a pre-defined semantics-based hierarchical structure of all possible values of a non-numerical QoS parameter in order to quantify the qualitative parameters.Also, in [20] Multiattribute Decision-Making framework for cloud adoption -MADMAC was proposed.The framework allows the comparison of multiple attributes with diverse units of measurements in order to select the best alternative.The work requires the definition of Attributes, Alternatives and Attribute Weights, to construct a Decision Matrix and arrive at a relative ranking to identify the optimal alternative.An adapted Likert-type scale from 1 to 10 was used by the MADMAC to convert all qualitative attributes to their quantitative equivalent, where 1 indicates very unfavourable, 5 indicates neutral, 6 indicates favourable, and 10 indicates a near perfect solution.However, in all of these cases, a standard cloud services measurement and comparison model such as SMI was not considered, which means that the QoS attributes used only covered a limited range of heterogeneous dimensions (qualitative and quantitative), which may not provide a sufficiently robust basis for decision making on cloud services.
In contrast to previous approaches, our approach considers the heterogeneity of cloud QoS Model that combines quantitative and qualitative QoS data, which to the best of our knowledge, represents a first attempt to use heterogeneous similarity metrics for QoS ranking and selection of services in the context of a cloud service e-marketplace.

Heterogeneous similarity metrics for cloud service ranking and selection
By giving due consideration to the heterogeneous nature of the cloud services QoS model, this paper proposes the use of heterogeneous similarity metrics (HSM) for cloud service ranking and selection.In this Section, we present an overview of HSM, the rationale for selection of HSM that have been selected in this study, and a description of the five selected HSM for cloud service ranking and selection.

Overview of heterogeneous similarity metrics
To measure the similarity between quantitative data, metrics such as Murkowski metrics [32], its derivatives (Manhattan and Euclidean), Chebyshev and Canberra metrics have been proposed.Also, metrics such as Overlap [33], Eskin [36], Lin [37] and Goodall [38], have also been proposed for qualitative similarity computations.However, these quantitative or qualitative metrics alone are insufficient for handling heterogeneity, except when combined into a unified metric that applies different similarity metrics to different types of QoS attributes [22].The resultant combination can be referred to as a heterogeneous similarity metric (HSM) [22].Authors in [22] proposed Heterogeneous Euclidean-Overlap Metric (HEOM) and Heterogeneous Value Difference Metric (HVDM) as metrics for computing similarity operations on heterogeneous datasets.The HEOM metric employs range-normalized Euclidean metric (Eq.4) for quantitative QoS attributes, while Overlap metric is employed for qualitative QoS attributes; while the HVDM uses the standard-deviation-normalized Euclidean distance (Eq.7) and value difference metric, for quantitative and qualitative QoS attributes respectively.The HEOM and HVDM have been applied for feature selection and instance-based learning in real-world classification tasks [22].

Rationale for selected qualitative metrics
A number of qualitative similarity metrics have been proposed in the literature and we selected at least one qualitative metric from each of the categories defined in [31] to create additional heterogeneous similarity metrics for QoS-based cloud service ranking and selection.The categories are as follows: Metrics that fills diagonal entries only: Qualitative metrics that fall into this category include the Overlap [33] and Goodall qualitative metrics [38].In the overlap metric, the similarity between two multivariate data points is directly proportional to the number of attributes or dimensions in which they both match.However, the overlap metric does not distinguish between the different values taken by an attribute as it treats all similarities and dissimilarities in the same manner.On the other hand, the Goodall metric takes into account the frequency distribution of different attribute values in a given dataset and computes the similarity between two qualitative attribute values by assigning higher similarity to a match when the attribute value is frequent.Metrics that fill off-diagonal entries only: an example of a metric in this category includes the Eskin metric [36].The Eskin metric gives more weight to mismatches that occur on attributes that take many values.In addition, the maximum value is attained when all the attributes have unique values.Metrics that fill both diagonal and off-diagonal entries: the Lin metric [37] is a typical example of such metrics.The Lin qualitative metric is applied in contexts that involve ordinal, string, word and semantic similarities.The metric assigns higher weights to matches on frequent values, and lower weight to mismatches on infrequent values.

Five heterogeneous similarity metrics for cloud service ranking and selection
Apart from HEOM and HVDM, we introduced an additional three HSM by combining existing similarity metrics used for either quantitative or qualitative data alone.The new HSM are as follows: Heterogeneous Euclidean-Eskin Metric (HEEM), Heterogeneous Euclidean-Lin Metric (HELM), and Heterogeneous Euclidean-Goodall Metric (HEGM).HEEM combines range-normalized Euclidean distance for the quantitative dataset, while Eskin metric [36] was employed for qualitative QoS.While the rangenormalized Euclidean distance is employed for computing quantitative QoS values in both HELM and HEGM, HELM applies the Lin metric and HEGM used the Goodall metric to compute on qualitative QoS values.
In all, the five HSM considered in this paper are as follows: HEOM (Eq.1), HVDM (Eq.5), HEEM (Eq.9), HELM (Eq.12) and HEGM (Eq.15).While the components for measuring quantitative and qualitative data aspects are shown in Table 1, the underlying mathematical equations that describe each of the HSM are presented subsequently based on the assumption that X and Y are vectors representing the values of the user QoS requirements and a QοS vector of a cloud service s i belonging to service list S, such that X = (x 1 , x 2 , … x m ) and Y = (y 1 , y 2 , … y m ); x m and y m corresponds to the value of the m th QoS attribute of the users requirement and QoS attribute of the cloud service s i respectively.
Subsequently, we describe each of the proposed heterogeneous metrics in details.

Heterogeneous Euclidean-overlap metric (HEOM)
Where And overlap (x, y) and rn _ diff i (x, y) are defined as Heterogeneous value difference metric Where N q i ; x is the number of instances (cloud app services) available in the marketplace that have value x for QoS attribute q i ; N q i ; x ; c is the number of instances available on the marketplace that have value x for QoS attribute q i and output class c; C is the number of output classes in the problem domain (in this case, C = 3, corresponding to the High, Medium and Low); P q i ; x ;c is the conditional probability of output class c given that QoS attribute q i has the value x, i.e.P(c| q i = x), computing as N q i ; x ; c N q i ; x .However, if N q i ; x ¼ 0, then P(c| q i = x) is also regarded as 0.
Heterogeneous Euclidean-Eskin metric Heterogeneous Euclidean-Lin metric Heterogeneous Euclidean-Goodall metric Where n i = the number of values that QoS attribute q i can assume (e.g. for security QoS attribute denoted by q security , n security = 3; corresponding to the number of values that security QoS attribute can assume: High, Medium and Low) Where pqi ðxÞ and p2 qi ðxÞ are the sample probability of QoS attribute q i to take the value of x in the data set (in this case the available services on the e-marketplace); computed as pqi ðxÞ ¼ N q i ; x The total number of services is denoted as N.

Experimental evaluation and results
In this section, we present an experimental assessment of the ranking accuracy of the five selected HSM on a synthetically generated dataset for cloud services.A synthetically generated QoS dataset was used because a real QoS dataset for cloud services that perfectly fit the context of our experiment could not be found.Alkalbani et al. [39] alluded to the paucity of viable datasets for cloud services.The Blue Pages dataset in [39] is the closest dataset on cloud services that we got, but it is not based on QoS cloud services.Rather, it provides data on different service offerings such as service name, the date the service was founded, service category, free trial (yes/ no), mobile app (yes/no), starting price, service description, service type, and provider link as extracted from two cloud services review sitesgetapp.com,and cloudreviews.com,which does not fit perfectly for the purpose of this study.However, we found some previous studies on cloud services that relied on a synthetically generated dataset or simulated datasets to perform experiments on cloud services [40][41][42][43], which motivated our decision to use a synthetically generated dataset.In order to synthesise the dataset, 6 attributes were selected from 6 categories of the SMI (see Table 2).The SMI was used as the basis for data synthesis because it provides a standardised method for measuring and comparing cloud-based business services [14].The 6 selected attributes comprising 3 quantitative and 3 qualitative attributes were those considered to be relevant to the context of SaaS.The 6 selected attributes are service response time, availability, cost, security, usability, and flexibility.The goal of the experiment is to investigate the ranking accuracy of the HSM compared to a gold standard obtained by human similarity judgment.

Dataset preparation
The data values for the selected SMI attributes were synthesised based on examples from previous evaluation studies [44][45][46][47], and related papers on cloud service selection such as [14,28,41,47] that revealed acceptable data formats for quantitative attributes such as response time, cost, and availability.We generated random qualifier values for the other qualitative attributes, which are usability, security, and flexibility.Consequently, we used a total of six QoS attributes with a typical data format as shown in Table 3.For simplicity, we limited the qualifier values for usability, security, and flexibility to high, medium and low.We simulated multiple instances of the adopted format for the six attributes in order to obtain a dataset comprising a total of 63 services after sorting by response time in ascending order.It must be said that in order to deploy our approach in a real case scenario, the QoS attributes of a service will have to be specified by the service provider and made accessible to the user as part of the service documentation that a user needs to consider in order to take a decision on which service to select.One of the available means to do this is to leverage relevant SMI measurement templates provided by the Cloud Service Measurement Index Consortium (CSMIC) [48].
Furthermore, the initial set of SMI templates by CSMIC has been extended by Scott Feuless in [49] to evolve metrics and SMI scored frameworks that enable specific SMI attributes to be scored by an organisation.The purpose of the SMI scored framework [50] is to enable a customer to evaluate a cloud service in order to make a right choice.By using the SMI scored framework or a similar model, the cumulative scores for specific SMI categories, and the scores for individual SMI Service Response time Performance This is the measure of the time between when a request is made and a response is constructed for the user [13].
Quantitative Ms (2, 3, 0.5 etc.) Availability Assurance This is a measure of the likelihood of the duration of time when the service will be in operation without downtime [13].
Quantitative % (99.5, 99.9, 999.9 etc.) Cost Financial This is the cost of acquisition or usage cost of a service by a user [13].
Quantitative $ (20,30,40,5) Usability Usability This is the ease with which a service can be used, operated, learned, understood, and installed by the user [13,42].
Qualitative Categorical range {High, Medium, Low} Security Management Security and Privacy This is a rating of the extent to which a service can satisfy user security requirements in terms of access control, privacy, data, infrastructure etc. [13] Qualitative Categorical range {High, Medium, Low} Flexibility Agility This is the rating of the ability to add or remove predefined features from a service in order to accommodate users' preferences [13].
Qualitative Categorical range {High, Medium, Low} attributes of a cloud service can be obtained.However, determining the cumulative scores for each SMI attribute is a manual process that is qualitatively driven by experts within an organization.Thus, having the SMI scored frameworks (or similar scoring models) for several cloud services, creates the basis for the application of the HSM that this paper proposes.The HSM can be applied for automated ranking and selection of the cloud services in real-time in order to determine the best cloud service offerings in the midst of several alternatives.This will offer a major advantage over the use of a manually-generated SMI scored frameworks [50] for ranking and selection of cloud services.

Evaluation metrics Kendal tau coefficient
Kendall's tau coefficient, denoted as τ is used to measure the ordinal association between two variables.The Kendall correlation between two variables will be high when the top-k list produced by the five HSM and gold standard has a correlation value of 1, and low with a correction value of − 1.The Kendall tau coefficient is computed as follows: Where C = Concordant pairs; D = Discordant pairs; k is the number of top-k items produced by the methods.

Precision metric
Precision, a measure used in information retrieval domains, was adapted here to evaluate the relevance of the output obtained from each metric with respect to the content of the gold standard.Precision is the fraction of cloud services obtained from the HSM that is contained in the gold standard.The gold standard output was used as the benchmark to determine the precision of each metric as we determined how many of the top-k services returned by the metrics include the services contained in the gold standard.We computed the precision of each metric as we varied the number of k.We define Precision as: Where TKS = Top-k Cloud Services returned by HSM and GS = Number of Services in Gold Standard.

Experiment design and protocol
We recruited 12 undergraduates students in Computing and Engineering fields (male = 9, Female =3), on the basis that 12 participants offer an acceptably tight confidence interval [51].We used one of the services from the dataset as the user requirements and asked participants to rank the remaining 63 services according to similarity to the user requirements.The user requirements vector R selected is as follows {302.75, 126, 99.99, Medium, Low, Low} respectively corresponding to values for Response Time, Cost, Availability, Usability, Security Management, and Flexibility.
To simplify the similarity judgement exercise, we converted the QoS values of the services in the dataset into line graphs, such that the user requirements is plotted against each of the remaining 63 services; and the qualitative values High, Medium and Low were mapped to numerical values of 50, 30 and 10 respectively for illustration purposes.For example, Fig. 1 shows the line graphs of the user requirement with another service, based on the QoS information contained in Table 4.
The participants were taken through a 15 min tutorial to explain the purpose of the experiment and basic training on the similarity evaluation exercise.After the training, the participants were shown the 63 line graphs and were asked to agree or disagree (on a 1 to 7 Likert scale) with the proposition: 'The two Lines graphs are similar.'The questionnaire contained 63 items corresponding to the 63 services been ranked.The responses from the 12 participants were analysed and we determined the Mean of the response to each item, which indicates unanimously which service is most similar to the user requirements.We aggregated the responses from all participants by finding the median responses across the 63 items presented in the questionnaire.The median scores were sorted in descending order to indicate the degree of similarity of the 63 services to the user requirements.Higher median scores indicate higher similarity and vice versa.
The HSM was implemented in Java and used to rank the 63 services used in this experiment with respect to the user requirements.The simulation was conducted on an HP Pavilion with Intel Core (TM) i3-3217 U CPU at 1.80GHz 1.80 GHz processor and 4.00GB RAM on 64-bit Operating System, an × 64based processor running Windows 8.1.The ranking produced by the HSM was compared with those produced by human subjects using the Kendall tau coefficient, while the accuracy of the ranking produced was measured using the gold standard as a benchmark based the precision metric.

Precision
High precision connotes that the heterogeneous similarity metrics ranked and returned more relevant services as contained in the gold standard.We used the ranking produced by HEOM as the gold standard and served as the benchmark to measure the precision of the rankings produced by the other HSM used in the evaluation.The value of top-k ranged from 5, 10, 15, 20 and 25.Based on the analysis shown in Fig. 2, we observed that HEEM consistently gave the highest precision accuracy across the ranges of k, followed slightly by HVDM, meanwhile HELM had the least.

Discussion
Based on the results of the rank order correlation and ranking accuracy measured by precision metrics precision, HEEM performed relatively well in comparison to HVDM viz a viz the ranking produced by HEOM.Although the HEOM and the HVDM are known heterogeneous similarity metrics and have been employed for similarity computations [22,52], this paper was the first to apply these metrics, together with the three proposed in this paper, to rank cloud services by considering heterogeneous nature of cloud services QoS model.The application of HSM in ranking cloud services provides a more credible basis for cloud service ranking and selection.In this paper, we have been able to consider the heterogeneous dimensions of the QoS model that defines cloud services that have been hitherto overlooked by previous cloud ranking and selection approaches.Based on the results of the experimental evaluations, we showed that not only is the HEEM a promising metric for ranking heterogeneous dataset, it can also be applied to accurately rank cloud services in cloud service e-marketplace contexts with respect to user requirements.Generally, the results of the experimental evaluation show the suitability of HSM for ranking cloud services in a cloud service e-marketplace context.More specifically, HEOM, HEEM, and HVDM show considerable ranking accuracy compared to  HEGM and HELM.Therefore, a cloud service selection approach that uses HSM to rank cloud services is more suitable compared to approaches that consider only quantitative QoS attributes.

Conclusion
The emergence of cloud service e-marketplaces such as AppExchange, SaaSMax, and Google Play Store as a one-stop shop for demand and supply of SaaS applications further contributes to the popularity of cloud computing, as a preferred means of provisioning and purchasing cloud-based services.Despite the fact that existing cloud e-marketplaces do not consider user's QoS requirements, the search results are presented as an unordered list of icons making it difficult for users to discriminate among services shown.Moreover, existing cloud service ranking approaches assume that cloud services are only characterised by quantitative QoS attributes.The main objective of this paper is to extend existing approaches by ranking cloud services in accordance with user requirements while considering the heterogeneous nature of QoS attributes.We demonstrated the plausibility of applying heterogeneous similarity metrics in ranking cloud services and evaluated the performance of five (two known metrics and three new metrics) heterogeneous similarity metrics using rankings produced by the human judgement as a benchmark.The experimental results show that the QoS rankings obtained from HEOM, HEEM and HVDM correlates closely with human similarity assessments compared to other heterogeneous similarity metrics used in this study.Thus, confirming the suitability of heterogeneous similarity metrics for QoS-based ranking of cloud services with respect to the user's QoS requirements in the context of a cloud service e-marketplace.Although we have used only one user's QoS requirements as an example to describe the scenario of a QoS-based ranking of cloud services, similar studies can be performed using a variety of user QoS requirements and QoS datasets to further validate the results obtained in this paper.In the nearest future, the proposed heterogeneous similarity metrics will be integrated into a holistic framework for cloud service selection, and more experimental evaluations would be performed to ascertain the user experience of metrics proposed to rank and select cloud services in cloud service e-marketplace.

Fig. 1
Fig. 1 Line Graph showing Cloud Service QoS Vs.User QoS Requirements.The line graph graphically depicts the similarity of the QoS properties of the cloud services and the QoS requirements of the users.Panel (a) shows that there is a perfect match between the User's QoS requirement and the QoS properties of the cloud service; while Panel (b) shows a variance between the QoS properties of the cloud service and the QoS requirement of the user

Fig. 2
Fig. 2 Precision Score of the heterogeneous similarity metrics (HEEM, HEGM, HVDM, HELM).Precision of the heterogeneous similarity metrics (HSM) measures how many relevant cloud services were ranked and returned by HSM as contained in the gold standard.The gold standard contained the ranking of services produced by HEOM, and it served as the benchmark to measure the precision of the rankings produced by other HSM including HEEM, HEGM, HVDM and HELM.The value of top-k ranged from 5, 10, 15, 20 and 25.HEEM had the highest precision score on all values of k compared to other HSM

Table 1
Summary of Heterogeneous Similarity Metrics

Table 2
Definition and Description of the Six QoS Attributes Ezenwoke et al.Journal of Cloud Computing: Advances, Systems and Applications (2018) 7:15

Table 3
Perfect Match of services and user requirements

Table 4
Difference in Service and User Requirements

Table 5
Kendall Tau Rank Correlation Coefficients