Can functional characteristics usefully define the cloud computing landscape and is the current reference model correct?

Caithness, Neil; Drescher, Michel; Wallom, David

doi:10.1186/s13677-017-0084-1

Research
Open access
Published: 05 June 2017

Can functional characteristics usefully define the cloud computing landscape and is the current reference model correct?

Journal of Cloud Computing volume 6, Article number: 10 (2017) Cite this article

3487 Accesses
1 Citations
6 Altmetric
Metrics details

Abstract

The NIST definition of cloud computing has been accepted by the majority of the community as the best available description to fully capture the variety of factors which determine how different stakeholders create, use or interact with cloud computing. With the breadth of the cloud computing landscape there is a need being expressed from within different cloud activities to consider how it may be best segmented so that the diversity might be more easily understood by the different stakeholders. The NIST definition considers four different deployment models (Private, Public, Hybrid, Community Cloud), three different service models (IaaS, PaaS, SaaS), and a number of characteristics (five in the final published version, but 13 in previous unpublished drafts). Exploring the definition further, this study aims to answer two questions: first, how can we use the affinity that different activities have with the definition’s characteristics and second, how well does the definition describe the whole cloud ecosystem? We find that utilising a quantitative methodology shows a clustering of different cloud projects and activities that are technically aligned and therefore likely to benefit from interactions and shared learning, and that the final (short-list) definition is more robust than the draft (long-list) definition. Finally, we present a segmentation of the cloud landscape that we believe can best support a sharing of learning between projects in individual clusters.

Introduction

Since the emergence of cloud computing as a distinct paradigm within distributed computing, and as an important emerging market for ICT based services, there have been a number of efforts to support and encourage the adoption of cloud computing, as well as to foster a more geographically diverse cloud computing provider community. This has resulted in a large number of research and innovation projects receiving European Commission (EC) support over the past five years through the FP7 and H2020 programmes.

As part of the methodology to ensure success of supported projects there have been regular funder led attempts to bring projects together to share learning. Using the domain description as the key differentiator it was thought that synergistic clusters of projects similar enough to share learning on both technical and social best practices would naturally emerge. Unfortunately, this has resulted in only limited appeal, as it has been unclear to participants exactly how the clusters are to be useful or effective as they are only superficially similar but differ widely in the cloud technologies and techniques in use.

Parallel to this European experience, in the US attempts to characterise the diverse landscape of cloud computing began at the National Institute of Standards and Technology (NIST). After years of development and 15 drafts, the final version of the cloud computing definition was published in September 2011 [5]. We describe this reference model more fully below but here we highlight a key feature—the model defines a limited set of functional characteristics that can be used to derive a quantitative description of the emerging cloud computing landscape. At a time when a large number of EC supported projects are maturing, we take the opportunity to make a quantitative assessment of their affinity to these functional characteristics and to derive a robust quantitative description of the cloud landscape. We identify clusters of projects that fully segment the landscape and provide a rational basis for enhancing shared learning.

Finally, appreciating the diversity of cloud computing activities is crucial to the process of deriving consistent and useful standards for cloud adoption and interoperability. We present a segmentation of the landscape of cloud computing that we believe can best inform this process.

ISO published cloud standards

The International Standards Organization and the International Electrotechnical Commission, together with the International Telecommunication Union, have recently released a new International Standard for cloud computing, ISO/IEC 17788, Cloud computing–Overview and Vocabulary and ISO/IEC 17789, Cloud computing–Reference architecture. These new standards provide definitions of common cloud computing terms, including those for cloud service categories such as Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS). They also specify the terminology for cloud deployment models such as “public” and “private” cloud. Crucially, both new standards draw on previous developments by NIST, including SP 800-145 [5] which has provided the primary dataset for this study. In light of these evolving developments we regard our present study as a timely appraisal of the foundational work.

Defining a methodology

Cloud computing resides in a complicated ecosystem of stakeholders with differing requirements and expectations. Even with a broad consensus that cloud computing is a general term describing anything that delivers hosted services over the Internet, interpretations of this vary widely and the field is subject to excessive hyperbole. Consequently, there is great latitude for interpretation on what exactly constitutes cloud computing and there are many approaches that might be adopted to gain insight into this dynamic and difficult to grasp ecosystem.

We require our approach to produce repeatable, insightful, and accurate information on the landscape and its segmentation. We therefore require the method to be evidence based rather than merely prescriptive. To this end, we define a methodology as follows:

1.
Adopt the NIST characteristics of cloud computing as variables against which to make quantitative assessments.
2.
Engage with project participants in the quantitative self-assessment of affinity to the characteristic variables.
3.
Apply repeatable unsupervised machine learning techniques to these data as evidence-based characterisation of the total cloud computing landscape.
4.
Test for the robustness of the landscape as defined by the NIST characteristics.
5.
Derive a clustering of cases (projects) along the complex dimensions of cloud computing and thereby facilitate the experience of shared learning.
6.
Derive a segmentation of the landscape with respect to interactions among the variables (characteristics) to inform the process of standards development.

NIST defining characteristics of the cloud

The NIST model is the most commonly cited third party definition of cloud computing. As previously noted this definition took significant time and number of iterations before final publication, indicative of the difficulties associated with reaching a consensus on which even a restricted set of experts in the field could agree.

By consensus agreement [5, p.2]

Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (…) that can be rapidly provisioned and released with minimal management effort or service provider interaction.

The NIST model intends to capture this complexity in simple, understandable characteristics. The model is composed of five essential characteristics, three service models, and four deployment models. In earlier drafts of the definition NIST included a further eight common characteristics. These were dropped in the final published version. At the time of writing the earlier draft had been replaced in all on-line references that we could find. However, we were able to collect self-assessment data on the full set of 13 characteristics. We present analyses of both the NIST long- and short-lists.

Here we provide short explanations of each of the 13 characteristics:

Essential characteristics (1-5)

(Note that these descriptions of the five essentials are intended to be similar, but not necessarily identical to the NIST published descriptions. This was due to the need for brevity to give projects with which we engaged succinct and easy to understand and rate definitions.)

On-demand self-service

Consumers can log on to a website or use web services to access additional computing resources on demand, whenever they want, without human interference in the process.

Broad network access

Because they are web-based, you can access cloud-computing services from any internet-connected device. With a web browser on a desktop machine, or even a thin client computer terminal, you can do any computing that the cloud resources provide.

Resource pooling

In multi-tenanted computing clouds the customers share a pool of computing resources with other customers, and these resources, which can be dynamically reallocated, may be hosted anywhere.

Rapid elasticity

Cloud computing enables computing resources or user accounts to be rapidly and elastically provisioned or released so that customers can scale their systems up and down at any time according to changing requirements.

Measured service

Cloud computing providers automatically monitor and record the resources used by customers or currently assigned to customers, which makes possible the pay-per-use billing model that is fundamental to the cloud-computing model.

Common characteristics (6-13)

Massive Scale

A cloud platform may, depending on the resources offered, provide individual users with access to large-scale or even massive-scale computing.

Homogeneity

In many situations, it is advantageous to both customers and providers to have essentially homogeneous systems at their disposal. Where requirements are particularly difficult or unusual, a cloud platform may be built out of non-homogeneous systems and components.

Virtualisation

Virtualisation of machines as software systems massively increases the scale of cloud resources that can be made available. Virtualisation is not an essential characteristic but it is becoming the only way that scale demands can be met by providers; customers generally don’t care either way as the virtualisation is entirely transparent.

Low cost Software

If increased scale reduces per-unit, or per-use cost, then cloud-computing offers a drive towards lower- cost software. It is important to note that this may not be the case across all sectors and activities.

Resilient computing

In some sectors, continuous availability of computing with zero-downtime is crucial to the sectors requirements, for example, emergency and financial systems. In these sectors, requirements for resilient, rather than just fail-safe computing will be the norm.

Geographic distribution

Some sectors have legal requirements that physical data stores are in particular geographical jurisdictions. This places certain restrictions on providers favouring a cloud-anywhere model. More commonly, the user is not concerned about location per se.

Service orientation

The design of the services that run and operate on the cloud frameworks are normally operated as services such they can take advantage of other factors that give resilience. This includes the ability to scale different components within the system depending on their load and capability.

Advanced security

There may be the capability to perform both system and network level security within the cloud system.

Data collection by quantitative self-assessment

Our final sample used in this analysis includes as many of the projects funded under the EC FP7-ICT-2013-10 calls 5, 7, 8 and 10 from which we were able gather data. We approached authoritative project representatives and asked them to self-asses their project’s affinity to the 13 characteristics based on the descriptions given above on an ordinal scale from 0-9, with 0 indicating no affinity (the project is indifferent to considerations of this feature) and 9 indicating the strongest affinity (the project regards this feature to be crucially important).

We were able to compile responses from 37 projects (the SeaClouds project supplied two scorings independently from two different representatives). Table 1 gives a complete list of project names and self-assessed scores on the 13 variables.

Table 1 Self-assessed scores of 38 cases (projects) on 13 variables

Full size table

Goals and techniques of analysis

Faced with a cases-by-variables data matrix (as in Table 1) the analyst seeks to summarise, simplify and explain. With high-dimensional data, there are a number of issues of interest:

What are the relationships among the cases?
What are the relationships among the variables?
What insights may be gained from a summarized joint representation?
Is the summary robust against possible variations and errors of case sampling?

Specifically for this analysis, the relationships among cases (cloud projects) lets us propose a clustering of projects that are technically aligned and therefore likely to benefit from interactions and shared learning. The relationships among variables (NIST characteristics) lets us present a segmentation of the cloud landscape that we believe can inform where different projects may find examples of best practice or technology choices suitable in those type of projects. The joint representation informs projects about their position in the landscape, and informs standards developers about the relevant cloud activities when considering the landscape characteristics. Finally, measuring robustness for different partitions of the variables (all 13 characteristics; 5 essentials; 8 common) lets us reach conclusions about which of these partitions may be more meaningful and in that sense better.

Dimension reduction

For any high-dimensional dataset, the statistical techniques of dimension reduction are indispensable to the aim of summarizing, simplifying and explaining meaning contained in the dataset. Here we give a brief description of the linear algebra exploited by the family of multivariate dimension-reduction methods that includes principal component analysis (PCA), log-ratio analysis (LRA), correspondence analysis (CA), and various forms of discriminant analysis. All of these methods are basic decompositions (or factorizations) of a target matrix into left- and right-vector matrices representing respectively the cases (rows) and the variables (columns) of the original data matrix.

What distinguishes the various methods is the form of normalization and differential weighting of points, chosen depending on the type of data, which is applied before the decomposition into left- and right singular vector matrices.

For ordinal survey data such as ours, CA is the appropriate method [3]. The normalization in CA is the matrix of standardized residuals \(T = D_{r}^{-1/2}(P-rc^{T})D_{c}^{-1/2}\), where P=N/n is the so-called correspondence matrix with N being the original data matrix and n its grand total. Row and column marginal totals of P are r and c respectively, and D _r and D _c are the diagonal matrices of these. Singular value decomposition (SVD) provides the appropriate factorization of T, with convenient properties, such that T=U Γ V ^T. The right singular vectors V, are the contribution coordinates of the variables. A further transformation involving a scaling factor D _q, such that \(F = D_{q}^{-1/2}U\Gamma \) defines the principal coordinates of the cases.

Visualization and interpretation

The biplot [2, 3] is a joint display of the two sets of points in V and F which, with the above transformation, can often be achieved on a common scale thereby avoiding the need for arbitrary independent scaling to make the biplot legible. The dimensions (Dim 1, Dim 2,... Dim N, i.e. the columns) of V and F are arranged in decreasing order of importance to the reduced-dimension solution. That means that a biplot of the first two dimensions is showing the most important relationships that can be represented on a planar 2-d plot. Cases are arranged in approximately Euclidean space so that proximity equates closely with similarity. Cases further from the origin have a stronger influence in the reduced-dimension solution; cases closer to the origin have a lesser influence. Variables are represented as vectors; the angular distance between vectors equates with correlation, and vector length again equates with relative contribution to the solution. One aspect of interpretation lies in identifying vectors with the same orientation, vectors at right angles, and vectors in opposing directions, and then identifying peripheral cases and how these are arranged with respect to the vectors. A second aspect of interpretation is to identify vectors aligned with the primary dominant dimension (Dim 1), and those aligned with the secondary dimension (Dim 2).

An important shortcoming of the biplot is that only two, or at most three, of the dimensions of the reduced-dimension solution can be visualised together. However, there may be more dimensions that are relevant for interpretation. The singular values of the SVD decomposition, in the description above, are the square roots of the eigenvalues; these indicate the relative importance of a dimension, i.e. its contribution to capturing information content in the analysis. When the eigenvalues are scaled so that their sum equals the number of variables, a conventional rule is to regard at least all those dimensions with eigenvalues>1 as relevant. This is the conventional Kaiser-Guttman stopping rule [4, 6]. Scree plots below show eigenvalues with the Kaiser-Guttman reference line.

Addressing this shortcoming, we develop additional analytic displays in the form of hierarchical dendrograms based on the cosine distances between vectors in any number of dimensions. The hierarchical ordering of nodes in the tree then indicates closeness of orientation in the n-dimensional equivalent biplot.

Robustness testing

The robustness of a particular solution to possible errors and variation in sampling can be assessed by the statistical technique of bootstrapping [1]. The technique involves resampling rows from the data table, with replacement, to construct a replicate dataset of the same size as the original. The analysis is then repeated on the replicate, and the whole process is repeated a large number of times (typically 1000).

Computation

All analysis was performed on a standard desktop computer using Matlab Version 9 (R2016a) and custom software we developed for CA and contribution biplots based on [3].

Results

We performed separate analyses on three partitions of the dataset in Table 1:

38-cases by 13-variables full dataset (NIST long-list).
38-cases by 5-variables essential characteristics (NIST short-list).
38-cases by 8-variables common characteristics (NIST residual-list).

We present results of these analyses in a series of biplots, scree plots and cluster trees as follows in Figs. 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10. See the figure captions for detailed descriptions and interpretation.

Summary of results

NIST long-list

Figure 1: broad and roughly even spread of cases and variables.
Figure 2: there are at least four relevant dimensions.
Figure 3: bootstrap values for all relationships are <50%, some ≪50%.

NIST short-list

Figure 4: well-clustered cases associated across an even spread of variables.
Figure 5: there are at least two relevant dimensions.
Figure 6: bootstrap values for two distal nodes >50%, one internal node ≈50%.
Figure 7: cluster tree of cases with broadly even bifurcations.

NIST residual-list

Figure 8: broad and roughly even spread of cases and variables.
Figure 9: there are at least three relevant dimensions.
Figure 10: bootstrap values for all relationships are <50%.

Discussion and conclusions

The aims of this study were to characterise the abstract landscape of cloud computing, determine if that characterization is robust, and then to learn something from the landscape to the benefit cloud participants and stakeholders. The characteristics of the NIST definition of cloud computing provided the framework for a quantitative survey of participants. Standard multivariate dimension reduction techniques provided the definition of the landscape which we visualise in biplots and cluster trees.

The robustness analysis of three partitions of the dataset (NIST long-list, short-list and residual-list) is decisive: any interpretations drawn from the cloud landscape depicted in the analyses of the long- and residual-lists are essentially meaningless if applied to the general cloud ecosystem that these sample projects were drawn from. Only the NIST short-list of 5 essential characteristics leads to a stable and robust depiction of the landscape and general interpretations should be made only against this depiction.

Figure 4 (biplot) and Fig. 7 (cluster tree) provide the interpretive mechanisms. Five groups of projects labelled I-V in Fig. 7 are a useful focus for discussion, though clusters at any level of the tree are equally meaningful.

Cluster I

23 MCN
21 LEADS
17 Embassy Cloud

These are the primary drivers of Broad Network Access, which is also one of the two dominant features of the landscape.

Cluster II

13 CloudTeams
26 OpenModeller
37 Varberg
36 Umea
22 Leicester
29 S-CASE
9 CloudCatalyst
5 BNCweb

The primary drivers of On Demand Self Service, with CloudTeams being the most significant to this axis, and Leicester being the least.

Cluster III

27 PaaSword
33 SUPERCLOUD
4 BigFoot

The primary drivers of Resource Pooling, with PaaSword being the most significant.

Cluster IV

35 U-QASAR
34 Texel
19 INPUT
15 COMPOSE
18 GEMMA
14 CloudWave
12 CloudSpaces
8 CELAR
38 WeNMR
10 CloudLightning
24 Mobizz
7 Catania Sci. Gateway
28 PANACEA
20 IOStack
11 CloudScale
3 BETaaS

The large central cluster not distinguished by a tendency towards any of the NIST characteristics over any other.

Cluster V

16 DICE
30 SeaClouds(1)
2 ASCETiC
32 STORM CLOUDS
6 Broker@Cloud
1 ARTIST
31 SeaClouds(2)
25 MODAClouds

This cluster subtends the two remaining characteristics, Measured Service and Rapid Elasticity, with DICE being the most significant and leaning towards Rapid Elasticity. As a final observation we note that although Measured Service is one of the dominant features of the landscape, no projects identify significantly with this feature.

Finally, we suggest that this interpretation of project clusters can provide the basis for enhanced shared learning among projects that are technically aligned on the axes of cloud characteristics, and that the further development of standards for cloud implementation and adoption can benefit from this depiction of the landscape and the association shown between projects and characteristics.

Responses to the methodology and future work

In this paper, we have presented a methodology for characterising the landscape of cloud computing based on the set of NIST defining features. The same methodology identifies the location of a project or cloud enterprise within the landscape. Taken together, the resulting biplot and the cluster tree offer rich interpretive tools in defining the current cloud computing landscape.

Presenting these results to partners in the supporting projects, and to the wider circle of EC-funded cloud related projects, participants accepted the methodology as sound and applicable, and they generally regarded the characterisation of the landscape as useful. However, several observations are noteworthy.

In draft form, SP 800-145 defines characteristics of cloud computing only, however, many more IT service characteristics are applicable that are not specific to cloud computing.
Several of the characteristics in the draft form subsume important aspects under a single term that is too general to be meaningful, such as Privacy and Data Protection being subsumed under the term Advanced Security.
Despite SP 800-145 having been published over five years ago in 2011, participants disagree on interpretation of certain characteristics. Measured Service, for example, is frequently misinterpreted as describing the monitoring of a cloud service system.

Several avenues of future work have emerged from this activity.

An automated web-hosted tool that allows the self-assessment of new cloud enterprises and shows their location within the existing landscape would be informative to new enterprises.
Exploring the impact of reducing the set of characteristics from 13 to five may be instructive. For instance, does the larger set define a much finer grained landscape than the reduced set, or is the information content of the reduced set sufficient?
Does expanding the set of characteristics to include more applicable IT service characteristics improve the resolution of the landscape, or does this perhaps significantly change the orientation of the existing characteristics?

References

Efron B, Tibshirani RJ (1994) An Introduction to the Bootstrap. Chapman and Hall/CRC, New York.
MATH Google Scholar
Greenacre M (2010) Biplots in Practice. BBVA Foundation, Madrid.
Google Scholar
Greenacre M (2013) Contribution biplots. J Comput Graph Stat22: 107–122.
Article MathSciNet Google Scholar
Jackson DA (1993) Stopping rules in principal components analysis: a comparison of heuristical and statistical approaches. Ecology74(8): 2204–2214.
Article Google Scholar
Mell P, Grance T (2011) The NIST definition of cloud computing. US National Institute of Standards and Technology. Available: doi:http://dx.doi.org/10.6028/NIST.SP.800-145.
Peres-Neto PR, Jackson DA, Somers KM (2005) How many principal components? Stopping rules for determining the number of non-trivial axes revisited. Comput Stat Data Anal49(4): 974–997.
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors thank two anonymous reviewers for helpful comments and suggestions.

Funding

This work was supported by the EC Framework 7 CloudWatch project (EC grant no. 610994).

Authors’ contributions

DW proposed a landscape segmentation analysis using the NIST model characteristics as variables for a quantitative study and drafted the descriptive definitions. MD wrote the first draft of the paper and interpreted appropriate segmentations of the cluster tree. NC devised the methodology, liaised with projects on data collection, implemented the software, performed all computation, and wrote subsequent drafts of the paper with significant contributions from the co-authors. All authors read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information

Authors and Affiliations

Oxford e-Research Centre, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
Neil Caithness, Michel Drescher & David Wallom

Authors

Neil Caithness
View author publications
You can also search for this author in PubMed Google Scholar
Michel Drescher
View author publications
You can also search for this author in PubMed Google Scholar
David Wallom
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Neil Caithness.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Caithness, N., Drescher, M. & Wallom, D. Can functional characteristics usefully define the cloud computing landscape and is the current reference model correct?. J Cloud Comp 6, 10 (2017). https://doi.org/10.1186/s13677-017-0084-1

Download citation

Received: 22 December 2016
Accepted: 23 May 2017
Published: 05 June 2017
DOI: https://doi.org/10.1186/s13677-017-0084-1

Can functional characteristics usefully define the cloud computing landscape and is the current reference model correct?

Abstract

Introduction

ISO published cloud standards

Defining a methodology

NIST defining characteristics of the cloud

Essential characteristics (1-5)

On-demand self-service

Broad network access

Resource pooling

Rapid elasticity

Measured service

Common characteristics (6-13)

Massive Scale

Homogeneity

Virtualisation

Low cost Software

Resilient computing

Geographic distribution

Service orientation

Advanced security

Data collection by quantitative self-assessment

Goals and techniques of analysis

Dimension reduction

Visualization and interpretation

Robustness testing

Computation

Results

Summary of results

NIST long-list

NIST short-list

NIST residual-list

Discussion and conclusions

Cluster I

Cluster II

Cluster III

Cluster IV

Cluster V

Responses to the methodology and future work

References

Acknowledgements

Funding

Authors’ contributions

Competing interests

Publisher’s Note

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords