A decision framework for placement of applications in clouds that minimizes their carbon footprint
© Makkes et al.; licensee Springer. 2013
Received: 14 July 2013
Accepted: 30 November 2013
Published: 17 December 2013
Cloud computing gives users much freedom on where they host their computation and storage. However the CO2 emission of a job depends on the location and the energy efficiency of the data centers where it is run. We developed a decision framework that determines to move computation with accompanying data from a local to a greener remote data center for lower CO2 emissions. The model underlying the framework accounts for the energy consumption at the local and remote sites, as well as of networks among them. We showed that the type of network connecting the two sites has a significant impact on the total CO2 emission. Furthermore, the task’s complexity is a factor in deciding when and where to move computation.
From a user’s perspective, reducing the environmental load of his computational tasks is equivalent to looking for a green data center, i.e a data center with a low power usage effectiveness (PUE). Many data centers advertise their greenness as an added value for customers. A recent study  shows that 71% of the data centers measure the PUE and that the mean value is about 1.8. Another survey for data centers in Europe  came up with a higher mean value of the PUE. Some large data centers claim to have a PUE approaching the theoretical value of 1. We argue that the PUE is not the only factor to consider: the energy sources powering a data center and the network used to move the data are also important, as they determine the amount of CO2 emitted for a given task.
We will present a framework that facilitates a user to decide where to perform a task, whether at a local data center or remotely at a clearer data center. The framework does not only take the CO2 emission of the data centers into account, but also estimates the CO2 emission of the transport network between them when input/output data accompanied the task. We can do this by exploiting the relation between energy produced in kWh and CO2 emission for different energy sources (see Equation 9). The CO2 emission of the network of a data center is a modest part of the total CO2 of the data center . However, deciding if offloading of an individual task to an optional cleaner data center is preferable, the contribution of the network (data center LAN and transport network) can be a substantial part of the decision. This means that if the decision framework introduced in this paper will be applied to all jobs of a data center, the total CO2 emission of both the data center and the optional cleaner data centers for offloading tasks will decrease.
The framework can make a prediction of the total CO2 emission for different scenarios, namely software interactive computation and hot or cold data storage. For each scenario we identify the equipment required in the local and the remote data center, e.g., for a computational task other equipment is used than for hot data storage. Subsequently we use models including the power consumption of the devices in use. In this paper we will focus on the computational scenario, but the interested reader can find some details of the storage scenario in [4, 5].
A common aspect of all scenarios considered is the amount of data involved. The input data determines the energy cost of the data transport part, first through the LAN of the local data center, then across the core network, Internet or light path, and finally through the LAN of the chosen remote data center. When output data plays a role we assume that the user is located near the local data center, so the energy cost associated with the output data is the energy cost for the local LAN versus the cost of the remote LAN and the transport network.
The equipment present in a data center, including the LAN devices, can be more realistically identified than the number of devices in a transport network to another data center. For the former we chose the same internal architecture for both data centers being compared; this allows us to purely focus on the sustainability of both. The latter instead depends not only on the type of network, Internet or light path, but also on the geographical location of both data centers. Therefore, our framework makes use of network models depending on the type of network and on the location of both endpoints to give an estimate of the minimal number of hops in the network. Furthermore, the geographical location of both endpoints determines the possible countries crossed by the shortest path transport network. These estimates make it possible to attach a CO2 emission to the transport network. Data on the energy types used by different European countries is available. If the transport network e.g., connects a data center in the Netherlands with one in Austria, a considerable part of the shortest path network will cross Germany. So the energy cost can be divided in three contributions, according to the distance spanned in each of the countries crossed. For each country we can calculate a mean CO2 emission based on the types of energy sources used in that country [6–11].
The rules applied to facilitate a user in his decision can also be applied by a scheduler of a data center. If a user can specify the complexity of his task, i.e., how computation time and or the amount of output data scale as a function of the input data, a scheduler can determine where to schedule the job such that the emission of the task in gr. CO2 is minimal. In that case the user need not to know about remote data centers and their PUE’s, because this knowledge resides in the scheduler’s database.
There are different aspects one can focus on in the optimization process of data center infrastructure costs. We chose to concentrate on CO2 emission costs, but there are other possible focus points such as economic costs, power utilization and infrastructure utilization. For each one of these costs there is ample existing research: namely for economic costs the work done by [12–14], for power utilization the work by [13, 14] and [12, 15] for infrastructure utilization.
Optimization of each of these aspects can lead to different outcomes. For example, a data center running more energy efficiently but supplied by energy produced from brown coal has a higher CO2 emission cost than a data center operating much less efficiently that is using hydro electric power.
In this paper we focus on CO2 emission costs. What for us is of interest is the ever-increasing effort in modeling the power consumption of networks and data center equipment. Understanding the power consumption in more detail of networks and computer equipment and their behavior under different conditions, gives the opportunity to better predict the impact of cloud computing and storage on the environment and to develop algorithms and strategies to reduce the carbon footprint. The way we predict the energy consumption of LAN’s and transport networks is based on the work of Baliga et al. .
We distinguish different kind of networks, LAN’s, Internet and light path, each with their specific type of equipment. Our novel contribution is that we integrate and extend different models into a single decision framework for greener computing. The models used can be easily enhanced, allowing the framework to evolve if one wishes. Our main impetus for the framework presented is that not only end users but also data centers’ operators and cloud service providers should think under what conditions it is better to host a job locally, or to host it elsewhere.
In the following sections we will focus on two different aspects that contribute to Equation 1: how efficient a data center uses its energy, and what are the different components used in the data center and the network.
How efficiently a data center uses its energy
In the calculation of the PUE of a data center all equipment that is not considered a computing device, like pumps, air conditioners, lighting, are part of P TOT only, whereas the power used by servers, storage equipment, network equipment are incorporated in both P IT and P TOT .
The different data center and network components used
Components of a data center LAN
LAN data center
Host (network interface)
where P host , P switch , P firewall , and P router are the power consumed by the host computer where the data resides, Ethernet switches, firewall, and data center gateway router, respectively. The capacities of the corresponding equipment and measured in bits per second are given by C host , C switch , C firewall , and C router .
Here, the factor U accounts for the utilization of the network equipment, expressing the fact network equipment typically does not operate at a full utilization while still consuming 100% of the power , a factor we took equals to 0.5.
where the factor 8 accounts for the translation of bytes into bits, as the terms P/C are measured in kW/Gb/s.
Power per capacity for the different components in our model
Power per capacity [kW/Gb/s]
Host data storage
DWDM terminal node
Values for the factor X used in our framework as function of the different energy sources (in decreasing value of X)
Gas works gas
Equations 13 and 14 are at the basis of our decision framework. They can be used in decision policies taken by a scheduler (section ‘Decision policies’) as well as in a web calculator available to end users (section ‘Web calculator’). A scheduler will take a decision on where to place computation based on these policies, and it will provide the user with detailed information on the CO2 emission cost of the chosen scenario. The complexity of tasks, i.e., how the computation time scales with the input data and how the output data scales with the input data, is a factor included in the decision framework too.
where T processing , N in , N out are respectively the computation time in CPU core hours, the amount of input data and the amount of output data, both in GBytes. Furthermore , and E network are unit energy consumptions of the data center LANs and the connecting transport network, expressed in kWh/GByte. Values for , and reside in a knowledge base of the scheduler. The values  and (derived from Equation 6 with the adopted values for network equipment) are constants for any decision policy, whereas the value for E network depends on the type of network and on the number of different hops, Equations 7 and 8. In case both light path and Internet connections are possible the scheduler can try both transport networks and the number of hops for the connecting shortest path is retrieved from the knowledge base too. For reasons of simplicity we take equals to and equals to . In an implementation of a scheduler, the scheduler will have knowledge of its own data center and all values concerning a remote data center will be retrieved by issuing a proposal to the scheduler of the remote data center. In that case, values for local and remote equipment maybe different.
In Figure 8 we see also values associated to the energy production in the country of the data center. Models used are not discussed in this paper, but can be retrieved from a report . The contribution of the LAN of the local data center and of the network, occurring on the right hand side of Equation 15, due to the transport of the input data, turn out to be a considerable part of the total energy consumption. This contribution will be even higher if an Internet connection was chosen, that due to the relative high power consumption of the routers in the network path. If the user knows how the computation and its output data scale with the amount of input data, Equation 15 can be applied on a range of input data to see how the cost of the different components scale.
Data ranges and complexities
We introduce the complexity of a task where both the computation time and the output data scale with the input data, and define T processing = f(x) and N out = g(x) with x = N in . For a task with processing time and output data both scaling linearly with the input data, O(x), we have f(x) = f1 · x + f0 and g(x) = g1 · x + g0. For a task exhibiting a processing time scaling quadratically, O(x2), and output scaling linearly, O(x), we have f(x) = f2 · x2 + f1 · x + f0 and g(x) = g1 · x + g0. In case the amount of input data x is specified or expressed as a range, i.e., x∈ [ X0, X1], X0 > 0, and the complexity of the job is specified, i.e., f(x) and g(x) are specified, Equation 15 will decide whether local or remote processing is preferable for each x∈ [ X0, X1]. With these definitions we can facilitate a user or the operators of a data center in their choices of task placement with more flexible parameters. The framework has a web calculator which allows data ranges as input for the amount of input data of a task and complexity formulas for the CPU processing time and the amount of output data as a function of the input data.
For quadratic behavior of the computation time it turns out that it becomes profitable to do the computation at a cleaner remote data center for even modest complexity values. This is due to the fact that the power consumption of computation nodes is relatively high. We saw that there is a difference if one compares Internet with dedicated light path connections due to the power consumption of routers in the former. This becomes clear if we transform Equation 15 into a decision boundary, i.e. substituting an equal sign for the greater sign in the formula.
As we had foreseen in the Introduction the PUE of two data centers, and even their power sources, cannot be the only guiding criteria in choosing the location of a computation or of data storage task. In case the transport network between them is powered by dirtier energy than both data centers are powered with, the contribution of the network to the total cost in gr. CO2 for moving data can be significant. This mostly is the case if the data traverses the Internet, due to the relatively high power consumption of routers. Light path connections are preferable over Internet connections, but light path connections are dedicated connections that require a more complex setup procedure and sometimes might not be available to a user. For large input data sets and linear behavior of the computation time on the input data, it might be better to do the calculation locally, if the connecting network is Internet. The same situation may be reversed in case the computation time shows a quadratic dependency on the input data. In that case the contribution of a dirty network becomes less prominent provided the data produced by the computation is limited and does not need to be transferred back to the user. Altogether this means that for realistic large processing, there is not one choice that can be made that is “always best” in terms of energy use and associated emissions.
Conclusions and future work
We have presented in this article a decision framework to allow users and data center operators to decide where to place an application in order to minimize the total CO2 emitted in the process. We have shown that, if one assumes that the two data centers being considered have the same architecture and internal structure but different PUE, the network connection between them can play a significant role for the final selection of the site in which to compute or store data. Our framework depends not only on the models for the networks, which can be enhanced if one wishes, but also depends on the contents of the knowledge base it can draw upon. In the work presented here we used the energy data published by the EU and data of some European continental data centers. There are improvements we intend to include in our framework in order to obtain even more realistic carbon footprint information. For data centers that are only reachable by crossing seas, the network model should be enhanced by models of sea cables. Another aspect connected with the network topology used in the models is the knowledge of the exact numbers of hops between two locations. For this, we would like to use a detailed map of the networks for different countries. Our first step in this direction will be to fill the knowledge base with detailed information of the transport topologies used between higher education and research data centers in the Netherlands, which are connected by the Surfnet network .
Marc X. Makkes currently pursues a PhD degree at the University of Amsterdam and works as a researcher at TNO. His research interest include distributed computing, control and information theory. Arie Taal received his Ph.D. in nuclear science at the University of Delft in 1989. Currently, he is a part-time researcher at the University of Amsterdam in the field of network engineering. Anwar Osseyran is CEO with many years of multidisciplinary management experience in various areas of ICT including manufacturing, information management, health informatics, high performance computing, industrial automation and computer hardware and software. His current position (since 2001) is the MD of SURFsara in Amsterdam. Dr. Paola Grosso is assistant professor in the SNE group leading the activities in the field of optical networking, distributed infrastructure information modeling and GreenIT activities. She is PI in the UVA activities in GigaPort Research on Network projects to develop network models and control plane for lambda networks and topology handling. Her research interests are smart and sustainable cyber infrastructures. Therefore she investigates green ICT, provisioning and design of hybrid networks for lambda services and development of information models for hybrid multi-domain multi-layer networks. She participated in the EU NOVI project leading the information modeling workpackage and co-chaired the NML-WG (Network Markup Language Working Group). She is currently involved in the GreenClouds and Green Software projects. http://www.science.uva.nl/~grosso.
Cooling load factor
Power load factor
Power usage effectiveness
Uniterupteble power supply
Power distribution unit
Kilo watt hour.
This research was in its initial phase sponsored by SURF and AgentschapNL during the Bits-Nets-Energy project. Further funding has been provided by the Dutch national research program COMMIT and by SURFnet via its GIGAPORT project.
- Stansberry M, Kundritzki J: Uptime institute 2012 data center industry survey. 2012.http://uptimeinstitute.com [online] . Accessed 13 Dec 2013Google Scholar
- [online] .http://www.thegreenitreview.com/2013/05/european-data-centres-are-less-energy.html [online] . Accessed 13 Dec 2013
- Brown DJ, Reams C: Toward energy-efficient computing. Commun ACM 2010, 53(3):50–58. 10.1145/1666420.1666438View ArticleGoogle Scholar
- Taal A, Grosso P, Bomhof F: Transporting bits or transporting energy: does it matter? 2013.http://www.surf.nl/en/knowledge-and-innovation/knowledge-base/2013/research-report-transporting-bits-or-transporting-energy-does-it-matter.html [online] . Accessed 13 Dec 2013Google Scholar
- Taal A, Drupsteen D, Makkes M, Grosso P: Storage to Energy: modeling the carbon emission of storage task offloading between data centers. In IEEE Consumer Communications and Networking Conference (CCNC). Las Vegas: IEEE; 2014.Google Scholar
- IEA: CO2 emissions from fuel combustion–highlights.. Paris; 2011. [online] .http://www.iea.org/co2highlights/co2highlights.pdf [online] . Accessed 13 Dec 2013Google Scholar
- [online] .http://en.wikipedia.org/wiki/Emission_intensity [online] . Accessed 13 Dec 2013
- Sovacool BK: Valuing the greenhouse gas emissions from nuclear power: a critical survey. Energy Policy 2008, 36(8):2950–2963. 10.1016/j.enpol.2008.04.017View ArticleGoogle Scholar
- [online] .http://ec.europa.eu/energy/energy_policy/doc/factsheets/mix/mix_nl_en.pdf [online] . Accessed 13 Dec 2013
- [online] .http://ec.europa.eu/energy/energy_policy/doc/factsheets/mix/mix_de_en.pdf [online] . Accessed 13 Dec 2013
- [online] .http://ec.europa.eu/energy/energy_policy/doc/factsheets/mix/mix_at_en.pdf [online] . Accessed 13 Dec 2013
- Rogers O, Cliff D: Options, forwards and provision-point contracts in improving cloud infrastructure utilisation. J Cloud Comput 2012, 1: 1–22.View ArticleGoogle Scholar
- Qureshi A, Weber R, Balakrishnan H, Guttag J, Maggs B: Cutting the electric bill for internet-scale systems. ACM SIGCOMM Comput Commun Rev 2009, 39(4):123–134. 10.1145/1594977.1592584View ArticleGoogle Scholar
- Rao L, Liu X, Xie L, Liu W: Minimizing electricity cost: optimization of distributed internet data centers in a multi-electricity-market environment. In INFOCOM, 2010 Proceedings IEEE. San Diego: IEEE; 2010:1–9.View ArticleGoogle Scholar
- Kant K: Data center evolution: a tutorial on state of the art, issues, and challenges. Comput Netw 2009, 53(17):2939–2965. 10.1016/j.comnet.2009.10.004View ArticleGoogle Scholar
- Baliga J, Ayre R, Hinton K, Tucker RS: Green cloud computing: Balancing energy in processing, storage, and transport. Proc IEEE 2011, 99: 149–167.View ArticleGoogle Scholar
- Tucker RS: Optical Packet-Switched WDM Networks–A Cost and Energy Perspective. In Optical Fiber Communication Conference, Optical Society of America. San Diego: IEEE; 2008:243–252.Google Scholar
- Baliga J, Ayre R, Hinton K, Tucker R: Energy consumption in wired and wireless access networks. Commun Mag IEEE 2011, 49(6):70–77.View ArticleGoogle Scholar
- A calculator for a road to cleaner computing [Online] .http://sne.science.uva.nl/bits2energy/index.html [Online] . Accessed 13 Dec 2013
- [online] .http://www.surfnet.nl/en/Hybride_netwerk/SURFinternet/Pages/kaart.aspx#topologie [online] . Accessed 13 Dec 2013
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.