With the consultation and initial design completed the next phase of the project involved the development of a Cloud Computing prototype. The decision was made to develop our prototype based on an existing Cloud Computing system, to this end it was decided to make use of CometCloud [15]. This section will firstly describe CometCloud, its features and why it was selected for use in this project. We will then outline the architecture of our Cloud Computing prototype named “CloudBIM”.
CometCloud
The CometCloud system was utilised for this project due to its successful deployment in other data sharing scenarios within the computational finance area [15]. CometCloud uses a Linda-like tuple space referred to as “CometSpace" – which is implemented using a Peer-2-Peer overlay network. In this way, a virtual shared space for storing data can be implemented by aggregating the capability of a number of distributed storage and compute resources. CometCloud therefore provides a scalable backend deployment platform that can combine resources across a number of different providers dynamically – a key requirement for a project in the AEC sector.
The overarching goal of CometCloud is to realize a virtual computational cloud with resizable computing capability, which integrates local computational environments and public cloud services on-demand, and provides abstractions and mechanisms to support a range of programming paradigms and application requirements. Specifically, CometCloud enables policy-based autonomic cloudbridging and cloudbursting. Autonomic cloudbridging enables on-the-fly integration of local computational environments (datacenters, Grids) and public cloud services (such as Amazon EC2 and Eucalyptus), and autonomic cloudbursting enables dynamic application scale-out to address dynamic workloads and spikes in demand. Cloudbridging is useful when specialist capability available in-house needs to be integrated with high throughput computation that can be outsourced to an external cloud provider such as Amazon. Cloudbursting, on the other hand, enables scale-out of in-house computation and may not necessarily involve a change in capability between in-house and outsourced providers.
CometCloud is based on a decentralized coordination substrate, and supports highly heterogeneous and dynamic cloud/Grid infrastructures, integration of public/private clouds and cloudbursts. The coordination substrate (based on a distributed Linda-based model) is also used to support a decentralized and scalable task space that coordinates the scheduling of tasks, submitted by a dynamic set of users, onto sets of dynamically provisioned workers on available private and/or public cloud resources based on their Quality of Service (QoS) constraints such as cost or performance. These QoS constraints along with policies, performance history and the state of resources are used to determine the appropriate size and mix of the public and private clouds that should be allocated to a specific application request. Additional details about CometCloud can be found at [3].
In this way, CometCloud differs from other Cloud computing environments currently available – as the focus in this system is specifically on bridging different distributed environment through the distributed tuple space implementation. Figure 5 illustrates the architecture of the CometCloud system – which consists of an: (i) infrastructure layer – enabling various data access and management capability to be supported (such as replication, routing, etc); (ii) a service layer – to enable a number of common services to be supported on the infrastructure, such as pub/sub, content/resource discovery, etc; and (iii) a programming layer – which enables the other two layers to be accessed in a number of ways using various programming models (such as map/reduce, master/worker, bag-of-tasks, etc). In practice, an application may not use all of these capabilities, as in our scenario which makes use of the master/worker paradigm. More details about the architecture, it use and source code downloads can be found in [3, 16]. Various cloud bridging solutions are now available, such as IBM’s Cast Iron Cloud Integration [17], part of the Web Sphere suite of tools for developing and deploying applications across different environments. Cast Iron enables integration, through plug-ins, with a number of IBM products (such as DB2) and systems from other vendors, such as SAP and Salesforces CRM – thereby enabling integration between in-house systems and public & private Cloud environments. Many such systems remain proprietary to particular vendors however and are hard to customise to particular use scenarios.
As illustrated in Figure 5, at a lower level the CometCloud system is made up of a set of computational resources each running the CometCloud overlay. When the CloudBIM system is initialised a set number of workers are initially launched on these resources, but additional workers can be started as required. The communication between these nodes is all done via the CometCloud communication space represented as a set of Linda-like tuples [15] which are placed into CometSpace using one of three concepts:
-
1.
Adding a tuple - OUT;
-
2.
Removing a tuple - IN;
-
3.
Reading a tuple but not removing it - RD.
These nodes and their communication can be structured by CometCloud to enable support for multiple development models including: Map/Reduce, Master/ Worker and the implementation of workflows (as described above).
The CloudBIM prototype
The CloudBIM prototype was constructed using CometCloud’s Master/Worker programming model and consists of three main components: A client and a set masters and workers. The architecture of the CloudBIM prototype is shown in Figure 6.
The flexibility of utilising CometCloud allows these components to be deployed in multiple configurations such as those shown in Figures 7 and 8. Figure 7 shows a configuration where a master node is deployed within each organisation working on the project but workers nodes are deployed externally - on a third party cloud services provider such as Amazon or Azure. An alternative configuration is shown in Figure 8 where masters and workers are deployed within organisations in addition to some worker nodes deployed externally.
The following sections will describe the implementation of the three main components, Masters, Workers and the two clients that have been developed; a web based interface, and plug-in for Google Sketchup.
Implementation of master and worker nodes
Masters
The CloudBIM master nodes do not store any data (other than temporarily caching for performance). These master nodes act only as gateways to the CloudBIM system. They are responsible to generating XML tasks that are inserted into the CometCloud coordination space. These XML tasks essentially wrap the queries that have been provided by the user (via the client) along with data needed internally by the cloud system. The format of these tasks is shown below:
<CloudBIMTask> <TaskId> Unique ID of Query</TaskId>
<AuthToken> Authorisation Token</AuthToken>
<MasterName> Name of Master that is Origin of Query</MasterName>
<DuplicationCount> Number of data is to be duplicated</DuplicationCount>
<InternalFlag> Flags whether this is an Internal Task to be ignored by all master nodes</InternalFlag>
<Query>User Query</Query>
</CloudBIMTask>
Workers
Each of the workers within the CloudBIM system hold a portion of the governance model and a subset of all the actual artefact data within the BIM. This ensures that all data is replicated allowing resilience if individual workers go off-line. The workers, in addition to storing the data, are also responsible for validating each query they receive against the governance model to determine if they should execute the query, i.e. ensure that user A has authority to update artefact B before doing so.
The interaction between masters and workers is the key to how the CloudBIM system functions. This communication is done using CometCloud’s distributed coordination space. Masters place XML tasks into this space and these are then read by the workers. Use of this distributed communication space allows for a variety of communication patterns to be utilised depending on the type of task being executed. These tasks can be broken down into one of four types: (i) tasks that read data; (ii) tasks that add data and (iii) tasks that remove data.
Figure 9 describes how data is retrieved from the system. Firstly a task containing an appropriate query is placed into the communication space by the master node. Each worker will read this task (using the non destructive RD function) and will determine if they have the capability to fulfil the query. If they have this ability - and the permissions of the user identified by the token contained within the query matches those enforced by the governance model, then the data will be returned to the master node. While this process is undertaken, the master node will monitor the data that is returned to it and, once it has recieved all the replies (or a timeout is exceeded) it will remove the query tasks from the communication space.
Figure 10 shows the similar process undertaken for adding new data for the system. When this type of query is executed the system must ensure that the data is duplicated across the cloud. So, when the Master receives the data from the user it will cache the data, so no delays occur for the user while duplication takes place. The query is now inserted into the communication space. The first available worker will then remove the task, decrement the duplication count and then, as long as the duplication count is above zero, re-insert the task. On task re-insertion, the worker will then request the data from the Master. This process will then repeat until the duplication count reaches zero. As in the previous example, the authorization token is used to determine who can add data to the system.
The final scenario is where data is removed from the system. This process is similar to that outlined in Figure 9 - in this case a task is inserted into the communication space by the master and all worker nodes that are able to will remove the specified data (assuming the user requesting the deletion meets the requirements of the governance model). Each worker node will then send a confirmation to the master node which, once it has received all the acknowledgements (or a time-out has been exceed) will remove the task from the communication space.
Fault tolerance
The CloudBIM system also has mechanisms for fault tolerance and the ability to expand its pool of workers as required. This is an essential property to ensure availability of BIM data. As mentioned previously, the underlying CometCloud architecture consists of a pool of resources/machines running the CometCloud overlay. When the CloudBIM system is launched a set of workers, defined by IP addresses in a configuration file, are initialised using nodes from this pool. If a worker fails, the procedure outlined in Figure 11 is followed. When a query is issued, the Master node will count the number of workers that process the query and if a single worker repeatedly fails to respond within a certain time frame (the number of failures and the time-out value are configurable), then the worker is considered to have failed. While this is taking place user requests are still being processed because the BIM data will still be available from other workers in the system (due to data duplication). Only in the case of multiple simultaneous failures would users be unable to retrieve data. In cases where a worker (or set of workers) loses connection for a long period of time (a timeout value set by an administrator) the worker will be removed from the system.
Once a worker has permanently failed, it is removed from the current list of workers and a new worker is added from a pool of nodes that can be added to the cloud. This is done by communicating with the CometCloud overlay that will be running on the waiting node and instructing it to initialise itself as a CloudBIM worker. Once this is done, the CometCloud overlay must then be restarted to enable correct routing of messages to the new worker.
Finally, once the new worker has joined the communication space, synchronisation may be needed to ensure that there is sufficient duplication of BIM data. This entire process takes place transparently to the user and is done as follows:
●Each worker will send the new worker the IDs of the BIM artefacts that it holds (by placing an internal task into the communication space).
●The new worker will calculate which artefact Ids need additional duplication based on this data.
●The new worker will request the artefacts needed directly from the worker that holds them.
The same process is followed when a new worker needs to be added to the system from the pool to improve system throughput. This process is also followed when a worker that has been offline re-joins the system, this means that it can retrieve a new set of data from other workers in the CloudBIM system, removing the risk of any invalid (outdated or deleted) data from becoming available to users.
The key aspect of this fault tolerance process is that there are “spare” workers available for use in the pool. This can be ensured in one of several ways as shown in Figure 12:
●By supplying the system with a list of IP addresses of nodes that have CometCloud installed and can be utilised.
●Utilising third party cloud providers to spawn additional virtual machines based on a defined policy. Currently this has been implemented by Rutgers using Amazon EC2 [3].
Integrating cloud computing and google sketchup
Within the CloudBIM system the client is responsible for providing the interface between users and the local master node. This is done by providing a user interface, which converts users’ actions into a query which is then communicated to the master node in the form of a query language. We implemented two clients, a web based interface and a plug-in for Google Sketchup (A commonly used tool in the AEC industry). The Google Sketchup plugin is shown in Figure 13.
The decision to utilise a query language was to enable two possible usages of the system:
-
1.
As a capability that could be integrated into a custom user interface implemented for a specific project.
-
2.
As a capability integrated within existing software as a plug-in (such as existing CAD systems like Autodesk Revit [18] or Google sketchup [19]).
This allows third parties to leverage on the functionality provided by the CloudBIM system. An example of this would be a company that utilises their own proprietary software tools, this company could, using the CloudBIM query language, integrate their existing software tools with the CloudBIM system, possibly including development of a plug-in for their CAD software or integrating CloudBIM into an existing project management intranet system.
The prototype CloudBIM query language is specified below in EBNF (Extended Bachus Naur Form) notation.
CLOUDBIMQUERY=DOCUMENTUPPLOAD | DOCUMENTDOWNLOAD |GOVERNANCEQUERY GOVERNANCEQUERY = UPDATEQUERY | OTHER QUERYUPDATEQUERY = ’update’,’ ’,OBJECT,’ ’,FIELDLIST,’ ’,’set’,’ ’,FIELDLIST OTHERQUERY = (’get | ’add’ | ’delete’),’ ’,OBJECT,’ ’,FIELDLIST DOCUMENTDOWNLOAD=’fetchdoc’,’ ’,FIELDLIST,[’ ’,’all’]DOCUMENTUPLOAD = ’adddoc’,’ ’,FIELDLIST, [’ ’,’(’,RELATION,’,’,RELATION,’)’] OBJECT=’ProjectStage’|’GateRequirement’ |’DocumentSuitability’|’Discipline’ |’Operation’|’Role’|’Right’|’User’ |’Notification’|’Flag’|’NotificationType’ RELATION=RELTYPE,’ ’,<ID> RELTYPE=’ver’|’der’|’comp’|’conc’ FIELDLIST=FIELD,’,’,FIELD FIELD=OBJECT,’=’,FIELDVALUE FIELDVALUE=VALUE|(’(’,VALUE,’,’,VALUE,’)’)
For the sake of brevity the terms ID (a unique ID) and VALUE (a string) are not defined, also omitted are commands used to authenticate a user. The CloudBIM query language defines six key commands: get, add, delete, update, adddoc and fetchdoc. These commands allow the manipulation of objects within the governance model, however it should be noted that not all objects are able to be directly manipulated by users, some are created/updated as a side effect of other queries i.e. specifying the relationship of a new document will lead to the automatic creation of Relationship and Transaction objects as necessary by using data supplied in the adddoc command. The adddoc and fetchdoc commands separate the uploading and downloading of documents from the CloudBIM system from the manipulation of the objects within the governance model. Additionally, it is worth noting that the fetchdoc command can be used to return either all matching documents (not always desirable) or just the first match.
CloudBIM for data processing
Workers within the CloudBIM system may also be used to launch external simulations (in batch mode), the results of which are also stored as artefacts. Access to these artefacts is then based on our governance model. This process enables the integration of third-party executable software, in addition to static artefacts that have been pre-generated and stored. The use of workers for processing operates in a similar way to that outlined previously, firstly a task is placed into the communication space that describes: (i) the program to be executed; or (ii) the artefacts that are needed as input to the program. These tasks will then only be read by workers that possess the application that the task is requesting. Once a worker has read the task it will place new internal tasks into the communication space to request any data it does not hold. Once the data has been received, the task will execute and a reply will be sent giving the artefact ID of the output data.
When utilising workers for data processing there are two different modes of operation that are supported:
●Utilising the processing of the existing workers that are used for data storage.
●Utilising CometCloud’s CloudBursting capability to spawn workers solely for data processing.
The second type of execution we envisage as the most common mode of operation, especially in cases where the tasks being executed require either specialised software to be installed, or have large resource requirements. In these cases additional workers are spawned on a cloud service such as Amazon EC2, but, because they are temporary workers, are only permitted to access the communication space via a RequestHandler, this is shown in Figure 14. This restriction is imposed because we do not want external workers to process any data storage tasks as they are temporary workers with a lifespan of the length of a single computation task.
This ability to spawn extra “external” workers is highly useful and has the ability to be expanded to include a large number of common industry tasks:
●Energy Simulations
●Rendering of building models.
●Automatic Clash Detection.