Generally, a task for IoT applications is composed of several inter-dependent subtasks. The dependencies are determined by the context awareness and hardware requirements of the components to support proper operation of the applications. To make offloading decisions on decision engine, the inter-dependent subtask model should be partitioned into multiple sub-graphs of disjoint modules, which are then offloaded to appropriate servers based on cost model. In this section, we will have an overview on these two models comprising application models, which are used as input of decision engine.
Inter-dependent subtask model
The first and essential step in modelling the application is to model inter-dependent subtasks, called inter-dependence subtask models. At present, Directed Acyclic Graph (DAG), which uses the vertices to represent sub-tasks and edges to represent relation, is commonly used to model dependence relation of subtasks.
For a task to be run in IoT systems, the dependence relation among subtasks such as execution order, data flow, and control flow at different granularity levels can be obtained statically or dynamically based on task parameters or context-awareness requirements. For statically modelling, the dependence relation is depicted before tasks are actually executed. For dynamically modelling, the dependence relation is determined during task running under control of the decision engine of the task partitioning and offloading framework. No matter which type of modelling, the dependence relation is mostly modelled as DAG. Depending on the computational scale of tasks, dependence relation models can have a large number of vertices with interdependencies of high complexity compared to simple class diagrams.
Niu et al. [32] constructed weighted object relationship graphs (WORGs) for mobile applications. The object and object-to-object dependence were obtained by analyzing the bytecode of tasks, and the object relationship graph (ORG) was constructed by traversing the application based on method calls, as shown in Fig. 8(a), where nodes represent objects and edges represent dependence between objects. Then, the execution time of each object and the transmission data of each method call were calculated as the weights of nodes and edges, respectively. The resulting weighted object relationship graphs (WORGs) were constructed as shown in Fig. 8(b).
Cost model
Different types of applications, such as data-oriented applications, code-oriented applications, and continuous execution applications, etc. (Munoz et al. [33]), have different requirements for offloading computing tasks to remote servers. The application is modelled by taking into consideration the requirements of users, network environment, and service providers, finding the corresponding optimization goals and constraints to establish a cost model. Several cost factors are usually considered in cost modelling, including latency, energy consumption, quality of service, quality of experience and economic cost.
Latency
Latency includes latencies in computing, transmission, and queuing. It is one of the essential metrics for evaluating the performance of mobile devices. This is especially true for latency-sensitive applications, like emergency event processing in industrial internet of things. The servers in traditional cloud computing centers have high computing power to handle complex workloads (e.g., image processing (Kumar et al. [34]), speech recognition (Muhammad et al. [35]), etc.). However, sending all subtasks to clouds can make both computing resource and network bandwidth congested, leading to higher queuing latency and then total latency. To reduce the total latency, the best approach is to offload workloads that do not need to be executed locally on mobile devices to the edge servers with sufficient computing power, which are closer to the local device than the remote cloud.
Theoretically, if all the subtasks can be processed at edge servers, the latency can be significantly reduced compared to sending them to cloud servers for execution (Shi et al. [36]), because the transmission time between the edge and the cloud is saved. However, a task arriving at a resource-constrained edge node is not likely to be executed immediately, and there is often a waiting time for execution. Therefore, the geographically closest infrastructure is not always a good choice. By taking a queuing model based on the priority of tasks in calculating the delay, it is a practical problem to dispatch subtasks to the proper layers (i.e. local, edge and cloud), allocate bandwidth for mobile end devices, and find a suitable transmission rate. To solve the problem, the real-time network state (e.g., transmission bandwidth, etc.), computing power and bandwidth usage information of edge and cloud layers must be considered to avoid unnecessary contention and delays.
Energy
Lifetime of mobile devices is considered as one of the indispensable factors affecting the working periods of IoT applications. Although some progress has been made in battery-related technology research, there is still a gap between the rapid growth of power consumption and the battery capacity of current mobile devices. Therefore, another critical optimization goal of partition and offload decision-making is how to extend the battery life of mobile devices by reducing energy consumption. There have been many works that discussed the energy consumption optimization problem from different perspectives. For example, Zhao et al. [37] proposed an approximate algorithm of computation offloading, which was shown saving nearly 82.7% of energy compared to executing it entirely on a mobile device.
Many related works have developed power/energy models for various components on mobile terminal devices, including central processing units (CPUs) as well as cellular and WiFi communications. Recently most of the researchers used specific hardware (external devices or components that attached with the mobile device) to obtain working voltage and current of devices, and then the power models can be obtained. However, this measurement based approach can only obtain power model of the whole device, but not that of each hardware component of the device (e.g., CPU, GPU, network interface, etc.). This limitation causes challenges in analyzing the energy consumption of different types of applications on mobile devices.
Quality of Service (QoS)
Due to the increasing popularity of video streaming services (e.g., YouTube and Netflix, etc.) and the exponential growth of active users, more research works use QoS as a modelling and optimization goal of many partitioning and offloading approaches (Rausch et al. [38]; Song et al. [39]). However, it is challenging to achieve expected QoS due to many factors, such as different client devices/request patterns, changing media content, and network environments. Generally speaking, the following factors should be taken into consideration to optimize QoS of network systems. First, the variability of network resources, the unstable nature of wireless channels, and the characteristics of fixed/mobile networks in heterogeneous environments. Second, the emergence of new services (e.g., video games and virtual/augmented reality (VR/AR)), the diversity of usage environments, users’ expectations, and the optimization of operational costs for mobile and service providers. Third, the resource constraints of edge servers, various measurement and evaluation methods for QoS management when allocating resources among users with other users’ quality of experience (QoE) preferences.
QoS is sometimes directly related to the response time of an application (amount to the latency metric), which can be taken as a constraint to achieve a specific QoS. Particularly, QoS improvement just requires keeping the response time within a threshold, rather than minimizing that time. Aazam et al. [40] proposed a resource estimation approach based on the QoE history of the cloud service customer (CSC) while enhancing the QoS. Mahmud et al. [41] proposed a delay-aware application module management policy that considers various aspects of distributed applications in a decentralized and coordinated environment inducing latency, so as to ensure QoS while meeting deadlines for all types of applications and maximizing utility of fog computing resources.
Quality of Experience (QoE)
With the proliferation of IoT and the ensuing computing-intensive tasks, the surging need for computing offloading, and how to optimize offloading for the optimal user quality of experience (QoE) is a fundamental question. Despite that it is important to consider the characteristics of the task and make optimal offloading decisions based on the QoE requirements of each end-user. Particularly, different types of end-users pursue different QoE performance. For example, cloud gaming inclines to have lower execution time during computational offloading, while the unmanned aerial vehicles (UAVs) prefer computationally offloading with low energy consumption to prolong battery life. On the basis of comprehensively considering the different QoE performance requirements of IoT users for task execution time, task processing energy consumption and computing cost, Luo et al. [42] proposed a QoE-driven adaptive computing offloading (QEACO) strategy based on theoretical performance analysis. Since then, each IoT user can optimize and adapt to their best QoE to make offloading decisions.
In the business scenario of IoT, in addition to providing low latency and low energy consumption, the cloud-edge collaborative computing module also needs to provide suitable caching, quick communication and vast amounts of computing power (Huang et al. [43]). QoE can be regarded as the most direct experience in service interaction, especially in Internet of Vehicles (IoV). He et al. [44] studied the problem of QoE-based edge task offloading in IoV, and proposed an improved deep reinforcement learning (DRL) algorithm named PS-DDPG, in which a QoE model was designed by taking into account the limited vehicle cache and the unpredictable communication path caused by the diversification of transmission information.
Economic cost
The economic cost is an essential factor when selecting computing resources for device users and when service providers offer solutions. The reasonable pricing of services and the optimization of economic cost are essential issues to be addressed in task partition and offloading. From the perspective of service providers such as YouTube and Amazon, cloud-side collaborative computing provides lower latency and energy consumption, potentially increasing throughput and improving user experience (Shi et al. [36]). As a result, they can earn more benefits by handling the same volume of computing or storage. The service provider’s investment is mainly spent in building and maintaining each tier of resources. To fully utilize the resources of each layer, the provider can charge the users based on the data location and expected resource utility. Therefore, how to build a reasonable economic cost model based on the characteristics of cloud-edge collaborative computing to ensure the profitability of service providers and the acceptability of users is the focus of current research work.
Integrated model
Due to diverse needs of users and service providers in fact, the joint optimization of multiple costs needs to be considered in task partitioning and offloading solutions. However, multiple costs are usually in conflict with each other, and it is difficult to make them optimal simultaneously. For example, to reduce energy consumption of the whole system (including mobile devices, edge servers and cloud servers), it may induce increasing response time of applications. A compromise between these costs is needed to obtain optimal partitioning and offloading decisions. In general, there are several ways to build a cost model for joint optimization of multiple costs: 1) Converting multiple costs into one cost. For example, a square-weighted addition approach assigns each cost its weight to approximate the optimal solution as closely as possible; 2) Selecting the most important cost as the optimization objective and the rest as constraints; 3) Obtaining the Pareto-optimal solutions (Lin et al. [45]) of each cost for the mobile device users or service providers according to actual situations.
Let the first approach be used as an example for cost modelling. Assume that the application had n modules, for a given module i, the size of data transmission was \({{t}_{i}}\), the memory cost was \({{m}_{i}}\), and the code size was \({{c}_{i}}\). Also, for each module, a variable \({{x}_{i}}\in \left\{ 0,1 \right\}\) was introduced to indicate whether module i was executed locally \(\left( {{x}_{i}}=0 \right)\) or remotely \(\left( {{x}_{i}}=1 \right)\). The cost model was thus expressed as follows:
$$\begin{aligned} Min\ \left( {{w}_{t}}\cdot {{C}_{t}}+{{w}_{m}}\cdot {{C}_{m}}+{{w}_{c}}\cdot {{C}_{\text {c}}} \right) \end{aligned}$$
(1)
$$\begin{aligned} \mathrm {s.t.} \begin{array}{l}{{{C}_{t}}=\sum \limits _{i=1}^{n}{{c}_{i}}\cdot {{x}_{i}}+\sum \limits _{i=1}^{n}{\sum \limits _{j=1}^{n}{{t}_{j}}}\cdot \left( {{x}_{j}}\oplus {{x}_{i}} \right) }\end{array} \end{aligned}$$
(2)
$$\begin{aligned} {{C}_{m}}=\sum \limits _{i=1}^{n}{{m}_{i}}\cdot \left( 1-{{x}_{i}} \right) \end{aligned}$$
(3)
$$\begin{aligned} {{C}_{c}}=\sum \limits _{i=1}^{n}{\alpha }\cdot {{c}_{i}}\cdot \left( 1-{{x}_{i}} \right) \end{aligned}$$
(4)
In Eq. (4), \(\alpha\) is the conversion factor that maps the code size to CPU instructions. As shown in Eq. (1), the cost model combines three costs, namely transmission cost \({{C}_{t}}\), local device memory cost \({{C}_{m}}\), and local device CPU occupation cost \({{C}_{c}}\), into one, where \({{w}_{t}}\), \({{w}_{m}}\), \({{w}_{c}}\) are the weights of each component that determine priorities of each optimization goal.
Let the second approach be used as an example for cost modeling. Assuming that the goal is to minimize latency and energy consumption of local devices, let \({\text {E}}^{tot}\) denote the total energy consumption of mobile devices, while \({{E}^{loc}}\) and \({{E}^{com}}\) denote the energy consumption of local execution and the energy consumption of migrating data between local devices and edge servers respectively. Thus, the total consumed energy \({{E}^{tot}}\) can be expressed as:
$$\begin{aligned} {{E}^{tot}}={{E}^{loc}}+{{E}^{com}} \end{aligned}$$
(5)
Then assume that \({{T}^{tot}}\) denotes the execution time of application from start to end, \({{T}^{loc}}\), \({{T}^{edge}}\) and \({{T}^{com}}\) denote the time needed for a part of the task to be executed locally, the time needed for a part of the task to be executed at the edge node and the communication time needed for data migration between the local device and the edge server respectively. So \({{T}^{tot}}\) can be expressed as:
$$\begin{aligned} {{T}^{tot}}={{T}^{loc}}+{{T}^{edge}}+{{T}^{com}} \end{aligned}$$
(6)
Next, we choose energy consumption minimization as the final optimization objective while limiting the total execution time:
$$\begin{aligned} {Min\ {{E}^{tot}}}\\ {\text {s.t. }{{T}^{tot}}< D},\nonumber \end{aligned}$$
(7)
where D denotes the deadline to complete execution of the application. The values of \({{E}^{loc}}\) and \({{T}^{loc}}\) depend on the workload of the application running locally and the computing capability of the local device. The values of \({{E}^{com}}\) and \({{T}^{com}}\) are mainly affected by the variation of network bandwidth. \({{T}^{edge}}\) is determined by the workload of the application offloaded to the edge and the computing capability of the edge servers.