 Research
 Open access
 Published:
Towards optimized scheduling and allocation of heterogeneous resource via graphenhanced EPSO algorithm
Journal of Cloud Computing volumeÂ 13, ArticleÂ number:Â 108 (2024)
Abstract
Efficient allocation of tasks and resources is crucial for the performance of heterogeneous cloud computing platforms. To achieve harmony between task completion time, device power consumption, and load balance, we propose a Graph neural networkenhanced Elite Particle Swarm Optimization (EPSO) model for collaborative scheduling, namely GraphEPSO. Specifically, we first construct a Directed Acyclic Graph (DAG) to model the complicated tasks, thereby using Graph Neural Network (GNN) to encode the information of task sets and heterogeneous resources. Then, we treat subtasks and independent tasks as basic task units while considering virtual or physical devices as resource units. Based on this, we exploit the performance adaptation principle and conditional probability to derive the solution space for resource allocation. Besides, we employ EPSO to consider multiple optimization objectives, providing finegrained perception and utilization of task and resource information. It also increases the diversity of particle swarms, allowing GraphEPSO to adaptively search for the global optimal solution with the highest probability. Experimental results demonstrate the superiority of our proposed GraphEPSO compared to several stateoftheart baseline methods on all evaluation metrics.
Introduction
Nowadays, highperformance computing cluster with cloud architecture plays a significant role in largescale scientific computation and batchprocessing tasks. The hardware infrastructure, user support platforms, and software services constitute the hierarchical structure of cloud computing [1]. It integrates resources by virtualizing and pooling techniques, achieving the stratification and abstraction from cloud clusters to physical nodes, and then to virtual machines [2]. One of the most significant parts of cloud computing is task scheduling, which aims to allocate tasks to appropriate computing resources and ensure an acceptable response time [3, 4]. This process can be regarded as a multiobjective combinatorial optimization problem involving various considerations: (1) From the user perspective, the primary requirement is to reduce the consumption of computing tasks (i.e., minimizing the time consumption and financial cost); (2) For computing centers, it is essential to enhance overall resource utilization and reduce power consumption [5, 6].
Task scheduling requires considering multiple influencing factors, including task characteristics [7], resource constraints, dependencies between tasks, and the matching degree between tasks and resources. Discord in any of these factors can lead to suboptimal results in task scheduling. However, the heterogeneity of computing resources and the everincreasing scale of computing tasks exacerbate the complexity of task scheduling [8,9,10]. Specifically, heterogeneous computing platforms host a variety of computing resources, which exhibit different performance metrics due to differences in their underlying hardware. Nevertheless, these hardwarelevel differences have been neglected by existing studies [11,12,13], which inevitably leads to low task efficiency in realworld operational environments. Besides, tasks on heterogeneous computing platforms consist of multiple interdependent subtasks that should be allocated with appropriate computing resources [14,15,16]. However, the complicated associations among subtasks are largely unexplored, affecting the performance of task scheduling. Besides, the search space expands exponentially when the platform scale and task number become enormous [17]. Finding a set of optimal solutions in such a vast solution space is challenging and timeconsuming. Hence, how to dynamically manage heterogeneous resources to optimize the response time and resource consumption for largescale computing tasks becomes emerging yet challenging.
To this end, we propose a heterogeneous resource optimization scheduling algorithm based on the integration of Graph Neural Networks (GNNs) and Elite Particle Swarm Optimization (EPSO). Specifically, we first construct a Directed Acyclic Graph (DAG) to capture the complicated associations among subtasks. Then, we utilize a Graph Neural Network (GNN) to encode task attributes and device status information. This process is guided by the hierarchical representation of tasks and the categorization of devices. Moreover, we further generate the solution space for cluster scheduling according to the adaptability of task attributes to resource performance and the conditional probability of currently used and idle resource states. Additionally, we employ EPSO to optimize the task scheduling, that is, to simultaneously minimize the time consumption and power consumption of tasks and to achieve load balancing for batch processing task queues. By embedding multidimensional node features in the EPSO objective function, GraphEPSO achieves the adaptive mapping of basic task units to specific virtual or physical devices. The main contributions of this paper can be summarized as:

We propose a graphenhanced EPSObased collaborative scheduling model called GraphEPSO to optimize task scheduling on heterogeneous computing clusters. Extensive experiments on six task scheduling datasets demonstrate the superiority of our proposed method in various metrics compared to several stateoftheart baselines.

By adopting the graph representation learning technique, we design a scalable task state encoding method, which captures the structure, type, and data size feature information of uncertain tasks.

We develop a multiobjective optimization model aimed at concurrently enhancing task execution efficiency, mitigating device power consumption, and optimizing system load distribution.

By embedding multidimensional features into the objective function of the EPSO algorithm, GraphEPSO can automatically capture the information of tasks and resources, thereby adaptively achieving the global optimal solution.
The rest of the paper is organized as follows. In Related workÂ section, we review and analyze the current research status of multiobjective optimization models and algorithms. Heterogeneous computing platform description and feature information encodingÂ section proposes encoding techniques for heterogeneous resource attributes and task information. Heterogeneous resource scheduling and multiobjective optimization modelÂ section develops a heterogeneous resource scheduling model and a multiobjective optimization model. A multiobjective optimization scheduling algorithm based on EPSOÂ section mainly introduces an EPSO scheduling algorithm based on inertia weight and elite particle perturbations. In Experiments and analysisÂ section, we conduct a series of experiments to test the time consumption, power consumption, and loads. Then, we provide a detailed analysis of the experimental results. ConclusionÂ section concludes the paper and presents future work.
Related work
Scholars have already proposed numerous effective algorithms and implementation techniques to address the largescale task scheduling problem in heterogeneous cloud computing platforms. This section mainly reviews the research and applications of multiobjective optimization model construction techniques, GNN information representation methods, and Particle Swarm Optimization (PSO) algorithms in task and resource scheduling.
Multi objective optimization scheduling for heterogeneous computing platforms
To ensure efficient task execution while reducing the power consumption of computing clusters, Liang et al. [18] established a power consumption model for clusters. By considering the scheduling, dynamic, static, and other power consumption aspects of the cluster, they obtained the relationship between the average task completion time, power consumption, and cost. Kaur et al. [19] constructed a multiobjective optimization model for cloud data center job scheduling and virtual machine configuration based on parameters such as Service Level Agreements (SLAs), energy costs, and carbon emissions. Subsequently, they employed an enhanced heuristic method, underpinned by a greedy strategy, to refine the solution of this model.
Kishor et al. [20] used a gametheoretic framework to formulate the biobjective optimization problem as a noncooperative load balancing game, achieving an appropriate tradeoff between the two conflicting objectives of load balancing. Guo et al. [21] selected the shortest task execution time, the lowest execution cost, and resource load balancing as the objectives for cloud computing task scheduling, and established a mathematical model to measure the effectiveness of multiobjective task scheduling. Sun et al. [22] described the task scheduling problem in heterogeneous computing environments as a Markov Decision Process (MDP) and designed tasktypeaware reward functions, achieving optimization for multitype tasks through multiobjective optimization and reward scaling. Zhang et al. [23] addressed the reliability issue of cloud computing systems by establishing an initial virtual machine faulttolerant placement model for cloud system startopology data centers based on five factors: SLA violation rate, resource availability, power consumption, failure rate, and faulttolerant cost. Then they proposed a heuristic ant colony optimization algorithm to solve the model.
GNN information representation method
GNNs are neural network models based on graph structure data, designed to extract and uncover features and patterns within graphbased data. Compared to traditional matrixbased models, GNNs are capable of capturing the relationships between nodes and the global structure of the graph, providing a strong representation of DAG task structure information and computing cluster resource information. Addressing the scheduling problem of largescale jobs on distributed computing clusters, Mao et al. [24] designed an extensible task scheduling method that can handle DAG tasks of any shape and size. This method converts DAG task information and executor state information into features to be fed into the policy network, achieving task relationship modeling and feature extraction. Ni et al. [25] used a graph encoder to encode the input flow graph, incorporating contextual information within the graph. They employed graph embedding to represent the structured information of the flow graph and used a graphaware decoder to capture the complex dependency relationships that affect resource allocation quality. Lin et al. [26] used GNNs to encode the node information and dependencies of DAG tasks into a set of node embedding vectors. Then they utilized a graph attention mechanism to weigh the nodes according to their importance. This enabled the GNN to allocate more attention to critical nodes in DAG applications, thereby enhancing the modelâ€™s information processing capabilities. Luo et al. [27] designed a graph neural network composed of node information embedding and fully connected feedforward networks, which uses GNNs to perceive the dependencies between tasks. By leveraging the connectivity of the DAG, they pooled information to each node, stacked the node embedding GNN layers together, enabling a node to integrate information from all reachable nodes, and used a policy network to select tasks for execution. Wang et al. [28] used Graph Convolutional Network (GCN) to encode the graph structure of tasks, representing information such as subtask types and execution order by extracting graphical features of the tasks. Song et al. [29] employed GCN to learn highlevel feature representations of each node in the original DAG, including execution cost, communication cost, outdegree, and indegree, and obtained scalable task state information within the system. They proposed a GCN based on bidirectional messaging that allows the network to learn both topdown and bottomup computational pattern metrics and implements bidirectional messaging from parent to child nodes.
Particle swarm multiobjective optimization scheduling algorithm
The PSO algorithm is a stochastic search algorithm designed by simulating the foraging behavior of bird swarms, which utilizes swarm intelligence to search for the optimal solution in the entire search space. Inspired by the PSO algorithm, Mansouri et al. [30] proposed a hybrid task scheduling strategy called FMPSO, which combines a fuzzy system with an improved PSO technique. They used four improved velocity update methods and a roulette selection technique to enhance the global search capability of particles, overcoming the local optimality and other drawbacks of the PSO algorithm. Bansal et al. [31] presented a multiobjective optimization scheduling framework based on a PSO scheduling model, which evaluates the budget cost using performance and budget constraints and provides feedback on the quality of task scheduling solutions. This approach can adjust the solution quality based on individual particle optimality and swarm particle optimality, solving the slow convergence problem of the PSO algorithm.
In addressing the scheduling challenges of largescale cloud workflows, Wang et al. [32] have proposed a dynamic swarm learningbased distributed particle swarm optimization algorithm (DPSO). The entire population is randomly partitioned into multiple groups, which collaboratively evolve using a masterslave multigroup distributed model. This method enhances the diversity of the population and employs a dynamic swarm learning strategy to dynamically change the group size, thereby controlling the learning intensity and balancing the diversity and convergence of DPSO. Tang et al. [33] formalized the cloud task scheduling problem with budget constraints and proposed a Random Matrix PSO Scheduling Algorithm (RMPSO), which uses random integer matrices to represent particle positions and feasible task scheduling solutions, aiming to achieve the optimal total cost of cloud services. They also designed a multicore parallel RMPSO algorithm to reduce the time complexity of policy execution. Miao et al. [34] proposed an Adaptive Particle Individual Best Position (AP) selection method, which uses nondominant particles to update the individual best position of a given particle. They introduced a new function to measure the gap between the individual best position or global best position of a given particle and its current position. The velocity vector is updated based on the calculated gap using a "roulette wheel" selection process. Li et al. [35] have infused local and global guiding information into the particle swarmâ€™s update process, pioneering a particle swarm cooperation technique. This refinement has significantly enhanced the global search and convergence capabilities of the particle swarm optimization algorithm, reinforcing its efficacy in both converging to optimal solutions and conducting comprehensive global searches.
Heterogeneous computing platform description and feature information encoding
This section describes the environment of the heterogeneous computing cluster, providing a detailed classification of resources and tasks. Subsequently, an encoding mechanism for resource and task information is designed based on Graph Neural Network (GNN) technology. The important symbols and explanations used in this paper are shown in TableÂ 1.
Heterogeneous platform resources and task descriptions
A heterogeneous computing platform consists of multiple servers of different types, each housing various types of computing, storage, and network resources. The server set is denoted as \(S = \left\{s_1,s_2,\cdots ,s_{S} \right\}\), where S signifies the total number of servers. The computing device type set is defined as \(RT = \left\{C_1,C_2,\cdots ,C_m,\cdots ,G_1,G_2,\cdots ,G_n,\cdots \right\}\), with \(C_m\) representing CPU models and \(G_n\) representing GPU models. From the composition of the heterogeneous platform and the categorization of resources by type and performance, a performance metric set for platform resources is derived and formally expressed as \(\Omega _1 = \left\{S_s,RT_s^k,SP_s^k,BW_s,MC_s,DC_s\right\}\). In this context, \(S_s\) refers to server s in the platform; \(RT_s^k\) indicates the type of computing device on \(S_s\), where k is the device index; \(SP_s^k\) denotes the computing speed per unit time of the kth device on \(S_s\); \(BW_s\) represents the bandwidth of \(S_s\), while \(MC_s\) indicates the memory capacity of \(S_s\), and \(DC_s\) represents the disk capacity of \(S_s\).
In task and resource scheduling, the basic resource unit is a computing device with specific types and performance capabilities. A server is capable of housing numerous diverse computing devices, each with the potential to execute multiple tasks concurrently. The type of resources required by the task must match the type of computing devices available, and the serverâ€™s memory and disk capacity must meet the minimum requirements for task execution. In batch processing of tasks, the resource state is dynamically changing, and devices of the same type but different models have varying power consumption and computing capabilities.
Tasks received by heterogeneous computing systems can be categorized into two groups: complex tasks and independent tasks. Complex tasks are capable of being divided into subtasks, with each subtask demanding different types of resources and performance. A DAG can be used to describe the constraint relationships between subtasks. Independent tasks, on the other hand, consist of only a single subtask and require only a single type of resource.
The process of resource allocation primarily encompasses two steps: The first step is to clearly resource requirements of the tasks and their execution priorities, while the second step involves assigning the appropriate resources to the tasks. When executing resource scheduling, we initially convert the attribute information of the tasks into representations. Subsequently, based on the specific needs of the tasks, we allocate resources to establish a mapping between tasks and resources.
Heterogeneous resource information encoding
In the unified management and allocation of heterogeneous devices, it is essential to encode the attribute features of each independent device unit. Device state information is encoded using a GNN, which is utilized to abstract the device information within the system. The GNN comprises a single embedding layer that performs information aggregation through graph convolutions. The goal is to abstract all resource attributes of a server, such as \(S_s\),\(RT_s^k\),\(P_s^k\),\(BW_s\),\(MC_s\),\(DC_s\), into feature vectors, which serve as input to the scheduling decisionmaking module. The network structure of the device encoding module is depicted in Fig. 1.
In Fig. 1, the attribute information of different resources first enters the embedding layer and is transformed into their respective embedding vectors. Subsequently, these embedding vectors are fused by a nonlinear transformation function to generate a serverlevel representation of the resource information. The calculation formula is show in Eq. 1
where \(\widetilde{f}(\cdot )\) is a nonlinear transformation function, \(EM_r\) represents the embedding vector of resource r, \(\tilde{S} = \left\{CPU_s^1,\ldots ,CPU_s^n,GPU_s^1,\ldots ,GPU_s^m,P_s^k,BW_s,M_s,DC_s \right\}\) represent the resource set of server \(S_s\).
When encoding heterogeneous resources, the same encoding method is applied to devices of different types and performance levels. Task units are capable of identifying the required devices based on the type and performance of the resources needed for execution.
Task information encoding
In heterogeneous platforms, there is a significant variation in the device performance requirements for different computing tasks. Serial and parallel computing patterns are widely present in these tasks, leading to complex algorithmic processes. The platform system must accommodate the diverse needs of multiple tasks, which often exhibit significant differences in type and data volume. Furthermore, the divisibility of tasks exacerbates the heterogeneity among tasks, thereby increasing the complexity of task scheduling.
In the batch processing mode, the task scheduling algorithm is executed when a batch of tasks arrives at the computing platform. The task set is defined as \(V = \left\{v_1,v_2,\cdots ,v_{V}\right\}\), where V indicates the number of tasks. If a task \(v_h\) is divided into \(n_h\) subtasks, it can be represented as \(v_h = \left\{g_1^h,g_2^h,\cdots ,g_{n_h}^h\right\}\). Each subtask \(g_i^h\) requires computing resources of type \(GT_i^h\), which can take values from the set \(\left\{C_1,C_2,\cdots ,C_m,\cdots ,G_1,G_2,\cdots ,G_n,\cdots \right\}\). The task information model for heterogeneous computing platforms can be formally expressed as \(\Omega _2 = \left\{V,v_h,DAG_h,g_i^h,GT_i^h,L_i^h,MC_i^h,DC_i^h\right\}\). Here, V denotes the set of all tasks submitted by users, \(DAG_h\) represents the constraint relationships and communication lengths between the subtasks of task \(v_h\), \(L_i^h\) is the size of subtask \(g_i^h\), \(MC_i^h\) is the memory capacity required by \(g_i^h\), and \(DC_i^h\) is the disk capacity required by \(g_i^h\).
Due to the variations in the composition of subtasks and algorithmic processes across different tasks, the structures of DAGs exhibit significant diversity. This complexity prevents fixedsize encoding methods from being directly applied to uncertain tasks. Consequently, there is a need to design an encoding technique that can accommodate arbitrary numbers and topological structures of subtasks within DAGs. Moreover, a parametersharing mechanism is essential to address the issue of extensive training parameters.
Graph Embedding [36] is a technique that maps highdimensional dense matrices of graph data to lowdimensional dense vectors. In this study, we employ graph convolutional operations to encode the feature information of DAGs, performing calculations in a bottomup manner based on directed edge connections. Throughout the convolutional process, the feature information of parent nodes is preserved, and each subtask node is recursively encoded into a feature vector. Building on this, we utilize graph convolutional operations to aggregate subtaskrelated information at the task level, encoding tasks as embedding vectors by the DAG structure.
Applying GNN to encode task information involves transforming task attribute features into a set of vector representations. We adopt a hierarchical recursive approach to generate the embedding vectors \(x_i^h\) for subtasks \(g_i^h\), the embedding vector \(y^h\) for tasks \(v_h\), and the global embedding vector z for the task set V. This method not only allows for scalability of task information but also maintains the global integrity of feature information. The detailed encoding process is illustrated in Fig. 2.
FigureÂ 2 depicts the task encoding process, in which symbols of varying shapes signify subtasks with distinct resource needs. Similarly shaped symbols that differ in size indicate subtasks of a consistent computational type but with varying magnitudes. The edge length \(e_{i,j}^h\) in the DAG graph indicates the amount of data communicated between subtasks.

(1) Subtask information embedding
Given the stage attribute feature vector \(vec_i^h\) corresponding to the subtask \(g_i^h\) within task \(v_h\), we employ graph convolution to construct the embedding for this subtask:
In Eq. 2, \(x_i^h\) represents the embedding vector of \(g_i^h\). During the encoding process, starting from the root nodes of all subtasks in task \(v_h\), attribute information is propagated from parent task nodes to child nodes through message passing. In each message passing, the parent node of subtask node \(g_i^h\) summarizes the information from all ancestral nodes. The embedding calculation formula is as follows:
where \(\hat{a}(\cdot )\) and \(\hat{b}(\cdot )\) are nonlinear transformations functions, implemented as lightweight neural networks; Anc(i) represents the set of ancestor nodes of \(g_i^h\). The first term in Eq. 3 is a nonlinear aggregation operation based on graph convolution that summarizes the embedding information of \(g_i^h\) about all of its ancestor nodes.

(2) Task information embedding
GNN can compute the embedding for each DAG task. Let \(y_{(0)}^h = \left[vec_1^h;vec_2^h;\dots ;vec_n^h\right]\) denote the feature matrix of \(v_h\), n represent the number of subtasks contained in \(v_h\), and \(E^h\) be the adjacency matrix of \(v_h\). Then, the embedding encoding process for task \(v_h\) is as shown in Eq. 4.
where \(\widetilde{E}=E^{h}+I\), \(\widetilde{D}\) is the diagonal matrix of \(\widetilde{E}\), W represents the trainable weights, and \(\sigma (\cdot )\) is the activation function.

(3) Task set global embedding
Regarding the encoding of the task set, we treat the tasklevel nodes as the child nodes of the global summary nodes. Subsequently, we generate the global embedding vectors of the task set through message passing, as shown in Eq. 5.
where V is the number of tasks, \(\tilde{a}(\cdot )\) and \(\tilde{b}(\cdot )\) are nonlinear transformation functions.
The embedding vectors at these three levels collectively capture the individual characteristics of subtasks, their interdependencies, and the global information of the task set. Based on these multilevel embedding vectors, a fully connected feedforward network can be employed to further compute the priority scores for each subtask. The computation process is as shown in Eq. 6.
Utilizing the Gumbelsoftmax sampling technique, we calculate the probability distribution for each subtask according to its corresponding priority score and subsequently select the subtask with the highest priority probability. The calculation process is as shown in Eq. 7.
where \(score_i^h\) is the priority score of subtask \(g_i^h\), \(\hat{P}(i)\) is the probability distribution of subtask priority for selecting the subtask with the highest probability. \(\hat{\rho }_{i}^{h}{\sim }U[0,1]\) and \(\tau\) denotes the temperature coefficient.
Heterogeneous resource scheduling and multiobjective optimization model
This section describes the allocation process of heterogeneous resources and proposes a multiobjective optimization model for task queue processing time, power consumption, and load.
Heterogeneous resource scheduling model
The batch task set V comprises independent tasks and divisible tasks, which can generate a task queue \(Q_v\). Among these, divisible complex tasks are represented by a DAG to depict the logic of subtask execution. For the hth divisible complex task \(v_h\), it can be represented by a DAG as \(DAG_{v_h}=(V_{v_h},E_{v_h})\), where \(V_{v_h} = \left\{g_{1}^{h},g_{2}^{h},\ldots ,g_{n}^{h}\right\}\) is the set of subtasks for \(v_h\), and \(E_{v_h} = \left\{\left(g_i^h,g_j^h\right)\leftg_i^h,g_j^h{\in }V_{v_h}\right.\right\}\) is the set of associations and communication lengths between subtasks. Integration of \(DAG_{v_h}\) into the task queue \(Q_v\) results in a comprehensive and granular workflow representation. During task execution, subtasks are distributed across various partitions of the task queue, tailored to their specific resource and performance needs. This approach ensures that the synchronization and efficiency of executing complex tasks are maintained.
Given a batch task queue \(Q_v\) and the platform resource set D, the goal of resource allocation is to predict the taskresource allocation relationship graph \({\varPhi }\). Specifically, tasks \(v_h\) within \(Q_v\) are matched with subsets of D, while each subtask \(g^h\) within \(v_h\) is assigned to a particular device \(d_i\) within the selected subset. Our methodology deploys subtasks and specific devices as the fundamental units for scheduling, employing both GNN and EPSO algorithms to devise an optimized scheduling policy. The overall framework and algorithm flow are shown in Fig. 3.
In Fig. 3, the symbols C, G, and M represent task queues with CPU, GPU, and mixed resource demands, respectively. Triangles and circles represent CPU and GPU resources, and hexagons represent memory, disk, and network resources. \(P(\cdot )\) and \(MK(\cdot )\) represent the calculation of probability and adaptability.
The information flow for developing a heterogeneous resource scheduling model can be delineated as follows:

(1) Resource Partitioning and Task Queue Multiple Input Module. This module first creates multiple resource input queues by categorizing platform resources based on type and performance. Then, it generates multiple task input queues based on the taskâ€™s demand for resource type and performance.

(2) Task Information and Resource Feature Encoding Module. This module uses graph embedding techniques to generate encodings of task attribute information and device state information.

(3) Subtask and Resource Allocation Module. This module generates subtaskdevice allocations by applying a performance adaptation principle and conditional probability calculations, thereby forming a collection of subtaskresource scheduling schemes.
The scheduling scheme can be conceptualized as a series of resource allocation predictions. Each step in the scheduling process relies on the current system status, the global attribute features of the task queue \(Q_v\), and the compatibility between the performance requirements of subtasks and available resources. Allocation of subtasks \(g_i\) from \(Q_v\) to devices \(d_i\) is conditional upon the global attributes of \(Q_v\) and the assignments of other subtasks. For any given sequence of subtasks \(\left\{g_1,g_2,\dots ,g_i\right\}\), the problem can be formulated as follows:
The device assignment of the subtask is usually highly influenced by the device assignments of its predecessor subtasks. The dependency between the new assignment \(d_{g_i}\) and all previous assignments depends on the global properties of \(Q_v\), so the joint probability expressed in Eq. 8 cannot simply decompose. Consequently, we adopt an approximated decomposition of Eq. 8 to simplify this problem. If the predecessor subtasks of \(g_i\) are assigned, then the joint probability can be approximated as:
where \(D^{(up)}(g_i)\) refers to the assignments of all predecessor subtasks of \(g_i\).
In this manner, the tasktodevice assignment can be completed recursively, resulting in a collection of taskresource scheduling schemes. Furthermore, the current study has embedded heuristic information in the initial optimization of taskdevice pairings, effectively narrowing down the scheduling space and search domain.
To assess the alignment between subtask requirements and resource performance, we employ the Minkowski distance to measure the similarity of attribute features. Given that the encodings of subtask resource demands and device attributes are represented by \(A(a_1,a_2,\dots ,a_n)\) and \(B(b_1,b_2,\dots ,b_n)\), respectively, the matching degree is as follows:
where p is the power exponent, and the value range is [0, \(\infty\)].

(4) Multiobjective optimization scheduling scheme generation module. This module transforms the resource allocation in task queue batch processing into a search problem for optimal solutions. Within the framework of multiple objectives, it employs the EPSO search policy to obtain optimal scheduling solutions.
Time, power and load models and multiobjective optimization
In this section, we develop a comprehensive model that integrates time, power consumption, and load. Additionally, we propose a multiobjective optimization scheduling model.
Time model and optimization objective
Given the computing resource \(R_s^k\) and its processing capability \(P_s^k\) per unit time, along with the size \(L_i^h\) of subtask \(g_i^h\), the execution time for \(g_i^h\) on \(R_s^k\) can be determined as \(ST_{h,i}^{s,k}=\frac{L_{i}^{h}}{P_{s}^{k}}\).
Accounting for network transmission costs, the transmission time between subtasks \(g_i^h\) and \(g_j^h\) is denoted as \(TransT(g_i^h,g_j^h)\). When \(g_i^h\) and \(g_j^h\) are executed on the same server, transmission costs are negligible. However, if they are located on different servers, these costs are determined by the lower bandwidth of the two servers. A flag \(Loc(g_i^h,g_j^h)\) is used to determine whether \(g_i^h\) and its preceding subtask \(g_j^h\) are carried out on the same server. The volume of communication data between \(g_i^h\) and \(g_j^h\) is represented by \(Comm(g_i^h,g_j^h)\), while \(BW_{g_j^h}\) denotes the network bandwidth of the server where \(g_j^h\) is located. Consequently, the transmission time between \(g_i^h\) and \(g_j^h\) can be expressed using Eq. 11.
The objective of time optimization is to minimize the average completion time (AvgT) across the task. Assuming that the computing resource allocated to subtask \(g_i^h\) is \(R_s^k\) and the earliest idle time of \(R_s^k\) is indicated as \(IdeT_{h,i}^{s,k}\). The actual completion time of the predecessor subtask \(g_j^h\) of \(g_i^h\) is recorded as \(FT_j^h\). Consequently, the actual starting time of \(g_i^h\) is as in Eq. 12.
Given that the actual execution time for subtask \(g_i^h\) is \(ST_{h,i}^{s,k}\), the corresponding actual completion time \(FT_i^h\) can be represented as follows:
For task \(v_h\), the objective is to minimize the maximum completion time across all subtasks. The maximum completion time for \(v_h\) is denoted as \(MaxFT_h={\mathop {max}\limits _i}FT_i^h\). Consequently, the maximum completion time for the entire task set can be represented as \(SysFT=\sum \nolimits _{h=1}^{V}MaxFT_h\).
For the entire computational platform, the optimization objective is to keep the waiting time for all tasks as short as possible, that is, to minimize the average completion time for all tasks. The average completion time for the task is given by \(AvgT=\frac{1}{V}SysFT\). Therefore, time optimization involves solving the minimization problem as stated in Eq. 14.
Power consumption model and optimization objectives
In the practical computing context, the power consumption of the platform can be bifurcated into two distinct categories: dynamic power usage, which is generated during task execution, and inherent power usage, which occurs when the device is in an idle state. The ratio of dynamic to inherent power consumption is estimated to be approximately 7:3 when the device is under full load.
This study computes power consumption by focusing solely on computing devices. For each device \(R_s^k\) on server \(S_s\), we denote its power consumption rate as \(UE_s^k\). Consequently, the aggregate power consumption rate for server \(S_s\) is denoted as \(SE_s=\sum \nolimits _{k=1}^{K_s}UE_s^k\).
The inherent power consumption of \(R_s^k\) is denoted as \(P_{s,k}^{inh}\), and the dynamic power consumption as \(P_{s,k}^{dyn}\). Given that the execution time of subtask \(g_i^h\) on \(R_s^k\) is \(ST_{h,i}^{s,k}\), the inherent power consumption of the system can be calculated as \(SysW^{inh}=\frac{3}{7}\times \sum \nolimits _{s=1}^{S}\sum \nolimits _{k=1}^{K_s}UE_{k}^{s}\times ST_{h,i}^{s,k}\). The total power consumption can be accumulated according to the actual operation, which can be calculated as \(SysW^{tol}=\sum \nolimits _{s=1}^{S}\sum \nolimits _{k=1}^{K_{s}}(UE_{s}^{k}\times ST_{h,i}^{s,k})+SysW^{inh}=SysW^{dyn}+SysW^{inh}\). Consequently, the objective of minimizing power consumption can be articulated through the following optimization problem.
Load balancing model and optimization objectives
For load balancing, this study takes into account the working hours and resource utilization of every server within the cluster.

(1) Timebased load balancing model. When performing task processing, each device works for as consistent a time as possible. The load model can be expressed as:
$$\begin{aligned} LOAD=\frac{min_{s\in [1,S]}time_{s}}{max_{s\in [1,S]}time_{s}}. \end{aligned}$$(16) 
(2) Resource utilizationbased load model. Based on the resource utilization of each server, the relative ratios of different resources can be calculated. Among them, CPU relative ratio: \(CPU_s={U_{Cs}}/{AL_C}\), GPU relative ratio: \(GPU_s={U_{Gs}}/{AL_G}\), memory relative ratio: \(MEM_s={U_{Ms}}/{AL_M}\), network relative ratio: \(NET_s={U_{Ns}}/{AL_N}\) and disk relative ratio: \(DISK_s={U_{Ds}}/{AL_D}\).
Here, \(AL_C\) indicates the average CPU load, while \(U_{Cs}\) represents the mean CPU usage rate of \(S_s\). A calculation result nearing 1 suggests that server resource utilization is approximately at the average level. Deviations from this indicate overloading or underloading.
In Eq. 17, \(Load_s=\lambda _s^1ln(CPU_s)+\lambda _s^2ln(GPU_s)+\lambda _s^3ln(MEM_s)+\lambda _s^4ln(NET_s)+\lambda _s^5ln(DISK_s)\), \(\lambda _s^1\),\(\lambda _s^2\),\(\lambda _s^3\),\(\lambda _s^4\),\(\lambda _s^5\) represent the weights of the four indicators of CPU, GPU, memory, network, disk of the server, where the value range of \(\lambda\) is \(0\sim 1\), and \(\lambda _s^1+\lambda _s^2+\lambda _s^3+\lambda _s^4+\lambda _s^5=1\).
Therefore, the load balancing comprehensive optimization objective model can be expressed as:
where \(\alpha\) is the regulating factor, and \(0<\alpha <1\).
Multiobjective optimization model
The multiobjective optimization model can be obtained by integrating the objectives of time, power, and load:
Within the multiobjective optimization model for task set scheduling on heterogeneous computing platforms, the respective metrics are either dynamically monitorable or computable from monitoring data.
A multiobjective optimization scheduling algorithm based on EPSO
The allocation of resources in task queues for batch processing involves a multiobjective optimization problem, which can be conceptualized as a search for optimal solutions within the task and resource allocation space. In such heterogeneous computing environments, the substantial volume of tasks submitted, paired with the assortment of resource varieties, leads to an extensive spectrum of potential strategies for taskresource allocation. Furthermore, task scheduling is subject to both dynamic changes and uncertainty, with each instance of task selection and resource allocation prompting a shift in the systemâ€™s state.
Multiobjective scheduling policy based on EPSO
This section aims to explore the problem of task and resource optimization scheduling and propose a search mechanism based on the selfadaptive EPSO. Based on the construction of the task and resource node information model, this study enhances the comprehensiveness and adaptability of information utilization by integrating multidimensional node features into the fitness function of the EPSO algorithm.
EPSO algorithm
The EPSO algorithm, with its adaptive weight and elite strategies, has emerged as an efficient technique for addressing multiobjective optimization challenges. During the initial and final phases of iteration, the algorithm adjusts the inertia weight via an adaptive strategy, while an effective elite guidance mechanism improves the diversity of particle exploration within the solution space. This approach promotes rapid convergence and increases the likelihood of identifying the global optimal solution. The flowchart of EPSO algorithm is illustrated in Fig. 4.
In Fig. 4, the EPSO algorithm initializes the positions and velocities of the particles. Here, the initial positions of the particles correspond to various mapping schemes between tasks and resources. Subsequently, the quality of the solutions is assessed by calculating the fitness value for each particle. To maintain the convergence and diversity of the particle swarm, the algorithm guides other particles to adjust their velocities and positions through elite particles in each iteration. Elite particles are the \(\mathcal {B}\) particles with the highest fitness values in the current and previous iterations. Randomly select particles from two generations of elites and compare them. The winner particle replaces the local best solution in the velocity update equation, while the loser particle replaces the global best solution. The calculation process is shown in Eqs. 20 and 21.
where \({\upsilon }_{\gamma }(t+1)\) represents the velocity vector of the \({\gamma }\)th particle at step \(t+1\), \(\theta _{1}(t)\) is the winner elitist particle for guiding individual optimization learning, \(\theta _{2}(t)\) is the loser elitist particle for guiding global optimization learning, rand is a random number between 0 and 1, and \(\theta _{\gamma }(t)\) represents the current position of the particle. Here, we utilize the elite particles from two consecutive iterations to guide the evolution of other particles, thereby enhancing the diversity of particles in the solution space. In the initial stage of the algorithm, since only one generation of particles is contained in the population, two particles are randomly selected from the \(\mathcal {B}\) elitist particles of this generation as comparison. Based on the fitness values of individual particles, the probability of being selected is given by the roulette wheel strategy:
Using a roulette wheel selection strategy can avoid the decrease in algorithm efficiency caused by randomness during the selection of elite particles. By extracting highly efficient elite guiding particles, the optimization efficiency and convergence stability of the algorithm are ensured.
Optimization based on inertia weights
Inertia weight is a crucial parameter in the particle swarm optimization algorithm that balances global and local search capabilities. A higher inertia weight results in greater particle velocities within the search space, facilitating extensive global exploration. Conversely, a lower inertia weight reduces particle velocities, promoting intensive local search activities near known local optimal solutions.
The linearly decreasing inertia weight strategy [37] involves setting a higher initial value for the inertia weight at the beginning of the algorithmâ€™s execution. Subsequently, this value is decreased linearly throughout iterations. Compared to the PSO with a fixed inertia weight, this approach enhances the finetuning capabilities of the particle swarm. The linearly decreasing inertia weight is defined in Eq. 23:
where \(\omega _{max}\) and \(\omega _{min}\) represent predefined constants, Max is the maximum number of iterations, and iter indicates the current iteration count. Throughout the algorithmâ€™s iteration, the inertia weight diminishes progressively as it is multiplied by a diminishing factor. The values of \(\omega _{max}\omega _{min}\) and \(\omega _{min}\) can be adjusted to control the weightâ€™s range, thereby enhancing the algorithmâ€™s local search proficiency in its latter stages. However, this approach does not adapt well to the dynamic changes of the problem at hand.
To more flexibly accommodate the needs of particle search behavior during optimization, a Sigmoid nonlinear inertia weight strategy is proposed in the literature [38]. The Sigmoid function exhibits rapid growth in the initial phase, decelerates as it nears a threshold, and eventually plateaus. This property is advantageous for particles to quickly locate the optimal solution in the early stages of optimization and to refine their search in later stages. The formula is as follows:
where u is the constant to adjust sharpness of the function and \(\kappa\) is the constant to set partition of sigmoid function. However, the Sigmoid function relies heavily on constant terms. This reliance can cause a problem. When the algorithm runs, if the weight adjustment reduces the search capability at any point, the algorithm might get stuck in a local optimum.
The Sigmoidlike inertia weight strategy [39] is achieved through a piecewise function that embodies the characteristics of a sigmoid function. This strategy enables particles to switch adaptively between linearly and nonlinearly decreasing inertia weight strategies. It facilitates effective global and local search across different stages of the optimization process. The formula is as follows:
where \(\varrho\) is used to define the transition region between the linearly decreasing inertia weight and the nonlinearly decreasing inertia weight.
Inspired by this method, we propose a piecewise nonlinear approach based on an exponential change of the inertia weight strategy [40]. During the initial phase of particle search, the inertia weight is kept constant at a higher fixed value. This approach enables the PSO to perform a comprehensive search across the solution space, enhancing its global optimization capabilities. In the later stages of the algorithm, we employed a nonlinear dynamic decrement strategy for the inertia weight. This strategy progressively expanded the algorithmâ€™s local search capabilities, preventing it from getting trapped in local optima. In the experimental scenarios of this paper, we achieved the best results by setting the fixed weight to 1 and the transition region to 2/3. The piecewise nonlinear decreasing inertia weight calculation formula is as follows:
In Eq. 26, \(\mu\) and \(\pi\) are exponential adjustment factors. This method enhances the flexibility of weight adjustment by employing an exponential adjustment factor to control the rate of weight change. Such a flexible weight adjustment mechanism is better suited for complex optimization problems. Additionally, the adaptive weights help maintain a balance between the particlesâ€™ exploration and exploitation capabilities, which promotes the convergence of the EPSO algorithm to the global optimal solution.
Search for the optimal scheduling solution
Fitness function is employed to evaluate the quality of solutions corresponding to particles, and to guide particles in selecting the optimal positions. When crafting the fitness function, it is essential to consider a range of factors, including task completion time, power consumption, and load balancing, to optimize time efficiency and resource utilization. This is encapsulated in the multiobjective optimization functions \(F_1\), \(F_2\), and \(F_3\), as detailed in Eq. 19. To synchronize these three distinct optimization objectives into a unified objective, the primary focus shifts to identifying the optimal value of the fitness function. Individuals with higher fitness function values are deemed superior. The objective function is derived by taking the logarithm of each objective, then its reciprocal, and summing the results. The precise fitness function is depicted in Eq. 27.
The adaptive elite particle swarm algorithm enhances the diversity of the particle population through its sophisticated search mechanism and dynamic weight adjustment strategies, thereby facilitating efficient exploration for optimal solutions within the solution space. The position vector of each particle represents the mapping between resources and tasks for a single scheduling instance. During the iterative process of the algorithm, it progressively searches for the global optimal solution, culminating in the best possible task scheduling outcome. The implementation process of EPSO is presented in AlgorithmÂ 1:
Experiments and analysis
In this section, we first designed an experiment to evaluate the effectiveness and feasibility of the proposed algorithm. Subsequently, we compare and analyze the proposed algorithm with existing efficient algorithms on multiple performance metrics.
Experiment environment
For the experimental setup and algorithmic model training, we employed an Inspur server. The hardware specifications of the server are detailed in TableÂ 2.
CloudSim [41] is a scalable simulation framework capable of modeling cloud computing infrastructure and services. The experiment used the CloudSim platform for task scheduling simulation and added new functionality for GPU resource allocation. To mirror the diversity of hardware resource types and task types found in authentic environments, a range of computing devices was established, exhibiting varying levels of performance and power consumption, as detailed in TableÂ 3. Additionally, the experiment instantiated heterogeneous servers, each characterized by unique resource attributes, quantities, and hardware performance metrics, as specified in TableÂ 4.
The heterogeneous computing cluster consists of 60 servers. The information obtained through simulation is as follows.

(1) Task completion time. The Monitor tool tracks the execution status of individual tasks, including whether they are hung, in progress, or completed. The duration from the start of the first task in the cluster to the end of the final task is the total task completion time.

(2) Power consumption (Only consider computing devices). Dynamic power consumption is obtained based on predefined power consumption rates and task execution times for different CPU and GPU types.

(3) Load. The Monitor tool captures the service duration provided by each server for task execution and the serverâ€™s resource utilization. Analyze and compare these metrics across servers to determine the degree of load balancing.
Dataset
The experiment utilized two datasets (i.e., the Alibabaclustertracev2018 and the Random tasks) to train and evaluate the proposed algorithm.
Alibabaclustertracev2018
The Alibabaclustertracev2018 [42] dataset provides an extensive record of the operational logs for a cluster of approximately 4000 machines within Alibabaâ€™s production environment across 8 days. It encompasses detailed static information and dynamic operational data for around 9000 online services and 4 million batchprocessing jobs. The dataset is structured around six monitoring log files that detail the activities of servers, online services, and batchprocessing tasks. For this study, we primarily utilize a portion of the data from the instance runtime monitoring file (batch_instance).
In a heterogeneous cluster environment, multiple tasks submitted by different users are present, each with unique resource requirements. For instance, computeintensive tasks demand substantial computational resources, while IOintensive tasks require more input/output resources. Computingintensive tasks include both CPU and GPU types. The following adjustments were made to the raw data in order to model the richness of task types in heterogeneous clouds:

(1)
The hierarchy of jobs, tasks, and instances was uniformly represented using a tasksubtask structure, as depicted in Fig. 5;

(2)
The subtask size is the difference between the start time and the end time of the instance in the original data table;

(3)
Random output data sizes were allocated to each instance, simulating the data transfer volumes between subtasks;

(4)
Since the instances in the original dataset utilized only CPUs for computation, additional computation resource types were introduced. Each instance was randomly assigned a computation type (such as CPU or GPU);

(5)
Tasks were categorized into independent task groups and DAG task groups based on whether they comprised multiple dependent subtasks.
The revised task attributes are presented in TableÂ 5 which illustrates the dependencies among subtasks through their names. For instance, "M3_2_6" denotes that M2 and M6 are predecessors of M3, while M3 is a successor to both M2 and M6. In other words, M3 is contingent upon the completion of M2 and M6 before it can commence. Independent tasks serve as the fundamental, indivisible units of execution and scheduling. DAG tasks consist of multiple subtasks, some of which have data dependencies and must receive outputs from their preceding tasks. Subtasks without data dependencies can be executed in parallel on different processors. When two subtasks with data dependencies are on different servers, the required data is transferred over the network. The size of subtask output data can quantify the overhead associated with interserver data transfer.
Random tasks
In the design of random distributed computing tasks, the number of subtasks, n, serves as a parameter to modulate the overall task set size. During task generation, the number of subtasks is restricted to a defined range, typically \(1<n<30\), to cater to a diversity of computational demands. Furthermore, an equal number of subtasks of CPU and GPU types were created to ensure task diversity. The execution times and communication costs for subtasks are generated according to the ComputationCommunication Cost Ratio (CCR) [26], which defines the average ratio between execution time and communication cost. When generating data on communication costs, the average execution time is fixed in advance, while the average communication cost is calculated based on the CCR value. Tasks with a higher CCR exhibit lower relative communication costs among subtasks, indicating a computationintensive nature where execution time is the predominant factor. Conversely, tasks with a lower CCR are characterized by higher communication costs, indicative of a dataintensive nature. By adjusting the number of subtasks, n, and the CCR, tasks with distinct characteristics can be generated, thus enabling the simulation of various task requirements in realworld distributed computing environments.
Experiment results
Training
From Alibabaclustertracev2018, six randomly selected task sets of different sizes with several tasks ranging from 100 to 600. The model is trained on these different datasets to adapt it to different task volumes. The inertia weight was determined using a piecewise nonlinear decreasing strategy, as detailed in Optimization based on inertia weights section. All other training parameters were configured as presented in TableÂ 6.
To depict the training performance of the model across varying data sizes, we processed the fitness values as follows: initially, we recorded the optimal fitness values achieved by the model at a different number of tasks; subsequently, we computed the difference between the fitness value at each iteration and the optimal fitness value; finally, we amplified this difference by a factor of 1.5 to enhance the visual representation. The processed results are illustrated in Fig. 6, which delineates the trend of fitness values over 500 iterations for datasets comprising 200, 400, and 600 tasks, respectively.
FigureÂ 6 illustrates that with the increment of iteration counts, the fitness values gradually converge towards a consistent level. Despite the augmented task numbers increasing the computational complexity and problemsolving difficulty, the algorithm still rapidly converges toward the global optimum without discernible oscillations in the curve. This suggests that the solution is comparatively stable and dependable. Analyzing the training outcomes confirms that the algorithm is capable of identifying superior solutions.
Utilizing an identical processing method, we evaluated the changes in fitness values under different inertia weight strategies within a dataset comprising 400 tasks. The outcomes are presented in Fig. 7.
In Fig. 7, the characteristics of the Sigmoid function lead to oscillations in the results when using a nonlinear inertia weight strategy, and the overall convergence rate is slower. The piecewise nonlinear decreasing inertia weight strategy remains between \(\omega _{min}\) and \(\omega _{max}\) throughout the search process, enabling the fastest convergence to the optimal solution. In contrast, the Sigmoidlike strategy rapidly decreases to near \(\omega _{min}\) after the set number of iterations, resulting in a slower convergence rate.
Baseline algorithms and evaluation metrics
The study chose a set of representative algorithms from the domain of task scheduling as a baseline to assess the efficacy of the proposed method. A concise overview of these algorithms is provided below:

MSJF [43]: This is an enhanced traditional approach that assigns tasks based on task length and equipment configuration.

MGGS [44]: This is an optimized task scheduling algorithm based on the combination of genetic algorithm and greedy strategy.

HDPSO [45]: This method is a swarm intelligence algorithm that exploits multistage hybrid discrete particle swarm to optimize the task scheduling.

QLHEFT [46]: This is another stateoftheart method that utilizes Qlearning and earliest completion time allocation to employ a heterogeneous scheduling policy.
To assess the performance of the models, we employed four metrics to evaluate their scheduling outcomes: makespan, scheduling length ratio, speedup, and load balancing efficiency.

(1) Makespan: The time span from when a task is submitted to when it is completed.

(2) Schedule Length Ratio(SLR): The makespan is compared to the aggregate duration required for executing each subtask on the critical path on a single computing device, yielding a ratio that quantifies their relationship:
$$SLR=\frac{makespan}{\Sigma _{g\in CP_{MIN}}min_{d\in D} \left\{ST_{g,d}\right\}},$$(28)where \(CP_{MIN}\) denotes the subtask located on the critical path. D stands for all computing devices, while \(ST_{g,d}\) specifies the duration necessary for executing subtask g on device d. The denominator calculates the cumulative execution time for each subtask on the critical path when run on optimally configured devices. Consequently, an SLR value closer to 1 indicates a more effective scheduling outcome.

(3) Speedup: The ratio of the minimum time required to execute all subtasks in a task sequentially using the best configured compute node to the makespan of the task.
$$speedup=\frac{min_{d\in D} \left\{\sum _{g\in \nu }ST_{g,d}\right\}}{makespan}.$$(29)This measure is employed to assess the effectiveness of the scheduling algorithm in leveraging parallel computing to expedite task execution.

(4) Load balance: The load balancing degree reflects the working hours and resource usage of each server in the cluster. The closer the load value is to 1, the more balanced the cluster load is.
Performance evaluation
In the initial phase, the Alibabaclustertracev2018 was employed to assess the efficacy of the scheduling algorithms. To ensure statistically valid outcomes, each algorithm was run independently five times across six datasets of diverse magnitudes. Figure 8 shows the performance of different scheduling algorithms on differentsized datasets.
From Fig.Â 8a, it can be seen that when the number of user tasks is small, the cluster resources are sufficient, and the performance differences among various algorithms are not significant. As the number of tasks increases, the optimal solution search ability of the GraphEPSO algorithm in complex environments begins to show. When the task volume reaches 600, the makespan of the GraphEPSO algorithm is 8.68% less than that of the QLHEFT algorithm. In Fig.Â 8b, the average SLR of the GraphEPSO algorithm is less than that of other algorithms, indicating that the algorithm is closer to the ideal solution. In Fig.Â 8c, the speedup of the GraphEPSO algorithm is higher than that of other algorithms, indicating that the algorithm can effectively utilize parallel computing resources. FigureÂ 8d describes the load balancing degree of all algorithms. Compared with the other 4 algorithms in the comparative experiments, the GraphEPSO algorithm exhibits a better loadbalancing effect. This indicates that the scheduling algorithm can not only improve resource utilization efficiency but also better balance the workload of multiple servers in heterogeneous clusters, avoiding performance degradation caused by overload or unbalanced workload of a single server.
The GraphEPSO algorithm aims to minimize system power consumption while shortening task execution time. To quantitatively evaluate the performance of the model in power consumption optimization, we compared the power consumption generated by GraphEPSO and other scheduling algorithms under different task numbers. The experimental results are shown in Fig.Â 9.
As shown in Fig.Â 9, the GraphEPSO algorithm achieves the lowest total power consumption of the cluster. The reason for this is the efficiency and rationality of GraphEPSO in resource allocation, which reduces the idle time of servers and thereby lowers the inherent power consumption of the servers.
Subsequently, we utilized randomly generated distributed computing tasks to evaluate the performance of the trained model. In this evaluation, we fixed the number of tasks at 200, with each task including a variable number of subtasks between 1 and 30. Recognizing the significant differences in communication costs and computational costs associated with different task types, we established six distinct CCR levels: 0.1, 0.5, 1, 2, 5, and 10. This approach enabled the generation of diverse task queues, encompassing both computeintensive and dataintensive workloads. The results of the experiment are shown in Fig.Â 10.
FigureÂ 10a displays the average makespan of all algorithms under different CCR values. As CCR increases, the communication cost between nodes increases relatively, while the computational cost is relatively small. The average makespan of each algorithm shows a decreasing trend. Among all algorithms, the GraphEPSO algorithm has the shortest average makespan. This is because as CCR increases, GraphEPSO tends to reduce communication costs and assigns parent and child subtasks to the same server. When the communication cost is much larger than the computational cost, the GraphEPSO algorithm effectively shortens the makespan. FigureÂ 10b shows that GraphEPSO has the smallest average SLR, indicating that the algorithm can process different dense tasks more quickly and reduce task waiting time. FigureÂ 10c displays the impact of CCR on average speedup. As CCR increases, the amount of data transmitted between subtasks decreases, and the parallel execution efficiency of each algorithm on multiple processors increases, resulting in corresponding increases in average speedup. The GraphEPSO algorithm exhibits the best average speedup, demonstrating that this scheduling algorithm can make full use of multiprocessor parallel execution for distributed computing tasks. FigureÂ 10d shows that the GraphEPSO algorithm has the highest load balancing degree, indicating that it can adapt to system load changes more quickly and achieve relatively reasonable resource utilization.
The experiment evaluated the power consumption of servers produced by different scheduling algorithms when processing distributed computing task queues at different CCR levels. The experimental results, as shown in Fig. 11, indicate that the GraphEPSO algorithm performs better than other algorithms in terms of power consumption. This result suggests that the GraphEPSO algorithm has better information processing capabilities for different dense tasks, enabling more efficient utilization of resources and reducing the idle time overhead of the cluster.
In summary, the GraphEPSO algorithm proposed in this paper optimizes the representation of tasks and computational resources through scalable state information encoding, effectively extracting global information about tasks and resources. Through the guiding role of elite particles and adaptive weights in learning, it can better balance local and global search capabilities in complex solution spaces. GraphEPSO enables tasks to be matched to appropriate computational resources in better order, fully utilizing the computing power of heterogeneous computational resources. As a result, it effectively reduces the maximum completion time and power consumption of tasks, while maximizing load balancing among multiple servers in the cluster.
Conclusion
This paper studies the multiobjective optimization (i.e., time efficiency, platform power consumption, and load balancing) of task queue batching in heterogeneous computing clusters. We propose a collaborative scheduling model and algorithm integrating GNNs with EPSO. By transforming task and resource information into a graph structure and utilizing GNNs for node feature representation and information propagation, our approach effectively captures and leverages the characteristic information of tasks and resources. Furthermore, we introduce EPSO as a taskscheduling decisionmaking mechanism, efficiently searching for optimal solutions within the feasible solution space of resource allocation. In the CloudSim simulation platform, we conducted scheduling experiments on task sets of varying sizes and CCRs. The optimal solutions found by our algorithm showed an improvement of 8.65% and 7.57% in time efficiency, a reduction of 7.71% and 6.37% in power consumption indicators, and an increase of 3.59% and 5.15% in load balancing indicators compared to the best values of the comparative algorithms, achieving favorable experimental results. Nevertheless, the algorithmic flow of our proposed method is relatively complex, and there is room for improvement in the factors considered during the modeling phase. Future research will focus on further improvements and examine the impact of system reliability and security on the scheduling of heterogeneous computing clusters.
Availability of data and materials
No datasets were generated or analysed during the current study.
Abbreviations
 GNN:

Graph Neural Network
 EPSO:

Elite Particle Swarm Optimization
 DAG:

Directed Acyclic Graph
 PSO:

Particle Swarm Optimization
 SLAs:

Service Level Agreements
 MDP:

Markov Decision Process
 GCN:

Graph Convolutional Network
 DPSO:

Distributed PSO
 RMPSO:

Random Matrix PSO
 AP:

Adaptive Particle Individual Best Position
 CCR:

ComputationCommunication Cost Ratio
 GA:

Genetic Algorithm
 HDPSO:

Hybrid Discrete Particle Swarm Pptimization
 SLR:

Schedule Length Ratio
References
OdunAyo I, Ananya M, Agono F, etÂ al (2018) Cloud computing architecture: A critical analysis. In: 2018 18th international conference on computational science and applications (ICCSA).Â IEEE, Melbourne, p 1â€“7
Masdari M, Nabavi SS, Ahmadi V (2016) An overview of virtual machine placement schemes in cloud computing. J Netw Comput Appl 66:106â€“127
Huang MG, Ou ZQ (2014) Review of task scheduling algorithm research in cloud computing. Adv Mater Res 926:3236â€“3239
Ma T, Pang S, Zhang W, etÂ al (2019) Virtual machine based on genetic algorithm used in time and power oriented cloud computing task scheduling. Intell Autom Soft Comput 25
Gawali MB, Shinde SK (2018) Task scheduling and resource allocation in cloud computing using a heuristic approach. J Cloud Comput 7:1â€“16
Mahmoud H, Thabet M, Khafagy MH et al (2021) An efficient load balancing technique for task scheduling in heterogeneous cloud environment. Clust Comput 24:3405â€“3419
Houssein EH, Gad AG, Wazery YM, etÂ al (2021) Task scheduling in cloud computing based on metaheuristics: review, taxonomy, open challenges, and future trends. Swarm Evol Comput 62:100841
Arunarani AR, Manjula D, Sugumaran V (2019) Task scheduling techniques in cloud computing: A literature survey. Futur Gener Comput Syst 91:407â€“415
Kumar M, Sharma SC, Goel A et al (2019) A comprehensive survey for scheduling techniques in cloud computing. J Netw Comput Appl 143:1â€“33
Hosseinzadeh M, Ghafour MY, Hama HK et al (2020) Multiobjective task and workflow scheduling approaches in cloud computing: a comprehensive review. J Grid Comput 18:327â€“356
Khojasteh TG, Naghibzadeh M, Abrishami S, etÂ al (2022) EDQWS: an enhanced divide and conquer algorithm for workflow scheduling in cloud. J Cloud Comput 11:13
Hai T, Zhou J, Jawawi D, etÂ al (2023) Task scheduling in cloud environment: optimization, security prioritization and processor selection schemes. J Cloud Comput 12:15
Abid A, Manzoor F M, Farooq MS, etÂ al (2020) Challenges and issues of resource allocation techniques in cloud computing. KSII Transactions on Internet & Information Systems 14
Belgacem A, BeghdadBey K (2022) Multiobjective workflow scheduling in cloud computing: tradeoff between makespan and cost. Cluster Comput 25:579â€“595
Singh H, Tyagi S, Kumar P, etÂ al (2021) Metaheuristics for scheduling of heterogeneous tasks in cloud computing environments: Analysis, performance evaluation, and future directions. Simul Model Pract Theory 111:102353
Hussain M, Wei LF, Lakhan A, etÂ al (2021) Energy and performanceefficient task scheduling in heterogeneous virtualized cloud computing. Sustain Comput Inform Syst 30:100517
Sardaraz M, Tahir M (2020) A parallel multiobjective genetic algorithm for scheduling scientific workflows in cloud computing. Int J Distrib Sens Netw 16:1550147720949142
Liang B, Dong X, Wang Y et al (2020) A lowpower task scheduling algorithm for heterogeneous cloud computing. J Supercomput 76:7290â€“7314
Kaur K, Garg S, Aujla GS et al (2019) A multiobjective optimization scheme for job scheduling in sustainable cloud data centers. IEEE Trans Cloud Comput 10:172â€“186
Kishor A, Niyogi R, Veeravalli B (2020) Fairnessaware mechanism for load balancing in distributed systems. IEEE Trans Serv Comput 15:2275â€“2288
Guo X (2021) Multiobjective task scheduling optimization in cloud computing based on fuzzy selfdefense algorithm. Alex Eng J 60:5603â€“5609
Sun C, Yang T, Lei Y (2022) DRLTA: A typeaware task scheduling and load balancing method based on deep reinforcement learning in heterogeneous computing environmentt. In: 2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI).Â IEEE, Macao, p 1187â€“1195
Zhang W, Chen X, Jiang J (2020) A multiobjective optimization method of initial virtual machine faulttolerant placement for star topological data centers of cloud systems. Tsinghua Sci Technol 26:95â€“111
Mao H, Schwarzkopf M, Venkatakrishnan SB, etÂ al (2019) Learning scheduling algorithms for data processing clusters. In: Proceedings of the ACM special interest group on data communication.Â ACM, New York, p 270â€“288
Ni X, Li J, Yu M et al (2020) Generalizable resource allocation in stream processing via deep reinforcement learning. Proc AAAI Conf Artif Intell 34:857â€“864
Lin Z, Li C, Tian L, etÂ al (2022) A scheduling algorithm based on reinforcement learning for heterogeneous environments. Appl Soft Comput 130:109707
Luo J, Zhou Y, Li X, etÂ al (2021) Learning to optimize dag scheduling in heterogeneous environment.Â arXivÂ preprintÂ arXiv:210306980.Â https://doi.org/10.48550/arXiv.2103.06980
Wang X, Zhang L, Liu Y et al (2022) Dynamic scheduling of tasks in cloud manufacturing with multiagent reinforcement learning. J Manuf Syst 65:130â€“145
Song Y, Li C, Tian L, etÂ al (2023) A reinforcement learning based job scheduling algorithm for heterogeneous computing environment. Comput Electr Eng 107:108653
Mansouri N, Zade BMH, Javidi MM (2019) Hybrid task scheduling strategy for cloud computing by modified particle swarm optimization and fuzzy theory. Comput Ind Eng 130:597â€“633
Bansal M, Malik SK (2020) A multifaceted optimization scheduling framework based on the particle swarm optimization algorithm in cloud computing. Sustain Comput Inform Syst 28:100429
Wang ZJ, Zhan ZH, Yu WJ et al (2019) Dynamic group learning distributed particle swarm optimization for largescale optimization and its application in cloud workflow scheduling. IEEE Trans Cybern 50:2715â€“2729
Tang X, Shi C, Deng T, etÂ al (2021) Parallel random matrix particle swarm optimization scheduling algorithms with budget constraints on cloud computing systems. Appl Soft Comput 113:107914
Miao Z, Yong P, Mei Y et al (2021) A discrete psobased static load balancing algorithm for distributed simulations in a cloud environment. Futur Gener Comput Syst 115:497â€“516
Li H, Wang D, Zhou MC et al (2021) Multiswarm coevolution based hybrid intelligent optimization for biobjective multiworkflow scheduling in the cloud. IEEE Trans Parallel Distrib Syst 33:2183â€“2197
Zhang J, Duan H, Guo L et al (2021) Towards lightweight crossdomain sequential recommendation via external attentionenhanced graph convolution network. International Conference on Database Systems for Advanced Applications. Springer Nature Switzerland, Cham, pp 205â€“220
Serizawa T, Fujita H (2020) Optimization of convolutional neural network using the linearly decreasing weight particle swarm optimization.Â arXivÂ preprintÂ arXiv:200105670.Â https://doi.org/10.48550/arXiv.2001.05670
Malik RF, Rahman TA, Hashim SZM et al (2007) New particle swarm optimizer with sigmoid increasing inertia weight. Int J Comput Sci Secur 1:35â€“44
Tian D, Shi Z (2018) Mpso: Modified particle swarm optimization and its applications. Swarm Evol Comput 41:49â€“68
Xu H, Zhang T (2015) Improved discrete particle swarmbased parallel schedule algorithm in cloud computing. J South China Univ Technol (Nat Sci Ed) 43:95â€“99
Cloudsim (2009) A framework for modeling and simulation of cloud computing infrastructures and services.Â https://github.com/Cloudslab/cloudsim.Â Accessed 12 Aug 2022Â Â
Alibaba cluster trace program. (2018). https://github.com/alibaba/clusterdata/blob/v2018/clustertracev2018/trace_2018.md
Alworafi MA, Dhari A, AlHashmi AA, et al (2016) An improved SJF scheduling algorithm in cloud computing environment[C]//2016 International Conference on Electrical, Electronics, Communication, Computer and Optimization Techniques (ICEECCOT). IEEE, p 208â€“212.
Zhou Z, Li F, Zhu H et al (2020) An improved genetic algorithm using greedy strategy toward task scheduling optimization in cloud environments. Neural Comput Appl 32:1531â€“1541
Shirvani MH (2020) A hybrid metaheuristic algorithm for scientific workflow scheduling in heterogeneous distributed computing systems. Eng Appl Artif Intell 90:103501
Tong Z, Deng X, Chen H et al (2020) QLHEFT: a novel machine learning scheduling scheme base on cloud computing environment. Neural Comput Appl 32:5553â€“5570
Funding
This research was supported by the National Key Research and Development Program (2018YFC1406200).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Conceptualization by Zhen Zhang and Shaohua Xu. Data collection and preprocessing were performed by Jinyu Zhang and Long Huang. Visualization was completed by Jinyu Zhang. The first draft of this manuscript was written by Zhen Zhang and Chen Xu. The authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Consent has been granted by all authors and there is no conflict.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zhang, Z., Xu, C., Xu, S. et al. Towards optimized scheduling and allocation of heterogeneous resource via graphenhanced EPSO algorithm. J Cloud Comp 13, 108 (2024). https://doi.org/10.1186/s13677024006704
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13677024006704