Towards optimized scheduling and allocation of heterogeneous resource via graph-enhanced EPSO algorithm

Efficient allocation of tasks and resources is crucial for the performance of heterogeneous cloud computing platforms. To achieve harmony between task completion time, device power consumption, and load balance, we propose a Graph neural network-enhanced Elite Particle Swarm Optimization (EPSO) model for collaborative scheduling, namely GraphEPSO. Specifically, we first construct a Directed Acyclic Graph (DAG) to model the complicated tasks, thereby using Graph Neural Network (GNN) to encode the information of task sets and heterogeneous resources. Then, we treat subtasks and independent tasks as basic task units while considering virtual or physical devices as resource units. Based on this, we exploit the performance adaptation principle and conditional probability to derive the solution space for resource allocation. Besides, we employ EPSO to consider multiple optimization objectives, providing fine-grained perception and utilization of task and resource information. It also increases the diversity of particle swarms, allowing GraphEPSO to adaptively search for the global optimal solution with the highest probability. Experimental results demonstrate the superiority of our proposed GraphEPSO compared to several state-of-the-art baseline methods on all evaluation metrics.


Introduction
Nowadays, high-performance computing cluster with cloud architecture plays a significant role in large-scale scientific computation and batch-processing tasks.The hardware infrastructure, user support platforms, and software services constitute the hierarchical structure of cloud computing [1].It integrates resources by virtualizing and pooling techniques, achieving the stratification and abstraction from cloud clusters to physical nodes, and then to virtual machines [2].One of the most significant parts of cloud computing is task scheduling, which aims to allocate tasks to appropriate computing resources and ensure an acceptable response time [3,4].This process can be regarded as a multi-objective combinatorial optimization problem involving various considerations: (1) From the user perspective, the primary requirement is to reduce the consumption of computing tasks (i.e., minimizing the time consumption and financial cost); (2) For computing centers, it is essential to enhance overall resource utilization and reduce power consumption [5,6].
Task scheduling requires considering multiple influencing factors, including task characteristics [7], resource constraints, dependencies between tasks, and the matching degree between tasks and resources.Discord in any of these factors can lead to suboptimal results in task scheduling.However, the heterogeneity of computing resources and the ever-increasing scale of computing tasks exacerbate the complexity of task scheduling [8][9][10].Specifically, heterogeneous computing platforms host a variety of computing resources, which exhibit different performance metrics due to differences in their underlying hardware.Nevertheless, these hardwarelevel differences have been neglected by existing studies [11][12][13], which inevitably leads to low task efficiency in real-world operational environments.Besides, tasks on heterogeneous computing platforms consist of multiple interdependent subtasks that should be allocated with appropriate computing resources [14 -16].However, the complicated associations among subtasks are largely unexplored, affecting the performance of task scheduling.Besides, the search space expands exponentially when the platform scale and task number become enormous [17].Finding a set of optimal solutions in such a vast solution space is challenging and time-consuming.Hence, how to dynamically manage heterogeneous resources to optimize the response time and resource consumption for large-scale computing tasks becomes emerging yet challenging.
To this end, we propose a heterogeneous resource optimization scheduling algorithm based on the integration of Graph Neural Networks (GNNs) and Elite Particle Swarm Optimization (EPSO).Specifically, we first construct a Directed Acyclic Graph (DAG) to capture the complicated associations among subtasks.Then, we utilize a Graph Neural Network (GNN) to encode task attributes and device status information.This process is guided by the hierarchical representation of tasks and the categorization of devices.Moreover, we further generate the solution space for cluster scheduling according to the adaptability of task attributes to resource performance and the conditional probability of currently used and idle resource states.Additionally, we employ EPSO to optimize the task scheduling, that is, to simultaneously minimize the time consumption and power consumption of tasks and to achieve load balancing for batch processing task queues.By embedding multi-dimensional node features in the EPSO objective function, GraphEPSO achieves the adaptive mapping of basic task units to specific virtual or physical devices.The main contributions of this paper can be summarized as: • We propose a graph-enhanced EPSO-based collaborative scheduling model called GraphEPSO to optimize task scheduling on heterogeneous computing clusters.Extensive experiments on six task scheduling datasets demonstrate the superiority of our proposed method in various metrics compared to several state-of-the-art baselines.• By adopting the graph representation learning technique, we design a scalable task state encoding method, which captures the structure, type, and data size feature information of uncertain tasks.• We develop a multi-objective optimization model aimed at concurrently enhancing task execution efficiency, mitigating device power consumption, and optimizing system load distribution.• By embedding multi-dimensional features into the objective function of the EPSO algorithm, GraphEPSO can automatically capture the information of tasks and resources, thereby adaptively achieving the global optimal solution.
The rest of the paper is organized as follows.In Related work section, we review and analyze the current research status of multi-objective optimization models and algorithms.Heterogeneous computing platform description and feature information encoding section proposes encoding techniques for heterogeneous resource attributes and task information.Heterogeneous resource scheduling and multi-objective optimization model section develops a heterogeneous resource scheduling model and a multi-objective optimization model.A multi-objective optimization scheduling algorithm based on EPSO section mainly introduces an EPSO scheduling algorithm based on inertia weight and elite particle perturbations.In Experiments and analysis section, we conduct a series of experiments to test the time consumption, power consumption, and loads.Then, we provide a detailed analysis of the experimental results.Conclusion section concludes the paper and presents future work.

Related work
Scholars have already proposed numerous effective algorithms and implementation techniques to address the large-scale task scheduling problem in heterogeneous cloud computing platforms.This section mainly reviews the research and applications of multi-objective optimization model construction techniques, GNN information representation methods, and Particle Swarm Optimization (PSO) algorithms in task and resource scheduling.

Multi objective optimization scheduling for heterogeneous computing platforms
To ensure efficient task execution while reducing the power consumption of computing clusters, Liang et al. [18] established a power consumption model for clusters.By considering the scheduling, dynamic, static, and other power consumption aspects of the cluster, they obtained the relationship between the average task completion time, power consumption, and cost.Kaur et al. [19] constructed a multi-objective optimization model for cloud data center job scheduling and virtual machine configuration based on parameters such as Service Level Agreements (SLAs), energy costs, and carbon emissions.Subsequently, they employed an enhanced heuristic method, underpinned by a greedy strategy, to refine the solution of this model.Kishor et al. [20] used a game-theoretic framework to formulate the bi-objective optimization problem as a non-cooperative load balancing game, achieving an appropriate trade-off between the two conflicting objectives of load balancing.Guo et al. [21] selected the shortest task execution time, the lowest execution cost, and resource load balancing as the objectives for cloud computing task scheduling, and established a mathematical model to measure the effectiveness of multi-objective task scheduling.Sun et al. [22] described the task scheduling problem in heterogeneous computing environments as a Markov Decision Process (MDP) and designed task-type-aware reward functions, achieving optimization for multi-type tasks through multi-objective optimization and reward scaling.Zhang et al. [23] addressed the reliability issue of cloud computing systems by establishing an initial virtual machine fault-tolerant placement model for cloud system star-topology data centers based on five factors: SLA violation rate, resource availability, power consumption, failure rate, and fault-tolerant cost.Then they proposed a heuristic ant colony optimization algorithm to solve the model.

GNN information representation method
GNNs are neural network models based on graph structure data, designed to extract and uncover features and patterns within graph-based data.Compared to traditional matrix-based models, GNNs are capable of capturing the relationships between nodes and the global structure of the graph, providing a strong representation of DAG task structure information and computing cluster resource information.Addressing the scheduling problem of large-scale jobs on distributed computing clusters, Mao et al. [24] designed an extensible task scheduling method that can handle DAG tasks of any shape and size.This method converts DAG task information and executor state information into features to be fed into the policy network, achieving task relationship modeling and feature extraction.Ni et al. [25] used a graph encoder to encode the input flow graph, incorporating contextual information within the graph.They employed graph embedding to represent the structured information of the flow graph and used a graph-aware decoder to capture the complex dependency relationships that affect resource allocation quality.Lin et al. [26] used GNNs to encode the node information and dependencies of DAG tasks into a set of node embedding vectors.Then they utilized a graph attention mechanism to weigh the nodes according to their importance.This enabled the GNN to allocate more attention to critical nodes in DAG applications, thereby enhancing the model's information processing capabilities.Luo et al. [27] designed a graph neural network composed of node information embedding and fully connected feedforward networks, which uses GNNs to perceive the dependencies between tasks.By leveraging the connectivity of the DAG, they pooled information to each node, stacked the node embedding GNN layers together, enabling a node to integrate information from all reachable nodes, and used a policy network to select tasks for execution.Wang et al. [28] used Graph Convolutional Network (GCN) to encode the graph structure of tasks, representing information such as subtask types and execution order by extracting graphical features of the tasks.Song et al. [29] employed GCN to learn high-level feature representations of each node in the original DAG, including execution cost, communication cost, out-degree, and in-degree, and obtained scalable task state information within the system.They proposed a GCN based on bi-directional messaging that allows the network to learn both top-down and bottomup computational pattern metrics and implements bidirectional messaging from parent to child nodes.

Particle swarm multi-objective optimization scheduling algorithm
The PSO algorithm is a stochastic search algorithm designed by simulating the foraging behavior of bird swarms, which utilizes swarm intelligence to search for the optimal solution in the entire search space.Inspired by the PSO algorithm, Mansouri et al. [30] proposed a hybrid task scheduling strategy called FMPSO, which combines a fuzzy system with an improved PSO technique.They used four improved velocity update methods and a roulette selection technique to enhance the global search capability of particles, overcoming the local optimality and other drawbacks of the PSO algorithm.Bansal et al. [31] presented a multi-objective optimization scheduling framework based on a PSO scheduling model, which evaluates the budget cost using performance and budget constraints and provides feedback on the quality of task scheduling solutions.This approach can adjust the solution quality based on individual particle optimality and swarm particle optimality, solving the slow convergence problem of the PSO algorithm.
In addressing the scheduling challenges of largescale cloud workflows, Wang et al. [32] have proposed a dynamic swarm learning-based distributed particle swarm optimization algorithm (DPSO).The entire population is randomly partitioned into multiple groups, which collaboratively evolve using a master-slave multi-group distributed model.This method enhances the diversity of the population and employs a dynamic swarm learning strategy to dynamically change the group size, thereby controlling the learning intensity and balancing the diversity and convergence of DPSO.Tang et al. [33] formalized the cloud task scheduling problem with budget constraints and proposed a Random Matrix PSO Scheduling Algorithm (RMPSO), which uses random integer matrices to represent particle positions and feasible task scheduling solutions, aiming to achieve the optimal total cost of cloud services.They also designed a multi-core parallel RMPSO algorithm to reduce the time complexity of policy execution.Miao et al. [34] proposed an Adaptive Particle Individual Best Position (AP) selection method, which uses non-dominant particles to update the individual best position of a given particle.They introduced a new function to measure the gap between the individual best position or global best position of a given particle and its current position.The velocity vector is updated based on the calculated gap using a "roulette wheel" selection process.Li et al. [35] have infused local and global guiding information into the particle swarm's update process, pioneering a particle swarm cooperation technique.This refinement has significantly enhanced the global search and convergence capabilities of the particle swarm optimization algorithm, reinforcing its efficacy in both converging to optimal solutions and conducting comprehensive global searches.

Heterogeneous computing platform description and feature information encoding
This section describes the environment of the heterogeneous computing cluster, providing a detailed classification of resources and tasks.Subsequently, an encoding mechanism for resource and task information is designed based on Graph Neural Network (GNN) technology.The important symbols and explanations used in this paper are shown in Table 1.

Heterogeneous platform resources and task descriptions
A heterogeneous computing platform consists of multiple servers of different types, each housing various types of computing, storage, and network resources.The server set is denoted as S = s 1 , s 2 , • • • , s |S| , where |S| signifies the total number of servers.The computing device type set is defined as with C m representing CPU models and G n represent- ing GPU models.From the composition of the heterogeneous platform and the categorization of resources by type and performance, a performance metric set for platform resources is derived and formally expressed as refers to server s in the platform; RT k s indicates the type of computing device on S s , where k is the device index; SP k s denotes the computing speed per unit time of the k-th device on S s ; BW s represents the bandwidth of S s , while MC s indicates the memory capacity of S s , and DC s represents the disk capacity of S s .
In task and resource scheduling, the basic resource unit is a computing device with specific types and performance capabilities.A server is capable of housing numerous diverse computing devices, each with the potential to execute multiple tasks concurrently.The type of resources required by the task must match the type of computing devices available, and the server's memory and disk capacity must meet the minimum requirements for task execution.In batch processing of tasks, the resource state is dynamically changing, and devices of the same type but different models have varying power consumption and computing capabilities.
Tasks received by heterogeneous computing systems can be categorized into two groups: complex tasks and independent tasks.Complex tasks are capable of being divided into subtasks, with each subtask demanding

S
The set of servers.

RT
The set of computing device types.

SP
The computing speed of the device.

BW
The network bandwidth of the server.

MC
The memory capacity of the server.

DC
The disk capacity of the server.

V
The set of tasks.

v h
The h-th task.
The i-th subtask in v h .

E v h
The constraint relationships and communication lengths among subtasks in v h .

GT h i
The type of computing resources required by The demand for memory space by g h i .

DC h i
The demand for disk space by g h i .

Q
The task queue.

ST
The runtime of the subtask.

IdeT
The earliest idle time of the computing device.

AcT
The actual start time of the subtask.

FT
The actual completion time of the subtask.

SysFT
The total completion time of the task set.

UE
The power consumption of the computing device.

SE
The power consumption of the server.

p best
The local optimal solution of the particle swarm.

g best
The global optimal solution of the particle swarm.
different types of resources and performance.A DAG can be used to describe the constraint relationships between subtasks.Independent tasks, on the other hand, consist of only a single subtask and require only a single type of resource.
The process of resource allocation primarily encompasses two steps: The first step is to clearly resource requirements of the tasks and their execution priorities, while the second step involves assigning the appropriate resources to the tasks.When executing resource scheduling, we initially convert the attribute information of the tasks into representations.Subsequently, based on the specific needs of the tasks, we allocate resources to establish a mapping between tasks and resources.

Heterogeneous resource information encoding
In the unified management and allocation of heterogeneous devices, it is essential to encode the attribute features of each independent device unit.Device state information is encoded using a GNN, which is utilized to abstract the device information within the system.The GNN comprises a single embedding layer that performs information aggregation through graph convolutions.The goal is to abstract all resource attributes of a server, such as S s ,RT k s ,P k s ,BW s ,MC s ,DC s , into feature vectors, which serve as input to the scheduling decision-making module.The network structure of the device encoding module is depicted in Fig. 1.
In Fig. 1, the attribute information of different resources first enters the embedding layer and is transformed into their respective embedding vectors.Subsequently, these embedding vectors are fused by a nonlinear transformation function to generate a server-level representation of the resource information.The calculation formula is show in Eq. 1 where f (•) is a nonlinear transformation function, EM r represents the embedding vector of resource r, represent the resource set of server S s .
When encoding heterogeneous resources, the same encoding method is applied to devices of different types and performance levels.Task units are capable of identifying the required devices based on the type and performance of the resources needed for execution.

Task information encoding
In heterogeneous platforms, there is a significant variation in the device performance requirements for different computing tasks.Serial and parallel computing patterns are widely present in these tasks, leading to complex algorithmic processes.The platform system must accommodate the diverse needs of multiple tasks, which often exhibit significant differences in type and data volume.Furthermore, the divisibility of tasks exacerbates the (1) Fig. 1 The framework of the device encoding network heterogeneity among tasks, thereby increasing the complexity of task scheduling.
In the batch processing mode, the task scheduling algorithm is executed when a batch of tasks arrives at the computing platform.The task set is defined as The task information model for heterogeneous computing platforms can be formally expressed as Here, V denotes the set of all tasks submitted by users, DAG h rep- resents the constraint relationships and communication lengths between the subtasks of task v h , L h i is the size of subtask g h i , MC h i is the memory capacity required by g h i , and DC h i is the disk capacity required by g h i .Due to the variations in the composition of subtasks and algorithmic processes across different tasks, the structures of DAGs exhibit significant diversity.This complexity prevents fixed-size encoding methods from being directly applied to uncertain tasks.Consequently, there is a need to design an encoding technique that can accommodate arbitrary numbers and topological structures of subtasks within DAGs.Moreover, a parametersharing mechanism is essential to address the issue of extensive training parameters.
Graph Embedding [36] is a technique that maps high-dimensional dense matrices of graph data to lowdimensional dense vectors.In this study, we employ graph convolutional operations to encode the feature information of DAGs, performing calculations in a bottom-up manner based on directed edge connections.
Throughout the convolutional process, the feature information of parent nodes is preserved, and each subtask node is recursively encoded into a feature vector.Building on this, we utilize graph convolutional operations to aggregate subtask-related information at the task level, encoding tasks as embedding vectors by the DAG structure.
Applying GNN to encode task information involves transforming task attribute features into a set of vector representations.We adopt a hierarchical recursive approach to generate the embedding vectors x h i for subtasks g h i , the embedding vector y h for tasks v h , and the global embedding vector z for the task set V. This method not only allows for scalability of task information but also maintains the global integrity of feature information.The detailed encoding process is illustrated in Fig. 2.
Figure 2 depicts the task encoding process, in which symbols of varying shapes signify subtasks with distinct resource needs.Similarly shaped symbols that differ in size indicate subtasks of a consistent computational type but with varying magnitudes.The edge length e h i,j in the DAG graph indicates the amount of data communicated between subtasks.

(1) Subtask information embedding
Given the stage attribute feature vector vec h i corresponding to the subtask g h i within task v h , we employ graph convolution to construct the embedding for this subtask: In Eq. 2, x h i represents the embedding vector of g h i .During the encoding process, starting from the root nodes of all subtasks in task v h , attribute information is propagated from parent task nodes to child nodes (2) Fig. 2 The structure of the task encoding network through message passing.In each message passing, the parent node of sub-task node g h i summarizes the information from all ancestral nodes.The embedding calculation formula is as follows: where â(•) and b(•) are nonlinear transformations func- tions, implemented as lightweight neural networks; Anc(i) represents the set of ancestor nodes of g h i .The first term in Eq. 3 is a nonlinear aggregation operation based on graph convolution that summarizes the embedding information of g h i about all of its ancestor nodes.(2) Task information embedding GNN can compute the embedding for each DAG task.Let y h (0) = vec h 1 ; vec h 2 ; . . .; vec h n denote the feature matrix of v h , n represent the number of subtasks contained in v h , and E h be the adjacency matrix of v h .Then, the embed- ding encoding process for task v h is as shown in Eq. 4.
where E = E h + I , D is the diagonal matrix of E , W rep- resents the trainable weights, and σ (•) is the activation function.

(3) Task set global embedding
Regarding the encoding of the task set, we treat the tasklevel nodes as the child nodes of the global summary nodes.Subsequently, we generate the global embedding vectors of the task set through message passing, as shown in Eq. 5.
where |V| is the number of tasks, ã(•) and b(•) are nonlin- ear transformation functions.
The embedding vectors at these three levels collectively capture the individual characteristics of subtasks, their interdependencies, and the global information of the task set.Based on these multi-level embedding vectors, a fully connected feedforward network can be employed to further compute the priority scores for each subtask.The computation process is as shown in Eq. 6.
Utilizing the Gumbel-softmax sampling technique, we calculate the probability distribution for each subtask according to its corresponding priority score and subsequently select the subtask with the highest priority probability.The calculation process is as shown in Eq. 7. (3) where score h i is the priority score of subtask g h i , P(i) is the probability distribution of subtask priority for selecting the subtask with the highest probability.ρh i ∼U [0, 1] and τ denotes the temperature coefficient.

Heterogeneous resource scheduling and multi-objective optimization model
This section describes the allocation process of heterogeneous resources and proposes a multi-objective optimization model for task queue processing time, power consumption, and load.

Heterogeneous resource scheduling model
The batch task set V comprises independent tasks and divisible tasks, which can generate a task queue Q v .Among these, divisible complex tasks are represented by a DAG to depict the logic of subtask execution.For the h-th divisible complex task v h , it can be represented by a DAG as n is the set of subtasks for v h , and is the set of associations and communication lengths between subtasks.Integration of DAG v h into the task queue Q v results in a comprehensive and granular workflow representation.During task execution, subtasks are distributed across various partitions of the task queue, tailored to their specific resource and performance needs.This approach ensures that the synchronization and efficiency of executing complex tasks are maintained.
Given a batch task queue Q v and the platform resource set D, the goal of resource allocation is to predict the taskresource allocation relationship graph Φ .Specifically, tasks v h within Q v are matched with subsets of D, while each subtask g h within v h is assigned to a particular device d i within the selected subset.Our methodology deploys subtasks and specific devices as the fundamental units for scheduling, employing both GNN and EPSO algorithms to devise an optimized scheduling policy.The overall framework and algorithm flow are shown in Fig. 3.
In Fig. 3, the symbols C, G, and M represent task queues with CPU, GPU, and mixed resource demands, respectively.Triangles and circles represent CPU and GPU resources, and hexagons represent memory, disk, and network resources.P(•) and MK (•) represent the cal- culation of probability and adaptability.
The information flow for developing a heterogeneous resource scheduling model can be delineated as follows: (1) Resource Partitioning and Task Queue Multiple Input Module.This module first creates multiple resource input queues by categorizing platform The scheduling scheme can be conceptualized as a series of resource allocation predictions.Each step in the scheduling process relies on the current system status, the global attribute features of the task queue Q v , and the compatibility between the performance requirements of subtasks and available resources.Allocation of subtasks g i from Q v to devices d i is conditional upon the global attrib- utes of Q v and the assignments of other subtasks.For any given sequence of subtasks g 1 , g 2 , . . ., g i , the problem can be formulated as follows: The device assignment of the subtask is usually highly influenced by the device assignments of its predecessor subtasks.The dependency between the new assignment d g i and all previous assignments depends on the global properties of Q v , so the joint probability expressed in Eq. 8 cannot simply decompose.Consequently, we adopt an approximated decomposition of Eq. 8 to simplify this problem.If the predecessor subtasks of g i are assigned, then the joint probability can be approximated as: (8) where D (up) (g i ) refers to the assignments of all predeces- sor subtasks of g i .
In this manner, the task-to-device assignment can be completed recursively, resulting in a collection of taskresource scheduling schemes.Furthermore, the current study has embedded heuristic information in the initial optimization of task-device pairings, effectively narrowing down the scheduling space and search domain.
To assess the alignment between subtask requirements and resource performance, we employ the Minkowski distance to measure the similarity of attribute features.Given that the encodings of subtask resource demands and device attributes are represented by A(a 1 , a 2 , . . ., a n ) and B(b 1 , b 2 , . . ., b n ) , respectively, the matching degree is as follows: where p is the power exponent, and the value range is [0, ∞].
(4) Multi-objective optimization scheduling scheme generation module.This module transforms the resource allocation in task queue batch processing into a search problem for optimal solutions.Within the framework of multiple objectives, it employs the EPSO search policy to obtain optimal scheduling solutions.

Time, power and load models and multi-objective optimization
In this section, we develop a comprehensive model that integrates time, power consumption, and load.Additionally, we propose a multi-objective optimization scheduling model.( 9) Fig. 3 The architecture of the heterogeneous resource scheduling model

Time model and optimization objective
Given the computing resource R k s and its processing capability P k s per unit time, along with the size L h i of subtask g h i , the execution time for g h i on R k s can be determined as ST s,k h,i = Accounting for network transmission costs, the transmission time between subtasks g h i and g h j is denoted as TransT (g h i , g h j ) .When g h i and g h j are executed on the same server, transmission costs are negligible.However, if they are located on different servers, these costs are determined by the lower bandwidth of the two servers.A flag Loc(g h i , g h j ) is used to determine whether g h i and its preceding subtask g h j are carried out on the same server.The volume of communication data between g h i and g h j is represented by Comm(g h i , g h j ) , while BW g h j denotes the network bandwidth of the server where g h j is located.Consequently, the transmission time between g h i and g h j can be expressed using Eq.11.
The objective of time optimization is to minimize the average completion time (AvgT) across the task.Assuming that the computing resource allocated to subtask g h i is R k s and the earliest idle time of R k s is indicated as IdeT s,k h,i .The actual completion time of the predecessor subtask g h j of g h i is recorded as FT h j .Consequently, the actual starting time of g h i is as in Eq. 12.
Given that the actual execution time for subtask g h i is ST s,k h,i , the corresponding actual completion time FT h i can be represented as follows: For task v h , the objective is to minimize the maximum completion time across all subtasks.The maximum completion time for v h is denoted as MaxFT h = max i FT h i .Consequently, the maximum completion time for the entire task set can be represented as SysFT = |V | h=1 MaxFT h .For the entire computational platform, the optimization objective is to keep the waiting time for all tasks as short as possible, that is, to minimize the average completion time for all tasks.The average completion time for the task is given by AvgT = 1 |V | SysFT .Therefore, time optimization involves solving the minimization problem as stated in Eq. 14. (11) TransT AcT s,k h,i = max IdeT s,k h,i , FT h j + TransT g h i , g h j . (

Power consumption model and optimization objectives
In the practical computing context, the power consumption of the platform can be bifurcated into two distinct categories: dynamic power usage, which is generated during task execution, and inherent power usage, which occurs when the device is in an idle state.The ratio of dynamic to inherent power consumption is estimated to be approximately 7:3 when the device is under full load.This study computes power consumption by focusing solely on computing devices.For each device R k s on server S s , we denote its power consumption rate as UE k s .Consequently, the aggregate power consumption rate for server S s is denoted as SE s = K s k=1 UE k s .The inherent power consumption of R k s is denoted as P inh s,k , and the dynamic power consumption as P dyn s,k .Given that the execution time of subtask g h i on R k s is ST s,k h,i , the inherent power consumption of the system can be calculated as . The total power consumption can be accumulated according to the actual operation, which can be calculated as . Consequently, the objective of minimizing power consumption can be articulated through the following optimization problem.

Load balancing model and optimization objectives
For load balancing, this study takes into account the working hours and resource utilization of every server within the cluster.
(1) Time-based load balancing model.When performing task processing, each device works for as consistent a time as possible.The load model can be expressed as: (2) Resource utilization-based load model.Based on the resource utilization of each server, the relative ratios of different resources can be calculated.Among them, CPU relative ratio: CPU s = U Cs /AL C , GPU relative ratio: GPU s = U Gs /AL G , memory relative ratio: MEM s = U Ms /AL M , network rela- tive ratio: NET s = U Ns /AL N and disk relative ratio: where α is the regulating factor, and 0 < α < 1.

Multi-objective optimization model
The multi-objective optimization model can be obtained by integrating the objectives of time, power, and load: Within the multi-objective optimization model for task set scheduling on heterogeneous computing platforms, the respective metrics are either dynamically monitorable or computable from monitoring data.

A multi-objective optimization scheduling algorithm based on EPSO
The allocation of resources in task queues for batch processing involves a multi-objective optimization problem, which can be conceptualized as a search for optimal solutions within the task and resource allocation space.In such heterogeneous computing environments, the substantial volume of tasks submitted, paired with the assortment of resource varieties, leads to an extensive spectrum of potential strategies for task-resource allocation.Furthermore, task scheduling is subject to both dynamic changes and uncertainty, with each instance of task selection and resource allocation prompting a shift in the system's state.

Multi-objective scheduling policy based on EPSO
This section aims to explore the problem of task and resource optimization scheduling and propose a search (18) mechanism based on the self-adaptive EPSO.Based on the construction of the task and resource node information model, this study enhances the comprehensiveness and adaptability of information utilization by integrating multi-dimensional node features into the fitness function of the EPSO algorithm.

EPSO algorithm
The EPSO algorithm, with its adaptive weight and elite strategies, has emerged as an efficient technique for addressing multi-objective optimization challenges.During the initial and final phases of iteration, the algorithm adjusts the inertia weight via an adaptive strategy, while an effective elite guidance mechanism improves the diversity of particle exploration within the solution space.This approach promotes rapid convergence and increases the likelihood of identifying the global optimal solution.The flowchart of EPSO algorithm is illustrated in Fig. 4. In Fig. 4, the EPSO algorithm initializes the positions and velocities of the particles.Here, the initial positions of the particles correspond to various mapping schemes between tasks and resources.Subsequently, the quality of the solutions is assessed by calculating the fitness value for each particle.To maintain the convergence and diversity of the particle swarm, the algorithm guides other particles to adjust their velocities and positions through elite particles in each iteration.Elite particles are the B particles with the highest fitness values in the current and previous iterations.Randomly select particles from two generations of elites and compare them.The winner particle replaces the local best solution in the velocity update equation, while the loser particle replaces the global best solution.The calculation process is shown in Eqs.20 and 21.where υ γ (t + 1) represents the velocity vector of the γth particle at step t + 1 , θ 1 (t) is the winner elitist particle for guiding individual optimization learning, θ 2 (t) is the loser elitist particle for guiding global optimization learning, rand is a random number between 0 and 1, and θ γ (t) represents the current position of the particle.Here, we utilize the elite particles from two consecutive iterations to guide the evolution of other particles, thereby enhancing the diversity of particles in the solution space.In the initial stage of the algorithm, since only one generation of particles is contained in the population, two particles are randomly selected from the B elitist particles of this generation as comparison.Based on the fitness values of individual particles, the probability of being selected is given by the roulette wheel strategy: (20) Using a roulette wheel selection strategy can avoid the decrease in algorithm efficiency caused by randomness during the selection of elite particles.By extracting highly efficient elite guiding particles, the optimization (22) . efficiency and convergence stability of the algorithm are ensured.

Optimization based on inertia weights
Inertia weight is a crucial parameter in the particle swarm optimization algorithm that balances global and local search capabilities.A higher inertia weight results in Fig. 4 The flowchart of EPSO greater particle velocities within the search space, facilitating extensive global exploration.Conversely, a lower inertia weight reduces particle velocities, promoting intensive local search activities near known local optimal solutions.The linearly decreasing inertia weight strategy [37] involves setting a higher initial value for the inertia weight at the beginning of the algorithm's execution.Subsequently, this value is decreased linearly throughout iterations.Compared to the PSO with a fixed inertia weight, this approach enhances the fine-tuning capabilities of the particle swarm.The linearly decreasing inertia weight is defined in Eq. 23: where ω max and ω min represent predefined constants, Max is the maximum number of iterations, and iter indicates the current iteration count.Throughout the algorithm's iteration, the inertia weight diminishes progressively as it is multiplied by a diminishing factor.The values of ω max − ω min and ω min can be adjusted to control the weight's range, thereby enhancing the algorithm's local search proficiency in its latter stages.However, this approach does not adapt well to the dynamic changes of the problem at hand.
To more flexibly accommodate the needs of particle search behavior during optimization, a Sigmoid nonlinear inertia weight strategy is proposed in the literature [38].The Sigmoid function exhibits rapid growth in the initial phase, decelerates as it nears a threshold, and eventually plateaus.This property is advantageous for particles to quickly locate the optimal solution in the early stages of optimization and to refine their search in later stages.The formula is as follows: where u is the constant to adjust sharpness of the function and κ is the constant to set partition of sigmoid func- tion.However, the Sigmoid function relies heavily on constant terms.This reliance can cause a problem.When the algorithm runs, if the weight adjustment reduces the search capability at any point, the algorithm might get stuck in a local optimum.
The Sigmoid-like inertia weight strategy [39] is achieved through a piecewise function that embodies the characteristics of a sigmoid function.This strategy enables particles to switch adaptively between linearly and nonlinearly decreasing inertia weight strategies.It facilitates effective global and local search across different stages of the optimization process.The formula is as follows: , where ̺ is used to define the transition region between the linearly decreasing inertia weight and the nonlinearly decreasing inertia weight.
Inspired by this method, we propose a piecewise nonlinear approach based on an exponential change of the inertia weight strategy [40].During the initial phase of particle search, the inertia weight is kept constant at a higher fixed value.This approach enables the PSO to perform a comprehensive search across the solution space, enhancing its global optimization capabilities.In the later stages of the algorithm, we employed a nonlinear dynamic decrement strategy for the inertia weight.This strategy progressively expanded the algorithm's local search capabilities, preventing it from getting trapped in local optima.In the experimental scenarios of this paper, we achieved the best results by setting the fixed weight to 1 and the transition region to 2/3.The piecewise nonlinear decreasing inertia weight calculation formula is as follows: In Eq. 26, µ and π are exponential adjustment factors.This method enhances the flexibility of weight adjustment by employing an exponential adjustment factor to control the rate of weight change.Such a flexible weight adjustment mechanism is better suited for complex optimization problems.Additionally, the adaptive weights help maintain a balance between the particles' exploration and exploitation capabilities, which promotes the convergence of the EPSO algorithm to the global optimal solution.

Search for the optimal scheduling solution
Fitness function is employed to evaluate the quality of solutions corresponding to particles, and to guide particles in selecting the optimal positions.When crafting the fitness function, it is essential to consider a range of factors, including task completion time, power consumption, and load balancing, to optimize time efficiency and resource utilization.This is encapsulated in the multi-objective optimization functions F 1 , F 2 , and F 3 , as detailed in Eq. 19.To synchronize these three distinct optimization objectives into a unified objective, the primary focus shifts to identifying the optimal value of the fitness function.Individuals with higher fitness function values are deemed superior.The objective function is derived by taking the logarithm of each objective, then its reciprocal, and summing the results.The precise fitness function is depicted in Eq. 27. ( The adaptive elite particle swarm algorithm enhances the diversity of the particle population through its sophisticated search mechanism and dynamic weight adjustment strategies, thereby facilitating efficient exploration for optimal solutions within the solution space.The position vector of each particle represents the mapping between resources and tasks for a single scheduling instance.During the iterative process of the algorithm, it progressively searches for the global optimal solution, culminating in the best possible task scheduling outcome.The implementation process of EPSO is presented in Algorithm 1:

Experiments and analysis
In this section, we first designed an experiment to evaluate the effectiveness and feasibility of the proposed algorithm.Subsequently, we compare and analyze the proposed algorithm with existing efficient algorithms on multiple performance metrics.

Experiment environment
For the experimental setup and algorithmic model training, we employed an Inspur server.The hardware specifications of the server are detailed in Table 2.
CloudSim [41] is a scalable simulation framework capable of modeling cloud computing infrastructure and services.The experiment used the CloudSim platform for task scheduling simulation and added new functionality for GPU resource allocation.To mirror the diversity of hardware resource types and task types found in authentic environments, a range of computing devices was established, exhibiting varying levels of performance and power consumption, as detailed in Table 3.Additionally, the experiment instantiated heterogeneous servers, each characterized by unique resource attributes, quantities, and hardware performance metrics, as specified in Table 4.
The heterogeneous computing cluster consists of 60 servers.The information obtained through simulation is as follows.
(1) Task completion time.The Monitor tool tracks the execution status of individual tasks, including whether they are hung, in progress, or completed.The duration from the start of the first task in the cluster to the end of the final task is the total task completion time.
(2) Power consumption (Only consider computing devices).Dynamic power consumption is obtained based on predefined power consumption rates and task execution times for different CPU and GPU types.
(3) Load.The Monitor tool captures the service duration provided by each server for task execution and the server's resource utilization.Analyze and compare these metrics across servers to determine the degree of load balancing.

Dataset
The experiment utilized two datasets (i.e., the Alibabacluster-trace-v2018 and the Random tasks) to train and evaluate the proposed algorithm.

Alibaba-cluster-trace-v2018
The Alibaba-cluster-trace-v2018 [42] dataset provides an extensive record of the operational logs for a cluster of approximately 4000 machines within Alibaba's production environment across 8 days.It encompasses detailed static information and dynamic operational data for around 9000 online services and 4 million batch-processing jobs.The dataset is structured around six monitoring log files that detail the activities of servers, online services, and batch-processing tasks.For this study, we primarily utilize a portion of the data from the instance runtime monitoring file (batch_instance).
In a heterogeneous cluster environment, multiple tasks submitted by different users are present, each with unique resource requirements.For instance, compute-intensive tasks demand substantial computational resources, while IO-intensive tasks require more input/ output resources.Computing-intensive tasks include both CPU and GPU types.The following adjustments were made to the raw data in order to model the richness of task types in heterogeneous clouds: (1) The hierarchy of jobs, tasks, and instances was uniformly represented using a task-subtask structure, as depicted in Fig. 5;   (2) The subtask size is the difference between the start time and the end time of the instance in the original data table ; (3) Random output data sizes were allocated to each instance, simulating the data transfer volumes between subtasks; (4) Since the instances in the original dataset utilized only CPUs for computation, additional computation resource types were introduced.Each instance was randomly assigned a computation type (such as CPU or GPU); (5) Tasks were categorized into independent task groups and DAG task groups based on whether they comprised multiple dependent subtasks.
The revised task attributes are presented in Table 5 which illustrates the dependencies among subtasks through their names.For instance, "M3_2_6" denotes that M2 and M6 are predecessors of M3, while M3 is a successor to both M2 and M6.In other words, M3 is contingent upon the completion of M2 and M6 before it can commence.Independent tasks serve as the fundamental, indivisible units of execution and scheduling.DAG tasks consist of multiple subtasks, some of which have data dependencies and must receive outputs from their preceding tasks.Subtasks without data dependencies can be executed in parallel on different processors.When two subtasks with data dependencies are on different servers, the required data is transferred over the network.The size of subtask output data can quantify the overhead associated with inter-server data transfer.

Random tasks
In the design of random distributed computing tasks, the number of subtasks, n, serves as a parameter to modulate the overall task set size.During task generation, the number of subtasks is restricted to a defined range, typically 1 < n < 30 , to cater to a diversity of computational demands.Furthermore, an equal number of subtasks of CPU and GPU types were created to ensure task diversity.The execution times and communication costs for subtasks are generated according to the Computation-Communication Cost Ratio (CCR) [26], which defines the average ratio between execution time and communication cost.When generating data on communication costs, the average execution time is fixed in advance, while the average communication cost is calculated based on the CCR value.Tasks with a higher CCR exhibit lower relative communication costs among subtasks, indicating a computation-intensive nature where execution time is the predominant factor.Conversely, tasks with a lower CCR are characterized by higher communication costs, indicative of a data-intensive nature.By adjusting the

Training
From Alibaba-cluster-trace-v2018, six randomly selected task sets of different sizes with several tasks ranging from 100 to 600.The model is trained on these different datasets to adapt it to different task volumes.The inertia weight was determined using a piecewise nonlinear decreasing strategy, as detailed in Optimization based on inertia weights section.All other training parameters were configured as presented in Table 6.
To depict the training performance of the model across varying data sizes, we processed the fitness values as follows: initially, we recorded the optimal fitness values achieved by the model at a different number of tasks; subsequently, we computed the difference between the fitness value at each iteration and the optimal fitness value; finally, we amplified this difference by a factor of 1.5 to enhance the visual representation.The processed results are illustrated in Fig. 6, which delineates the trend of fitness values over 500 iterations for datasets comprising 200, 400, and 600 tasks, respectively.
Figure 6 illustrates that with the increment of iteration counts, the fitness values gradually converge towards a consistent level.Despite the augmented task numbers increasing the computational complexity and problemsolving difficulty, the algorithm still rapidly converges toward the global optimum without discernible oscillations in the curve.This suggests that the solution is comparatively stable and dependable.Analyzing the training outcomes confirms that the algorithm is capable of identifying superior solutions.
Utilizing an identical processing method, we evaluated the changes in fitness values under different inertia weight strategies within a dataset comprising 400 tasks.The outcomes are presented in Fig. 7.
In Fig. 7, the characteristics of the Sigmoid function lead to oscillations in the results when using a nonlinear inertia weight strategy, and the overall convergence rate is slower.The piecewise nonlinear decreasing inertia weight strategy remains between ω min and ω max through- out the search process, enabling the fastest convergence to the optimal solution.In contrast, the Sigmoid-like strategy rapidly decreases to near ω min after the set num- ber of iterations, resulting in a slower convergence rate.

Baseline algorithms and evaluation metrics
The study chose a set of representative algorithms from the domain of task scheduling as a baseline to assess the efficacy of the proposed method.A concise overview of these algorithms is provided below: • MSJF [43]: This is an enhanced traditional approach that assigns tasks based on task length and equipment configuration.• MGGS [44]: This is an optimized task scheduling algorithm based on the combination of genetic algorithm and greedy strategy.• HDPSO [45]: This method is a swarm intelligence algorithm that exploits multi-stage hybrid discrete particle swarm to optimize the task scheduling.• QL-HEFT [46]: This is another state-of-the-art method that utilizes Q-learning and earliest completion time allocation to employ a heterogeneous scheduling policy.
To assess the performance of the models, we employed four metrics to evaluate their scheduling outcomes: makespan, scheduling length ratio, speedup, and load balancing efficiency.
(1) Makespan: The time span from when a task is submitted to when it is completed.
(2) Schedule Length Ratio(SLR): The makespan is compared to the aggregate duration required for executing each subtask on the critical path on a single computing device, yielding a ratio that quantifies their relationship: where CP MIN denotes the subtask located on the crit- ical path.D stands for all computing devices, while ST g,d specifies the duration necessary for executing subtask g on device d.The denominator calculates the cumulative execution time for each subtask on the critical path when run on optimally configured devices.Consequently, an SLR value closer to 1 indicates a more effective scheduling outcome.
(3) Speedup: The ratio of the minimum time required to execute all subtasks in a task sequentially using the best configured compute node to the makespan of the task.This measure is employed to assess the effectiveness of the scheduling algorithm in leveraging parallel computing to expedite task execution.(4) Load balance: The load balancing degree reflects the working hours and resource usage of each server in the cluster.The closer the load value is to 1, the more balanced the cluster load is.

Performance evaluation
In the initial phase, the Alibaba-cluster-trace-v2018 was employed to assess the efficacy of the scheduling algorithms.To ensure statistically valid outcomes, each (29) speedup = min d∈D g∈ν ST g,d makespan .
algorithm was run independently five times across six datasets of diverse magnitudes.Figure 8 shows the performance of different scheduling algorithms on differentsized datasets.From Fig. 8a, it can be seen that when the number of user tasks is small, the cluster resources are sufficient, and the performance differences among various algorithms are not significant.As the number of tasks increases, the optimal solution search ability of the GraphEPSO algorithm in complex environments begins to show.When the task volume reaches 600, the makespan of the GraphEPSO algorithm is 8.68% less than that of the QL-HEFT algorithm.In Fig. 8b, the average SLR of the GraphEPSO algorithm is less than that of other algorithms, indicating that the algorithm is closer to the ideal solution.In Fig. 8c, the speedup of the GraphEPSO algorithm is higher than that of other algorithms, indicating Fig. 8 Scheduling performance with the different number of tasks that the algorithm can effectively utilize parallel computing resources.Figure 8d describes the load balancing degree of all algorithms.Compared with the other 4 algorithms in the comparative experiments, the GraphEPSO algorithm exhibits a better load-balancing effect.This indicates that the scheduling algorithm can not only improve resource utilization efficiency but also better balance the workload of multiple servers in heterogeneous clusters, avoiding performance degradation caused by overload or unbalanced workload of a single server.
The GraphEPSO algorithm aims to minimize system power consumption while shortening task execution time.To quantitatively evaluate the performance of the model in power consumption optimization, we compared the power consumption generated by GraphEPSO and other scheduling algorithms under different task numbers.The experimental results are shown in Fig. 9.
As shown in Fig. 9, the GraphEPSO algorithm achieves the lowest total power consumption of the cluster.The reason for this is the efficiency and rationality of GraphEPSO in resource allocation, which reduces the idle time of servers and thereby lowers the inherent power consumption of the servers.
Subsequently, we utilized randomly generated distributed computing tasks to evaluate the performance of the trained model.In this evaluation, we fixed the number of tasks at 200, with each task including a variable number of subtasks between 1 and 30.Recognizing the significant differences in communication costs and computational costs associated with different task types, we established six distinct CCR levels: 0.1, 0.5, 1, 2, 5, and 10.This approach enabled the generation of diverse task queues, encompassing both compute-intensive and data-intensive workloads.The results of the experiment are shown in Fig. 10.
Figure 10a displays the average makespan of all algorithms under different CCR values.As CCR increases, the communication cost between nodes increases relatively, while the computational cost is relatively small.The average makespan of each algorithm shows a decreasing trend.Among all algorithms, the GraphEPSO algorithm has the shortest average makespan.This is because as CCR increases, GraphEPSO tends to reduce communication costs and assigns parent and child subtasks to the same server.When the communication cost is much larger than the computational cost, the GraphEPSO algorithm effectively shortens the makespan.Figure 10b shows that GraphEPSO has the smallest average SLR, indicating that the algorithm can process different dense tasks more quickly and reduce task waiting time.Figure 10c displays the impact of CCR on average speedup.As CCR increases, Fig. 9 Power consumption with the different number of tasks the amount of data transmitted between subtasks decreases, and the parallel execution efficiency of each algorithm on multiple processors increases, resulting in corresponding increases in average speedup.The GraphEPSO algorithm exhibits the best average speedup, demonstrating that this scheduling algorithm can make full use of multi-processor parallel execution for distributed computing tasks.Figure 10d shows that the GraphEPSO algorithm has the highest load balancing degree, indicating that it can adapt to system load changes more quickly and achieve relatively reasonable resource utilization.
The experiment evaluated the power consumption of servers produced by different scheduling algorithms when processing distributed computing task queues at different CCR levels.The experimental results, as shown in Fig. 11, indicate that the GraphEPSO algorithm performs better than other algorithms in terms of power consumption.This result suggests that the GraphEPSO algorithm has better information processing capabilities for different dense tasks, enabling more efficient utilization of resources and reducing the idle time overhead of the cluster.
In summary, the GraphEPSO algorithm proposed in this paper optimizes the representation of tasks and computational resources through scalable state information encoding, effectively extracting global information about tasks and resources.Through the guiding role of elite particles and adaptive weights in learning, it can better balance local and global search capabilities in complex solution spaces.GraphEPSO enables tasks to be matched to appropriate computational resources in better order, fully utilizing the computing power of heterogeneous computational resources.As a result, it effectively reduces the maximum completion time and power consumption of tasks, while maximizing load balancing among multiple servers in the cluster.

Conclusion
This paper studies the multi-objective optimization (i.e., time efficiency, platform power consumption, and load balancing) of task queue batching in heterogeneous computing clusters.We propose a collaborative scheduling model and algorithm integrating GNNs with EPSO.By transforming task and resource information into a graph structure and utilizing GNNs for node feature representation and information propagation, our approach effectively captures and leverages the characteristic information of tasks and resources.Furthermore, we introduce EPSO as a task-scheduling decision-making mechanism, efficiently searching for optimal solutions within the feasible solution space of resource allocation.In the CloudSim simulation platform, we conducted scheduling experiments on task sets of varying sizes and CCRs.The optimal solutions found by our algorithm showed an improvement of 8.65% and 7.57% in time efficiency, a reduction of 7.71% and 6.37% in power consumption indicators, and an increase of 3.59% and 5.15% in load balancing indicators compared to the best values of the comparative algorithms, achieving favorable experimental results.Nevertheless, the algorithmic flow of our proposed method is relatively complex, and there is room for improvement in the factors considered during the modeling phase.Future research will focus on further improvements and examine the impact of system reliability and security on the scheduling of heterogeneous computing clusters.

( 7 ) 3 )
P(i) =exp log score h i − log − log ρh i /τ � j∈V exp log(score ν(j) j ) − log(− log ρh i ) /τ , resources based on type and performance.Then, it generates multiple task input queues based on the task's demand for resource type and performance.(2) Task Information and Resource Feature Encoding Module.This module uses graph embedding techniques to generate encodings of task attribute information and device state information.(Subtask and Resource Allocation Module.This module generates subtask-device allocations by applying a performance adaptation principle and conditional probability calculations, thereby forming a collection of subtask-resource scheduling schemes.

,Fig. 7
Fig. 7 Comparison of different inertia weight strategies

Fig. 10
Fig. 10 Scheduling performance with different CCR

Fig. 11
Fig. 11 Power consumption with different CCR ) min SysW dyn + SysW inh .Here, AL C indicates the average CPU load, while U Cs represents the mean CPU usage rate of S s .A calculation result nearing 1 suggests that server resource utilization is approximately at the average level.Deviations from this indicate overloading or underloading.

Table 2
Hardware configuration information

Table 3
Computing device performance parameters

Table 4
Heterogeneous server configuration

Table 5
Task attribute

Table 6
Training parameter settings