A survey of Kubernetes scheduling algorithms

Senjab, Khaldoun; Abbas, Sohail; Ahmed, Naveed; Khan, Atta ur Rehman

doi:10.1186/s13677-023-00471-1

Journal of Cloud Computing

Advances, Systems and Applications

Table 1 Literature Summary (Scheduling Kubernetes)

From: A survey of Kubernetes scheduling algorithms

#	Objectives	Methodology/Algorithms	Experiments	Findings	Applications	Limitations
[15]	To develop a scheduling strategy for container-based apps in Smart City deployments that is network-aware.	As an addition to the default scheduling mechanism built into Kubernetes, a network-aware scheduling method is suggested and put into practice.	Evaluated utilizing container-based Smart City applications and validated on the Kubernetes platform.	Comparing the suggested method to the default scheduling mechanism, network latency is reduced by 80%.	Can be used in Fog Computing environments for delay-sensitive and data-intensive services.	Further testing and implementation may reveal limitations and future improvements.
[16]	Public cloud container scheduling with consideration for cost.	A cluster scheduler with a focus on organizing batch job execution on virtual clusters, which is termed as Stratus.	On the basis of cluster workload traces from Google and TwoSigma, simulation experiments were conducted.	Stratus reduces virtual cluster scheduling costs by 17–44% compared to state-of-the-art approaches.	Batch job execution on virtual clusters in the public cloud	Limited to the context of batch job execution on virtual clusters in the public cloud
[17]	To improve functionality and ensure user equality in a shared cluster with swappable hardware resources for deep learning frameworks.	min-cost bipartite matching	Large-scale simulations and evaluations on a small-scale CPU-GPU hybrid cluster.	AlloX can drastically shorten the average work completion time, eliminate starvation, and ensure fairness.	Scheduling jobs over interchangeable resources in a shared cluster	Interchangeable resources exceeding the threshold of two may cause many problems.
[18]	Containers’ initial placement is optimized via task packing, enabling cluster size adjustment to meet changing workloads through autoscaling algorithms, and developing a rescheduling mechanism to shut down underutilized VM instances for cost savings while preserving task progress.	Heterogeneous job configurations Autoscaling algorithms Rescheduling mechanism	Validated using the Australian National Cloud Infrastructure (Nectar).	When compared to the standard Kubernetes framework, the suggested solution could lower overall costs for various cloud workload patterns by 23% to 32%.	Container orchestration with low costs on cloud computing infrastructures powered by Kubernetes.	VM types may also be taken into consideration.
[19]	To develop a GPU-aware resource orchestration layer for datacenters To improve resource utilization and reduce operational costs in datacenters. To improve Quality of Service (QoS) for user-facing queries	Presented Kube-Knots, a Kubernetes-integrated resource orchestration layer that is GPU-aware. Kube-Knots uses dynamic container orchestration to dynamically harvest available computation cycles. Two GPU-based scheduling methods (CBP and PP) are created to schedule workloads at datacenter scale using Kube-Knots.	Evaluated CBP and PP on a ten node GPU cluster Compared results with state-of-the-art schedulers	For HPC applications, CBP and PP increase GPU usage across the cluster by up to 80% on average. Deep learning workloads' average job completion increased by up to 36% and 33% cluster-wide energy reduction For latency-critical queries, PP ensures end-to-end QoS by lowering QoS breaches by up to 53%.	To improve resource utilization and reduce operational costs in GPU-based datacenters	–
[20]	To improve efficiency of data centers through holistic scheduling in Kubernetes To consider virtual and physical infrastructures and business processes in scheduling	Replaced the Kubernetes default scheduler with a proposed all-encompassing scheduling framework. Both software and hardware model considerations are made by the scheduler. System was deployed in a real data center.	Deployment in real data center	Reductions in power consumption of 10% to 20% were noted The effectiveness of the data center can be significantly increased by an intelligent scheduler.	To improve efficiency of data centers through software-based solutions	Further research is needed in this area
[21]	A new Kubernetes container scheduling strategy (KCSS) has been introduced. Boost the efficiency of many online-submitted containers' scheduling.	Using a variety of factors, choose the best node for each newly submitted container. To combine all criteria into a single rank, use the Technique for the Order of Prioritization by Similarity to Ideal Solution (TOPSIS) method.	Conducted experiments on different scenarios Used data from cloud infrastructure and user need	When compared to other container scheduling algorithms, KCSS enhances performance.	Can be used in industrial and academic fields for container-orchestration systems	Limited to the six key criteria used in the experiments Potential to expand the criteria and improve the performance further in future work.
[22]	Presented a Kubernetes GPU scheduling mechanism based on topology. Increase resource efficiency and load distribution in the GPU cluster.	The foundation of the system is the established Kubernetes GPU scheduling mechanism. In a resource access cost tree, the topology of the GPU cluster is restored. The resource access cost tree is used to schedule and adapt various GPU resource application scenarios.	Tencent has employed GaiaGPU in actual production.	Improved resource utilization by about 10% Improved performance on load balance	Used in production at Tencent	–
[23]	To develop a context-aware Kubernetes scheduler that takes into account physical, operational, and network parameters in order to improve service availability and performance in 5G edge computing	Real-time edge device data integration into the scheduler decision algorithm.	Comparison with the default Kubernetes scheduler	The suggested scheduler offers increased fault tolerance capabilities along with advanced orchestration and management.	5G edge computing	–
[24]	To develop a policy-driven meta-scheduler for Kubernetes clusters that enables efficient and fair resource allocation for multiple users	Dominant Resource Fairness (DRF) policy Additional fairness metrics based on task resource demand and average waiting time	–	The proposed meta-scheduler improves fairness in multi-tenant Kubernetes clusters	Kubernetes clusters	–
[25]	To modify Kubernetes to be better suited for edge infrastructure, with a focus on network latency and self-healing capabilities	Custom Kubernetes scheduler that considers applications' delay constraints and edge reliability	–	The modified Kubernetes is better suited for edge infrastructure	Edge computing	–
[26]	To improve Kubernetes scheduling for performance-sensitive containerized workflows, particularly in the context of 5G edge applications	NetMARKS is a cutting-edge method for scheduling Kubernetes pods that makes advantage of dynamic network metrics gathered with Istio Service Mesh.	Validated using different workloads and processing layouts	NetMARKS can save up to 50% of inter-node bandwidth while reducing application response times by up to 37%.	Kubernetes in 5G edge computing and machine-to-machine communication	–
[27]	Create a feedback control approach for Kubernetes-based systems' elastic container provisioning of Web systems.	Combining a linear model with a varying-processing-rate queuing model can increase the accuracy of output errors.	Evaluated on a real Kubernetes cluster	When compared to cutting-edge algorithms, the suggested approach achieves the lowest percentage of SLA violation and the second-lowest cost.	Elastic container provisioning in Kubernetes-based systems	–
[28]	Create a dynamic Kubernetes scheduler to help a heterogeneous cluster deploy Docker containers more effectively. Utilize past data on container execution to speed up task completion.	Developed the KubCG dynamic scheduling platform. Introduced a new scheduler that takes into account past data on container execution as well as the timetable for Kubernetes Pods.	Conducted different tests to validate the new algorithm	In experiments, KubCG was able to cut task completion times from 100 to 64% of the original time.	Used for the deployment of cloud-based services that require GPUs for tasks like deep learning and video processing.	Further testing and validation are needed to determine the effectiveness of the algorithm in a variety of scenarios.
[29]	Describe a new method for arranging workloads in a Kubernetes cluster.	Framework model for hybrid shared-state scheduling. On the basis of the cluster's overall state, scheduling decisions are determined.	Tested proposed scheduler behavior under different scenarios, including failover/recovery in a deployed Kubernetes cluster	The suggested scheduler operates in circumstances like priority preemption or collocation interference. The features of both centralized and distributed scheduling frameworks are included in the scheduler.	Used in Kubernetes cluster to optimize resource utilization	Further testing and implementation needed to fully evaluate the effectiveness of the proposed scheduler.
[30]	Develop and put into use KubeHICE, a container orchestrator for heterogeneous ISA architectures on cloud edge platforms. Assess the efficiency and performance of KubeHICE in handling heterogeneous-ISA clusters.	By using AIM and PAS, KubeHICE expands open source Kubernetes. AIM automatically locates a node that is appropriate for the ISAs that the containerized application supports. PAS schedules containers based on the computational capacity of cluster nodes.	KubeHICE was tested in several real-world scenarios.	KubeHICE is efficient in performance estimation and resource scheduling while adding no further overhead to container orchestration. When handling heterogeneity, KubeHICE can improve CPU utilization by up to 40%.	KubeHICE is beneficial for containerized applications in heterogeneous cloud-edge platforms	–
[31]	Make the Kubernetes scheduler more efficient by incorporating the disk I/O load.	To improve the disk I/O balance between nodes, a dynamic scheduling approach called Balanced-Disk-IO-Priority (BDI) was proposed. Also presented the Balanced-CPU-Disk-IO-Priority (BCDI) dynamic scheduling algorithm to address the problem of unbalanced CPU and disk I/O load on a single node.	According to experimental findings, the BDI and BCDI algorithms are superior to the default scheduling algorithms in Kubernetes.	The load imbalance of CPU and disk I/O on a single node is resolved by the BDI and BCDI algorithms, which also enhance the disk I/O balance between nodes.	Can be used to improve the performance of Kubernetes in managing containerized applications	Further research may be needed to optimize the BDI and BCDI algorithms and evaluate their performance in different scenarios.
[32]	Investigate how Serverless frameworks built on Kubernetes systems can schedule pods more efficiently in large-scale concurrent applications.	To further maximize the effectiveness of pod scheduling in Serverless cloud paradigms, a scheduling approach leveraging concurrent scheduling of the same pod is proposed.	Preliminary verification is performed to test the effectiveness of the proposed algorithm.	The suggested approach can significantly cut down on pod startup time while maintaining resource balance on each node.	The proposed algorithm is used to improve efficiency of pod scheduling in Serverless cloud paradigms.	The effectiveness is only verified via preliminary experiments. Also, the algorithm is only applicable to Serverless frameworks.
[33]	Present a resource rescheduling and Kubernetes scheduler extension that combines QoE metrics into SLOs.	Use the QoE metric proposed in the ITU P.1203 standard Evaluate architecture using video streaming services co-located with other services	Evaluate architecture using video streaming services co-located with other services	The average QoE is increased by 50%. The average QoE was raised by 135% as a result of resource rescheduling. Over-provisioning was completely removed by the suggested architecture.	Improving QoE for cloud environments	Limited to the specific QoE metric used. Further research may be needed to evaluate the effectiveness of the proposed architecture with other QoE metrics.
[34]	Enable the secure colocation of best-effort processes and latency-sensitive services in Kubernetes clusters to increase resource utilization. Flexibly divide resources among various workload categories. Improve hardware and software isolation capabilities for containers.	Based on Kubernetes extension mechanisms, Zeus was developed. Best-effort jobs are scheduled by Zeus based on actual server use. Through the coordination of hardware and software isolation elements, Zeus improves container isolation.	In a large-scale production setting, Zeus is assessed using latency-sensitive services and best-effort jobs.	Zeus can increase CPU usage from 15 to 60% on average without breaking SLO. Zeus can significantly increase how efficiently Kubernetes clusters use their resources.	Zeus can be used to improve the resource utilization of Kubernetes clusters	–

Back to article page