In recent years, many organizations face challenges when managing large amount of data and data-intensive computing tasks. Cloud computing technology has been widely-used to alleviate these challenges with its on-demand services and distributed architecture. Data replication is one of the most significant strategies to decrease the access latency and improve data availability, data reliability and resource utilization by creating multiple data copies to geographicallydistributed data centers. When a fault occurs at a data center, existing jobs that require data access in this data center can be redirected to other data centers, where data replicas are available. This paper proposes a utility-based fault handling (UBFH) approach to rescue the jobs at the faulty data center. Then a fault handling algorithm is developed to determine the direction of job redirection by considering the network performance and job attributes. Our main objective is to achieve better repairability, job rescue utility and job operation profit. The simulation results show that our UBFH approach outperforms HDFS, RR and JSQ approaches in all these aspects.
Xie, F., Yan, J. & Shen, J. (2020). A Utility-Based Fault Handling Approach for Efficient Job Rescue in Clouds. Lecture Notes in Computer Science, 12403 49-63. Virtual International Conference on Cloud Computing