Dynamic and Adaptive Fault Tolerant Scheduling with QoS Consideration in Computational Grid

Haider, Sajjad

DSpace Home
→
Engineering and Technology
→
Thesis
→
View Item

Dynamic and Adaptive Fault Tolerant Scheduling with QoS Consideration in Computational Grid

Haider, Sajjad

URI: http://142.54.178.187:9060/xmlui/handle/123456789/5205

Date: 2019

Abstract:

Implementing fault tolerant scheduling in computational grid is a challenging task. Proactive and reactive fault tolerant scheduling techniques are commonly used in grids. Proactive approaches focus on the issues due to which faults are generated. Reactive approaches are activated after identiﬁcation of failures. Diﬀerent from exist ing fault tolerant techniques, we present a novel, hybrid, dynamic, and adaptive fault tolerant technique that eﬀectively uses proactive and reactive approaches. Proactive fault tolerant orchestrator uses proactive approach, where resources are ﬁltered on the basis of vicinity, availability and reliability. Existing fault tolerance techniques do not distinguish resources during selection, but the proposed algorithm prefers to employ local resources that results in low communication costs and less tendency towards failures. In order to ﬁnd high availability of resources, a newly identiﬁed parameter that uses availability time is incorporated in the model for ﬁnding highly available resources using mean time between availability and mean time between unavailability. Reliability of nodes is an indispensable consideration and proposed system computes the reliability of nodes using factors like success or failure ratio of jobs and types of encountered failures. Proposed model also employs an optimal resource identiﬁcation algorithm that helps in selection of optimal resources during execution of the jobs. List of reliable and optimal grid nodes identiﬁed using proactive fault tolerant orchestrator is passed to reactive fault tolerant orchestrator. Failure detector and predictor are the two components that work under reactive fault tolerant orchestrator and caters for network, prediction and temperature based hardware failures. For detection of errors in an eﬃcient and timely manner push and pull models are also applied. Hardware failures are predicted on the basis of device temperature and are carefully used for con trolling the checkpoint intensity. Reduction in number of checkpoints based on device temperature provide several performance beneﬁts in terms of communication cost and reduced execution times. Performance of proposed model is validated using GridSim toolkit. Compared to contemporary techniques, experimental results exhibit eﬃciency and eﬀectiveness of the proposed model with respect to several performance metrics like execution time, throughput, waiting and turnaround time, number of checkpoints and energy consumption.

Show full item record