wthislsd

wthislsd

A Planning-Based Approach to Failure Recovery
in Distributed Systems
by
Naveed Arshad
B.S., Ghulam Ishaq Khan Institute of Engineering Sciences
and Technology, Pakistan, 1999
M.S., University of Colorado at Boulder, USA, 2003
A thesis submitted to the
Faculty of the Graduate School of the
University of Colorado in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
Department of Computer Science
2006
This thesis entitled:
A Planning-Based Approach to Failure Recovery in Distributed Systems
written by Naveed Arshad
has been approved for the Department of Computer Science
Professor Alexander L. Wolf
Professor Dennis M. Heimbigner
Date
The final copy of this thesis has been examined by the signatories, and we find that
both the content and the form meet acceptable presentation standards of scholarly
work in the above mentioned discipline.
iii
Arshad, Naveed (Ph.D., Computer Science)
A Planning-Based Approach to Failure Recovery in Distributed Systems
Thesis directed by Professor Alexander L. Wolf
Automated failure recovery in distributed systems poses a tough challenge because
of myriad requirements and dependencies among its components. Moreover, failure
scenarios are usually unpredictable so they cannot easily be foreseen. Therefore,
it is not practical to enumerate all possible failure scenarios and a way to recover a
distributed system for each of them. Due to this reason, present failure recovery techniques
are highly manual and have considerable downtime associated with them. In this
dissertation, we have developed a planning-based approach to automated failure recovery
in distributed component-based systems. This approach automates failure recovery
through continuous monitoring of the system. Therefore, an exact system state is always
available with a failure monitor. When a failure is detected the monitor performs
various checks to ensure that it is not a false positive or false negative. A...