Optimal Design of Fault-Tolerant Distributed Systems Based on a Recursive Algorithm

Hoang Pham, Shambhu J. Upadhyaya

Research output: Contribution to journalArticlepeer-review

4 Scopus citations


This paper addresses the issue of optimal design (in terms of the number of processors) of a distributed system and is based on a recursive algorithm for fault tolerance (RAFT). The reliability and performance of the system using RAFT are determined as a function of reliability of individual processors and the number of fault modes in a processor. Also discussed are how to determine the design policies when the objective is to minimize the average system cost given the cost of each processor and the cost of the system failure. Several numerical examples illustrate the results. Reader Aids - Purpose: Widen state of the art Special math needed for explanations: Probability theory Special math needed to use results: None Results useful to: Reliability analysts.

Original languageEnglish (US)
Pages (from-to)375-379
Number of pages5
JournalIEEE Transactions on Reliability
Issue number3
StatePublished - Aug 1991
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Safety, Risk, Reliability and Quality
  • Electrical and Electronic Engineering


  • Distributed system
  • Fault tolerance
  • Optimization
  • System reliability


Dive into the research topics of 'Optimal Design of Fault-Tolerant Distributed Systems Based on a Recursive Algorithm'. Together they form a unique fingerprint.

Cite this