This paper addresses the issue of optimal design (in terms of the number of processors) of a distributed system and is based on a recursive algorithm for fault tolerance (RAFT). The reliability and performance of the system using RAFT are determined as a function of reliability of individual processors and the number of fault modes in a processor. Also discussed are how to determine the design policies when the objective is to minimize the average system cost given the cost of each processor and the cost of the system failure. Several numerical examples illustrate the results. Reader Aids - Purpose: Widen state of the art Special math needed for explanations: Probability theory Special math needed to use results: None Results useful to: Reliability analysts.
All Science Journal Classification (ASJC) codes
- Safety, Risk, Reliability and Quality
- Electrical and Electronic Engineering
- Distributed system
- Fault tolerance
- System reliability