Quantifying and improving the availability of high-performance cluster-based Internet services

Kiran Nagaraja, Neeraj Krishnan, Ricardo Bianchini, Richard P. Martin, Thu D. Nguyen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

Cluster-based servers can substantially increase performance when nodes cooperate to globally manage resources. However, in this paper we show that cooperation results in a substantial availability loss, in the absence of high-availability mechanisms. Specifically, we show that a sophisticated cluster-based Web server, which gains a factor of 3 in performance through cooperation, increases service unavailability by a factor of 10 over a non-cooperative version. We then show how to augment this Web server with software components embodying a small set of high-availability techniques to regain the lost availability. Among other interesting observations, we show that the application of multiple high-availability techniques, each implemented independently in its own subsystem, can lead to inconsistent recovery actions. We also show that a novel technique called Fault Model Enforcement can be used to resolve such inconsistencies. Augmenting the server with these techniques led to a final expected availability of close to 99.99%.

Original languageAmerican English
Title of host publicationProceedings of the 2003 ACM/IEEE Conference on Supercomputing, SC 2003
DOIs
StatePublished - 2003
Event2003 ACM/IEEE Conference on Supercomputing, SC 2003 - Phoenix, AZ, United States
Duration: Nov 15 2003Nov 21 2003

Publication series

NameProceedings of the 2003 ACM/IEEE Conference on Supercomputing, SC 2003

Other

Other2003 ACM/IEEE Conference on Supercomputing, SC 2003
Country/TerritoryUnited States
CityPhoenix, AZ
Period11/15/0311/21/03

ASJC Scopus subject areas

  • Software

Fingerprint

Dive into the research topics of 'Quantifying and improving the availability of high-performance cluster-based Internet services'. Together they form a unique fingerprint.

Cite this