Understanding application performance on shared virtual memory systems

Research output: Contribution to journalConference article

30 Citations (Scopus)

Abstract

Many researchers have proposed interesting protocols for shared virtual memory (SVM) systems, and demonstrated performance improvements on parallel programs. However, there is still no clear understanding of the performance potential of SVM systems for different classes of applications. This paper begins to fill this gap, by studying the performance of a range of applications in detail and understanding it in light of application characteristics. We first develop a brief classification of the inherent data sharing patterns in the applications, and how they interact with system granularities to yield the communication patterns relevant to SVM systems. We then use detailed simulation to compare the performance of two SVM approaches - Lazy Released Consistency (LRC) and Automatic Update Release Consistency (AURC) - with each other and with an all-hardware CC-NUMA approach. We examine how performance is affected by problem size, machine size, key system parameters, and the use of less optimized program implementations. We find that SVM can indeed perform quite well for systems of at least up to 32 processors for several nontrivial applications. However, performance is much more variable across applications than on CC-NUMA systems, and the problem sizes needed to obtain good parallel performance are substantially larger. The hardware-assisted AURC system tends to perform significantly better than the all-software LRC under our system assumptions, particularly when realistic cache hierarchies are used.

Original languageEnglish (US)
Pages (from-to)122-133
Number of pages12
JournalConference Proceedings - Annual International Symposium on Computer Architecture, ISCA
StatePublished - Jan 1 1996
EventProceedings of the 1996 23rd Annual International Symposium on Computer Architecture - Philadelphia, PA, USA
Duration: May 22 1996May 24 1996

Fingerprint

Computer systems
Data storage equipment
Hardware
Network protocols
Communication

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture

Cite this

@article{3d39ca9f78b3469d9995581f1d28c56f,
title = "Understanding application performance on shared virtual memory systems",
abstract = "Many researchers have proposed interesting protocols for shared virtual memory (SVM) systems, and demonstrated performance improvements on parallel programs. However, there is still no clear understanding of the performance potential of SVM systems for different classes of applications. This paper begins to fill this gap, by studying the performance of a range of applications in detail and understanding it in light of application characteristics. We first develop a brief classification of the inherent data sharing patterns in the applications, and how they interact with system granularities to yield the communication patterns relevant to SVM systems. We then use detailed simulation to compare the performance of two SVM approaches - Lazy Released Consistency (LRC) and Automatic Update Release Consistency (AURC) - with each other and with an all-hardware CC-NUMA approach. We examine how performance is affected by problem size, machine size, key system parameters, and the use of less optimized program implementations. We find that SVM can indeed perform quite well for systems of at least up to 32 processors for several nontrivial applications. However, performance is much more variable across applications than on CC-NUMA systems, and the problem sizes needed to obtain good parallel performance are substantially larger. The hardware-assisted AURC system tends to perform significantly better than the all-software LRC under our system assumptions, particularly when realistic cache hierarchies are used.",
author = "Liviu Iftode and Singh, {Jaswinder Pal} and Kai Li",
year = "1996",
month = "1",
day = "1",
language = "English (US)",
pages = "122--133",
journal = "Proceedings - International Symposium on Computer Architecture",
issn = "1063-6897",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Understanding application performance on shared virtual memory systems

AU - Iftode, Liviu

AU - Singh, Jaswinder Pal

AU - Li, Kai

PY - 1996/1/1

Y1 - 1996/1/1

N2 - Many researchers have proposed interesting protocols for shared virtual memory (SVM) systems, and demonstrated performance improvements on parallel programs. However, there is still no clear understanding of the performance potential of SVM systems for different classes of applications. This paper begins to fill this gap, by studying the performance of a range of applications in detail and understanding it in light of application characteristics. We first develop a brief classification of the inherent data sharing patterns in the applications, and how they interact with system granularities to yield the communication patterns relevant to SVM systems. We then use detailed simulation to compare the performance of two SVM approaches - Lazy Released Consistency (LRC) and Automatic Update Release Consistency (AURC) - with each other and with an all-hardware CC-NUMA approach. We examine how performance is affected by problem size, machine size, key system parameters, and the use of less optimized program implementations. We find that SVM can indeed perform quite well for systems of at least up to 32 processors for several nontrivial applications. However, performance is much more variable across applications than on CC-NUMA systems, and the problem sizes needed to obtain good parallel performance are substantially larger. The hardware-assisted AURC system tends to perform significantly better than the all-software LRC under our system assumptions, particularly when realistic cache hierarchies are used.

AB - Many researchers have proposed interesting protocols for shared virtual memory (SVM) systems, and demonstrated performance improvements on parallel programs. However, there is still no clear understanding of the performance potential of SVM systems for different classes of applications. This paper begins to fill this gap, by studying the performance of a range of applications in detail and understanding it in light of application characteristics. We first develop a brief classification of the inherent data sharing patterns in the applications, and how they interact with system granularities to yield the communication patterns relevant to SVM systems. We then use detailed simulation to compare the performance of two SVM approaches - Lazy Released Consistency (LRC) and Automatic Update Release Consistency (AURC) - with each other and with an all-hardware CC-NUMA approach. We examine how performance is affected by problem size, machine size, key system parameters, and the use of less optimized program implementations. We find that SVM can indeed perform quite well for systems of at least up to 32 processors for several nontrivial applications. However, performance is much more variable across applications than on CC-NUMA systems, and the problem sizes needed to obtain good parallel performance are substantially larger. The hardware-assisted AURC system tends to perform significantly better than the all-software LRC under our system assumptions, particularly when realistic cache hierarchies are used.

UR - http://www.scopus.com/inward/record.url?scp=0029666648&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0029666648&partnerID=8YFLogxK

M3 - Conference article

SP - 122

EP - 133

JO - Proceedings - International Symposium on Computer Architecture

JF - Proceedings - International Symposium on Computer Architecture

SN - 1063-6897

ER -