An Empirical Study on the Reliability of Perceiving Correlation Indices using Scatterplots

Varshita Sher, Karen Bemis, Ilaria Liccardi, Min Chen

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Scatterplots have been in use for about two centuries, primarily for observing the relationship between two variables and commonly for supporting correlation analysis. In this paper, we report an empirical study that examines how humans’ perception of correlation using scatterplots relates to the Pearson's product-moment correlation coefficient (PPMCC) – a commonly used statistical measure of correlation. In particular, we study human participants’ estimation of correlation under different conditions, e.g., different PPMCC values, different densities of data points, different levels of symmetry of data enclosures, and different patterns of data distribution. As the participants were instructed to estimate the PPMCC of each stimulus scatterplot, the difference between the estimated and actual PPMCC is referred to as an offset. The results of the study show that varying PPMCC values, symmetry of data enclosure, or data distribution does have an impact on the average offsets, while only large variations in density cause an impact that is statistically significant. This study indicates that humans’ perception of correlation using scatterplots does not correlate with computed PPMCC in a consistent manner. The magnitude of offsets may be affected not only by the difference between individuals, but also by geometric features of data enclosures. It suggests that visualizing scatterplots does not provide adequate support to the task of retrieving their corresponding PPMCC indicators, while the underlying model of humans’ perception of correlation using scatterplots ought to feature other variables in addition to PPMCC. The paper also includes a theoretical discussion on the cost-benefit of using scatterplots.

Original languageEnglish (US)
Pages (from-to)61-72
Number of pages12
JournalComputer Graphics Forum
Volume36
Issue number3
DOIs
StatePublished - Jun 1 2017

Fingerprint

Enclosures
Costs

All Science Journal Classification (ASJC) codes

  • Computer Graphics and Computer-Aided Design

Cite this

Sher, Varshita ; Bemis, Karen ; Liccardi, Ilaria ; Chen, Min. / An Empirical Study on the Reliability of Perceiving Correlation Indices using Scatterplots. In: Computer Graphics Forum. 2017 ; Vol. 36, No. 3. pp. 61-72.
@article{675932b4e57b4e8ba1118e166d1b0ea2,
title = "An Empirical Study on the Reliability of Perceiving Correlation Indices using Scatterplots",
abstract = "Scatterplots have been in use for about two centuries, primarily for observing the relationship between two variables and commonly for supporting correlation analysis. In this paper, we report an empirical study that examines how humans’ perception of correlation using scatterplots relates to the Pearson's product-moment correlation coefficient (PPMCC) – a commonly used statistical measure of correlation. In particular, we study human participants’ estimation of correlation under different conditions, e.g., different PPMCC values, different densities of data points, different levels of symmetry of data enclosures, and different patterns of data distribution. As the participants were instructed to estimate the PPMCC of each stimulus scatterplot, the difference between the estimated and actual PPMCC is referred to as an offset. The results of the study show that varying PPMCC values, symmetry of data enclosure, or data distribution does have an impact on the average offsets, while only large variations in density cause an impact that is statistically significant. This study indicates that humans’ perception of correlation using scatterplots does not correlate with computed PPMCC in a consistent manner. The magnitude of offsets may be affected not only by the difference between individuals, but also by geometric features of data enclosures. It suggests that visualizing scatterplots does not provide adequate support to the task of retrieving their corresponding PPMCC indicators, while the underlying model of humans’ perception of correlation using scatterplots ought to feature other variables in addition to PPMCC. The paper also includes a theoretical discussion on the cost-benefit of using scatterplots.",
author = "Varshita Sher and Karen Bemis and Ilaria Liccardi and Min Chen",
year = "2017",
month = "6",
day = "1",
doi = "https://doi.org/10.1111/cgf.13168",
language = "English (US)",
volume = "36",
pages = "61--72",
journal = "Computer Graphics Forum",
issn = "0167-7055",
publisher = "Wiley-Blackwell",
number = "3",

}

An Empirical Study on the Reliability of Perceiving Correlation Indices using Scatterplots. / Sher, Varshita; Bemis, Karen; Liccardi, Ilaria; Chen, Min.

In: Computer Graphics Forum, Vol. 36, No. 3, 01.06.2017, p. 61-72.

Research output: Contribution to journalArticle

TY - JOUR

T1 - An Empirical Study on the Reliability of Perceiving Correlation Indices using Scatterplots

AU - Sher, Varshita

AU - Bemis, Karen

AU - Liccardi, Ilaria

AU - Chen, Min

PY - 2017/6/1

Y1 - 2017/6/1

N2 - Scatterplots have been in use for about two centuries, primarily for observing the relationship between two variables and commonly for supporting correlation analysis. In this paper, we report an empirical study that examines how humans’ perception of correlation using scatterplots relates to the Pearson's product-moment correlation coefficient (PPMCC) – a commonly used statistical measure of correlation. In particular, we study human participants’ estimation of correlation under different conditions, e.g., different PPMCC values, different densities of data points, different levels of symmetry of data enclosures, and different patterns of data distribution. As the participants were instructed to estimate the PPMCC of each stimulus scatterplot, the difference between the estimated and actual PPMCC is referred to as an offset. The results of the study show that varying PPMCC values, symmetry of data enclosure, or data distribution does have an impact on the average offsets, while only large variations in density cause an impact that is statistically significant. This study indicates that humans’ perception of correlation using scatterplots does not correlate with computed PPMCC in a consistent manner. The magnitude of offsets may be affected not only by the difference between individuals, but also by geometric features of data enclosures. It suggests that visualizing scatterplots does not provide adequate support to the task of retrieving their corresponding PPMCC indicators, while the underlying model of humans’ perception of correlation using scatterplots ought to feature other variables in addition to PPMCC. The paper also includes a theoretical discussion on the cost-benefit of using scatterplots.

AB - Scatterplots have been in use for about two centuries, primarily for observing the relationship between two variables and commonly for supporting correlation analysis. In this paper, we report an empirical study that examines how humans’ perception of correlation using scatterplots relates to the Pearson's product-moment correlation coefficient (PPMCC) – a commonly used statistical measure of correlation. In particular, we study human participants’ estimation of correlation under different conditions, e.g., different PPMCC values, different densities of data points, different levels of symmetry of data enclosures, and different patterns of data distribution. As the participants were instructed to estimate the PPMCC of each stimulus scatterplot, the difference between the estimated and actual PPMCC is referred to as an offset. The results of the study show that varying PPMCC values, symmetry of data enclosure, or data distribution does have an impact on the average offsets, while only large variations in density cause an impact that is statistically significant. This study indicates that humans’ perception of correlation using scatterplots does not correlate with computed PPMCC in a consistent manner. The magnitude of offsets may be affected not only by the difference between individuals, but also by geometric features of data enclosures. It suggests that visualizing scatterplots does not provide adequate support to the task of retrieving their corresponding PPMCC indicators, while the underlying model of humans’ perception of correlation using scatterplots ought to feature other variables in addition to PPMCC. The paper also includes a theoretical discussion on the cost-benefit of using scatterplots.

UR - http://www.scopus.com/inward/record.url?scp=85022202829&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85022202829&partnerID=8YFLogxK

U2 - https://doi.org/10.1111/cgf.13168

DO - https://doi.org/10.1111/cgf.13168

M3 - Article

VL - 36

SP - 61

EP - 72

JO - Computer Graphics Forum

JF - Computer Graphics Forum

SN - 0167-7055

IS - 3

ER -