Introduction to scatter plots


Scatter plots use dots or symbols to represent distinct data points which typically correspond to a single measurement or a statistical summary of multiple measurements. A scatter plot is a two-way graph where each dot or symbol represents two values-a value on the vertical axis and a value of the horizontal axis (Cartesian graph), for example frequency of an event versus age. Scatter plots allow users to perceive the trends of a measure against the other, and can also reflect the variability in the data. Scatter plots are fairly intuitive and do not really need any specialised knowledge in order to understand them. Therefore they can be useful in communicating a benefit-risk message to a wide variety of audiences including the general public through mass media, patients, physicians, regulators and other experts.

The main strength of scatter plot in conveying the relationship between two variables (see Cleveland 1984), could also be its main weakness. Users could be unintentionally drawn to a relationship in the data that may appear significant but may not be clinically relevant. Outliers on a scatter plot may also affect user¡¯s perception and may not be seen as isolated incidents. Scatter plots of data on nominal scale may be misunderstood to have the same interpretation as those with continuous scale, which could lead to misinterpretation of the measures.

Many software packages can produce scatter plots. This includes the range of statistical packages such as Stata, R, SAS and SPSS, and other spreadsheet-based software packages such as Microsoft Excel, Tableau, Spotfire, QlikView, IBM Many Eyes and Google Drive. Dynamic and interactive versions of scatter plots offer extra functionality such as animations, better story-telling capability and easier decoding of individual data values through dynamic annotations. Some examples on the use of dynamic and interactive scatter plots can be found on http://www.gapminder.org.