Visualizing Data

SDS 192: Introduction to Data Science

Professor Lindsay Poirier

For Today

  • Perusall quiz debriefing
  • What is a data visualization?
  • Taxonomy of Data Visualizations
  • Visualization Conventions and Critiques
  • Work on Problem Solving Lab in Class

What is data visualization?

  • the translation of information into a graphical format
  • helps analysts summarize and identify patterns across large datasets
  • always involves critical judgment calls on the part of the designer

Elements of data graphics

  • visual cues/aesthetics
    • color
  • scale
  • context

Framework drawn from: Yau, Nathan. 2013. Data Points: Visualization That Means Something. 1st edition. Indianapolis, IN: Wiley.

Visual Cues

  • Where is the data positioned on the plot?
  • What is the length of shapes on the plot?
  • How large is the angle between vectors?
  • What shapes/symbols appear on the plot?
  • How much area do shapes take up on a plot?
  • How intense is the color presented on the plot?

What variables mapped onto what visual cues?

What variables mapped onto what visual cues?

** This is the last time you will see me use a pie chart in this class!

What variables mapped onto what visual cues?

What variables mapped onto what visual cues?

What variables mapped onto what visual cues?

What variables mapped onto what visual cues?

Color

  • Qualitative: Distinct colors used to bucket categorical data
  • Sequential: Gradient of color used to represent a uni-directional range of quantitative values
  • Divergent: Gradations of color from a neutral center used to represent a bi-directional range of quantitative values

Accessible Color Palettes

display.brewer.all(colorblindFriendly = TRUE)

Scale

  • Linear: Numeric values are evenly spaced on axis.
  • Logarithmic: Numeric interval are spaced by a factor of the base of the logarithm.
  • Categorical: Categorical values are discretely placed on axis.
  • Ordinal: Categorical values are ordered on axis.
  • Percent: Percentages of a whole are evenly spaced on axis.
  • Time: Date/time values are placed on axis in years, months, days, hours, etc.

Examples

Context

In every plot you submit for this class, I will be looking for five pieces of context.

  • The data’s unit of observation
  • Variables represented on the plot
  • Filters applied to the data
  • Geographic context of the data
  • Temporal (date/time range) context of the date

Context

Data Visualization Conventions

  • Edward Tufte, American statistician sometimes considered “father of data visualization”
  • Introduced the concept of “graphical integrity”
  • How do we present data as honestly as possible?

Lie Factor

  • Lie Factor = (size of effect in graphic)/(size of effect in data)
  • Lie factor is greater when variations on a graph fail to match variations in data

Tufte, Visual Display of Quantitative Information

Inconsistent Scales

Example from callingbullshit.org

Presenting Data out of Context

Example from mediamatters.org

Disproportionate Data-to-Ink Ratio

  • Ensure that the ink used on the data match the amount of data presented
  • Data-to-ink ratio = (ink used to represent data)/(ink used to print graphic)
  • Should be as close as possible to 1
  • Another way to think about it: How much of this graph could I erase without losing data?

Disproportionate Data-to-Ink Ratio

Deviating from Norms

Example from callingbullshit.org

When can I break convention??