Pivoting Data

SDS 192: Introduction to Data Science

Professor Lindsay Poirier

For Today

  • Pivoting longer
  • Pivoting wider
  • Separating
  • Uniting

Tidy Data

  • Every observation has its own row.
  • Every variable has its own columns.
  • Every value has its own cell.

Is this tidy?

What variables are displayed on this plot?

df <- data.frame(
  date = c("10/27/2025","10/28/2025","10/29/2025"),
  Northampton_AQI = c(16, 12, 9),
  Easthampton_AQI = c(19, 13, 9),
  Hatfield_AQI = c(17, 12, 9)
)
df

Could I make this plot?

What will it look like when tidy?

a <- df |> pivot_longer(-date, 
                    names_to = "city", 
                    values_to = "aqi")

a

Pivoting Longer

  • We use pivot_longer() to pivot a datasets from wider to longer format:
  • pivot_longer() takes the following arguments:
  1. cols =: Identify a series of columns to pivot - The names of those columns will become repeated rows in the pivoted data frame, and the values in those columns will be stored in a new column.
  2. names_to =: Identify a name for the column where the column names will be store
  3. values_to =: Identify a name for the column were the values associated with those names will be stored
  4. Various arguments to support transformations to names

Example

df |> pivot_longer(cols = ends_with("AQI"), 
                    names_to = "city", 
                    values_to = "aqi") |>
  mutate(city = str_replace(city, "_AQI", ""))

Pivoting Wider

Note: I use this far less often than pivot_longer()

  • We use pivot_wider() to pivot a datasets from longer to wider format:
  • pivot_wider() takes the following arguments:
  1. names_from =: Identify the column to get the new column names from
  2. values_from =: Identify the column to get the cell values from
  3. Various arguments to support transformations to names

Example

df |> pivot_wider(names_from = "date", 
                   values_from = "aqi", 
                   names_repair = make.names)

Separating Columns

  • We use separate() to split a column into multiple columns:
  • separate() takes the following arguments:
  1. col: Identify the existing column to separate
  2. into = c(): Identify the names of the new columns
  3. sep =: Identify the characters or numeric position that indicate where to separate columns

Example

df |> pivot_longer(cols = -date,
                   names_to = "city", 
                   values_to = "values") |>
  separate(city, into = c("city", "metric"), sep = "_")

Uniting Columns

  • We use unit() to join multiple columns into one column:
  • unite takes the following arguments:
  1. ...: Identify the existing columns to unite
  2. col: Name of the new column
  3. sep =: Identify the characters or numeric position that indicate where to separate columns

Example

df |> unite(month_day, year, col = "date", sep = "/")

Let’s practice!

Steps

  1. Brainstorm the ggplot code (i.e. what goes on the x and y axis).
  2. Sketch out what the data frame should look like (i.e. what columns and rows).
  3. List the cleaning functions that you will need to to create the data frame.
  4. Write pseudo-code to create the data frame.