Pivoting Data

SDS 192: Introduction to Data Science

Professor Lindsay Poirier

For Today

Pivoting longer
Pivoting wider
Separating
Uniting

Tidy Data

Every observation has its own row.
Every variable has its own columns.
Every value has its own cell.

Is this tidy?

What variables are displayed on this plot?

df <- data.frame(
  date = c("10/27/2025","10/28/2025","10/29/2025"),
  Northampton_AQI = c(16, 12, 9),
  Easthampton_AQI = c(19, 13, 9),
  Hatfield_AQI = c(17, 12, 9)
)
df

Could I make this plot?

What will it look like when tidy?

a <- df |> pivot_longer(-date, 
                    names_to = "city", 
                    values_to = "aqi")

a

Pivoting Longer

We use pivot_longer() to pivot a datasets from wider to longer format:
pivot_longer() takes the following arguments:

cols =: Identify a series of columns to pivot - The names of those columns will become repeated rows in the pivoted data frame, and the values in those columns will be stored in a new column.
names_to =: Identify a name for the column where the column names will be store
values_to =: Identify a name for the column were the values associated with those names will be stored
Various arguments to support transformations to names

df |> pivot_longer(cols = ends_with("AQI"), 
                    names_to = "city", 
                    values_to = "aqi") |>
  mutate(city = str_replace(city, "_AQI", ""))

Pivoting Wider

Note: I use this far less often than pivot_longer()

We use pivot_wider() to pivot a datasets from longer to wider format:
pivot_wider() takes the following arguments:

names_from =: Identify the column to get the new column names from
values_from =: Identify the column to get the cell values from
Various arguments to support transformations to names

Example

Before
Code
After

df |> pivot_wider(names_from = "date", 
                   values_from = "aqi", 
                   names_repair = make.names)

Separating Columns

We use separate() to split a column into multiple columns:
separate() takes the following arguments:

col: Identify the existing column to separate
into = c(): Identify the names of the new columns
sep =: Identify the characters or numeric position that indicate where to separate columns

Example

Before
Code
After

df |> pivot_longer(cols = -date,
                   names_to = "city", 
                   values_to = "values") |>
  separate(city, into = c("city", "metric"), sep = "_")

Uniting Columns

We use unit() to join multiple columns into one column:
unite takes the following arguments:

...: Identify the existing columns to unite
col: Name of the new column
sep =: Identify the characters or numeric position that indicate where to separate columns

Example

Before
Code
After

df |> unite(month_day, year, col = "date", sep = "/")

Let’s practice!

Steps

Brainstorm the ggplot code (i.e. what goes on the x and y axis).
Sketch out what the data frame should look like (i.e. what columns and rows).
List the cleaning functions that you will need to to create the data frame.
Write pseudo-code to create the data frame.