Data Wrangling

Question

Send me a list of all of the lending IDs for the mortgage applications that were originated for borrowers with a debt-to-income ratio of greater than 60%.

#Code here
hmda_ms_2024 |>
  filter(debt_to_income_ratio == ">60%") |>
  select(lei)
# A tibble: 2,344 × 1
   lei                 
   <chr>               
 1 549300DNWSJUNAWUM003
 2 549300DMYENRV818D594
 3 549300DMYENRV818D594
 4 549300JPGMCMBEJEK584
 5 5493000V86JDJLM0OV13
 6 5493000V86JDJLM0OV13
 7 5493000V86JDJLM0OV13
 8 5493000V86JDJLM0OV13
 9 5493000V86JDJLM0OV13
10 549300L7HZ4C5U35BO37
# ℹ 2,334 more rows
Question

Send me the lending ID and action taken for the home purchases with the top 10 loan-to value ratios.

# Code Here
hmda_ms_2024 |>
  filter(loan_purpose == "Home Purchase") |>
  arrange(desc(loan_to_value_ratio)) |>
  head(10) |>
  select(lei, action_taken)
# A tibble: 10 × 2
   lei                  action_taken                         
   <chr>                <chr>                                
 1 X05BVSK68TQ7YTOSNR22 Application denied                   
 2 X05BVSK68TQ7YTOSNR22 Application denied                   
 3 EQTWLK1G7ODGC2MGLV11 Application denied                   
 4 EQTWLK1G7ODGC2MGLV11 Application denied                   
 5 5493003GQDUH26DNNH17 Application denied                   
 6 X05BVSK68TQ7YTOSNR22 Application denied                   
 7 X05BVSK68TQ7YTOSNR22 Application denied                   
 8 X05BVSK68TQ7YTOSNR22 Application denied                   
 9 EQTWLK1G7ODGC2MGLV11 Application denied                   
10 2549009A2MMICIE00762 Application approved but not accepted
Question

Send me a list of the loan to value ratios, loan amounts, and property values for each originated loan.

# Code Here
hmda_ms_2024 |>
  mutate(property_value = loan_amount / (loan_to_value_ratio/100)) |>
  select(loan_to_value_ratio, loan_amount, property_value)
# A tibble: 34,284 × 3
   loan_to_value_ratio loan_amount property_value
                 <dbl>       <dbl>          <dbl>
 1                80        265000        331250 
 2                80        205000        256250 
 3                95        145000        152632.
 4                90        485000        538889.
 5                68.3      215000        315000.
 6                NA        665000            NA 
 7                95         75000         78947.
 8                NA        315000            NA 
 9                NA        215000            NA 
10                80        145000        181250 
# ℹ 34,274 more rows
Question

How many NA values were listed in the loan to value ratio variable?

# Code Here
hmda_ms_2024 |>
  filter(is.na(loan_to_value_ratio)) |>
  nrow()
[1] 19247
hmda_ms_2024 |>
  summarize(total_na = sum(is.na(loan_to_value_ratio)))
# A tibble: 1 × 1
  total_na
     <int>
1    19247
Question

Of applicants with a debt-to-income ratio of less than 36%, what was the mean and median interest rate of originated loans for each race?

# Code Here
hmda_ms_2024 |>
  filter(debt_to_income_ratio %in% c("<20%", "20%-<30%", "30%-<36%")) |>
  group_by(derived_race) |>
  summarize(mean_interest = mean(interest_rate, na.rm = TRUE),
            median_interest = median(interest_rate, na.rm = TRUE))
# A tibble: 9 × 3
  derived_race                              mean_interest median_interest
  <chr>                                             <dbl>           <dbl>
1 2 or more minority races                           8.37            8.22
2 American Indian or Alaska Native                   7.81            7.19
3 Asian                                              6.70            6.62
4 Black or African American                          8.81            8.73
5 Free Form Text Only                              NaN              NA   
6 Joint                                              7.59            7.04
7 Native Hawaiian or Other Pacific Islander         10.8            11.4 
8 Race Not Available                                 7.30            6.88
9 White                                              7.50            7   
Question

Which 5 lending institutions lent the most? Send me a list of their IDs, the number of loans they originated, and the total originated.

# Code Here
hmda_ms_2024 |>
  filter(action_taken == "Loan originated") |>
  group_by(lei) |>
  summarize(total_loans = n(),
            total_amount = sum(loan_amount)) |>
  arrange(desc(total_amount)) |>
  head(5)
# A tibble: 5 × 3
  lei                  total_loans total_amount
  <chr>                      <int>        <dbl>
1 01J4SO3XTWZF4PP38209         834    209840000
2 Q7C315HKI8VX0SSKBS64         695    194795000
3 549300SFG15JDKI5MD22         585    157785000
4 549300XQVJ1XBNFA5536        1238    137350000
5 5493005QK4NV0ZZ5EM64         543    120115000
Question

How many loans were denied because of debt-to-income ratio for each race and debt-to-income ratio bracket? Send me a list of the race, debt-to-income ratio, and count of denied loans, sorted first by the derived race, and second by the count in descending order.

# Code Here
hmda_ms_2024 |>
  filter(denial_reason == "Debt-to-income ratio") |>
  group_by(derived_race, debt_to_income_ratio) |>
  summarize(denied_for_debt_count = n()) |>
  arrange(derived_race, desc(denied_for_debt_count))
`summarise()` has grouped output by 'derived_race'. You can override using the
`.groups` argument.
# A tibble: 38 × 3
# Groups:   derived_race [9]
   derived_race                     debt_to_income_ratio denied_for_debt_count
   <chr>                            <fct>                                <int>
 1 2 or more minority races         >60%                                     2
 2 2 or more minority races         42%-49%                                  1
 3 American Indian or Alaska Native >60%                                     3
 4 American Indian or Alaska Native 42%-49%                                  1
 5 American Indian or Alaska Native 50%-60%                                  1
 6 Asian                            >60%                                    14
 7 Asian                            50%-60%                                  4
 8 Asian                            42%-49%                                  1
 9 Black or African American        >60%                                   263
10 Black or African American        50%-60%                                124
# ℹ 28 more rows
Question

When applicants were denied, what was the most common denial reason for each race? Send me a list of the race, top denial reason, and count of loans that were denied for that reason.

# Code Here
hmda_ms_2024 |>
  filter(action_taken == "Application denied") |>
  group_by(derived_race, denial_reason) |>
  summarize(count = n()) |>
  ungroup() |>
  group_by(derived_race) |>
  filter(count == max(count))
`summarise()` has grouped output by 'derived_race'. You can override using the
`.groups` argument.
# A tibble: 9 × 3
# Groups:   derived_race [9]
  derived_race                              denial_reason        count
  <chr>                                     <chr>                <int>
1 2 or more minority races                  Credit history          15
2 American Indian or Alaska Native          Multiple                18
3 Asian                                     Debt-to-income ratio    19
4 Black or African American                 Multiple              2025
5 Free Form Text Only                       Credit history           2
6 Joint                                     Multiple                43
7 Native Hawaiian or Other Pacific Islander Multiple                 5
8 Race Not Available                        Multiple               213
9 White                                     Multiple              1424