Advanced Data Wrangling

Question

Select all columns relating the crime suspected.

#Code here
  cs_objcs cs_descr cs_casng cs_lkout cs_cloth cs_drgtr cs_furtv cs_vcrim
1        N        N        Y        N        N        N        N        N
2        N        N        Y        N        N        N        Y        N
3        N        N        N        N        N        N        Y        N
4        N        N        N        N        N        N        Y        N
5        N        N        N        N        N        N        Y        N
  cs_bulge cs_other
1        N        N
2        N        N
3        N        Y
4        N        Y
5        N        Y
Question

Select all of the first 5 columns, except the 4th column.

# Code Here
  year pct ser_num timestop
1 2011 102     185        0
2 2011 115      50        5
3 2011 100       4        7
4 2011 100       3        7
5 2011 100       1        7
Question

Send me the distinct values in the crime suspected column.

# Code Here
           crimsusp
1          BURGLARY
2               FEL
3               CPW
4 CRIMINAL TRESPASS
5           ROBBERY
Question

In which rows where a suspect was frisked and searched was either contraband or a pistol found on the suspect? Add a line of code below to determine this.

sqf_2011 |>
  select(frisked, searched, contrabn, pistol) |> 
  head(5)
  frisked searched contrabn pistol
1       Y        N        N      N
2       Y        N        N      N
3       Y        N        N      N
4       Y        N        N      N
5       Y        N        N      N
  frisked searched contrabn pistol
1       Y        Y        N      Y
2       Y        Y        Y      N
3       Y        Y        Y      N
4       Y        Y        N      Y
5       Y        Y        Y      N
Question

Send me all of the rows where the crime suspected contains “FEL.”

# Code Here
  pct ser_num crimsusp
1 115      50      FEL
2 101       8      FEL
3  70       1      FEL
4 101      11   FELONY
5  28       1   FELONY
Question

The NYPD uses the number 999 to indicate a missing age. In the code below, add a line to convert 999 to NA, using mutate().

# Code Here

sqf_2011 |>
  filter(age > 100) |>
  ggplot(aes(x = age)) +
  geom_histogram(binwidth = 100)

Question

Add a line to the code below to create a new column called innocent that is set to “Y” if both arstmade and sumissue are “N”, and otherwise set to “N”.

#sqf_2011 |>
  #select(pct, ser_num, arstmade, sumissue) #|>
  #summarize(num_innocent = sum(innocent == "Y"))
  num_innocent
1       685022
Question

In the following code, the percent total stops is calculated out the total stops for each city. How can I adjust the code if I wanted to calculate percent total stops out of the total stops in the entire dataset.

sqf_2011 |>
  group_by(city, race) |>
  summarize(count = n()) |>
  mutate(total_stops = sum(count)) |>
  mutate(percent_total_stops = count/total_stops)
`summarise()` has grouped output by 'city'. You can override using the
`.groups` argument.
# A tibble: 45 × 5
# Groups:   city [6]
   city    race  count total_stops percent_total_stops
   <chr>   <chr> <int>       <int>               <dbl>
 1 " "     B        22          38             0.579  
 2 " "     I         1          38             0.0263 
 3 " "     P         5          38             0.132  
 4 " "     Q         7          38             0.184  
 5 " "     W         3          38             0.0789 
 6 "BRONX" A      1125      135738             0.00829
 7 "BRONX" B     64232      135738             0.473  
 8 "BRONX" I       340      135738             0.00250
 9 "BRONX" P     16311      135738             0.120  
10 "BRONX" Q     44325      135738             0.327  
# ℹ 35 more rows
`summarise()` has grouped output by 'city'. You can override using the
`.groups` argument.
# A tibble: 45 × 5
   city    race  count total_stops percent_total_stops
   <chr>   <chr> <int>       <int>               <dbl>
 1 " "     B        22      685724          0.0000321 
 2 " "     I         1      685724          0.00000146
 3 " "     P         5      685724          0.00000729
 4 " "     Q         7      685724          0.0000102 
 5 " "     W         3      685724          0.00000437
 6 "BRONX" A      1125      685724          0.00164   
 7 "BRONX" B     64232      685724          0.0937    
 8 "BRONX" I       340      685724          0.000496  
 9 "BRONX" P     16311      685724          0.0238    
10 "BRONX" Q     44325      685724          0.0646    
# ℹ 35 more rows
Question

In which precinct were individuals of each race stopped the most?

# Code Here