class: center, middle, inverse, title-slide # Day Thirty-Six: Algorithmic Bias ## SDS 192: Introduction to Data Science ###
Lindsay Poirier
Statistical & Data Sciences
, Smith College
Spring 2022
--- class: middle, center # What's in a name? --- # The amazing people working on algorithmic bias <img src="img/bias.png" width="600" /> ...and many more! --- # AI Harms <img src="" width="300" /> > Image from [Algorithmic Justice League](; Credit: Megan Smith (former Chief Technology Officer of the USA) --- # Fairness and Disparate Error <iframe width="560" height="315" src="" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> --- # How does this happen? * Garbage in, garbage out * Proxy discrimination * Data Bias Diversion --- # Garbage In, Garbage Out... ...indicates instances when we build algorithms and other automated technologies on unrepresentative data. <iframe width="560" height="315" src="" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> --- # Proxy Discrimination .pull-left[ * Healthcare algorithm designed to determine which patients are in need of extra care * Researchers determined that at a given risk score, black patients tended to be much sicker than white patients * Algorithm used the amount of money patients had spent on healthcare as an indicator of health risk > What's the problem? ] .pull-right[ > Obermeyer, Ziad, Brian Powers, Christine Vogeli, and Sendhil Mullainathan. 2019. "Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations." Science 366 (6464): 447–53. ] --- class: middle, center  --- # Predicting Child Neglect * Allegheny County PA Office of Child Youth and Families designed algorithm to predict when children were at higher risk of experiencing neglect * Aims to address bias in neglect determinations * Upon report, AFST scores likelihood (1-20) of neglect * 131 predictive indicators based on regression analysis of data warehouse with over a billion records on past victims of neglect, including: * receiving county health or mental health treatment; * being reported for drug or alcohol abuse; * accessing supplemental nutrition assistance program benefits, cash welfare assistance, or Supplemental Security Income; * living in a poor neighborhood; * interacting with the juvenile probation system --- # Proxy Discrimination * Assumes that bias happens in the screening phase, when studies show bias in referral stage * No actual data on neglect so algorithm relies on proxies * A quarter of the indicators are also indicators for poverty * Data about use of public services is more widely accessible so included more than private services * No indicators for private rehabs or mental health counseling * Algorithm "oversamples" the poor --- # The Data Bias Diversion * Belief that it is possible and desirable to do data science in a "neutral" or impartial way * Ignores: * We can't have datasets without making decisions regarding what counts and how. * Data landscapes are already inequitable. * Data science tools and methods have racist legacies. * "Models are opinions reflected in mathematics." - Cathy O'Neil --- # Data Ethics Frameworks * [Federal Data Strategy]( * [Algorithmic Justice League Equitable AI]( * [Design Justice Principles]( * [Deon Data Ethics Checklist](