Chances are your Models are Racist, Sexist, or both

At Joostware, I build ML and NLP products. Joostware is an entirely bootstrapped product development studio garage, and consulting is a big part of the business. I work with clients (mostly early startups) on their next big idea, and help them mature those ideas and bring them into reality. One amazing thing I notice, increasingly, is machine learning is everywhere in the Valley tech conscious. Today, no startup worth its domain name, wants to admit it doesn’t use/want to use ML/NLP/Vision or “AI”.

In one such consulting deal, I felt particularly worried about what I saw in the data and the production model outputs. ML practitioners and advocates are increasingly finding themselves becoming gatekeepers of the modern world. The models you create have power to get people arrested or vindicated, get loans approved or rejected, determine what interest rate should be charged for such loans, who should be shown to you in your long list of pursuits on your Tinder, what news do you read, who gets called for a job phone screen or even a college admission… the list goes on.

So what can you do about it?

To co-opt a related paper’s title, we can only ensure fairness through awareness. I first wrote to Suresh Venkatasubramanian (@geomblog), who I knew had done some work on this and, eventually, I went down the literature rabbit hole on unfairness and discrimination in machine learning.

I must thank Suresh for providing some of the seed material for my research. I have also been very influenced by the work of Cynthia Dwork, Moritz Hardt (@mrtz), and Cathy O’Neill. Standing on their shoulders, I shared some of my learning with friends from The Data Guild recently. I have detailed notes for some of these slides. If you would like to follow those, try going directly to Google Slides.

I look forward to more such conversations in the startup lands everywhere (I mostly refer to startups because I work a lot with them, and they are usually the ones who are incentivized to move fast and break things).

, , , ,

  • Zalman Stern

    You may want to clarify the note text on slide 12:
    [
    The same Gillian Tett, now writes this on Chicago’s predictive policing. “The program has nothing to do with race but multi-variable equations”.
    ]
    The quoted comment is attributed to Brett Goldstein in the article snippet but the note seems to attribute it to Tett.

    • Delip Rao

      Zalman Stern, thanks for pointing that. I have fixed the notes. Gillian wrote the article endorsing predictive policing models used by the CPD, and quotes Goldstein in support.

  • Kaleberg

    Machine learning is all about stereotyping. That’s what the whole Bayesian thing at the heart of modern data analysis is based on, P(A|B). The problem is all the 1-P(A|B) doesn’t get a fair hearing. This doesn’t matter when guessing the next word in a dialog, but it guarantees some bad decision making.

    ML is all about lazy thinking, except done by machines instead of people. Often, it’s good enough, but all too often it is unfair, immoral and unsafe.

  • Joseph Savirimuthu

    Thanks for sharing and the replies helped – I learnt a lot.

  • Algo Genius

    Great post. Many don’t consider these factors when working with demographic data. I left my previous employer because models were targeting too many poor people (hundreds per month) for theft and fraud. Meanwhile, execs were not being reviewed for financial crimes.

  • Houshalter

    So what? The problem with discrimination is that it’s done by prejudiced, irrational humans. Algorithms have no bias. They look at the data objectively, and make the most accurate prediction possible.

    Do you really believe that racist stereotypes are true, after controlling for other variables like education and income? If not, then there is nothing to worry about. Algorithms will be a thousand times less biased than humans.

    No one is suggesting that we use race as a variable. But you are going way further than that. Saying even anything that correlates with race shouldn’t be used. And literally everything correlates with race…

    If credit score correlates with race, should a bank be allowed to use it? A black person with a low credit score is just as likely to get a loan as a white person with a low credit score. There is no racial discrimination here.

    And even the group of people with low credit scores, are not being treated unfairly. It just so happens that group is way more likely to default. As a group, the interest rate is perfectly fair, and the algorithm’s predictions are perfectly accurate.

  • Pingback: It’s Good* That Word Embeddings Are Sexist | The Logorrhean Theorem()

© 2016 Delip Rao. All Rights Reserved.