The idea behind Beauty.AI seemed innocent enough—an online beauty contest judged by intelligent machines. In 2016, a team of data scientists built a deep‑learning model to evaluate physical beauty by objective criteria such as facial symmetry. They tested the model on male and female entrants in five age groups, ranging from 18 to 60 and above.
More than 6,000 people from over 100 countries entered. When the system finished its judging, nearly all of the 44 winners were white. A handful were Asian. None were black.
The results, not surprisingly, set off a firestorm of criticism, causing the company to indefinitely postpone its next contest. The algorithms created to judge beauty, it seemed, had somehow picked up racist tendencies—possibly because the system was primarily trained on light‑skinned faces.
Today, the risks of similar algorithmic mishaps have only increased, as machine intelligence plays an expanding role in making consequential decisions—from hiring and promoting employees, to determining the sentence of a convicted felon, to placing individuals on the federal government’s “no fly” list.
As systems that evaluate inputs and make decisions on their own gain traction in business, a new term is entering the argot of risk management—algorithmic auditing. And for good reason: The algorithmic elements and variables embedded in intelligent applications today are overseen by data scientists and math whizzes, not experts in risk analysis, regulation, or corporate governance.
At some point, CIOs will need “to be able to explain the results of algorithmic decisions,” says Josefin Rosen, a principal analyst and algorithmic auditing expert with statistical software company SAS. “To be able to fulfil that, regular algorithm audits will be necessary.”
How to get started
Algorithmic auditing, as the name suggests, is a collection of techniques for testing whether an intelligent machine has blind spots or biases. Engineers feed the machine a variety of inputs and look for problematic patterns in decision‑making. Part data scientist, part forensic accountant, an algorithmic auditor taps an emerging set of tools and techniques to analyze risks. It’s an exercise that can be just as complex as the technology it promises to hold accountable.
There are two modes of algorithmic audit:
Direct auditing consists of code audits and other more traditional efforts, which are effective on machine learning systems and models that human auditors can deconstruct and interpret. For example, common data science methods such as decision trees and regression models generally can be analyzed and understood at the code level. This makes isolating the impact of all the inputs and variables in the model much simpler and more transparent.
Indirect auditing entails feeding sets of data that vary widely into an algorithm to test the outputs for signs of bias, anomalous behavior or other undesirable results. Indirect examination is useful in any audit, but is essential in audits of deep learning systems that write code autonomously.
One point every CIO should keep in mind: Algorithmic audits are new—so new that standardized methodologies don’t yet exist. A growing number of university and corporate researchers are working on ways to make “black box” algorithms more easily understandable. If you like analogies, imagine the software equivalent of a radiologist injecting dye into a patient and then imaging the dye as a way of illuminating the inner workings of the body.
But these modeling tools aren’t ready for widespread corporate use. And the most immediate applications are likely to be focused on hot‑button issues such as hiring discrimination that pose the most obvious risks to business.
For now, researchers generally agree on a few basic guidelines for algorithmic audits. Here’s a look at the basics.
Check your data
The most common cause of algorithmic bias is biased data. Reducing that bias means understanding what data has been collected and its relevance to the problem you’re trying to solve, the integrity and accuracy of the data and the process undertaken to clean the raw data. It also means having a clear understanding of what data may actually be a proxy for bias. Zip codes, for example, can be a proxy for race and so algorithms that rely heavily on zip codes—for mortgage applications, for instance, or even same‑day consumer deliveries—may bake in hidden biases.
“This proxy problem where information that you would rather not consider in the model leaks in through another variable is a very hard problem,” says Rich Caruana, an expert in algorithmic auditing at Microsoft Research, the software company’s R&D arm. Identifying these hidden proxies may take considerable trial and error.
Ask how the algorithm works
Understanding the components and weighting of an algorithm makes it significantly easier to perform an audit. The very act of asking for an explanation of how an algorithm works can spark change, says Michael Skirpan, founder of a new algorithmic auditing consultancy called Probable Models.
“If they can answer the question, that’s great,” Skirpan explains. “If not, they will see that algorithmic transparency is important to you.” In particular, experts recommend asking what steps developers have taken to ensure the model is not biased against different protected groups.
Run your own sniff test
One simple way to check for bias without having to analyze the source code is to run some test data points through an algorithm and check the result, says Sarah Tan, a researcher at University of California San Francisco. For example, HR managers might collect the resumes of job candidates screened out of a hiring pool and change a variable they suspect may be causing bias—such as zip code or first or last names. They would then run those resumes through the screener again and see if the machine places the applicant back into the hiring pool.
This approach is particularly useful for very obvious racial or gender bias, although is less useful for bias that enters via proxy variables like zip codes. Running these simple “sniff tests” can often be useful in spotting the worst cases of bias quickly and affordably.
Request a full audit
There are various levels of algorithmic audits. A company can hire an outside firm to come in and lead the effort. If the company has an internal data team that is developing data models, it can institute a basic auditing process that involves an extra layer of testing outcomes to look for signs of bias. “If a department in your company or some other company you’ve hired has created a model, it’s very important that you have your own test and validation of the model,” says Caruana. “You will often discover problems you didn’t anticipate.”
This is particularly important because bias—or just simply unfair results—may not be obvious initially and may only emerge as a model evolves or is trained on new data. Developers, who are used to writing code, checking it for errors, and then shipping it, aren’t used to having to work this way.
“If you train a deep model on a test set and it looks accurate, then they figure that’s sufficient,” says Caruana. “You need more tests to track the behavior of the algorithm” over time.
For many companies, auditing algorithms may still seem like a distant priority compared to other challenges with new technologies. But with machine intelligence taking root in just about every function of business today, businesses that succeed with its new capabilities will be those that truly understand what’s in the black box.