I won’t pledge allegiance to Big Data

Jan. 15, 2018, 4:00 p.m.

From the Stanford Marriage Pact to micro-targeted political ads on your Facebook feed, a world governed by information and algorithms has become the new normal. However, the rise of data-driven decision-making has been accompanied by a dangerous ruse of objectivity — the false assumption that numbers must be neutral.

As a disclaimer, computer and data science offer many useful, even groundbreaking insights into the world’s most pressing social issues. They allow researchers to examine historical trends at an immense scale and make corresponding predictions to a high degree of accuracy. Furthermore, from the Poverty and Technology Lab to the Center for Spatial and Textual Analysis (CESTA), Stanford has been at the forefront of using technology to better understand the world around us.

However, the danger lies in forgetting that behind lines of code and massive datasets are humans. Smart humans, sure, but prone to laziness, biases and silly mistakes just like the rest of us. And when equipped with the transformative power of technology, these flaws — and their impact on society — become magnified. Errors in algorithms cannot be easily tracked and contained, but are instead quickly systematized and reproduced.

We’ve already begun to see the consequences of putting too much trust in data, as well as the dangers it continues to pose. “Predictive policing” models tell cops where to focus their resources based on records of past arrests, institutionalizing anti-black bias and slowing progress in a police system that already feels stuck in the 1950s. When Microsoft tested out “Tay,” a Twitter chatbot that imitated human users, she began spewing Nazi rhetoric within hours of release.

I’d like to think that the developers behind these technologies had no intention of perpetuating hate and racism. Yet it is simultaneously an ethical imperative for tech innovators to understand these tendencies and counteract them in future projects.

So what makes data so uniquely risky, especially when it comes to high-level policymaking?

First of all, data often fails to represent a population accurately. As we all know from basic statistics, survey sampling is subject to any number of issues: nonresponse, undercoverage, voluntary response bias. Furthermore, selection bias often disproportionately affects communities that are already marginalized, perhaps because they don’t have internet access, landlines or time to spare for answering questionnaires. Once you consider whose experiences tend to be institutionalized, and whose are systematically excluded from dominant narratives, even very large datasets are revealed as less all-encompassing as they seem.

Furthermore, data is fundamentally descriptive. It certainly tells us a lot about what has happened in the past, but doesn’t prescribe a specific course of action for the future. As a result, the normative conclusions we draw not only are subjective but also reify past misjudgments. For instance, automating part of the admissions process at St George’s Hospital Medical School conserved historical patterns of excluding women and nonwhites. Machine learning only complicates this, giving computers the power of interpretation and judgment with minimal supervision — a risk that became clear when Google’s “smart” image recognition technology labeled photos of black people “gorillas.”

Luckily, the same analytical tools that have so often been used to oppress can be repurposed for emancipatory purposes, but caution must be exercised at every step of the process. The Stanford Social Innovation Review argues that technology firms must prioritize ethics in their corporate cultures, hiring practices and standards for evaluation. Others have defined best practices for data use for social sector organizations: for example, purchasing information directly from underserved communities and disaggregating data to avoid homogenizing diverse populations. Finally, Stanford can expand its own role in fostering human-centered engineering by taking actions such as expanding the Technology in Society requirement for computer science majors.

In an era of fake news and mass political polarization, it’s more tempting than ever to embrace decision-making strategies that seem grounded firmly in empirics and objective reality. However, data isn’t neutral, and we can’t keep acting like it is. Making the old ways of doing things even faster and more efficient does not mean we have progressed.

Instead, it’s worthwhile to slow down the engine of innovation and ask ourselves: What kind of future are our algorithms really creating?

Contact Jasmine Sun at jasminesun ‘at’ stanford.edu.

Jasmine Sun '21 is a sociology major from the greater Seattle area. She's fascinated by the future of cities, education and digital media. When not trying and failing to catch up on her Goodreads Reading Challenge, she nurtures her caffeine addiction and hosts a podcast on civic innovation.

Print Article

I won’t pledge allegiance to Big Data

Login or create an account