What happens when you randomly collate data, looking for questionable correlations?
You want to know the habits of mobile phone users? Big Data. You want to reach a targeted clientele on the Web? Big Data. You want to decode the secrets of the latest on Netflix, or learn where to fix potholes in a neighbourhood? Big Data! All you need is a good algorithm, and a decent quantity of data, and the companies that analyze Big Data promise to find all sorts of answers to our questions. But who’s asking these questions? And can we trust algorithms to make decisions?
2015 is the year of Big Data. The concept of Big Data has existed for forty years already, but according to Forbes, this is the year that marks Big Data’s entry into the business world and governance. A bunch of companies are retuning their business models to reap profits from a new source of wealth: our personal data.
Big Data Mashup
Statistical analysis has always been with us. By taking surveys, or by calculating selected answers in a census form, we can estimate, more or less, the probability that a candidate will be elected, the number of car accidents annually, or even the type of individual most likely to reimburse a loan. Mistakes can be made, but numbers help uncover trends. And based on those trends, we hopefully make the right decisions.
Nowadays, we produce trends using quintillions of data points. Add this to the information collected by institutions and credit companies, browsing history tracked by cookies (episode 2), the data from our mobile phones (episode 4), 50 million photos, 40 million Tweets, and billions of documents exchanged daily. Now add the data produced by sports bracelets, “smart” objects and gadgets, and you’ll understand why “Big” is the right adjective to describe the vast expanse of available information.
However, the true revolutionary aspect of Big Data isn’t so much a question of its size, as it is the way in which all of this data can be mixed. Beyond the things it says about us (or despite us), it is the correlation and mixing of personal information that allow the behaviours of users to be predicted.
Being able to know what you say online? Who cares! But knowing the words used, and with whom you are exchanging, on what network, and at what time? Now that’s a moneymaker.
Categorization For The Win
With something as simple as a postal code, for example, average consumer income can be predicted. The Esri and Claritas agencies even claim to be able to deduce education level, lifestyle, family composition, and consumer habits from this one piece of information. Target made headlines in 2012 when it predicted a teenager’s pregnancy, before her parents were aware, based on the type of lotions, vitamins, and color of items purchased.
For these algorithms to work properly, individuals have to be put into increasingly more precise categories. And that is where discrimination lurks. Because we don’t always fit easily into a pigeonhole.
Predictions and Discrimination
As Kate Crawford stated when she was interviewed in episode 5, it is minorities, and those who are already discriminated against, that are the most affected by prediction errors. The more an individual corresponds to the “norm”, or to a predetermined category, the easier it is to take their data into consideration. But what happens when we are on the margins? What happens to those that don’t behave the way Amazon, Google, or Facebook predicts?
Facebook recently angered many of its users by strictly enforcing a section of their Terms and Conditions which insists people use their real names within the service. The purpose, says the company, was to provide a safer environment and limit hateful posts. What they didn’t account for was the deletion of accounts from the transgendered, indigenous and survivors of domestic violence whose accounts weren’t held under “real” names. This violated not only the individual rights but also the privacy of these users.
And what about prejudices and discrimination that algorithms only serve to reinforce? In 2014, Chicago police rang the doorbell of a 22 year old young man named Robert McDaniels. “We’re watching you” said one of the officers. This was the result of an algorithm developed by the Illinois Institute of Technology placing him on a list of 400 potential criminals because of crime data about his neighbourhood, the intersections where crimes occurred in the past, and his degrees of separation from people involved in crimes. It’s like science fiction. And if there was a misconception, how would it be repaired?
Take the Test
We’re not going to lie to you: it’s difficult, if not impossible, to find out how we are categorized – and even harder to avoid it altogether. It all depends on the company, the algorithm, and the information that they are after. However, some tools can give us a glimpse into the ways in which the Web categorizes us:
- The extension called Floodwatch (link) allows us to have a quick look at all of the advertisements that target us personally over a long time period. Handy for retracing our browsing practices and how they affect our categorization!
- Even simpler? If you are logged in to your Google account – Go to the Ad Parameters page – Does this profile resemble you? It’s up to you if you want to correct it, or you could just adopt this new identity as a form of camouflage.
Pinterest hosts more than 30 billion Pins (and growing) with rich contextual and visual information. Tens of millions of Pinners (users) interact with the site every day by browsing, searching, Pinning, and clicking through to external sites. The home feed, a collection of Pins from the people, boards and interests followed, as well as recommendations including Picked for You, is the most heavily user-engaged part of the service, and contributes a large fraction of total repins. The more people Pin, the better Pinterest can get for each person, which puts us in a unique position to serve up inspiration as a discovery engine on an ongoing basis.
For heavy Facebook users, let alone social media gurus, the idea that Facebook’s news feed is filtered by an algorithm is very, very old news. But a majority of everyday Facebook users in a recent study had no idea that Facebook constructs their experience, pushing certain posts into their stream and leaving others out. And worse, many participants blamed themselves, not Facebook’s software, when friends or family disappeared from their news feeds.
In the developed world, financial institutions use credit history in order to assess a person’s creditworthiness and subsequently issue them financial products at personalized terms. Unfortunately, that’s difficult to do in emerging markets, where mobile phones are more prevalent than traditional credit history.
NEW YORK—Calling it a major breakthrough that will significantly expedite and streamline its daily operations, Wall Street financial firm Goldman Sachs revealed Thursday it has developed a new high-speed algorithm that is capable of performing more than 10,000 ethical violations per second. “
The searches we make, the news we read, the dates we go on, the advertisements we see, the products we buy and the music we listen to. The stock market. The surveillance society. The police state, and the drones. All guided by a force we never see and few understand.