You want to know the habits of mobile phone users? Big Data. You want to reach a targeted clientele on the Web? Big Data. You want to decode the secrets of the latest on Netflix, or learn where to fix potholes in a neighbourhood? Big Data! All you need is a good algorithm, and a decent quantity of data, and the companies that analyze Big Data promise to find all sorts of answers to our questions. But who’s asking these questions? And can we trust algorithms to make decisions?
2015 is the year of Big Data. The concept of Big Data has existed for forty years already, but according to Forbes, this is the year that marks Big Data’s entry into the business world and governance. A bunch of companies are retuning their business models to reap profits from a new source of wealth: our personal data.
Big Data Mashup
Statistical analysis has always been with us. By taking surveys, or by calculating selected answers in a census form, we can estimate, more or less, the probability that a candidate will be elected, the number of car accidents annually, or even the type of individual most likely to reimburse a loan. Mistakes can be made, but numbers help uncover trends. And based on those trends, we hopefully make the right decisions.
Nowadays, we produce trends using quintillions of data points. Add this to the information collected by institutions and credit companies, browsing history tracked by cookies (episode 2), the data from our mobile phones (episode 4), 50 million photos, 40 million Tweets, and billions of documents exchanged daily. Now add the data produced by sports bracelets, “smart” objects and gadgets, and you’ll understand why “Big” is the right adjective to describe the vast expanse of available information.
However, the true revolutionary aspect of Big Data isn’t so much a question of its size, as it is the way in which all of this data can be mixed. Beyond the things it says about us (or despite us), it is the correlation and mixing of personal information that allow the behaviours of users to be predicted.
Being able to know what you say online? Who cares! But knowing the words used, and with whom you are exchanging, on what network, and at what time? Now that’s a moneymaker.
Categorization For The Win
With something as simple as a postal code, for example, average consumer income can be predicted. The Esri and Claritas agencies even claim to be able to deduce education level, lifestyle, family composition, and consumer habits from this one piece of information. Target made headlines in 2012 when it predicted a teenager’s pregnancy, before her parents were aware, based on the type of lotions, vitamins, and color of items purchased.
For these algorithms to work properly, individuals have to be put into increasingly more precise categories. And that is where discrimination lurks. Because we don’t always fit easily into a pigeonhole.
Predictions and Discrimination
As Kate Crawford stated when she was interviewed in episode 5, it is minorities, and those who are already discriminated against, that are the most affected by prediction errors. The more an individual corresponds to the “norm”, or to a predetermined category, the easier it is to take their data into consideration. But what happens when we are on the margins? What happens to those that don’t behave the way Amazon, Google, or Facebook predicts?
Facebook recently angered many of its users by strictly enforcing a section of their Terms and Conditions which insists people use their real names within the service. The purpose, says the company, was to provide a safer environment and limit hateful posts. What they didn’t account for was the deletion of accounts from the transgendered, indigenous and survivors of domestic violence whose accounts weren’t held under “real” names. This violated not only the individual rights but also the privacy of these users.
And what about prejudices and discrimination that algorithms only serve to reinforce? In 2014, Chicago police rang the doorbell of a 22 year old young man named Robert McDaniels. “We’re watching you” said one of the officers. This was the result of an algorithm developed by the Illinois Institute of Technology placing him on a list of 400 potential criminals because of crime data about his neighbourhood, the intersections where crimes occurred in the past, and his degrees of separation from people involved in crimes. It’s like science fiction. And if there was a misconception, how would it be repaired?
Take the Test
We’re not going to lie to you: it’s difficult, if not impossible, to find out how we are categorized – and even harder to avoid it altogether. It all depends on the company, the algorithm, and the information that they are after. However, some tools can give us a glimpse into the ways in which the Web categorizes us:
- The extension called Floodwatch (link) allows us to have a quick look at all of the advertisements that target us personally over a long time period. Handy for retracing our browsing practices and how they affect our categorization!
- Even simpler? If you are logged in to your Google account – Go to the Ad Parameters page – Does this profile resemble you? It’s up to you if you want to correct it, or you could just adopt this new identity as a form of camouflage.