big data

Big Data: The Categorizing Machine

You want to know the habits of mobile phone users? Big Data. You want to reach a targeted clientele on the Web? Big Data. You want to decode the secrets of the latest on Netflix, or learn where to fix potholes in a neighbourhood? Big Data! All you need is a good algorithm, and a decent quantity of data, and the companies that analyze Big Data promise to find all sorts of answers to our questions. But who’s asking these questions? And can we trust algorithms to make decisions?

2015 is the year of Big Data. The concept of Big Data has existed for forty years already, but according to Forbes, this is the year that marks Big Data’s entry into the business world and governance. A bunch of companies are retuning their business models to reap profits from a new source of wealth: our personal data.

Big Data Mashup

Statistical analysis has always been with us. By taking surveys, or by calculating selected answers in a census form, we can estimate, more or less, the probability that a candidate will be elected, the number of car accidents annually, or even the type of individual most likely to reimburse a loan. Mistakes can be made, but numbers help uncover trends. And based on those trends, we hopefully make the right decisions.

Nowadays, we produce trends using quintillions of data points. Add this to the information collected by institutions and credit companies, browsing history tracked by cookies (episode 2), the data from our mobile phones (episode 4), 50 million photos, 40 million Tweets, and billions of documents exchanged daily. Now add the data produced by sports bracelets, “smart” objects and gadgets, and you’ll understand why “Big” is the right adjective to describe the vast expanse of available information.

However, the true revolutionary aspect of Big Data isn’t so much a question of its size, as it is the way in which all of this data can be mixed. Beyond the things it says about us (or despite us), it is the correlation and mixing of personal information that allow the behaviours of users to be predicted.

Being able to know what you say online? Who cares! But knowing the words used, and with whom you are exchanging, on what network, and at what time? Now that’s a moneymaker.

Categorization For The Win

With something as simple as a postal code, for example, average consumer income can be predicted. The Esri and Claritas agencies even claim to be able to deduce education level, lifestyle, family composition, and consumer habits from this one piece of information. Target made headlines in 2012 when it predicted a teenager’s pregnancy, before her parents were aware, based on the type of lotions, vitamins, and color of items purchased.

For these algorithms to work properly, individuals have to be put into increasingly more precise categories. And that is where discrimination lurks. Because we don’t always fit easily into a pigeonhole.

Predictions and Discrimination

As Kate Crawford stated when she was interviewed in episode 5, it is minorities, and those who are already discriminated against, that are the most affected by prediction errors. The more an individual corresponds to the “norm”, or to a predetermined category, the easier it is to take their data into consideration. But what happens when we are on the margins? What happens to those that don’t behave the way Amazon, Google, or Facebook predicts?

Facebook recently angered many of its users by strictly enforcing a section of their Terms and Conditions which insists people use their real names within the service. The purpose, says the company, was to provide a safer environment and limit hateful posts. What they didn’t account for was the deletion of accounts from the transgendered, indigenous and survivors of domestic violence whose accounts weren’t held under “real” names. This violated not only the individual rights but also the privacy of these users.

And what about prejudices and discrimination that algorithms only serve to reinforce? In 2014, Chicago police rang the doorbell of a 22 year old young man named Robert McDaniels. “We’re watching you” said one of the officers. This was the result of an algorithm developed by the Illinois Institute of Technology placing him on a list of 400 potential criminals because of crime data about his neighbourhood, the intersections where crimes occurred in the past, and his degrees of separation from people involved in crimes. It’s like science fiction. And if there was a misconception, how would it be repaired?

Take the Test

We’re not going to lie to you: it’s difficult, if not impossible, to find out how we are categorized – and even harder to avoid it altogether. It all depends on the company, the algorithm, and the information that they are after. However, some tools can give us a glimpse into the ways in which the Web categorizes us:

  • The extension called Floodwatch (link) allows us to have a quick look at all of the advertisements that target us personally over a long time period. Handy for retracing our browsing practices and how they affect our categorization!
  • Even simpler? If you are logged in to your Google account – Go to the Ad Parameters page – Does this profile resemble you? It’s up to you if you want to correct it, or you could just adopt this new identity as a form of camouflage.

Sandra Rodriguez

The minority report: Chicago’s new police computer predicts crimes, but is it racist?

Chicago police say its computers can tell who will be a violent criminal, but critics say it’s nothing more than racial profiling.

When the Chicago Police Department sent one of its commanders to Robert McDaniel’s home last summer, the 22-year-old high school dropout was surprised. Though he lived in a neighborhood well-known for bloodshed on its streets, he hadn’t committed a crime or interacted with a police officer recently. And he didn’t have a violent criminal record, nor any gun violations. In August, he incredulously told the Chicago Tribune, “I haven’t done nothing that the next kid growing up hadn’t done.” Yet, there stood the female police commander at his front door with a stern message: if you commit any crimes, there will be major consequences. We’re watching you.

What McDaniel didn’t know was that he had been placed on the city’s “heat list” — an index of the roughly 400 people in the city of Chicago supposedly most likely to be involved in violent crime. Inspired by a Yale sociologist’s studies and compiled using an algorithm created by an engineer at the Illinois Institute of Technology, the heat list is just one example of the experiments the CPD is conducting as it attempts to push policing into the 21st century.

 

Meet The Woman Who Did Everything In Her Power To Hide Her Pregnancy From Big Data

Janet Vertesi, assistant professor of sociology at Princeton University, had an idea: would it be possible to hide her pregnancy from big data? Thinking about technology—the way we use it and the way it uses us—is her professional life’s work. Pregnant women, she knew, are a marketing gold mine; a pregnant woman’s marketing data is worth 15 times as much as the average person’s. Could Vertesi, a self-declared “conscientious objector” of Google ever since 2012, when they announced to users that they’d be able to read every email and chat, navigate all the human and consumer interactions having a baby would require and keep big data from ever finding out?

Courts docs show how Google slices users into “millions of buckets”

The online giant probably knows more about you than the NSA — including things you might not even tell your mother.

The first law of selling is to know your customer. This simple maxim has made Google into the world’s largest purveyor of advertisements, bringing in more ad revenue this year than all the world’s newspapers combined. What makes Google so valuable to advertisers is that it knows more about their customers — that is to say, about you — than anyone else.

Datacoup: Unlock the Value of Your Personal Data

Our mission to help people unlock the value of their personal data.
Almost every link in the economic chain has their hand in our collective data pocket. Data brokers in the US alone account for a $15bn industry, yet they have zero relationship with the consumers whose data they harvest and sell. They offer no discernible benefit back to the producers of this great data asset – you.

My Quantified Email Self Experiment: A failure

I have an archive of my own email going back 18 years, containing 450,000 messages. One day I decided to make it searchable. Not half-searchable but fully, dynamically, programmably searchable.

My big idea was: If I can quickly look through all of my old emails I will be able to observe how my thoughts have evolved. I’ll learn something fundamental about myself and how I’ve grown as a person —for example, the difference between being in my early 20s and being 40.

This seemed like an interesting thing to do, so I did it. But the experiment was a failure, and not very edifying.

Older Posts