Data isn't oil, so what is it?

Metaphors shape how we understand, and change, the world. We need a better metaphor for data.

In 2011, Stanford researchers Paul Thibodeau and Lera Boroditsky published research that showed how the way we talk about crime changes our ideas about what to do about it. They asked two groups of students to read reports about crime in their area - one using a metaphor of crime as a ‘beast’ that was rampaging through the neighbourhood, and one describing crime as a ‘virus’ that had to be stopped. Their research showed that students shown the ‘virus’ metaphor were more likely to favour policy that looked at the root causes of crime, such as social deprivation, whilst students who read the ‘beast’ metaphor story favoured enforcement policies.

In my day job, we spend a lot of time thinking about the metaphors we use to help shape people’s understanding of complex issues, and hopefully drive change. In fact, I know about the study above from a podcast we produced a few years ago called This Will Change Your Mind, looking at how the ideas that shape public thinking are developed and adopted. In my favourite episode, the hosts try and find out how the phrase ‘hole in the ozone layer’ emerged, a particularly successful climate metaphor that drove global efforts to cut CFC emissions.

It’s an odd metaphor, as there isn’t really an ozone layer, and the hole wasn’t really a hole, but the metaphor caught on with scientists, policy makers and the public. One of the reasons for its success might have been the visceral image of a hole in the earth’s atmosphere, and the associations of breached defences that created. This was the era of early video games like Space Invaders and Missile Command, where the player had to defend the earth by stopping alien attacks raining down. I was a kid in the 80s, and the hole in the ozone layer felt as terrifying as these pixellated invaders.

Back in the early 2000s, I was working at the BBC on a project to imagine what the organisation would do with our user’s personal data. At the time, there were only a few websites that asked you to create personal accounts, and most of the data we captured was relatively anonymous server logs. I developed a list of principles that I thought should inform the BBC’s approach to managing user data, and commissioned a service design agency to develop prototypes of what this might look like as products or services.

I was then asked to present this to the BBC’s executive committee, and gave probably the worst presentation of my life. It didn’t help that I started the presentation by explaining that this work might be important in a ‘post licence fee world’, before being softly chided by then Director General Mark Thompson that this wasn’t the place to discuss that kind of idea.

But more than that gaffe, I think the biggest problem was that I couldn’t really describe what personal data actually was. Not in a technical sense - the BBC ExCo in the mid-2000s wasn’t very technically savvy anyway - but as something that mattered, and was important to a vision of what public media could do in this new century. I had lots of fine statements about what we could do with data, and how it could bring value to our audiences, but the thing itself was immaterial, a poltergeist only visible through the things it moved.

Over the next decade, the most dominant metaphor for personal data ended up being ‘oil’. As the platform giants of Facebook, Google, Amazon and Apple built empires of products and services that tracked our every activity, data has been discussed as a vast, untapped resource, ideal for extraction and processing into cold, hard cash.

But in the last few years, there has been a backlash against this extractive metaphor. In 2018 Cory Doctorow described Facebook’s data as an empire of low quality ‘oily rags’, recasting the metaphor to one of industrial waste, not liquid gold. In a recent speech, EU Vice President Margrethe Vestager tried to reposition data as a ‘reusable resource’, a more ecological metaphor that suggests ways of extracting value that doesn’t pollute the public sphere.

All these metaphors imagine public data as a huge, passive, untapped resources - lakes of stuff that only has value when it is extracted and processed. But this framing completely removes the individual agency that created the stuff in the first place. Oil is formed by millions of years of compression and chemical transformation of algae and tiny marine animals (sorry, not dinosaurs). Data is created in real time, as we click and swipe around the internet. The metaphor might work in an economic sense, but it fails to describe what data is as a material. It’s not oil, it’s people.

At the moment the big platforms and governments are ramping up the battle over our personal data - who can collect it, what they can do with it, and where they can send it. But this is happening at a level far above our individual experience of data. The battle rages above us, like the missiles and aliens in the video games of my 80s youth.

The discussions around data policy still feel like they are framing data as oil - as a vast, passive resource that either needs to be exploited or protected. But this data isn’t dead fish from millions of years ago - it’s the thoughts, emotions and behaviours of over a third of the world’s population, the largest record of human thought and activity ever collected. It’s not oil, it’s history. It’s people. It’s us.

If you’ve been on the internet for a while - let’s say 5-10 years - you’ve probably felt the visceral kick of seeing someone or something in your data history that caused you pain. It could be Facebook’s ‘on this day’ feature sending you a memory of a traumatic event, or scrolling through your photo library to find a photo of a deceased relative or friend. Or it could be a moment of joy - the online store where you bough a much loved item of clothing, or that perfect gift for a friend.

After a year of lockdown, seeing reminders of life before the pandemic in our camera rolls and social media updates has felt especially melancholic. Groups of us cramming together in a bar or park to get into the shot, or hugging each other at a football game. That intimacy is what our personal data records, an intimacy that seems doubly ironic when it is played back to us, isolated in our homes, through the same devices we’ve relied on to connect us during the lockdown.

This is not a passive archive - these are records of how we live now, and how people live in our memories. They can be recalled with a touch, and brought back to life, even if they are bittersweet memories. We need metaphors for data that capture the agency and visceral emotions that our personal data can generate. Metaphors that link it directly into our lives and relationships, that help us recognise that this is us - we’re the ones being traded and sold and stored and analysed and processed.

Perhaps then we’d understand how we can handle this data in a more responsible way. A metaphor that puts our personal experience at the forefront will help us find out where to draw lines in how our lives are stored and processed, and to understand that the lines will need to be different for different people. I don’t know what the right metaphor is - memory and history are the concepts I’ve been mulling over, but they have already been used in computing in ways that blur and dull them.

Maybe we should be very explicit, and refer to data as our lives. Imagine if a service had to ask you permission to ‘track your life’ or ‘share information about your life with other providers’. Already that feels grittier, more visceral, than just ‘data’.

We urgently need to come up with metaphors like this, that bring the discussion over data down from the skies above us and locate it in the minutiae of our everyday lives. Because that, after all, is what this data actually is.