(mis)adventures in software development...

03 March 2015

The Definition of Metadata

Category Technology

Tortured metaphors, metadata and mass surveillance.

Metadata used to be such an innocuous, mundane, technical term.

Until Big Government got hold of it.

And appropriated it for its own Orwellian purposes.

In technical circles, metadata is an abstract term. It means nothing in itself. What the word “metadata” refers to is entirely dependent on context.

If you’re talking about databases, metadata might information about the structure of the database — how the data is stored.

But you’re talking about photo files, it might be things like when the photo was taken, the location where the photo was taken, the device used to take the photo, the camera settings used to take the photo (exposure, white balance, shutter speed), etc.

It might be. But it depends. It’s an abstract term.

And it really, really depends on context.

Partly because it varies depending on context. The “metadata” of a photo sharing website is likely to be different to the “metadata” of an accounting application, for example.

It’s kind of like the word “religion”. It means different things, depending on which religion — or groups of religions — you might be referring to.

You might use the word “religion” in an abstract way, perhaps if talking about a group of religions in a general sense. But the word “religion” in itself doesn’t provide specifics — it wouldn’t be particularly enlightening to proudly proclaim a new found faith by saying you have decided to “convert to religion”. Which religion? It doesn’t make sense to get into specifics without providing at least some context about which religion you may be referring to.

Similarly, in the context of mass surveillance, talking about metadata without specifying the exact metadata doesn’t make much sense.

Yet, this is what the proposed mandatory metadata retention laws are doing — the legislation doesn’t even define the “metadata” that service providers are supposed to retain.

This is like saying religion is important and we must believe — without specifying which religion, or what we are to believe.

The Australian Government’s proposed data retention legislation takes its legal cues from The Castle — it’s all about “the vibe” of the metadata thing. The government is arguing these data retention laws are really important for national security, but apparently not important enough for specifics.

Which is kind of like arguing that all monotheistic Abrahamic faiths are basically the same and any distinction is not worth bothering with. Sure, you could try and argue that, but you’re likely to get some pretty strong push back in certain religious circles. I’d be willing to bet a lifetime supply of contraception on that.

With metadata as with religion, the devil is in the details.

And much like religion, the very notion of mass surveillance by the state requires a huge leap of faith.

Faith that data retention will be effect in catching criminals and preventing terrorism (despite evidence that it’s not).

Faith that the government and its agencies will not abuse the data it collects on its citizens (other governments have).

Faith that the metadata will be stored securely and won’t be hacked (unlikely/impossible).

Faith that it will not stifle freedom of speech and freedom of the press (it will by definition).

Faith despite evidence nothing the government says about data retention is true.

And it’s really hard to have faith in data retention when the argument for it has been so poorly articulated. The subtleties of metadata as an abstract term were obviously lost on Attorney General George Brandis when he tried to explain it in that now infamous interview on Sky News,. It’s hard to have faith in our elected leaders when they are making draconian laws about technology they clearly don’t understand.

After Brandis’s trainwreck of an explanation, we now have politicians glibly reciting the minimal route definition of metadata — that it’s “data about data”. While that might be the technical definition, it doesn’t really apply in the context of mandatory data retention. Like many terms used by politicians, “metadata” has been appropriated as a weasel word designed to deceive.

The term “metadata” is now part of the government’s attempt to make data retention seem benign — by making it sound technical; by implying metadata is somehow not actually data; by downplaying the privacy implications while exaggerating the distinction between “data” and “metadata”.

Anachronistic analogies describing metadata as “the material on the front of the envelope, not the contents inside the envelope” are misleading and inaccurate.

Metadata is a lot more than the address on the front of an envelope. Metadata is more like everything on the front and back of the envelope — including addresses, stamp, postmark and fingerprints — along with details of the pen used to write the address, where you bought the pen, where you bought the stamp, where you were when you wrote the address, who you were with, what you were wearing, what you had for breakfast that day, the date you posted the envelope, the postbox location where you posted the envelope, along with all your movements on the day you posted that envelope.

Ultimately, “metadata” is vague and abstract enough that in the hands of politicians it can mean whatever they want it to mean. This makes the technical definition irrelevant — any information that’s collected about us and our internet usage and communication is data. Don’t be fooled by false distinctions and technical connotations. The distinction between “metadata” and “data” is an arbitrary one, and in this case a misleading one. It’s all data, and it’s all private — even if it’s not considered equally private.

Sure, content of communication is considered more “private” than the “metadata” around that content (the contents of an email is considered very private, the time it was sent less so, for example).

But any consideration of the privacy of metadata is misleading if we only look at a single communication in isolation. Keep in mind that data retention is about our communication patterns in aggregate. It’s not just one envelope, it’s all the envelopes, and a whole lot more besides. Data retention is about gathering a large amount of data about all our digital communication and online activities, so that things can be inferred from sequences and patterns. Information that’s innocuous in isolation can become ominous in aggregate. Knowing an individual called a local doctor’s clinic might not reveal much. But knowing that call was followed by a numerous calls to abortion clinics possibly does.

The problem is with enough data, everything looks ominous in aggregate. Patterns truly innocent but seemingly nefarious will inevitably emerge due to randomness. The web browsing history of a chemistry student might, at times, look very similar to the web browsing history of a potential bomb maker. Journalists, activists, whistleblowers and researchers often have legitimate reasons for visiting dubious corners of the internet and perhaps communicating with individuals of dubious character — and doing so in private. But metadata by definition means no context or nuance. Metadata means everyone looks guilty. Metadata is an overbearing and constantly critical parent, always assuming the worst.

Just like we leave our scent behind as we move through the real world, in a constantly connected online world, we inevitably leave traces of our activities behind. Whether it’s an entry in a log file on an obscure server that will never examined, or an entry in a massive social media database that will be extensively data mined, it will still be there. Even if we don’t realise. We almost certainly cannot realise just what our digital metadata trail might imply. Things we would consider completely innocuous can appear suspicious taken out of context.

Like a sniffer dog tracking a scent, the government wants to be able to sniff out our online activities based on our digital data trail. But unlike a sniffer dog, the government has no leash, and no idea what it’s doing or what it’s looking for. Data retention gives government authorities the ability to look at patterns in our internet usage and communication, but no way to determine whether those patterns mean anything significant, or whether they’re just statistical noise. This means that when it comes to data retention, the government is a rabid, anxious, agitated beast ready to tear the throat out of anyone that smells a bit funny.

Data retention gives the government and its agencies too much power over us with too little oversight, for very little gain.

It’s also given “metadata” a new meaning: “metadata” is now a euphemism for government sanctioned spying.

Metadata means the government spying on all of us — but mostly the innocent — without a warrant.

Metadata means many false positives, which in turn means innocent people are be more likely to be charged and possibly convicted of things they didn’t do.

It means our phones become tracking devices for the government.

It means ISPs become agents of the state, gathering information about our online activities.

It means the government will know who we communicate with, and when, and how often.

It means whistleblowers will find it much harder to expose government and corporate corruption.

It means hackers will have an irresistible honeypot of data they will try and get their hands on.

It means the risk of identity theft will increase.

It means we have a government trying to implementing policy they don’t understand and can barely articulate.

It means wasting millions of taxpayer dollars on something that is known to be ineffective.

It means we get poorly drafted national security legislation based on fear rather than sensible, evidence-based policy.

It means online anonymity will be impossible.

It means invasion of our privacy.

It means undermining Australia’s democracy.

Misleading metaphors about envelopes don’t even begin to explain the deleterious effect mandatory data retention will have on our democracy.

Data retention is to democracy what a gatecrashing Satanist noisily sacrificing a goat is to a quiet christening— a disturbing and completely unwelcome intrusion.