Fixing Metadata’s Bad Definition
Fixing Metadata’s Bad Definition
ABSTRACT: “Metadata is data about data” is a bad definition. It’s vague and recursive. How can we manage metadata if we can’t even define it clearly?
I’ve always been bothered by the expression “metadata is data about data”. That’s a bad definition—vague and recursive. It’s like saying climate is the weather of weather. But if metadata is not data about data, then what is it? I pondered this while writing my recent post Metadata is Data, So Manage it Like Data. How can we manage metadata if we can’t even define it clearly?
How can we manage metadata if we can’t even define it clearly?
A bad definition has practical implications. It makes misunderstandings much more likely, which can infect important processes such as data governance and data modeling. Thinking about this became an annoying itch that I couldn’t scratch. What follows is my thought process working toward a better understanding of metadata and its role in today’s data landscape.
The problem starts with language. Our lexicon hasn’t kept up with modern data’s complexity and nuance. There are three main issues with our current discourse about metadata:
-
Vague language. We talk about data in terms of “data” or “metadata”. But one category encompasses the other, which makes it very difficult to differentiate between them. These broad, self-referencing terms leave the door open to being interpreted differently by different people.
-
A gap in data taxonomy. We don’t have a name for the category of data that metadata describes, which creates a gap at the top of our data taxonomy. We need to fill it with a name for the data that metadata refers to.
-
Metadata is contextual. The same data set can be both metadata and not metadata depending on the context. So we need to treat metadata as a role that data can play rather than a fixed category.