Metadata is Data, So Manage It Like Data
Metadata Is Data, So Manage It Like Data
ABSTRACT: Pervasive use of metadata to solve today’s data management problems means that metadata is itself a valuable data asset that we must proactively manage.
Managing data is a critical survival skill for any organization. Companies are investing in new data architectures and solutions—such as data fabric, data access governance, and data observability—to keep pace with expanding business appetite for data. But the key to managing data at scale is metadata. Metadata makes it possible to deliver data faster, in more forms, for more uses by enabling automation of important management steps such as documenting, classifying, and certifying data.
However, the pervasive use of metadata in these new approaches creates another data management challenge that is not yet on our radar. In the rush to implement solutions to manage data, we’re making metadata duplicative and inconsistent. In other words, we’re creating the same problems with metadata that we’re using it to fix. It’s time to recognize that metadata is data and therefore, is itself a valuable asset that we must proactively manage.
It’s time to recognize that metadata is data and therefore, is itself a valuable asset that we must proactively manage.
In this article, we’ll explore what metadata is and how it’s used in the modern data stack. We’ll look at some of today’s solutions that rely heavily on metadata, and why it’s critical to manage metadata as carefully as we’ve learned to manage data.
What is Metadata?
Put simply, metadata is data that describes data. There are four broad categories of metadata (see Figure 1), each of which has certain uses:
-
Technical metadata. If you want to know the structure of a table or find objects that have attribute names including the string “cust”, or determine whether those attributes contain numbers, letters or both, you need technical metadata. Technical metadata includes the characteristics of data that systems need to work with it, such as format, type, length, and location. It also includes technical documentation such as data models and system designs.
-
Business metadata. If you want to know what “margin ratio” means and how it’s calculated, or whether a delivery address is considered PII, you need business metadata. Business metadata uses business language to provide context for the data that appears in applications, reports, and dashboards. It includes elements such as terms, definitions, classifications, and retention rules for different types of data. It also includes assignment of people to roles such as data owners and stewards.
-
Operational metadata. If you want to know what data sets are used together most often, or how long it took to complete data pipeline tasks, you need operational metadata. Operational metadata provides information on how data is used and what happens when it’s used. It includes information from a variety of sources such as execution logs, rule engines, error logs, and audit registers, with dates and times of events. It also includes data quality levels and operational monitoring of data quality issues that exceed defined levels.
-
Social metadata. Collaboration enables organizations to derive more value from their data. For example, many business intelligence and data catalog applications provide ways for users to ask questions, share tips, and make recommendations about data. Social metadata captures the enrichment that comes from collaboration. It includes elements such as tags, ratings, annotations, and questions and comments.
Figure 1. Types of Metadata