Article/BlogLatest Post

The ROI of monitoring data usage

93views

The ROI of monitoring data usage

  Wannes Rosiers      Dec. 8. 2022

More and more businesses heavily rely on data. They have left the stone-ages of gut-feel decisions and have entered a world of data-driven decisions. Within this modern mindset, they have grown from plain, after the event reports, up to automated data driven decisioning. The way companies harvest value from data has changed from reports, instead of operational excellence and process optimization, towards the creation of new value chains. Monetizing data has become the baseline, yet companies are constantly looking for new smart ways to use their data and obtain a competitive advantage. This innovation takes place both in the realms of new use-cases as well as which data we collect. But how many companies collect data and create insights about their data usage?

Quite often, obtaining these insights is a cumbersome process. Creating the obligatory audit reports once a year brings sleepless nights to those that need to actually write the report. In a recent blog, I wrote about this audit nightmare and the problem of the discontinuity of this process.

“Such — data usage — insights should not be created by continuously sending an engineer into the dungeons of access controls in your data application landscape for a day. These insights should be a single click away, at all times.”

The ROI of usage insights

At one of my previous employers, we had the ambition to rationalize our data landscape. Sometimes you need a break to accelerate again. As we did not know who used which data or data pipelines, or whether they were used at all, we applied the same “introduce a break”-strategy on data pipelines: we paused them and waited to see whether someone started complaining.

Data not being used should not be stored, nor captured at all. Data pipelines resulting in unused data objects should not be maintained. Both introduce a purposeless cost and maintenance effort. By aggressively powering off data pipelines or tools, we could save on storage and processing power, we could limit the number of licenses for certain tools — or phase out other tools one year faster — and we would also manage to free up engineers by reducing their time spent on maintenance. I’ll leave it up to the reader to calculate the possible ROI when you can reduce storage and processing cost by 10%, as well as freeing up engineers by reducing maintenance by 20%…

From lightbulb to spotlight monitoring

Companies now try to collect more and more data about their customers than ever before. The concept of creating a customer 360 view boils down to putting spotlights on your customer from multiple angles. Yet within the company itself they only have a few tiny light bulbs to monitor critical elements and forget about the rest.

Just as collecting customer information should be limited to capturing lawful and purposeful data, monitoring activities should be limited to those that truly create relevant insights. More and more, we push the responsibility of data governance to the left. Monitoring access and usage of data should allow you to enable data owners to truly take this ownership, as well as let them prove that you meet legal requirements.

Data access analytics: what to monitor

There you have it: meet legal requirements. Take for example the European GDPR legislation, which stipulates that purpose limitation is a requirement.

Purpose limitation: Personal data should only be collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes.

This can be translated into the principle of data minimization:

The principle of data minimization means that a data controller should limit the collection of personal information to what is directly relevant and necessary to accomplish a specified purpose.

These privileges set limitations on the data itself. On the other hand you have the principle of least privilege, which limits the users and can assist in reassuring that you do everything to adhere to these principles.

The principle of least privilege states that a user or entity should only have access to the specific data, resources and applications needed to complete a required task.

When reassuring that these principles are met — a responsibility of business data stewards — you should provide these business data stewards with insights regarding:

  • Which data exists and what are you responsible for?
  • Does this data belong to a special category type of data?
  • For which purpose(s) is this data being collected?
  • For which purpose(s) is this data being used?
  • Who can access this data for which purpose(s)?

More traditional data catalogs provide part of the solution. They focus on the discoverability of data, including its metadata. They therefore allow business data stewards to actively investigate which data they own, and why this data is being collected. Often they also provide insights in the special categories of data. What they most often lack, however, is the insight to why data is being used. This applies in general as well as on a per individual case.

 

The employee 360

From the cost management perspective, as from the legal perspective, it’s the individual statistics that are interesting. For cost reduction you want to know what is being used, not how much is being used. For legal compliance you want to know who is using what and why. This boils down to turning the lights on everywhere and creating an employee or data object 360 view next to the customer 360 view.

Hence to make your data access analytics valuable, you require individual data access and usage insights. From a user perspective, your compliance team should mostly be looking into the validity of data access — ‘is this person allowed to access data’ — and the principle of least privilege — ‘is this person actually using the access he got’. In short: access coverage and access usage on an individual basis.

Such insights rapidly turn out to be valuable if you count the time saved during audit times, or, even worse, during data breaches. In the case of credential breaches, access coverage indicates what is at risk, while access usage covers what is really impacted. Next to this, it also allows you to chase employees to work with data and increase your data maturity.

Cost management via data usage

From the data object perspective, you could consider a table in a database as the counterpart of an individual employee. The same would hold for a report in a reporting environment. A typical data pipeline fails a few times a year. Storage is cheap, but not free, certainly when you store a bunch of historical copies in a data lake. Processing power on the other hand is paid per usage in cloud environments, hence increases while processing unused data pipelines.

Read more …


Leave a Response