I just read Martin Fowlers blogpost on the concept of Datensparsamkeit that you can find here
My summary of his post is that he argues that organizations should only store the data they really need and not, in this day and age of big data, store all data they can get their hands on. His primary concern is that of privacy.
I fully support his concern of privacy, but reading the blogpost it got me to think about another reason for Datensparsamkeit.
In my current project we have been collecting data from various sources, internal to the organization as well as external. After all, the project is about a data warehouse and those are fundamentally about collecting data. In some situations the data was not needed right away, but for various reasons we built the interface to collect it anyway. After this, the interface was put into production and the data collection was started.
In all of the situations when we collected data that was not needed right away, we run into serious data quality issues. We had situations when the interface was broken for months, no data was collected nobody realized it. We also had a situation where a calculation to generate derived data was seriously flawed for months and nobody realized it.
Since the data was not needed right away, nobody was ensuring the quality of it and therefore the collection of the data was not only a big waste, it also led to serious rework to fix the problems.
So, in addition to the privacy concerns of Datensparsamkeit, I would also add serious data quality issues that leads to Datensparsamkeit.