My current assignment deals with a data warehouse system where some calculations should be done based on the data in the data warehouse. We also have some tables that control how the calculation should be performed (e.g. how data in separate data sets should be mapped to each other).
Over time, we have introduced more features to the calculation and most of these features need some control table structure so therefore the control tables have evolved into being more and more complex in their structure and in their interrelationships.
Now, with the great benefit of 20-20 hindsight, it is easy to see some areas of the structure of the control tables that are sub-optimal at best and downright ugly at worst. There is one example where the same column of the same control table is used to map to two separate concepts in the model, depending on what exact data is used.
Why did this happen? Why were we not able to control the structure of these important tables over time in the project. I see a few different elements of an explanation:
- We did not have a good enough understanding of the total problem when we started. (of course this is not what you typically have in an agile project, which leads us to the other explanations)
- The project, like all projects, was really pressed for time, so nobody wanted to redo work that was already invested in the early versions of the control tables
- The test and deployment processes of the project takes a long time, therefore rework was not really wanted.
So, what do I think we should have done instead
- Invested some time in some information modeling early on, so that we could have a better view of the full picture of these control tables early on
- Created reusable test cases and possibly some level of automated tests so that we could easily retest something that was reworked.
- Spent more time after the initial functionality was created to cleanup and fix these kinds of technical debt issues.