Data Vault Ensemble Modeling

Virtual Data Warehousing

Christian Hädrich, DÖRFFLER + PARTNER

“For a Data Warehouse we do not have enough time” …. sound familiar to you?

Data Models are purpose driven. Although you can have a “wrong” Data Model, usually, there is no single “right” one. Be aware: Data only represents the real world (or what we call the “truth”). Like a map, that just shows us partial information (or an image) of the landscape, data representing business objects or processes can never be complete. And as you have maps for different purposes and levels of abstraction, some data models are better suited for some situations than others – you may need different ones even in a single company. Usually information is digitalized in order to automate and speed up business processes. Data Modeling is the art of defining a structure that describes how the data is stored with a view to efficiently process it and to draw the correct meaning. And just like on a map the information often is condensed or sometimes even just simplified.

When building a Data Warehouse we have to deal with two different types of Data Models: Data Model(s) of the Source System(s) and the Data Model of the Data Warehouse. Data Models of the Source Systems are designed for Source-System-Purposes and usually can’t be influenced by the needs of a Data Warehouse. The Data Model of a Data Warehouse is designed by the Data Warehouse Team. It should be able to receive all incoming data from the Source Systems but also feed the information demands of all consumers. Ideally a Core Data Warehouse model is multi-purpose-able. Guess what: the more purposes you try to satisfy, the harder it can be to design (and especially also feed) your model.

In 2014 Roelant Vos published the idea of a Virtual Enterprise Data Warehouse. The idea was conceived as a result of working on improvements for the generation of Data Warehouse loading processes. It is, in a way, an evolution in ETL generation thinking. Combining Data Vault Modeling with a Persistent Historical Data Store provides additional functionality because it allows the designer to preview new designs as well as to refactor parts of the existing Data Warehouse solution. Hybrid approaches for Data Warehousing are designed to be flexible, to be adaptable to accommodate changes in business use and interpretation. Working with data can be complex, and often the ‘right’ answer for the purpose is the result of a series of iterations where Business Subject Matter Experts and Data Professionals collaborate.

In other words, the Data Warehouse model itself is not always something you always can get right in one go. In fact, it can take a long time for a Data Warehouse model to stabilize, and in the current fast-paced environments this may even never be the case. The Virtual Data Warehouse helps maintain both, the mindset and capability for a data solution to keep evolving with the business and to reduce technical debt on an ongoing basis. This mindset also enables some truly fascinating opportunities such as the ability to maintain version control of the data model, the metadata and their relationship – to be able to represent the entire Data Warehouse as it was at a certain point in time – or to even allow different Data Models for different business domains at the same time.