settingsLogin | Registersettings
Scalefree
Show Menu

combining data in Data Vault 2.0

+3 votes
Can anyone tell me if you should store combined data from sources in the data vault? I'm torn between the options. If we decide to store combined data, and the business wants to change requirements for storing data, wouldn't this create problems? On the other hand, the business rules may never change and this may never be an issue. Then again, in the satellite, we have a number of fields for one table. Some of the fields must have a certain amount of integration for the analysis. If we don't combine the source information, we'll have to create a table with a number of fields.

I'm wavering back and forth and would really appreciate your perspectives and advice. What would be the best solution keeping in mind that the business rules could or could not change in the future?
asked Dec 22, 2013 in Modelling by DataVault_Alex (260 points)

4 Answers

0 votes
One reason why the DV is so popular is because it is a convenient way to gather and maintain data exactly as it appears on the Hub and satellite. Integration of data is intended to happen when the data is moved from the DV to the BI cubes. Basically, the DV is just supposed to collect and store the data from the sources. The data is not meant to be changed or modified when it is collected. You could solve the problem by creating views for data that is out-bound from the system. You could also specify that any data changes that are necessary be done at this time. You could also use service level agreements to document any changes to the data.
answered Jan 30, 2014 by Infinite (330 points)
0 votes
I really believe that each database stored in the DV basically is a copy of the original database. I also think that if the data is integrated, ti should be done by a data warehouse application. Basically the goal is to remove any unnecessary data processing.
answered Mar 10, 2014 by Annanana (600 points)
Annanana,

the structure of the data changes and it is historized. That is why I personally would refrain from the term "copy."

Also, the preprocessing (application of business rules) is shifted downstream (towards the business user). But you are right: restating the data is reduced (processing the data in order to load it into a data warehouse model, such as third-normal form and then taking back some of the raw data modifications in order to present it to a business user).

-Mike
+2 votes
If you are not starting with enterprise hubs that have key alignment, I don't think that the data vault can actually handle the data. I believe you should separated the source data from the satellite source system, but there's a lot of reasons why this is very important. Sats should be dependent on the same hub if they are being used by the same business. I believe that this type of set up is what makes a DV system so important.
answered Apr 18, 2014 by centos (650 points)
0 votes

Alex,

the goal of the Raw Data Vault is to integrate the data from multiple sources with the following goals (a selected list):

  • integrate the raw data from multiple operational source systems by the business key
  • historize the descriptive data (in satellites)
  • ensure the auditability of the raw data.

To achieve these goals, the meaning of the raw data should not be modified, because such modification prevents the auditability of your data. 

While integration takes place, the data is not combined. A best practice (or lets say important recommendation) is to separate the source systems into individual satellites. That way, each satellite can be "optimized" for each source system and load all data at all times. The source system data (stored in individual satellites) is integrated via the parent (a hub or link). 

You're referring to "integration for analysis." This typically takes place after the raw data has been loaded into the Raw Data Vault. The integration is the application of (soft) business rules. They are implemented in the Business Vault, or in Information Marts (a.k.a. Data Marts). 

I hope that helps you a bit,

-Mike

answered Jan 8, 2015 by molschimke (1,890 points)
Scalefree Scalefree

Upcoming Trainings

  • July 01 to 03 - Amsterdam (English) Data Vault 2.0 Boot Camp and Certification

  • August 05 to 07 - Berlin (German) Data Vault 2.0 Boot Camp and Certification

  • September 9 to 13 - Hanover WWDVC EU

  • September 16 to 18 - Brussels (English) Data Vault 2.0 Boot Camp and Certification

  • October 21 to 23 - Hanover (German) Data Vault 2.0 Boot Camp and Certification

  • November 18 to 20 - Amsterdam (English) Data Vault 2.0 Boot Camp and Certification

  • December 02 to 04 - Dusseldorf (German) Data Vault 2.0 Boot Camp and Certification

  • December 16 to 18 - Zurich (English) Data Vault 2.0 Boot Camp and Certification

  • Contact us for inhouse training engagements.
    Visual Data Vault: Logical Modelling for Data Vault - Free Download!
    Scalefree
    DataVault.guru is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
    Permissions beyond the scope of this license are available in our license.

    DataVault.guru | DWH Wiki | Recent questions RSS feed | Imprint | Provided by Michael Olschimke
    ...