settingsLogin | Registersettings
Scalefree
Show Menu

Historical data in Data Vault

+1 vote
Does anyone have experience with building historical storage into the modern DV models? What options or suggestions do you have? What about possible challenges during the process? Back stocking history has never been an easy task for me. Any advice you all can provide would be greatly appreciated as this is a required task within the business realm.
asked Dec 14, 2013 in Relational by Sabnr (290 points)

3 Answers

+1 vote
Staging areas can provide a multitude of advantages, but they range in various levels of complexity. This could be anywhere from target databases to file systems. Their primary purpose is to ensure data accuracy, improve overall proficiency of ETL processes, and support record operations.

I have limited access to historical documents within the field as a whole; and the documents I have been granted access to are just as much of a hassle. For instance: There is a limit on the number of business keys a hub can encompass. Hubs are comprised of unique keys and Meta information that label the origin of the data. The graphics and attributes for the record in the Hub are deposited into Satellite tables.

Even still, this can be very problematic because while I have the capability to back stock the majority of the historical storage and restore it to its original state, the original start date still remains a mystery, with the exception of a few fields.
answered Jan 14, 2014 by Adam.S (600 points)
+3 votes
The simple answer is to botch the row completely. I normally use an obstinate "landing zone" or stage area of some form to establish the LDTS to the whole set as a primary load. Data staging areas tend to be transitory by default. They're actually designed to contain records for extended amounts of time. I personally find it beneficial to attach them so that I am able to replace the DV as necessary.  
answered Feb 14, 2014 by Johnson (360 points)
+1 vote
At the end of the day, it all boils down to the date you choose. Because of the variations in business it is not practical to remove all the data from Operational databases at the same time. In my experience, issues like these are typically resolved using designs where the LDTS replaces the date, extract date, event and or time. The purpose of this is to ensure all of the data is arranged accordingly. This makes it less complex and easier to comprehend.

Not all businesses will require you to have a staging area. Even still, it will probably be more reasonable to spare some sales information. Extracting this type of information on a regular basis could negatively impact financial records requiring a month-end defrayal process. Similarly, for most businesses it is better to copy records using ETL directly into the Data Warehouse.
 
As a matter of fact, I used the following system to test this idea:

- When utilizing the LDST, insert into the database the date and time.
- Add event/date/time - in that order and format - to the hash for comparisons.
- Supplement 'historical' time variant data.

Doing this allows you to determine if the records including the date and time of the event is present or not. This can be handled one of two ways:
 
- Using an updated LDTS: Remove using the date or time of the specified stage area.
- Using the solution inside the design: In this case, the event’s date and time will be equal to the LDTS.
 
During a typical integration process, contention levels inside source systems are placed at a minimum and important data is copied from source systems to the landing zone. Previous methods were used to manipulate technical competencies like data coursing technologies, for example. These devices were reduced overhead by decreasing the necessity to re-establish connections used for source systems and optimizing lock management.

It is imperative for the ETL to terminate rows that have been previously processed and to reload various changes to each individual key per integration in the event that overwriting would be necessary. Doing this will save the DV model and remove all collateral.
answered Mar 17, 2014 by helen.f (280 points)
Scalefree Scalefree

Upcoming Trainings

  • July 01 to 03 - Amsterdam (English) Data Vault 2.0 Boot Camp and Certification

  • August 05 to 07 - Berlin (German) Data Vault 2.0 Boot Camp and Certification

  • September 9 to 13 - Hanover WWDVC EU

  • September 16 to 18 - Brussels (English) Data Vault 2.0 Boot Camp and Certification

  • October 21 to 23 - Hanover (German) Data Vault 2.0 Boot Camp and Certification

  • November 18 to 20 - Amsterdam (English) Data Vault 2.0 Boot Camp and Certification

  • December 02 to 04 - Dusseldorf (German) Data Vault 2.0 Boot Camp and Certification

  • December 16 to 18 - Zurich (English) Data Vault 2.0 Boot Camp and Certification

  • Contact us for inhouse training engagements.
    Visual Data Vault: Logical Modelling for Data Vault - Free Download!
    Scalefree
    DataVault.guru is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
    Permissions beyond the scope of this license are available in our license.

    DataVault.guru | DWH Wiki | Recent questions RSS feed | Imprint | Provided by Michael Olschimke
    ...