settingsLogin | Registersettings
Show Menu

Loading question

+3 votes
As we all understand, Data Vault architecture allows one to establish load and recovery standards, query time-based information and establish synchronicity across content for the modeling aspects of a warehouse. I have a question concerning Data Vault.

I have completed the first segment of a design for a data structure based on DV techniques. I am now commencing with testing the design and load tables. I have run into a problem, and was wondering if other programmers have encountered the same problem and what approach they took to solve it.

I want to load a satellite for employee information from the main production system. This system is the core system used at the company. The record's effective load date is becoming a major problem for me. It is not possible to use the load end date for the employee's record because the legacy data has already been migrated into the production system. As a result, if I attempt to query with a date parameter, all records from before we started using the satellite will be excluded. There is absolutely no indication of when they started with the company using this approach, unless you want to obtain the data from the HR Department; this department maintains this type of history. Now I could just take the profile on the legacy system and upload the records following DV methods, but there is one big stumbling block.

In addition to a migration of the legacy data, some of the data was modified or enhanced to live in the current production system. There are several fields from the old records that I will need to access in order to draw an accurate image. Here's an example to show what I mean:

Hire Date=1/12/2018
Record Created Date in Production=1/1/2020
Records back-filled in system>=1/12/2018

Is it a bad idea to do a preliminary load using a place holder and update record-effective dates according to the data HR has on record?
asked Dec 7, 2013 in Implementation by neo (310 points)
recategorized Jan 9, 2015 by molschimke

3 Answers

+2 votes
Best answer
Here's a quick answer with another option—it's a bit complex, but it works. Use bi-temporality: one set of dates to track the technicality, such as DWH loading or source extract creations, and the other set of dates to monitor the business's effectiveness. With each record, compute both the business effectiveness and the technicality effectiveness. You can do this at BDV or during data consumption.
answered Feb 17, 2014 by Shiva1337 (900 points)
+1 vote
Your problem is rather easy to solve and I've provided some simple answers. All you need to do is follow the definition of your Loading Dates. These dates are system-driven dates and must be tagged with the date and time of insertion of any and all data to the warehouse. The only time that you can back-set the loading date is for a static, one-time historical load when you need to ask the question: what would the load end date have been if the loading processes were running back at the time the static data was put in place.

You'll need to add all dates and temporality you get from the source system and put them into the system structure as attributes. This action will allow you to easily coalesce, conform and align data at query time. You'll be able to organize the data by different date attributes over load time.

The only reason that the Loading Dates wouldn't work is if all source systems do not have a consistent date/time stamp to qualify as the load and end load date. For this reason, all dates that are imported from the source system, or as a result of tagging or extraction, should be considered attributes in your satellite system. Following this advice should resolve the issues you are experiencing.

If your queries persist in struggling with multiple Begin and End dates, consider using Point In Time or Bridge Tables to obtain equal-joins across all your records.
answered Dec 31, 2013 by Haag77 (290 points)
+1 vote
Once all of your data has loaded with the respective dates/times, you can create multiple views on your records timekeeper to see one or more business timelines at once. Taking this view for your current situation, you could use the hire date or several dates, where you always take the lowest date. The logic you decide to use will depend on how the business or corporation wants to view the information. Therefore, all logic should be discussed and agreed on beforehand by the programmers and the business managers. It seems to me that your problem is not so much a loading issue, but is more of a problem of deciding how to select the right data for your business managers. When you are doing loading, you should try to avoid using this logic. Because you can't trust the quality of the source data, you could end up destroying the integrity of your DWH and render it unauditable. You really want to avoid that problem if you can help it.
answered Jan 24, 2014 by Curly (300 points)
Scalefree Scalefree

Upcoming Trainings

  • July 01 to 03 - Amsterdam (English) Data Vault 2.0 Boot Camp and Certification

  • August 05 to 07 - Berlin (German) Data Vault 2.0 Boot Camp and Certification

  • September 9 to 13 - Hanover WWDVC EU

  • September 16 to 18 - Brussels (English) Data Vault 2.0 Boot Camp and Certification

  • October 21 to 23 - Hanover (German) Data Vault 2.0 Boot Camp and Certification

  • November 18 to 20 - Amsterdam (English) Data Vault 2.0 Boot Camp and Certification

  • December 02 to 04 - Dusseldorf (German) Data Vault 2.0 Boot Camp and Certification

  • December 16 to 18 - Zurich (English) Data Vault 2.0 Boot Camp and Certification

  • Contact us for inhouse training engagements.
    Visual Data Vault: Logical Modelling for Data Vault - Free Download!
    Scalefree is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
    Permissions beyond the scope of this license are available in our license. | DWH Wiki | Recent questions RSS feed | Imprint | Provided by Michael Olschimke