settingsLogin | Registersettings
Scalefree
Show Menu

Initial load of satellites

+2 votes
Our system has a history that has to be loaded from its source systems into a link-related satellite, with the link having an effectivity satellite. Our usual approach would be to load the active relationship initially, then access the history through the satellite. The problem with this approach is that we have to make the entire history of a relationship available in the raw data vault.

My question is, how can we change our loading process and/or model to accomplish this? Bear in mind we can't load the historical rows at the same time as the active row as it would cause two conflicting satellite keys (LinkID and Load Date).

We've considered a few options:
1. Adding a new column to the satellite key for the business effectivity begin date. So far this is the most likely option.
2. Performing multiple passes on the initial load, resulting in a unique Load Date for each satellite key.
3. Using the effectivity begin date as the Load Date for the initial load, then using the actual Load Date for daily incremental loads. The drawback with this option is that it overloads the meaning of Load Date with business affectivity.
asked Dec 11, 2013 in Raw Data Vault by The_IT_guy (280 points)
recategorized Jan 9, 2015 by molschimke

2 Answers

+3 votes
 
Best answer
Option 1 is certainly feasible, and is actually a pretty common solution. Just remember that one of the recurring arguments against multi-active satellites is that the grain of the satellite will be different from your upstream SQL's Hub. It will work, but I make it a point to only go this route if I have to.

The second option will also work. While it's not a terrible option, I'd recommend trying to get them into the same table, otherwise the satellites will share the exact same grain and structure. That being said, I end up using this approach in cases where we handle “actual” and periodic “history” in near real-time. It depends on the date being used for the LDTS and what you consider “most recent”.

Basic DV uses the database insertion time of the record and stores additional context in the satellite, which links to the model to provide a full temporal view.

While basic DV uses the database insertion time of the record and stores additional context in the satellite, there's an enormous array of date/time timestamps to choose from. Plus, being able to load in parallel is a nice bonus.

What it really boils down to is that it's possible to add additional context to the satellite without having to redefine the key. You simply add the business effective date to the hash as a normal attribute, check if the hash already exists in the LSAT and insert it if it doesn't. This prevents you from having to do additional comparisons later. This will also work if you switch to delta after doing your existing load, but you'll need to make sure that your delta detection is in good form before going this route. Otherwise, you'll end up repeatedly comparing each record.

For a more deterministic outcome, I like to use a Historical Staging Area to harden the LDTS before each load. Using a staging area adds complexity, but gives you a bit more control and predictability. Of course, the date that you use depends on which layer the data is currently in and where it's being delivered.
answered Feb 5, 2014 by NThompson (660 points)
+1 vote
I agree that option 1 is your best choice. After all, satellites are designed to track changes over time. Adding the effectivity begin date to the key not only provides time variance, but helps retrieve the point-in-time data without overloading the LDTS.

As an alternative, have you considered using a separate satellite? Load the historical data from the source systems and build the forward history in the separate satellite, then merge the data at a later point or at the business vault. Not only does this isolate the quality issues with historical data, but it lets you build your history in parallel with the daily load process.
answered Jan 8, 2014 by RamonDo (600 points)
Scalefree Scalefree

Upcoming Trainings

  • July 01 to 03 - Amsterdam (English) Data Vault 2.0 Boot Camp and Certification

  • August 05 to 07 - Berlin (German) Data Vault 2.0 Boot Camp and Certification

  • September 9 to 13 - Hanover WWDVC EU

  • September 16 to 18 - Brussels (English) Data Vault 2.0 Boot Camp and Certification

  • October 21 to 23 - Hanover (German) Data Vault 2.0 Boot Camp and Certification

  • November 18 to 20 - Amsterdam (English) Data Vault 2.0 Boot Camp and Certification

  • December 02 to 04 - Dusseldorf (German) Data Vault 2.0 Boot Camp and Certification

  • December 16 to 18 - Zurich (English) Data Vault 2.0 Boot Camp and Certification

  • Contact us for inhouse training engagements.
    Visual Data Vault: Logical Modelling for Data Vault - Free Download!
    Scalefree
    DataVault.guru is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
    Permissions beyond the scope of this license are available in our license.

    DataVault.guru | DWH Wiki | Recent questions RSS feed | Imprint | Provided by Michael Olschimke
    ...