settingsLogin | Registersettings
Scalefree
Show Menu

Data Vault Modelling Scenario When Source PK is Surrogate Key

+1 vote

Hi, we are in very early phase of Data Vault implementation and below are some the architectural issues which we are facing and need your advice on how to deal with these,

Scenarios in Data Vault:

 

Scenario # 1:  Initial load issue because Source Primary key doesn’t match with Data Vault Business key

Below is one of the scenario which we are facing because of different keys between source and Datavault

See below example:

Below is the source table where ID is the primary key,

ID    Political Party Name    Attribute # 1   Attribute #2

1       Democrat

2       Republican

3       Other

We designed our satellite where “Political Party name” is the business key which makes sense, however in source we can always duplicate the data with different ID

Please check below example,

ID    Political Party Name    Attribute # 1   Attribute #2

1       Democrat

2       Republican

3       Other

4       Democrat

We understand this is due to bad source database design, where we have Surrogate key as PK and no proper natural key. We have following issues because of this,

While performing for initial Loads – There will be two records with same business key which is causing unique identifier issue temporarily we are disabling the PK while loading the satellite.

 

Scenario # 2:

Since PK is the Surrogate key in the source application, there can be a scenario where we can do the following update, check above record ID #1 (Change form Democrat and Democrat1)

ID    Political Party Name    Attribute # 1   Attribute #2

1       Democrat1

Now when we load the Satellite we treat this as a new record and create a new satellite record, because “Democrat1” is a new key instead of updating the existing record.

 

Scenario # 3:

We capture deletes in Source system and make them inactive in satellite, so in the above scenario, after the initial load we will have 2 satellite records active which matches the source, however when the delete happens on source we have no idea to which record to be deactivated since the business key is “Democrat”

 

ID    Political Party Name    Attribute # 1   Attribute #2

1       Democrat

2       Republican

3       Other

4       Democrat         

 

Please suggest how to handle these scenarios where surrogate is key is PK in source and when we don’t have a strong natural key.

asked Nov 21, 2015 in Modelling by Mastan

1 Answer

0 votes
Hi Mastan,

regarding Scenario #1:

if you use the natural key (regardless of any design arguments) in the hub, you'll get an unique hash key in return that can be used in your satellites.

If you load your historic batches in the chronological order (as it is recommended), you should only add the natural key to the hub once (when it appears for the first time in one of your batches).

Adding the descriptive data to the satellite should not be a problem, as a new record is only added when a change to the descriptive attributes is captured.

Side comment: Because I don't know your source system very well, I don't want to argue with you about the key.

Scenario #2:

This looks awkward, as you're not integrating your data anymore.

Scenario #3:

Same as #2.

In general, the hub should solve your key issues. Only if the descriptions change, a new satellite entry is added to the dependent satellite.

Hope that helps. Let me know if you have further questions, or if I'm off with my answers. Sorry, it is already getting late here.

-Mike
answered Nov 21, 2015 by molschimke (1,890 points)
I have a simillar problem: The source system has a table like this:

os_id  os_desc
1          Windows XP
2          Windows Server 2005
3          Debian 7 Server

The os_id does not work for business key because data comming from other systems are not going to use the same sequence. So I am left with os_desc as the business key. But data comming from other systems might have different spelling so no integration whatsoever again. I believe a Same-As table is the advisable solution here, but still in the dark. (Did not read the DV2 book yet.)
Scalefree Scalefree

Upcoming Trainings

  • July 01 to 03 - Amsterdam (English) Data Vault 2.0 Boot Camp and Certification

  • August 05 to 07 - Berlin (German) Data Vault 2.0 Boot Camp and Certification

  • September 9 to 13 - Hanover WWDVC EU

  • September 16 to 18 - Brussels (English) Data Vault 2.0 Boot Camp and Certification

  • October 21 to 23 - Hanover (German) Data Vault 2.0 Boot Camp and Certification

  • November 18 to 20 - Amsterdam (English) Data Vault 2.0 Boot Camp and Certification

  • December 02 to 04 - Dusseldorf (German) Data Vault 2.0 Boot Camp and Certification

  • December 16 to 18 - Zurich (English) Data Vault 2.0 Boot Camp and Certification

  • Contact us for inhouse training engagements.
    Visual Data Vault: Logical Modelling for Data Vault - Free Download!
    Scalefree
    DataVault.guru is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
    Permissions beyond the scope of this license are available in our license.

    DataVault.guru | DWH Wiki | Recent questions RSS feed | Imprint | Provided by Michael Olschimke
    ...