Hi, we are in very early phase of Data Vault implementation and below are some the architectural issues which we are facing and need your advice on how to deal with these,
Scenarios in Data Vault:
Scenario # 1: Initial load issue because Source Primary key doesn’t match with Data Vault Business key
Below is one of the scenario which we are facing because of different keys between source and Datavault
See below example:
Below is the source table where ID is the primary key,
ID Political Party Name Attribute # 1 Attribute #2
1 Democrat
2 Republican
3 Other
We designed our satellite where “Political Party name” is the business key which makes sense, however in source we can always duplicate the data with different ID
Please check below example,
ID Political Party Name Attribute # 1 Attribute #2
1 Democrat
2 Republican
3 Other
4 Democrat
We understand this is due to bad source database design, where we have Surrogate key as PK and no proper natural key. We have following issues because of this,
While performing for initial Loads – There will be two records with same business key which is causing unique identifier issue temporarily we are disabling the PK while loading the satellite.
Scenario # 2:
Since PK is the Surrogate key in the source application, there can be a scenario where we can do the following update, check above record ID #1 (Change form Democrat and Democrat1)
ID Political Party Name Attribute # 1 Attribute #2
1 Democrat1
Now when we load the Satellite we treat this as a new record and create a new satellite record, because “Democrat1” is a new key instead of updating the existing record.
Scenario # 3:
We capture deletes in Source system and make them inactive in satellite, so in the above scenario, after the initial load we will have 2 satellite records active which matches the source, however when the delete happens on source we have no idea to which record to be deactivated since the business key is “Democrat”
ID Political Party Name Attribute # 1 Attribute #2
1 Democrat
2 Republican
3 Other
4 Democrat
Please suggest how to handle these scenarios where surrogate is key is PK in source and when we don’t have a strong natural key.