settingsLogin | Registersettings
Scalefree
Show Menu

How to handle different Business Key structures?

+1 vote
Hi everyone,

I have a question about how to handle different Business Key structures in multiple source systems.

In case of a single column BK with different data types I would use the same hub with a datatype which is capable to hold the BK from different sourcesystems.

But how to handle a BK (for example Customer-No) consisting of one column in sourcesystem A and 2 (or more) columns in sourcesystem B?

I am thinking of three approaches:

Approach 1: Split the hub into multiple hubs by sourcesystem or structure-equivalent sourcesystems.

- Hub_Customer_A (HUBKEY, BK_COL1, SOURCESYSTEM, ...)

- Hub_Customer_B (HUBKEY, BK_COL1, BK_COL2,  SOURCESYSTEM, ...)

=> any satellites would attached either to Hub_A or Hub_B depending on sourcesystem.
=> any consolidation could be solved in a same-as-link.

Approach 2: Modell a single Hub with a common mix of columns

- Hub_Customer (HUBKEY, BK_COL1, BK_COL2,  SOURCESYSTEM, ...)

=> in case of sourcesystem  A, BK_COL2 would be empty and the Hashkey would calculated on the concatenation of BK_COL1 and BK_COL2 (regular for composite keys)

Approach 3: Modell a single Hub with a common mix of Columns but with additional composite key column

- Hub_Customer (HUBKEY, BK_COMPOSITE, BK_COL1, BK_COL2,  SOURCESYSTEM, ...)

=> in case of sourcesystem A, BK_COL2 would be empty. BK_COMPOSITE is a copy of BK_COL1 and  the Hashkey would calculated on BK_COMPOSITE.

=> in case of sourcesystem B, BK_COMPOSITE is a concatenation of BK_COL1 and BK_COL2 and  the Hashkey would calculated on BK_COMPOSITE.


What would be the best solution?

kind regards,

Christoph
asked Feb 25, 2016 in Modelling by christoph (170 points)

1 Answer

+2 votes
 
Best answer
Hi Christoph,

the best solution is...it depends. Here is our Jedi rule (pls don't tell George Lucas, we did not license it ;-) ):

As long as you are able to recreate your source data from the Raw Data Vault model, you are able to change / refactor your solution. In this case, you are on the good side of the model. If you are loosing data (or generating data that was not in the source system) after recreating your source data, you are on the dark side of the model.

Under this light, approach 1 will work: you can create the source data, but you have a number of entities. Especially if you have to add new source systems or new keys, you end up with new entities all the time. On the other hand, adding a new business key structure (a new hub + same as link) means no refactoring required of your current solution, hence a zero change impact (which is desired).

Approach 2 on the other hand will require some refactoring (adding columns) if you have more key structures coming in.  However, you will end up with much less entities. Are your keys are all the same granularity and business object (do not mix business objects)? If so, this might be the best option.

Approach 3 is non-standard but sounds like a smart key. Wouldn't that be an option? If your keys are so different in structure? However, this requires some more analysis. A smart key appeals to me in this case, not really the composite key + additonal keys. Here is the problem with that: think about automation. You need additional metadata to describe which key columns in your hub go into the hash key computation. Not good (some other reasons, but all of them about standardization).

So, my recommendation for you? I don't have one. It depends on your actual data, how often the keys will change in structure.

My only actual recommendation here is to stay on the "good side of the model". If not, add your comments / questions. I'll bring my rebel army to answer.

Hope that helps,

Mike
answered Feb 25, 2016 by molschimke (1,890 points)
selected Mar 1, 2016 by christoph
Scalefree Scalefree

Upcoming Trainings

  • July 01 to 03 - Amsterdam (English) Data Vault 2.0 Boot Camp and Certification

  • August 05 to 07 - Berlin (German) Data Vault 2.0 Boot Camp and Certification

  • September 9 to 13 - Hanover WWDVC EU

  • September 16 to 18 - Brussels (English) Data Vault 2.0 Boot Camp and Certification

  • October 21 to 23 - Hanover (German) Data Vault 2.0 Boot Camp and Certification

  • November 18 to 20 - Amsterdam (English) Data Vault 2.0 Boot Camp and Certification

  • December 02 to 04 - Dusseldorf (German) Data Vault 2.0 Boot Camp and Certification

  • December 16 to 18 - Zurich (English) Data Vault 2.0 Boot Camp and Certification

  • Contact us for inhouse training engagements.
    Visual Data Vault: Logical Modelling for Data Vault - Free Download!
    Scalefree
    DataVault.guru is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
    Permissions beyond the scope of this license are available in our license.

    DataVault.guru | DWH Wiki | Recent questions RSS feed | Imprint | Provided by Michael Olschimke
    ...