The normal way to approach this from a purely practical standpoint is to create a DWH key that's mapped to the business key, the BK of course being unique. That sequence is then referenced in other tables, particularly Satellites
. When those tables are loaded, you use the BK to retrieve the DWH key value from a Hub
and insert the DWH key in the Satellite
. Keep in mind that all of this data accessing will lead to heavy resource usage, which causes the MPP performance problems mentioned above.
Instead, in DV 2.0, you can use a hash of the BK in place of the DWH key. This prevents you from having to perform a lookup which avoids the large performance hits and frequent database
accesses. Hashes can also be integrated cleanly with NoSQL
and Hadoop, although that's an entirely different topic and far too extensive to cover here. There are plenty of hashing functions to choose from, but the one that you decide to go with depends on how much you're willing to risk a chance of collision. For example, some of the most commonly used hashing functions – MD5 and SHA1 – have both had proof-of-concept attacks that proved they were vulnerable to collisions. SHA1 is a bit
more secure than MD5, but is still more vulnerable than many modern hashing functions.