settingsLogin | Registersettings
Scalefree
Show Menu

No primary /unique data in source flat file.

0 votes
We have a situation when there is no unique business key columns in source flat file. We need to dump the complete file based on date. How to create the HUB MD5KEY for that.
asked Feb 25, 2016 in Modelling by ranjeet (120 points)

1 Answer

0 votes
Hi Ranjeet,

could you provide a sample structure for that?

Is there a surrogate key? If you have nothing else, that would be the best option (figure 4.9 on page 97 of our book "Building a Scalable Data Warehouse with Data Vault 2.0").

However, be aware to not using this strategy in general. This is the least desirable option of many. Therefore, taking a look at your structure would be helpful.

Hope that helps.

Mike
answered Feb 25, 2016 by molschimke (1,890 points)
Hi Ranjeet,
Right.  When you can't find a Business Key, it is sometimes acceptable to create one, but understand that to make it 'enterprise-wide' (or at least "wide-ish") takes a good amount of persuasion of others over a long time.  

Doing data profiling, are you able to define a subset of the table's columns that, when combined in a GROUP BY, are sufficient to guarantee a unique record, and which you could thus consider a natural key. If you then converted all of them to strings, and concatenate them together, you might call that a newly-created business key.  If you can get to that, you can then run your MD5 Hash on it, and you have a Hub Primary Key.  

A generic example of that would be 'one productid, one orderid, one datetimestamp', which could then be used to derive an (admittedly lengthy) 'OrderLineItem ID'.
Scalefree Scalefree

Upcoming Trainings

  • July 01 to 03 - Amsterdam (English) Data Vault 2.0 Boot Camp and Certification

  • August 05 to 07 - Berlin (German) Data Vault 2.0 Boot Camp and Certification

  • September 9 to 13 - Hanover WWDVC EU

  • September 16 to 18 - Brussels (English) Data Vault 2.0 Boot Camp and Certification

  • October 21 to 23 - Hanover (German) Data Vault 2.0 Boot Camp and Certification

  • November 18 to 20 - Amsterdam (English) Data Vault 2.0 Boot Camp and Certification

  • December 02 to 04 - Dusseldorf (German) Data Vault 2.0 Boot Camp and Certification

  • December 16 to 18 - Zurich (English) Data Vault 2.0 Boot Camp and Certification

  • Contact us for inhouse training engagements.
    Visual Data Vault: Logical Modelling for Data Vault - Free Download!
    Scalefree
    DataVault.guru is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
    Permissions beyond the scope of this license are available in our license.

    DataVault.guru | DWH Wiki | Recent questions RSS feed | Imprint | Provided by Michael Olschimke
    ...