settingsLogin | Registersettings
Scalefree
Show Menu

Question on hubs

+2 votes
I've been reading through "Super Charge your Data Warehouse," and have learned a great deal. Still, I find myself struggling with a key element of this. I understand that Hubs contain of a number sequence, a business key, load date, and a record source. As I've come to understand it, hubs act are representative of real world assets. For example, an itinerary of products may be logged as business keys that are itemized by IDs assigned to them by a CRM system. Understandably, these numbers serve a purpose in satellite tables, condensing historical data into one place. This, however, fails to explain just why hubs are given these IDs. As Hubs lend themselves to a substantial amount of variability, changing the satellite table at any given time. Herein lies my confusion.

Am I to understand that these numbers are there in order to itemize records from stratified groups in order to give a juxtaposed view of different records?

If so, does this not complicate the view if it is possible for the same ID to be used for multiple items reflecting real-world assets?

And if this is the case, doesn't normalization of a proper record fly out the window if the same ID is also a primary key?
asked Dec 16, 2013 in Modelling by Will_Malone (220 points)

3 Answers

0 votes
A satellite getting its key from a Hub sequence?
answered Jan 18, 2014 by Mr.Fantastic (240 points)
I am aware of how a Hub's numbers is recognized as the foreign key inside links and satellites. But this doesn't explain why an entirely separate key is needed if the primary key already exists. If production assigns an ID and I can rewrite it and assign it any value I choose, then production becomes irrelevant. Or, am I understanding correctly that using the production key as the ID works just as well?
+1 vote
Hubs should rely on surrogate keys in the event of conflicts a database may have with natural keys. From Wikipedia: "In some circumstances the natural key that uniquely identifies a tuple in a relation may be cumbersome to use for software development. For example, it may involve multiple columns or large text fields. In such cases, a surrogate key can be used instead as the primary key."

From what you mentioned, using the ID assigned by the CRM might be best for avoiding confusion. If can confidently identify values as unique, then the there should be not reason to suspect conflict from varying sources.
answered Feb 20, 2014 by action.fancy.girl (580 points)
+1 vote
Using a surrogate from the source as the hub business key may not be the best way to go. Business keys should be taken naturally; a surrogate should not be necessary in most cases. It may seem like the simple solution, using a surrogate, but I prefer to avoid this method when possible. I don't suggest this without reason. If a surrogate is used, the numbers could be replicated during upgrades. If the original numbers are abandoned, the workload increases if new keys must be generated for use in a new system.
answered Mar 25, 2014 by Müller.H (260 points)
Scalefree Scalefree

Upcoming Trainings

  • July 01 to 03 - Amsterdam (English) Data Vault 2.0 Boot Camp and Certification

  • August 05 to 07 - Berlin (German) Data Vault 2.0 Boot Camp and Certification

  • September 9 to 13 - Hanover WWDVC EU

  • September 16 to 18 - Brussels (English) Data Vault 2.0 Boot Camp and Certification

  • October 21 to 23 - Hanover (German) Data Vault 2.0 Boot Camp and Certification

  • November 18 to 20 - Amsterdam (English) Data Vault 2.0 Boot Camp and Certification

  • December 02 to 04 - Dusseldorf (German) Data Vault 2.0 Boot Camp and Certification

  • December 16 to 18 - Zurich (English) Data Vault 2.0 Boot Camp and Certification

  • Contact us for inhouse training engagements.
    Visual Data Vault: Logical Modelling for Data Vault - Free Download!
    Scalefree
    DataVault.guru is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
    Permissions beyond the scope of this license are available in our license.

    DataVault.guru | DWH Wiki | Recent questions RSS feed | Imprint | Provided by Michael Olschimke
    ...