settingsLogin | Registersettings
Scalefree
Show Menu

referenced fields in hubs

+2 votes
Hi all, please bear with me since I'm still fairly new to DV. I'm trying to create a join between two hubs where one is using the business key and the other is from a normalized source. Say I have an Employee table in the source system with EmployeeNumber as the PK and a second Departments table with DepartmentId as a reference from Employee to Departments. So the structure would look something like this:

Hub_Employee
- SQN
- LoadDate
- RecordSource
- EmployeeNumber

Hub_Department
- SQN
- LoadDate
- RecordSource
- DepartmentName (used as the business key, assuming there are no duplicates)

Link_Employee_Department
- SQN
- EmployeeSQN
- DepartmentSQN
- LoadDate
- RecordSource

My question is how would I go about joining Employee and Department, considering Employee doesn't have a department name? From what I can see there are a few options, from querying the production system when the either one is loaded (not my favorite option) to using a reference table to go outside the regular schema, referencing it downstream, and so on.

Any other ideas? I'd appreciate your help!
asked Dec 4, 2013 in Modelling by a.miller (290 points)
retagged Jan 9, 2015 by molschimke

3 Answers

+2 votes
Since the Employee doesn't have a department name, it sounds like you're trying to add something that doesn't exist in the source, i.e. a business rule. When you want to add a relationship but that relationship doesn't exist in the source, you might want to consider adding that transformation later in the process, either in the Business Vault or in Conformed Dimensions in a star schema set up.
answered Dec 25, 2013 by centos (650 points)
Thanks for your answers! So, would DepartmentId in the employees table not qualify as data in the source since it refers to the value in the lookup table?

As a separate example, say I had a Product table with ProductId as the primary key and ProductNumber as the BK. ProductId would be used as a foreign key to other tables such as ProductOrder. How could I bring in ProductOrder when it only uses ProductID from the Products table? Again, I'm still trying to learn, so sorry if I'm missing something obvious. Thanks for all your help!
+2 votes
A bit off-topic: rarely is it ever good practice to use a name as a business key. What if the department name changes? You'd have to create a whole new instance. Links define the relationships between your BKs, so maintaining integrity and avoiding duplicates is vital . I'd recommend using an ID column like DepartmentId.

Back to the point, are you talking about using alternative keys? If you have different BKs in different tables on the source system, then you need to map them somewhere in order to identify the relationships.
answered Jan 15, 2014 by eve_182 (430 points)
+1 vote
From what I can tell (and I could be wrong) you'd need= your Employee and Department objects, two satellites to match and a link to contain the DV keys from each both Employee and Department. The nice thing about this setup is that you shouldn't need any reference tables.

If you're following DV 2.0 certified practice for generating keys using MD5 hashing (i.e., generating unique keys, keeping the keys consistent, not using MD5 as a primary key and only using MD5 as a BK when necessary) , then you should be able to calculate and load it for each object simultaneously. A huge benefit to joining objects this way is that it lets you load them in parallel. In our case, we calculate them when staging the raw DV. The hashed keys also make it easier to join and pull data from the satellites later on. During the ETL, you can use the DepartmentId FK in Employee to access the BK from Department and generate the key used in the join to populate it. Even if you're using sequence numbers, the ETL would only need to add the join on Department and obtain the key for the join. The only downside is that you'd have to sequence the loads to populate Employee and Department first, or you could run into issues when joining them.
answered Feb 5, 2014 by Annanana (600 points)
Scalefree Scalefree

Upcoming Trainings

  • July 01 to 03 - Amsterdam (English) Data Vault 2.0 Boot Camp and Certification

  • August 05 to 07 - Berlin (German) Data Vault 2.0 Boot Camp and Certification

  • September 9 to 13 - Hanover WWDVC EU

  • September 16 to 18 - Brussels (English) Data Vault 2.0 Boot Camp and Certification

  • October 21 to 23 - Hanover (German) Data Vault 2.0 Boot Camp and Certification

  • November 18 to 20 - Amsterdam (English) Data Vault 2.0 Boot Camp and Certification

  • December 02 to 04 - Dusseldorf (German) Data Vault 2.0 Boot Camp and Certification

  • December 16 to 18 - Zurich (English) Data Vault 2.0 Boot Camp and Certification

  • Contact us for inhouse training engagements.
    Visual Data Vault: Logical Modelling for Data Vault - Free Download!
    Scalefree
    DataVault.guru is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
    Permissions beyond the scope of this license are available in our license.

    DataVault.guru | DWH Wiki | Recent questions RSS feed | Imprint | Provided by Michael Olschimke
    ...