settingsLogin | Registersettings
Show Menu

Country - City - District modelling in Data Vault.

+1 vote
I need to implement the hierarchy for Country - City - District where 1 to many relations exist between them. Apparently, the business key for city is its name with the country while the business key for district should be its name with either the city reference or city + country.
It was suggested that each of the above mentioned entities should be separate Hubs with the name as business key. However, I disagree because there are many cities with the same name. So, if I created a hub for city with the name as business key, one record might represent multiple cities.
How this can be implemented?
asked Jan 7, 2016 in Modelling by husseinfareed (130 points)

1 Answer

0 votes
You were told right that each entity here requires a Hub and yes you are right that there a multiple cities that have the same name. The part that would be able to distinguish between these would be the link table associated containing Country, City and District. Yes a single City name may appear in multiple countries and a single district name may be associated with multiple cities but a very key point of the data vault is that it doesn't put constraints on the data it captures. It relies on these constraints being in place on the source system and if they are not and they are broken then a report can be built to show it and the issue rectified in the source system.
answered Jan 7, 2016 by liamwhales (220 points)
Hi liamwhales,
From my point of view, I disagree. As stressed by Dan Linstedt in his books, the business key should have meaning to the business and they should not require any additional keys to provide them with the grain of definition. This is not the case here, for example the City hub does not represent the city here; it represents the "City Name" as the city is represented with both city name and country.
You are right, I would say that it would be easier to implement as I suggested above and only store data against the Link and not the Hubs but yes that would not truly DV Compliant (As you are adding into the structure the requirement that people know not to add information against those Hub's and it's these sorts of hidden requirements that data vault aims to avoid)

So in which case to implement this you'd still need three Hubs, two of which would contain composite business keys. The City Hub would also contain the Country Name and the District Hub would contain the Country and City as well. However in terms of referencing there would be no join between the Countries stored in the District Hub and those stored in the other Hubs. There is no issue with having multiple columns for a Business Key in a Hub, this is mentioned in DV books as an option, you just need to be cautious as it could be an indication of a poor key choice.

To detail why I gave my first suggestion, which was incorrect from a true DV standpoint. I am aiming to keep the structure of my DV as repeatable as possible so that I can wrap it in scripts in order to automate as much of the process as possible. Having composite keys in my DV adds a huge amount of complexity into the scripts and since the process is automated people will not be able to tie data to a level which is does not arrive at, meaning that, in the case you are providing, all satellite data could only be tied to the link described in my first suggestion, thus mitigating the risk of hidden knowledge.
I have already faced that need and solved it storing that hierarchy to a satellite of something - like the order hub. It did not deal with the geography as a business key, but only as descriptive data. Do you folks say this is wrong? Or just lazyness?
Scalefree Scalefree

Upcoming Trainings

  • July 01 to 03 - Amsterdam (English) Data Vault 2.0 Boot Camp and Certification

  • August 05 to 07 - Berlin (German) Data Vault 2.0 Boot Camp and Certification

  • September 9 to 13 - Hanover WWDVC EU

  • September 16 to 18 - Brussels (English) Data Vault 2.0 Boot Camp and Certification

  • October 21 to 23 - Hanover (German) Data Vault 2.0 Boot Camp and Certification

  • November 18 to 20 - Amsterdam (English) Data Vault 2.0 Boot Camp and Certification

  • December 02 to 04 - Dusseldorf (German) Data Vault 2.0 Boot Camp and Certification

  • December 16 to 18 - Zurich (English) Data Vault 2.0 Boot Camp and Certification

  • Contact us for inhouse training engagements.
    Visual Data Vault: Logical Modelling for Data Vault - Free Download!
    Scalefree is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
    Permissions beyond the scope of this license are available in our license. | DWH Wiki | Recent questions RSS feed | Imprint | Provided by Michael Olschimke