settingsLogin | Registersettings
Scalefree
Show Menu

Multiple versions of an entity arriving in a satellite at the same time

0 votes
Hi everybody,

I have to create a Data Vault model based on data that is arriving in flat files.
These flat files do not contain any timestamps at the row level.
Part of the data is related to cars, with information like registration number, model, mileage, etc.

My problem is that in the same file, the same vehicle - that is, the same registration number - could have different attribute values. For example, different car manufacturer, because there has been a manual error, saved but immediately corrected in the source system that provided the flat file.

The question is: how do I model this data? I started by creating a vehicle hub (with the registration number as the business key) and a vehicle satellite with the attributes. The satellite has a LoadDate and a LoadEndDate.

Let's imagine that I receive a file with the following two rows:
Registration Number | Manufacturer
AA12 ABC | Renault
AA12 ABC | BMW

Shall I insert the two rows in the satellite with the same LoadDate? If I do that, how can I reconstruct the original data later, knowing that the join with the link tables will be done on the hash key plus the LoadDate - which are the same for both rows?

Let me know if you need more details.

Thanks in advance,

Pierre-Emmanuel
asked May 19, 2016 in Modelling by pelarrou (120 points)

1 Answer

0 votes
Hi Pierre-Emmanuel,

the only possible solutions I see here is to include the LineNumber and to take the 'later' version as the newer or active one and to insert the same LoadDate.
So you are able to reconstruct the imports precisely and you can select by LoadDate and LineNumber.

Another possibility is to add for example one second to the LoadDate for every forthcoming Verision/Row if you are not loading to often or not near realtime.
Sorting of course by line-number of the import-file.
You can of course try to sort by Manufacturer but I think it is more realistic that the earlier lines are the earlier versions...

hope this helps
gherzog
answered May 23, 2016 by gherzog (380 points)
Scalefree Scalefree

Upcoming Trainings

  • July 01 to 03 - Amsterdam (English) Data Vault 2.0 Boot Camp and Certification

  • August 05 to 07 - Berlin (German) Data Vault 2.0 Boot Camp and Certification

  • September 9 to 13 - Hanover WWDVC EU

  • September 16 to 18 - Brussels (English) Data Vault 2.0 Boot Camp and Certification

  • October 21 to 23 - Hanover (German) Data Vault 2.0 Boot Camp and Certification

  • November 18 to 20 - Amsterdam (English) Data Vault 2.0 Boot Camp and Certification

  • December 02 to 04 - Dusseldorf (German) Data Vault 2.0 Boot Camp and Certification

  • December 16 to 18 - Zurich (English) Data Vault 2.0 Boot Camp and Certification

  • Contact us for inhouse training engagements.
    Visual Data Vault: Logical Modelling for Data Vault - Free Download!
    Scalefree
    DataVault.guru is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
    Permissions beyond the scope of this license are available in our license.

    DataVault.guru | DWH Wiki | Recent questions RSS feed | Imprint | Provided by Michael Olschimke
    ...