settingsLogin | Registersettings
Show Menu

Storage costs for high volume transactions

+2 votes
I would like to use the data vault methodology for a data warehouse project, but I have concerns about how the model addresses the need for costly, high volume storage.  Specifically, my project has two hubs, a link and satellite.  The first hub has one million entities which I'll label Entity Type A, and the second hub that is named EntityType B has 100 entities. A link connects each Entity Type A with each Entity Type B, and the same link also contains context attributes within its satellite.  This link connection is to happen every business day, and the resulting data must be stored for seven years. A high capacity data depot is needed to contain the output of the daily transactions required, and the associated costs of this concept are prohibitive.  In keeping with the data vault model, I proposed to break down the concept into a column for each component of Entity Type B to reduce the storage required and related costs.  I would like to know if there is a table structure within the data vault model or the industry's best practices that would reduce the storage required for my particular project.
asked Dec 3, 2013 in Real-Time by Emily_White (210 points)
recategorized Jan 9, 2015 by molschimke

3 Answers

+2 votes
Best answer
There is no need to have a data vault overlay a standard transactional database like Oracle or Postgres. Data Vault 2.0 supports Hadoop and NoSQL data repositories.  An Hadoop cluster may be the most cost effective alternative for the large amounts of data associated with your project.
answered Feb 1, 2014 by Mij (940 points)
+2 votes
I assume from the high entity volume of data that the hubs, link and satellite in this concept represent event related data or transactions required. If this is correct, then a transactional link structure should likely be used.  
answered Dec 23, 2013 by peckheart (340 points)
+1 vote
I confirm that the transactional link structure is needed in this scenario if you are warehousing transactional data.  Because transactional data does not require updates, it can be stored within the link structure itself.  However, no satellite is used in this table structure unless you denormalize the link structure further.  To effectively implement the link structure one must separate out business key data which is generally found in the hub structure.
answered Jan 12, 2014 by berkley68 (320 points)
Scalefree Scalefree

Upcoming Trainings

  • July 01 to 03 - Amsterdam (English) Data Vault 2.0 Boot Camp and Certification

  • August 05 to 07 - Berlin (German) Data Vault 2.0 Boot Camp and Certification

  • September 9 to 13 - Hanover WWDVC EU

  • September 16 to 18 - Brussels (English) Data Vault 2.0 Boot Camp and Certification

  • October 21 to 23 - Hanover (German) Data Vault 2.0 Boot Camp and Certification

  • November 18 to 20 - Amsterdam (English) Data Vault 2.0 Boot Camp and Certification

  • December 02 to 04 - Dusseldorf (German) Data Vault 2.0 Boot Camp and Certification

  • December 16 to 18 - Zurich (English) Data Vault 2.0 Boot Camp and Certification

  • Contact us for inhouse training engagements.
    Visual Data Vault: Logical Modelling for Data Vault - Free Download!
    Scalefree is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
    Permissions beyond the scope of this license are available in our license. | DWH Wiki | Recent questions RSS feed | Imprint | Provided by Michael Olschimke