I would like to use the
data vault methodology for a
data warehouse project, but I have concerns about how the model addresses the need for costly, high volume storage. Specifically, my project has two
hubs, a
link and
satellite. The first
hub has one million entities which I'll label Entity Type A, and the second
hub that is named EntityType B has 100 entities. A
link connects each Entity Type A with each Entity Type B, and the same
link also contains context attributes within its
satellite. This
link connection is to happen every business day, and the resulting data must be stored for seven years. A high capacity data depot is needed to contain the output of the daily transactions required, and the associated costs of this concept are prohi
bitive. In keeping with the
data vault model, I proposed to break down the concept into a column for each component of Entity Type B to reduce the storage required and related costs. I would like to know if there is a table structure within the
data vault model or the industry's
best practices that would reduce the storage required for my particular project.