The raging debates about data warehouse architectures never seem to abate. Thanks to the mammoth advances in every category of data-related technology, we have fortunately moved past the old data warehouse architecture arguments that were largely framed in “Kimball vs. Inmon” terms. We now get to grapple with service oriented architecture, federated data mashups, and cloud-based data as a service (DaaS). Aren’t we lucky!
In the past year I encountered a number of functioning data warehouses that conformed more or less to what many in the industry refer to as the hub and spoke model. These data warehouses were structured around the axiom that the data warehouse would, without question, be a highly normalized “golden copy” of the organization’s transactional, market, and reference data. It would forever remain the firm’s “one version of the truth” for all data that found its way to into their operational systems of record. Data would be sourced from the data warehouse, so that it could be scrubbed, aggregated, and transformed via a variety of extract transform and load (ETL) processes that would ultimately populate individual data marts. Because of the divergent usages, needs, and specific key drivers of each data mart, the format and structure of each mart data would often look drastically different compared to what was persisted in the main data warehouse. This hub and spoke model seems to work well for many large enterprise data warehouse implementations. However, as data warehouses grow in size and number of data points, the scalability of this approach has been coming under increased scrutiny.
While the modular approach of using ETL tools to cleanse and transform data for data mart consumption has ensured the overall integrity and wide-spread usability of the enterprise data warehouse, the overall architecture of many data warehouses remains brittle and opaque. One reason for this is that although the ETL approach provides a much higher degree of customization and control over data scrubbing and distribution, important business rules and data transformation logic (which used to be buried in stored procedures) become hidden and embedded in the ETL processes, with little support in the way of metadata or rules engines. Another pressing issue is that executing an ETL job for each data mart delta can result in massive data latency and overall performance issues.
Many data architects now use materialized views in order make building data marts a more federated and agile process, shifting the burden of prepackaging data for data marts back to the database tier. The materialized view is a pre-defined duplicate copy of remotely persisted database data. Unlike most regular views (which are stored as SQL statements that dynamically construct data), the data of a materialized view is stored on disk and refreshed by querying the underlying remote tables. With materialized views, processor intensive queries -those that would be very expensive for an ETL job to keep executing- can be cached and “prepackaged” on disk. The heavy network loads associated with ETL jobs, which often initially source or query much more data than they ultimately need, can be greatly reduced. Instead of each data mart’s portfolio of ETL processes accessing the core data warehouse server(s), they could access materialized views that reside across multiple database servers which are regularly refreshed (but physically separate) from the golden data copy. Even more nifty is that it is possible to create materialized views that are defined against other materialized views, so that even better data warehouse load distribution can be achieved. But beware of the biggest issue with materialized views: they will become outdated if they are not constantly synchronized with underlying data.
About the Author
William Laurent is one of the world's leading experts in information strategy and governance. For 20 years, he has advised numerous businesses and governments on technology strategy, performance management, and best practices—across all market sectors. William currently runs an independent consulting company that bears his name. In addition, he frequently teaches classes, publishes books and magazine articles, and lectures on various technology and business topics worldwide.. As a Senior Contributing Author for Dashboard Insight, he would enjoy your comments at email@example.com
Copyright 2011 - Dashboard Insight - All Rights Reserved.