Even though multidimensional analysis and OLAP (online analytical processing) tools have been around for the greater part of two decades, I have noticed that many non-technical business managers, who have been charged with reporting duties or cube-based analysis tasks, remain lost when it comes to getting the most out of their analytical tools and supporting architectures. While they may possess a decent understanding of their OLAP tool’s functional palette and a passable knowledge of star schema, snowflake and other dimensional model constructs, knowledge will be lacking on how glean the best performance out of their OLAP platform (e.g., conducting volumetric analysis or taking care to map OLAP-driven workflow to the most appropriate processing hardware). And the problem that always winds up looming as the largest for OLAP administrators (from novice report creators to seasoned architects) is that of “data explosion," which will inevitably result in a data cube of massive proportions, with long load and creation times.
In multidimensional models, derived and calculated values may wind up occupying a lion’s share of the data cube, exceeding raw base data values by orders of magnitude. As would be expected, such large data sets quickly become unmanageable and difficult to traverse or distribute; scalability and performance becomes severely restricted. Data explosion can result in data cubes/databases being hundreds, if not thousands of times larger than initially anticipated. Addressing data explosion problems by throwing additional or more expensive hardware and processing power at each and every phase of the multidimensional data lifecycle—calculate, build, distribute, etc.—will often not attain the desired result. The yield curve of processor-to-performance has a tendency to flatten out rather swiftly, providing a diminished impact and return on infrastructure investment (ROI).
To this point, for the both the seasoned and novice OLAP professional, great care must be paid to the number of dimensions contained in the OLAP model as well as the number of calculated levels in each of these dimensions. If any of these dimensions is sparsely populated, the chances become even greater that the data explosion monster will rear his ever-lurking head. It will be of extreme importance that the OLAP data architect carefully plans and structures his/her multidimensional models with the utmost foresight, i.e. painstakingly estimating the compound grown factor (CGF) of data cubes and the system memory that may be required if data explosions start to occur.
Don’t ever assume that a multidimensional model will align with or conform neatly to an organization’s core business values on an inter-departmental basis; different business units look at the world from completely divergent perspectives. The information artifacts—an OLAP cube’s facts and dimensions—that will drive sales reporting are not always going to be held sacred by the accounting department. OLAP is about satisfying a large set of business units and business questions with a diverse (and numerous) set of targeted reports.
No comments have been posted yet.