• Votes for this article no votes for this yet
  • Dashboard Insight Newsletter Sign Up

Data Predictive Analytics Takes Center Stage with PMML

by William Laurent, William Laurent, Inc.Monday, December 6, 2010

Business Intelligence (BI) is by nature a discovery process. It is about the uncovering of previously hidden trends, behaviors and meanings from enterprise data, and the follow-up distribution of this new actionable knowledge. While BI perpetually carries out its uncompromising spread into every corporate business segment and industry vertical, predictive analytics and modeling has assumed an increasingly visible and essential role. The practical applications for predictive analytics are infinite—from trying to foresee future consumer behavior, to envisioning the probability of a terrorist attack, and everything in between. When the power of predictive analytics is effectively harnessed, the very toughest of business questions will be answered with a better degree of accuracy and confidence.  As a bonus, when future outcomes are made more believable, risks that were formerly invisible are ultimately uncovered and clearly understood. This particular strategic feature of predictive analytics—threat mitigation and discovery—is a vital component of predictive modeling and analysis that is often overlooked.

While the models used by predictive analytic applications and BI engines will be different from typical analytic models, they will rely on the same data. The same implementation roadblocks and issues that apply to traditional BI platforms (such as extracting, transforming, and integrating terabytes of enterprise data) also pertain to predictive analytics. Consequently, having a robust BI architecture in place before rolling up one’s sleeves and tackling a predictive analytics solution is an essential prerequisite. As with all varieties of BI systems, the core operational and technological processes of predictive analytics will not change (from both a logical and physical perspective). Paradigms of data identification, model creation and verification, and distribution of intelligence will look highly similar from a birds-eye view.

The Data Mining Group (better known as DMG) has helped propel the predictive analytics juggernaut by introducing an XML based markup language called the Predictive Model Markup Language (PMML). This markup language provides for a quick,  effective and flexible way to define the schema for predictive models in a vendor independent manner. PMML allows all types of BI and statistical models to seamlessly “talk” to one another and share their data, and PMMLhas also reduced vendor incompatibility issues,, thereby enhancing the flow of forward-thinking analytic information on an inter-application basis. This conforms to one of the dominant trends in BI as a whole—the gravitation to open models and code bases and the deprecation of vendor-centric and proprietary data mining elements. PMML can be parsed using any standard XML parser, which will be able to interpret information about such things like file headers, data transformations, data dictionaries/semantics, and domains. Like all XML-driven file standards a model expressed in a PMML format can be administered and maintained by a variety of inexpensive or free XML viewers and editors.

The PMML specification has now evolvedto version 4.0 (details on the general structure and specification can be found at DMG’s website at http://www.dmg.org/v4-0-1/GeneralStructure.html ). The specification has been greatly improved  since its first version, which provided for a small set of DTDs, as well as specifying entities and attributes needed for regression and decision tree models. As the number of modeling techniques supported by PMML increases, so to has the number of vendors that provide support for this PMML standard. Vendors such as Statistica and the R language are able to output results to PMML rather effortlessly. At a minimum, PMML now supports the most important statistical modeling techniques, such as:

  • Decision Trees
  • Support Vector Machines
  • Linear and Logistic Regression
  • Clustering
  • Sequences
  • Text Models
  • Time Series
  • Neural Networks

About the Author

William Laurent is one of the world's leading experts in information strategy and governance. For 20 years, he has advised numerous businesses and governments on technology strategy, performance management, and best practices—across all market sectors. William currently runs an independent consulting company that bears his name. In addition, he frequently teaches classes, publishes books and magazine articles, and lectures on various technology and business topics worldwide. As Senior Contributing Author for Dashboard Insight, he would enjoy your comments at wlaurent@williamlaurent.com

Copyright 2010 - Dashboard Insight - All Rights Reserved.

Tweet article    Stumble article    Digg article    Buzz article    Delicious bookmark      Dashboard Insight RSS Feed
 
Other articles by this author

Discussion:

No comments have been posted yet.

Site Map | Contribute | Privacy Policy | Contact Us | Dashboard Insight © 2017