Organizations embedding analytics into applications typically follow a process of first creating a prototype model, then incorporating that model into a production application. In the prototyping stage, modelers explore data and work with different algorithms to determine the best way to answer a question or gather insight from the data. In a production deployment, a developer will move the model into the dashboard where it can be leveraged by multiple users. If you are creating advanced analytic components for dashboard applications, you should expect to follow the same process.
Typically, modelers and developers use different tools. For prototyping, popular options include commercial tools such as Microsoft Excel or open source alternatives such as FreeMat, GSL, Octave and R. While excellent for prototyping, these tools are not always optimal for creating models that will become components of production dashboards. Some of these tools can be limited in their scalability or deployment options or are unable to efficiently process very large data sets. As a result, many prototype models are re-written in development languages like C/C++, Java or .NET when they are ready to be included as components of a production dashboard application.
This article explores analytic modeling and production deployment, explains how they are fundamentally different in their requirements, goals and tools and proposes several simple measures to achieve a high-productivity development process that simplifies and connects the prototype and production development steps.
Prototyping And Production Goals
Modelers creating analytic prototypes to embed into dashboard applications typically focus on:
- Identifying the requirements for production analytics. The investigation often starts with some basic ideas of available techniques, but the actual requirements for a production deployment are usually not clear until actual data is collected and examined and the exploration of analytic techniques completed.
- Proving that a given analytic approach addresses the identified goals for the project. For example, the goal might be to create a dashboard component that displays the sales forecast for multiple product lines based on historical sales. Different analytic approaches are tested to identify which approach will deliver the most accurate forecast.
- Identifying any performance and scalability issues that are important to consider in production deployment. Is the production dashboard web-based? In what language is the dashboard written?
As the modeler moves his or her prototype into production, or hands the model off to a development or implementation team, a different set of steps needs to be taken:
- Integrating algorithms, statistics or business logic into the dashboard used within a group, a department or across an organization. The dashboard will have a user interface, possibly web-based, and is often designed for use by non-analytic experts to perform repeated tasks or make actionable decisions on new data.
- Using the code to operate on compute-intensive problems which might involve large data sets, or running compute-intensive simulations using developed models.
- Batch processing of data to perform frequent analysis or analysis on many data sets, for example forecasting sales of many products based on a common forecast model or categorization of new data as it becomes available.
Note that there are important challenges in putting analytic code into production that distinguish production deployment from the activities and concerns in the prototype stage. It is generally risky to simply deploy prototype code directly into a production environment. Some concerns for deployment include:
- Improving application performance, often by writing compute-intensive analytics in lower-level languages like C.
- On-demand or scheduled data collection, cleansing and filtering of data which is usually done by hand in prototype. This data collection and processing may be spread out across the different activities in a production application: in a database, during ETL (Extraction Translation and Loading) and in the analytic code itself.
- Robust error handling and reporting. It is especially important to trap and report analytic anomalies or errors rather than return possibly corrupt results.
- Testing and quality control of analytic accuracy to make certain that the quality of results in production are identical to those seen in prototyping.
Having the right numerical tools to achieve these production goals is important - and parity in the analytic tools used in prototyping and production is an important consideration.
Prototype Modelers And Their Needs
Prototype modelers are typically analysts who explore data and algorithms to achieve desired numerical results. They are often not trained as software developers. Instead, they are usually domain experts: statisticians, mathematicians, business intelligence experts, financial quantitative analysts, scientists or engineers.
Modelers often use off-the-shelf tools designed for flexibility and rapid development. Production deployment concerns are less important than flexibility to easily manipulate, filter and transform data, apply and customize numerical analysis and create intermediate charts and tables of results. When prototype results are satisfactory, the code is often turned over to a programming staff who must find a way to replicate the numerical methods in a production dashboard using different tools, because the prototype tools are often incompatible or ill-suited for use within the dashboard application. Many modelers prefer an interactive command-line environment for prototyping; others prefer a more formal integrated-development environment with powerful tools for composing code using syntax highlighting, command completion, refactoring and formatting. Debugging tools to interrogate variables within code is a valuable part of these environments.