• Votes for this article no votes for this yet
  • Dashboard Insight Newsletter Sign Up

Unstructured Data 101
Defining unstructured data and its importance to organizations

by Lyndsay WiseMonday, June 11, 2007

Introduction

With the introduction of BI search tools, the importance of leveraging unstructured data has taken center stage within the world of business intelligence and performance management. Although the concept of search is not new, the use of search tools to bring BI and BPM-related data to the masses within organizations and the ability to search using text strings has expanded the way organizations can access and analyze their data. In addition to financial and static data, the importance of text or audio-based data analysis is growing. General estimates put the levels of unstructured data within organizations at 80 to 85 percent. Conservative estimates, such as those expressed by TDWI, put unstructured and semi-structured data at 53 percent of an organization’s overall data. Either way, the concept of leveraging unstructured data within the organization is compelling.

This article is the first in a series exploring the growing role and importance of unstructured data within the world of BI and BPM. This article serves as a base for future articles by defining unstructured data, the use of search and text analytics, and the importance of unstructured data to organizations. Areas to be discussed in subsequent articles include:

  • Practical applications of unstructured data,
  • How to leverage unstructured data through search and text analytics,
  • What factors organizations should consider when implementing search or text analytics, and
  • Vendor offerings and key differentiators within BI, BPM, and text analytics.

Unstructured data defined

Understanding what unstructured data is and how it is used enables organizations to move beyond traditional data analysis. Structured data is identified as the data stored in databases (i.e., in DBMSs), whereas unstructured data is the information stored in word processing documents, e-mail, audio, PowerPoint presentations, etc. Within the category of unstructured data there exists the concept of semi-structured data. Semi-structured data includes zipped files, TCP/IP packets and RSS feeds. For the purposes of this series of articles, both types of data will be classified as unstructured as they are handled similarly when being transformed into a structured data set used for analysis.

Data used for BI and performance management are generally structured. However, text fields such as product descriptions and customer addresses are unstructured but stored in a structured way, making the data easier to handle. The difficulty arises in collecting and analyzing text, audio, and visual data sets that do not reside in databases and how to transform these forms of data into a structured format that can be used by the organization. This is where search and text analytics come in.

The world of BI assumes that people know their data. Although this may be true about IT, generally business units know what information they require, but don’t know where that data exists or how to turn that data into actionable results. Search allows end users to use text strings to identify what they are seeking, eliminating the need for intimate knowledge of organizational data structures. For end users looking to identify sales trends over time or sales by region, a simple search of pre-canned reports allows quick and easy access to data. Use of text analytics is not as simple. Text analytics involves the identification of patterns within the data and the development of analyses to identify performance gaps, trends, benchmarking against competitors, product issues, etc. Where search involves looking for information that is already available, text analytics uses unstructured data as a springboard for analysis that can be tied to organizational performance. Either way, both forms of unstructured data identification are essential to the discussion of unstructured data and contribute to the way that data is currently leveraged within organizations.

Importance of unstructured data

Because the category of unstructured data is broad and encompasses many types of information, it may be challenging for organizations to identify the actual importance of utilizing unstructured data to help drive organizational performance. The use of unstructured data goes beyond the ability to search for reports, documents, PowerPoint presentations and e-mail. Identifying customer complaints, benchmarking marketing campaigns and identifying insurance claims fraud are just a few of the areas that can tie the analysis of unstructured data to organizational profit. If general estimates of organizational unstructured data are correct (between 53 and 85 percent) then organizations that only capture structured data for analysis are missing potential opportunities for performance optimization as they are utilizing less than half of their information resources.

Business examples of how the analysis of unstructured data benefits organizations abound. Call centers provide a good example of how unstructured data can be leveraged to improve internal processes and performance. Customer complaints may be monitored, but the type of complaint and levels of customer satisfaction may not be. Freeform text fields within CRM applications can provide decision makers the information they require to identify trends in customer dissatisfaction and recurring issues. These can then be used to enhance the overall customer experience, thereby increasing satisfaction and reducing customer churn rates. In general, freeform text within any organizational software application can be analyzed to identify trend-based information to help identify areas for improvement.

Conclusion

The importance of unstructured data to the world of BI and BPM can’t be underestimated. As organizations compete to maintain competitive advantage the way they leverage their data becomes key. The importance of identifying customer complaints, quality issues within manufacturing, and how to benchmark against competitors’ marketing campaigns are just a few of the applications of unstructured data analysis and mark the start of the transition towards the optimization of analysis tools to help drive organizational performance.

Copyright 2007 - Dashboard Insight - All rights reserved

About the Author

Lyndsay Wise is a senior research analyst for the business intelligence and business performance management space. For more than seven years, she has assisted clients in business systems analysis, software selection and implementation of enterprise applications. She is a monthly columnist for DMReview and writes reviews of leading technologies, products and vendors in business intelligence, data integration, business performance management and customer data integration.

Tweet article    Stumble article    Digg article    Buzz article    Delicious bookmark      Dashboard Insight RSS Feed
 
Other articles by this author

Discussion:

No comments have been posted yet.

Site Map | Contribute | Privacy Policy | Contact Us | Dashboard Insight © 2017