As more channels emerge for collecting electronic data, the capability for analyzing legacy traditional databases with text-based unstructured sources is fast becoming essential
I remember my fourth-grade math teacher telling me that it’s impossible to add apples and oranges. She was wrong.
Now there’s a way to throw apples and oranges into an equation without making a mess. The newest text mining technology enables decision makers to crunch words and numbers at the same time, yielding practical knowledge that can be leveraged for business growth.
Today, business information is gathered through multiple channels. And it arrives in a wide variety of forms. Transforming this torrent into manageable streams of business intelligence requires working with a mix of organized labeled data (structured data), as well as with new records that typically are waiting for manual processing before being grouped by content topic (unstructured data).
The difficulties involved with processing disparate forms of data aren’t likely to go away anytime soon. In fact, as more organizations become aware of the benefits continuous innovation and agility offer, the desire for creating growth strategies will certainly intensify. The ever-growing size of data collections can be thought of as the fuel for fact-based decisions. When a company goes the extra mile to conduct high-quality analyses, it’s like pouring in a fuel additive to deliver turbocharged valuable insights for developing business strategies.
Crunching text and traditional data together enables organizations to leap beyond standard goal-oriented search methodologies. With integrated text mining, nimble organizations leverage the power of electronic information to reveal gaps in the market and take action long before their competitors are even aware of the opportunities.
American Honda, Dreyfus, Sub-Zero Freezer Co./Wolf Appliance Co., the City of Turin Tourism Office, the Australian Taxation Office and HP are just a few of the farsighted organizations that already have begun to discover the value of deploying advanced text mining capabilities.
Thanks to highly efficient systems for data storage and the reduced cost of memory, most organizations have amassed huge repositories of data. Despite advances in processing speeds and analytic techniques, these repositories sit largely untouched and unexplored, like vast undiscovered continents. Some of the databases contain feedback from customers reflecting their desires, opinions and interests. Unlike natural resources such as oil or gold, however, buried information is relatively easy to extract.
This buried treasure is often captured as words or text in a variety of languages and sentence structures. Before the advent of text mining technologies, deciphering patterns or trends hidden in this sort of nuanced information would have required a team of trained linguists. Converting such nuanced information into any sort of usable business intelligence was considered a Herculean task.
Today, global organizations such as large manufacturing companies with far‑flung customer relationship management teams are faced with a steady deluge of text from e-mail, customer surveys, warranty claims forms, call reports, technician reports and dealer feedback. The database for HP’s call center alone contains 300,000 records and grows daily.
“In the past, we lacked the ability to make much sense of the influx of data,” says Randy Collica, Senior Business Analyst at HP. “With the volume of notes we have, we could not physically assign someone to read each data record and manually transform the freeform text comment string to structured fields so we could proceed in our data mining and statistical analysis project. There’s just no way any one person can do that.”
Collica and his team now use text mining to uncover underlying themes or concepts contained in large document collections. They also use text mining to automatically group documents into topical clusters and classify documents into predefined categories. By integrating text data with structured data, Collica and his colleagues are able to enrich their predictive modeling efforts and formulate business decisions on the basis of insightful customer information, instead of relying on “gut instinct” from their more experienced employees.
Before turning to text mining, the job of product classification was tedious, difficult and time-consuming. Collica and his colleagues had to pore through vast amounts of information. Their work was often hampered by data they didn’t understand.
“We’re not product experts,” says Collica. “Before, it would take me several hours to research new products coming in just to know what bucket to categorize them in; but now it’s all automated, so now I can easily take the new SKU numbers and analyze them. Text mining classifies them for me, so I don’t spend hours doing unnecessary research, and now I do it all on a monthly basis in about 20 minutes.”
By adding text to existing data mining investigations, companies can fold in behavior indicators to answer the difficult “why” questions buried in miscellaneous data to complement the “what” answers traditionally found by comparing current values with historical files contained in large document collections.