• Votes for this article no votes for this yet
  • Dashboard Insight Newsletter Sign Up

Text Analytics for Smarter Search

by Cailyn Clark, Business Opportunity Manager, http://www.sas.comTuesday, December 21, 2010

Informed decision makers are good decision makers. Sadly, many are not fully using the knowledge within the organization because they can’t find it. Traditional enterprise search and content management systems attempt to solve the problem of locating information, but it is only when information access is coupled with knowledge management and search that we achieve enterprise “findability.”

Smarter search has never been more important. When people find the information they need, they can do their jobs. When they find the information they need quickly, they increase their productivity.

Bringing Together Content Management Systems, Search and Metadata

Many organizations have already implemented content management or search tools. So why is finding content so difficult and time consuming? Within content management repositories there are no organizational search capabilities. Without search, there is nothing to drill down on or refine, so ultimately, there is no additional contextual information provided with these tools. Enterprise content is thus separated in specialized silos that are scattered in most organizations, and seldom properly managed – even with content management tools. These silos of unshared information are waiting to be found.

Manual tagging efforts to improve findability are labor-intensive, error-prone and redundant. Even when manual tagging happens, it will be inconsistent, as what I may put in one category, you may put in another, and we may forget about a possible third. There is no consistency to manual tagging.

Organizations should consider using metadata to achieve a semantic approach to search. The volume of unstructured content is growing. IDC estimates that roughly 80 percent of the information within an organization is text.1 What critical nuggets are buried in those masses of unstructured text? How can we ensure that the content being produced can be found and reused? At M2010, in his keynote address, John Elder described text as low-hanging fruit in his work, saying, “Text has proved more valuable than all the structured data combined.”

It’s Semantics

Content is key to an organization’s understanding of customers, research, products and effectiveness. Having the capability to drive search with semantic technology makes stored text more available and usable.

Semantics is all about words, and metadata defines what words are meaningful in what context. The associated technologies add descriptors to information contained in documents. During searches, the descriptors ensure more relevant results. This also creates facets or dimensions for drill-down capabilities, enabling users to further refine their searches and see the relationships between different terms. To effectively go beyond the title and date and other high-level metadata available in search and content management systems, organizations apply metadata defining the keywords and more relevant metadata, thereby making their content truly functional. Without metadata, content is unmanaged, and content left unmanaged will lose value and context.

Improving Search

Your search is as good as your metadata. Search relies on metadata (categories, entities) to improve document ranking. Once you provide rich search navigation, a user can do faceted search – to enable exploration of both the item and collection level, and eliminate large numbers of irrelevant documents. But what drives faceted navigation is the existence of a complete set of metadata defining those relationships. Organizations that use automatic classifications to annotate documents with categories and concepts before adding them to the index create this searchable content. The metadata makes frequently executed searches or filters much faster – improving the response times. The success of an enterprise search deployment relies on the automatic creation of metadata.

Improving Content Management

As with search, content management systems (CMS) do not provide keywords associated with the content of the documents that they are designed to manage. Metadata is the key to driving these keywords. How? By defining efficient content taxonomy structures that specify keywords, organizations are able to more easily find the relevant content stored in a CMS, resulting in improved access to information and less time spent searching through it. A browsable directory of categories and related facets permit the users of these systems to rapidly scan a directory of categories in SharePoint, for example. Metadata is defined from the taxonomy and used to score archives. It is also automatically generated and used to tag and categorize any new content in real time or bulk information downloads to SharePoint or another CMS.

Metadata Magic through Text Analytics

Text analytics extracts relevant information from unstructured documents and analyzes, mines and structures that text to improve findability. Text analytics technology looks for sentiment, patterns and relationships. More relevant search is facilitated by creating taxonomies for content, achieved by content categorization technologies that define and manage the relationships between all relevant terms and characteristics of the text collections you have. In other words, content categorization creates and manages taxonomies. Taxonomy management is used to automatically map documents with one or more topics by looking at the context to understand the meaning and find these terms within the text. As a result, these applications generate metadata regarding the terms that have been found and extracted from documents.

Sentiment analysis generates metadata that defines the polarity, or extent of negative and positive opinions, expressed in the text. Data, especially text, is often subjective and dependent on an individual’s perspective – so an additional benefit of categorization, taxonomy management and sentiment analysis is the consistency and standardization that we have come to rely on technology to accomplish. To function effectively, and to continuously improve, organizations need an integrated semantic platform – one that provides a complete range of metadata applications – to best extend the value that can be derived from search and content management.

The creation and maintenance of the right metadata improves user engagement – just ask media and publishing organizations that have achieved real-time results. Automatic categorization, metadata generation and taxonomy management enable you to build a semantic search system where relevant documents replace imprecise keywords.

Informed decision makers are good decision makers, and the faster they can get the information they need, the more productive they will be. It’s time that content management systems, search and metadata work together to open the door to real-time, real-world business insights for every knowledge worker in the organization.

Author Biography

Cailyn Clark is a Business Opportunity Manager focusing on SAS Text Analytics. Her main areas of expertise are in developing and managing Text Analytics relationships between SAS and current or prospective customers. Before joining SAS in 2008, She worked in business development at Teragram Corporation.
Clark earned a BA Degree in Business from Johnson and Wales University in the US.  She also studied in Johnson and Wales University, Göteborg Sweden and The University of Queensland, Australia.


1IDC. Text Analytics: Software’s Missing Piece? Susan Feldman and Hadley Reynolds. Doc.# 220911, Dec. 2009.

Tweet article    Stumble article    Digg article    Buzz article    Delicious bookmark      Dashboard Insight RSS Feed
Other articles by this author


No comments have been posted yet.

Site Map | Contribute | Privacy Policy | Contact Us | Dashboard Insight © 2018