Among its many other benefits, Hadoop serves as a staging ground for discovery. This is not by accident, but by design: the Hadoop ecosystem provides the ability to scalably store and then search all data – structured and unstructured alike – enabling business analysts to explore and discover within a single environment.
Click here to watch IBM brief Radiant Advisors’ John O’Brien in The Briefing Room entitled, “Big Data in Action: Real-World Solution Showcase.”
We have identified four forms of discovery, which can be organized into two categories: traditional and advanced (or new). Traditional forms of discovery include commonplace, structured BI-discovery tools, like spreadsheets and basic visualization. Advanced forms of discovery, however, leverage multi-faceted search mode capabilities and innovations in advanced visualizations to support new capabilities in data discovery.
Traditional Forms of Discovery
Both spreadsheets and basic visualizations – such as graphs and percentage of whole (pie) charts – are traditional forms of discovery.
Spreadsheets (like Microsoft Excel) remain the most popular business analytics paradigm to work with data, due in part because of their widespread and long-standing availability and user familiarity. However, with a wide range of analysis and reporting capabilities, spreadsheets can be powerful analytic tool in the hands of an experienced user. For example, Excel 2013 can hold more than 1 million rows (1,048,576) and over 16,000 columns (16,384) of data.1
With spreadsheets, the real value is in providing access to data for the user to manipulate locally. With this tool, the data already organized neatly into rows and columns — an analyst can slice and dice spreadsheet data through sorting, filtering, pivoting or building very simple or very complex formulas directly into their spreadsheet. They can discovery new insights by simply reorganizing the data.
Basic visualizations, such as graphs or charts (including those embedded in dashboards) – whether generated through Excel or not – provide visual representations of data that allow analysts to discover insights that might not be as easily perceived in a plain text format.
Basic visualizations, then, are an effective means of describing, exploring, or summarizing data because the use of a visual image can simplify complex information and help to highlight – or discover – patterns and trends in the data. They can also help in presenting large amounts of data and can just as easily be used to present smaller datasets, too.
Advanced Forms of Discovery
Hadoop has evolved the traditional forms of discovery to those that can search through multiple kinds of data within one environment. The two other forms of data discovery are “newer,” or what we call analytic forms of discovery.
Multi-Faceted, Search Mode
Multi-faceted (or, “search-mode”) discovery allows analysts to mine through data for insights without discriminating between structured and unstructured data. Analysts can access data in documents, emails, images, wikis, social data, etc. in a search engine fashion (Google, Yahoo! or Bing) with the ability to iterate back and forth as needed and drill down to dive deeper into available data to discover new insights. IBM Watson, for example, is a search mode form of discovery, capable of answering questions posed in everyday language.
Finally, advanced visualizations are a tool for visual discovery that allow analysts to experiment with big data to uncover insights in a totally new way. With advanced visualizations, analysts can visualize clusters or aggregate data; they can also experiment with data through iteration to look for correlations or predictors to discover new analytic models.
And, the inclusion of visual cues – such as intelligent icons and heat maps – are an emerging technique in advanced visual discovery that leverage principles and best practices in cognitive sciences and visual design. These advanced visualizations can also complement or supplement traditional forms of discovery to provide the opportunity to compare various forms of discovery to potentially discover even more insights, or to have a more complete view of the data.
1 Excel Specifics and Limits. http://office.microsoft.com/en-us/excel-help/excel-specifications-and-limits-HA103980614.aspx
See the original article on Inside Analysis (re-posted here with the author's permission).
About The Author
Lindy Ryan is the Research Director for Radiant Advisor’s Data Discovery and Visualization practice and leads research and analyst activities in the confluence of data discovery, visualization, and data science from a business needs perspective. She also retains the role of Editor in Chief of RediscoveringBI Magazine. As Radiant Advisors’ Editor in Chief for three years, Lindy participated in in-depth discussions and analysis with industry thought leaders and vendors while maturing her position and perspectives in the BI industry.
Lindy has a B.S. in Business Administration and a M.A. in Organizational Leadership. She is currently a doctoral candidate, pursuing a PhD in Organizational Leadership and Strategy. Her dissertation research focuses on addressing the technical, ethical, and cultural impacts that have already and will continue to arise in a rapidly expanding big data culture.