After IBM researchers delivered the first data warehouse in the late 1980s, businesses looked forward to finally being able to store critical data in easy-to-find, centralized locations. Employees at all levels could tap that rich data to make decisions based on concrete, analytical facts, instead of gathering scattered information from different sources or using plain intuition.
Like many sweeping technology promises, the vision sounded grand, but sadly didn’t become the reality for many companies throughout the 1990s. The problem, however, was never the lack of capabilities with the technology. Rather, big commercial data warehouses were so expensive that they largely remained the luxury of very big organizations with the budgets to buy the systems and the staff to implement and maintain them. Aside from the steep cost, some of these data warehouses had critics who claimed the systems delivered big IT headaches, with little return on investment.
Data warehousing, however, is changing quickly to meet the demands of companies with large volumes of data that require fast answers to complex, unpredictable questions. What’s providing the answers today – in a more affordable, simpler way – is the two-word IT revolution called “open source,” which is providing the building blocks required to create a whole new data warehouse.
There are many benefits to an open source data warehouse. It costs less to support and maintain because the products are more affordable than commercially licensed products, plus it’s relatively easy, when hiring, to find the skills required to deploy an open source data warehouse – so you won’t have to scour the industry for staff with a specific IT expertise. In addition, rather than have to go through a lengthy and expensive trial process, open source provides immediate, free-of-charge access for evaluation through a simple download.
Best of all, your company won’t be locked into a costly proprietary software upgrade path.
The Road To Open Source
Open source isn’t new, of course.
When the Internet took flight in the mid-1990s, Linux sparked a free software movement that today supports everything from operating systems to application servers to middleware and databases.
Now, companies that have traditionally relied on commercial databases are turning to open source. Walk into many Fortune 500 firms and you will increasingly find open source installed alongside traditional commercial databases. Indeed, one study of 226 members of the Independent Oracle Users Group (IOUG) found that 35 percent of these commercial users had also installed an open source database such as MySQL.
The use of open source DBMS engines has spiked, too, in recent years, according to market researcher Gartner Group. The analyst firm found that 47 percent of companies it surveyed have already adopted an open source database, and another 19 percent are considering investing in a solution within a 12-month period.
The Warehouse Problem Solved
So why is open source a particularly smart strategy for the data warehouse? Given enough time and money, corporate IT departments can develop a system perfectly designed to answer any question quickly – that is as long as they know the question. The problem is that business people cannot know in advance all of the questions they will need answers to in the future.
Plus, many are using traditional, proprietary databases that aren’t designed to handle complex analytic queries against billions of rows of data. To answer even simple questions typically requires time-consuming retooling, creating indexes, partitioning the data and re-indexing the database.
With this backdrop it is only natural that the flexibility of open source would make its way into the data warehouse market.
The movement started with vendors building proprietary data warehouse products based on open source databases such as MySQL, PostgreSQL and Ingres. Development of open source databases progressed into full-fledged open source data warehouse solutions and communities built around those solutions. Our open source community (www.infobright.org) provides one such resource, alongside a host of other open source developer/user business intelligence communities including those of Talend, Jaspersoft and Pentaho.
Today, even the extract, transform and load (ETL) tools that support database management systems – offered by vendors like Pentaho, Talend and Octopus - are going open source. (About 11 percent of the companies Gartner Group recently surveyed are using open source ETL tools, with another 16 percent considering such tools over the coming months.)
Despite the success of open source, companies still debate its merits. But building an argument for the use of open source in the data warehouse in a market where IT budgets are shrinking and the demand for information is increasing is pretty straightforward. It’s also growing in strength, thanks to the open source community.