Big Data is not only an IT challenge in the context of harnessing the growing volume of data across multiple enterprise applications for improved reporting and analytics but is more often discussed in the context of managing and retaining critical data for much longer time-periods and scaling enterprise systems to accommodate future growth in the most cost-efficient manner.
For a large enterprise, Big Data may be in the petabytes or more, while for a small or mid-size enterprise, data volumes that grow into tens of terabytes can become problematic. So far much of the IT focus on data retention has been on unstructured data sets which includes e-mails, file-based systems, audio and image files. Whilst important, the most critical enterprise data asset lies in your “run-the-business applications” and therefore constitutes structured or semi-structured data, which typically lives in traditional RDBMS repositories or is integrated into a data warehouse for reporting and BI. The reality, driven by more stringent legislation, governance and extended on-demand accessibility to historical data, is that structured data retention is now fast becoming the #1 imperative for businesses worldwide. The key signs you need a dedicated solution for Big Data retention are outlined below:
10. You agonize over when to keep or purge data
Traditional RDBMS or analytics systems do not have any features to support the enforcement of retention or expiry policies dictated by industry regulations or business process data governance. Trying to balance the ongoing cost of storage to retain volumes of data, avoiding penalties for non-compliance or increasing risk exposure due to holding data beyond legal expiry timeframes is a constant reminder that you need a dedicated solution to better manage and store data long term.
An online data retention solution (OLDR) can complement your existing OLTP and OLAP systems, providing you with reduced physical structured data storage (40 to 1 or more) through data value de-duplication, and built-in configurable rules to enforce retention policies, allowing you to rest easy and avoid counting storage arrays.
9. Your data volumes and growth rates exceed comprehension
Systems that track human-generated activity such as records of every medical interaction, stock trades, call data records, webpage clicks and direct “machine generated” data, such as IT log files, barcode scans, RFID tag reads, GPS location entries, industrial automation control and environmental sensor outputs are examples of data sets that once “transacted” don’t change. The update capabilities of traditional systems are overkill for this type of data, and it can be a major IT headache to have to continuously add expensive hardware and memory to keep up.
An OLDR solution can be your primary repository for immediately historical data, specializing in ingesting and storing billions of records per day on low-cost commodity servers, allowing you to avoid serious hardware headaches.
8. Your production database arteries are clogged
If your production application system diagnosed as constantly having “performance pains,” it is likely that the relational database holding your data is suffering from excessive volumes, resulting in unhealthy enlarged indexes. Studies show that a significant proportion (up to 90 percent) of a production application’s data set doesn’t require constant updates and is therefore static in nature. A best practice approach to solving this would be to put your production system on a data “diet.”
An OLDR solution can be a complementary repository that holds limitless volumes of historical data while providing continued on-demand accessibility, allowing you to benefit from a slimmed down and ongoing healthy production database and application.
7. You’re addicted to hardware purchases
You wished your local hardware shop carried fibre connected SAN’s in 12 petabyte packs, and you’ve probably added more memory than you can remember. If this sounds familiar, then you may be addicted to using hardware to compensate for the deteriorating performance of your critical production OLTP and OLAP systems caused by growing data volumes. If only you could get this growing enterprise carbon footprint under control.
A purpose-built, dedicated repository that holds large date volumes on low cost servers, allows you to defer costly hardware or memory upgrades for your production systems, thereby kicking your addiction to ongoing hardware purchases.
6. You have more DB admin specialists than end-users
While this is a highly unlikely IT doomsday scenario, it is indicative of a trend toward increasing numbers of specialized administrators needed to support, tune, backup, migrate and manage systems across a wide range of heterogeneous repositories that are likely doubling in size each year. Aside from the shrinking time window to backup large volumes, migrate to new application versions and arcane processes to move data offline to tape, the cost of highly skilled DBA’s constitute a major portion of the total cost of ownership (TCO) per terabyte of data retained.
A dedicated data retention solution is a low to zero administration data repository, allowing you to allocate next to zero specialized resources to big data retention.