The relentless growth of data remains one of IT’s biggest challenges, with annual growth rates approaching 70 percent in some organizations. A good deal of this growth is the result of too many systems creating redundant copies of data for multiple purposes.
Various studies indicate that companies worldwide are spending about $50 billion every year on storage hardware and software to manage unnecessary “copy data.” Even with the deployment of deduplication and compression technologies, it is estimated that 85 percent of storage hardware purchases and 65 percent of storage software purchases are devoted to managing excess duplicate data.
Copy data virtualization (CDV) solutions are designed to address the cost and complexity issues created by redundant data. Introduced in 2009 by Actifio, CDV extends virtualization principles to data management, replacing physical copies of data stores with virtual copies to dramatically reduce storage footprints.
While a certain level of data redundancy is necessary to ensure operational resilience, unplanned redundancy can create significant burdens. The problem occurs as application infrastructures evolve to serve multiple business units. Each of these siloed apps require their own backup, mirroring, replication, thin provisioning and snapshot functions. According to one study, a single production application such as Oracle, Exchange or SAP can create up to 120 excess copies of production data, even with deduplication technologies deployed within individual silos.
Why Deduplication Isn’t Enough
There are important distinctions between deduplication and data virtualization. For one, deduplication is an intensive I/O operation, so it is typically reserved for backup and archived data. Production data can’t spare that type of real-time computational resources without significant performance degradation. In addition, deduplicated data is usually written to disk using backup software, which means it must be restored to a host in its native format before it can be accessed again.
CDV solutions such as Actifio’s Virtual Data Pipeline (VDP) captures data in its native format from production application sources in physical or virtual environments, on-premises or in the cloud. It creates a single “golden master” copy that is compatible with any storage infrastructure. Rather than writing copy data to a physical drive, Actifio creates an image or pointer that links back to a central archive of the original data.
This allows systems to access a virtual copy of application data from any point in time without having to touch the golden master. Accessing these virtual copies does not generate redundant physical copies or take up additional storage infrastructure. The golden master is only updated with changing blocks through an “incremental forever” model.
Improving Data Governance
Copy data virtualization solutions such as Actifio are fast becoming essential elements of a strong data governance program. By decoupling data from physical storage, much the same way a hypervisor decouples compute from physical servers, CDV makes the process of managing, accessing and protecting data faster, simpler and more efficient. It also supports governance and regulatory compliance for sensitive data, and it safeguards against potential data leaks simply by limiting the number of data sources.
The need for strong data governance is being amplified as data analytics, mobile computing and Internet of Things initiatives drive continued worldwide data growth. Organizations simply can’t afford to waste money and resources storing and managing redundant data that has little or no value.
Of growing concern is the effect that poor data quality has on the decision-making process. Organizations today are looking for ways to use predictive analytics tools to evaluate large amounts of data in search of patterns and insights that can be used to guide corporate decisions and policies. In our next post, we’ll discuss how Actifio can accelerate data access and analytics to improve data-driven decision making.