When you take a moment to stop and look at the actual numbers, the sheer amount of data accumulating as technology moves forward is staggering. According to IDC’s Worldwide Global DataSphere Forecast 2021-2025, Business and consumer data has been growing at a compound annual growth rate of 23 percent since last year. Meanwhile, data created in the cloud is growing at 36 percent annually, and data collected at the edge is growing at 33 percent annually and will make up 22 percent of the total global datasphere by 2025.
As the volume and complexity of data grow, so too does the task of making this information compute-ready and ripe for analysis. Compounding this challenge is the fact that, while many companies are collecting more data, less time has been paid to developing effective data management processes that unify the data and make it easily accessible and actionable.
The traditional approach to data management is based on the Extract, Transform, Load (ETL) method—extracting information from multiple sources, transforming it into a specific format via an ETL server, and then loading it into a data warehouse for analysis. This process is typically complex and compute-intensive and eats up a significant amount of time in the form of Input/Output (I/O) activity, string processing, and data parsing.
A recent CIO article argues that a better framework is to reorder the acronym to ELT, beginning with the data extraction, then loading the information into specific data repositories that then transform it into a more useful and relevant form. With this approach, the data transformation duties occur in individual cloud-based data warehouses rather than relying on a single ETL engine or server to transform all structured and unstructured information.
In addition to reducing I/O time and speeding parsing, this approach lays the framework for a more comprehensive data management strategy that eliminates the time between data acquisition and delivery of insights. According to the CIO piece, key considerations include:
- The implementation of foundation layer capabilities, including connectors, event streaming, source writebacks, and MapReduce.
- A discovery and self-learning engine that can crawl and retrieve data from various sources, continuously adapting to business needs and ingesting the right amount of compute-ready data.
- The inclusion of data structure and persistence abstraction to comply with data privacy regulations.
With overall data volume, the number of data sources, and data complexity only poised to grow, there’s no time like the present to reengineer your data management strategy to reflect the modern data landscape.
To learn more about how best to approach rethinking your data management strategies, check out the full CIO article here.