Developing Software for Fast Data environments
Why software design has to adapt to Fast Data environments
Fast Data is a result of an increasingly fast-paced reality. Global Machine to Machine (M2M) communications totaled 0.3 Petabytes per month in 2014, but this figure is projected to grow well beyond 4 Petabytes per month by 2020 1, with the number of connections increasing likewise2. Mobile data traffic grew more than 7% and speed more than 20% in 2015; by 2020 traffic will reach 30 Exabyte per month, with the average mobile data transfer speed well above 3 Mbps3. Will this growth continue in the future? With only 0.6% of potential devices that could be connected actually communicating4, the only possible answer is a resonating ‘yes’.
Seizing the Fast Data opportunity
Can a standard software design approach cope with Fast Data? What happens when traditional software design and architectures paradigms are used to digest this information feast? Chocking on it is the most likely result. Fast Data environments impose very demanding volume and time constrains that rule out traditional approaches to software application design and system interactions. To make things worse, although storage, processing capacity and technologies keep evolving, in most situations it is still necessary to use existing systems originally intended for processing large data volumes, with slow and complex analytical processes associated with them.
The solutions to this design challenge require putting value – in this case data – at the heart of the design by focusing on specialized blocks (data ingestion, integration, processing, storage and visualization) that can manage the information streams as efficiently as possible, and figuring out a way to speed up and orchestrate the communication between them.
Example of a layered architecture and commercial solutions available
Unfortunately, and in spite of efforts made by vendors to present a comprehensive package, there is no silver bullet solution that can be readily implemented. Instead, a stack of technologies has to be chosen and integrated. Given the options in the menu, the choice depends very much on the final use cases and business requirements to be covered.
Information flow between systems with data streams
But designing the architecture and selecting the systems to implement it is just one part of the problem. The real challenge in Fast Data is information flow.
In classical design approaches, data is transferred between components, and systems communicate with secure but time consuming protocols, and usually rely on ETL processes for data storage and analysis in a later phase. For Fast Data solutions composed of several functional layers built with a stack of integrated technologies, this approach is impractical.
The current trend points towards solutions based on software applications built on distributed stream data platforms capable of per-event transactions and streaming aggregations, which can scale through multiple connected systems and machines to speed-up basic data integration and analysis.
However, many times designers will face situations when they have to integrate capabilities of disparate systems, some of them intended for high-speed operation, and others for large data volumes processing in legacy platforms. Connectivity between them can be solved using two different strategies: by ingesting data arriving at high-speed systems to volume specialized ones, or by sending analytic models from back-end systems to high-speed real-time processing front-ends.
Either way, a practical solution is the design of pipelines between systems to avoid costly and slow ETL processes. Designing a process pipeline to efficiently communicate systems (storage pools, high performance BI, and streaming) can maximize their strengths and capabilities limiting the impact on operational performance. However, pipelines come at the expense of complexity and risk. If implementing pipelines is not an option, another alternative to expedite data processing is using ELT operations, which upload the data onto distributed clusters leveraging its processing capabilities.
It is not a framework but a moving picture
Architecture, systems and information flow are the foundation for Fast Data’s software design approach, and the previous guidelines highlight just some main ideas on these topics. But as with any revolution – and we are certainly living one with Digital and Data- expect new strategies, designs and products to unfold and increase the speed and efficiency of today’s solutions, as there will be a pressing need coming from the unprecedented amount of data and real-time interactions that will shape the digital future.