The convergence of High Performance Computing and Big Data
In the decades since the invention of the computer, High Performance Computing (HPC) has contributed significantly to our quality of life – driving scientific innovation, enhancing engineering design and consumer goods manufacturing, as well as strengthening national and international security. This has been recognised and emphasised by both government and industry, with major ongoing investments in areas encompassing weather forecasting, scientific research and development as well as drug design and healthcare outcomes.
More recently, with the explosion of data available from the development of the internet, businesses and other organisations have recognised that data represents a gold mine of competitive possibility. Companies like Uber, Airbnb and Expedia may be the poster children for Big Data Analytics (BDA), but at the same time more traditional organisations from advertising to weather forecasting looking to Big Data solutions as they try to find new ways to interact with customers and create new business models and opportunities.
Are HPC and BDA really that different?
In many ways, the case for convergence between HPC and BDA is obvious: lots of people in the HPC world can’t quite understand what all the Big Data fuss is about. They think they have been doing BDA for years. A typical operational weather forecasting operation ingests, digests and produces many terabytes of data per day – that is hardcore Big Data!
HPC has long been working on the sorts of things you need in order to deal with large volumes of data that should be useful to the BDA business – how to scale algorithms, run parallel processes efficiently, automate data management, build high performance networks and, above all, how to carry these out affordably.
The difference between HPC and BDA
But of course, it isn’t quite as simple as that. BDA deals with new types of data, very different from weather forecasts or engineering models. Data such as sentiment analysis of social media or connectedness as a tool for law enforcement or security agencies present new challenges and require new analytical methods. Which data is real and which bogus; what weight do statistics have; other questions of this sort make Big Data analytics more of a software challenge than anything else. Most tasks, whether they are deemed HPC or BDA, are part of a wider workflow. For example, to get value from HPC, it is not enough to run a weather forecasting model. You have to collect the data, massage it into a form the model can accept, run the model, interpret the results, and finally present them in a form that people can use: umbrella, or no umbrella?
Without doubt, the convergence of HPC and BDA is happening through a combination of techniques and methods from one workflow with the other. Improving shampoo formulation (an HPC task) using sentiment analysis from customers (a BDA task), or using HPC to automate image analysis within a time-critical period in order to locate a person of interest using Big Data that has been gathered on their movements are all convergence examples.
As HPC and BDA become increasingly ubiquitous and connected, we expect this type of synergy to deliver results that neither discipline could produce alone. We expect the boundaries to continue to blur as we strive towards better products, services and outcomes.
This article is part of the Atos Digital Vision for Supercomputing & Big Data opinion paper. The challenge for any organisation is how to turn data into tangible advantage. Becoming truly data-driven is perhaps our most definitive step into the digital age. In our Digital Vision for Supercomputing & Big Data, we explore the implications for organisations and what lies ahead.