Unleash the performance of your HPC applications
For many years, the regular improvement of processors brought regular performance gain without any pain. Today, the increased number of compute cores, in CPUs and in accelerators or co-processors, requires a true optimization effort to get maximum performance.
Atos’s Center for Excellence in Parallel Programming (CEPP), operated in partnership with Intel and NVIDIA, helps you get optimal performance and maximum energy efficiency for your applications in the context of manycore technologies. The CEPP’s experts can advise you and help you analyze, optimize and port your codes. This includes for example:
- Proof of Concepts (POCs) to demonstrate performance gains,
- workshops that give you the opportunity to exchange with experts and get started with the porting, optimization and acceleration of your simulations,
- application and solution benchmarks,
- tailored training,
- access to specific compute resources.
- Fast Start program to ensure your applications make the most of your Bull supercomputers from day one (porting, optimizing and configuring your applications well ahead of system delivery).
CEPP in action: code modernization with the French HPC community
GENCI is the French entity in charge of developing numerical simulation usage so as to improve French research and industry competitiveness. GENCI initiated in 2016 a technology watch project for assisting French HPC developers to move to new architectures by evaluating the productivity gain and the portability effort in many domains (weather forecast, CFD, high energy physics, chemistry…).
For this purpose, Atos delivered one of the very first Bull Sequana X1000 systems at the end of 2016 in Montpellier (CINES). The system is composed of 48 compute nodes equipped with Intel Knight Landing processors (68 cores).
Atos’ Center for Excellence in Parallel Programming was involved in the profiling and optimizations for enabling each application to the Intel KNL architecture. The obtained performance was published in many workshops and conferences, including the Intel Xeon Phi User Group (IXPUG).
CEPP in action: code optimisation for SKA-France
The SKA (Square Kilometer Array) will be the largest radio telescope ever built and will produce science that changes our understanding of the universe. The SKA will be collocated in Australia and in Africa. The project involves 100 organisations across about 20 countries. SKA-France is a national coordination of industrial, technical and scientific activities preparatory to the SKA project in France. SKA-France’s coordination work to optimise new algorithms for radioastronomy through collaboration between researchers and HPC companies is producing interesting results. For the first experiments, Atos’s CEPP worked on the calibration and imaging code « DDFacet » by Cyril Tasse (OBSPM) with two different compiler suites: GNU and Intel. The aim was to dive into the DDFacet software stack in order to understand the different processing phases and extract their respective part from the total execution time. These experiments have highlighted potential improvements, which will be investigated through a tighter collaboration with developers.
CEPP in action: GPU accelerated implementation of NCI calculations using promolecular density
A scientific paper produced by Atos’s CEPP and the University of Reims Champagne-Ardenne, a long-time partner and customer of Atos in HPC.
The NCI approach is a modern tool to reveal chemical noncovalent interactions. It is particularly attractive to describe ligand–protein binding. A custom implementation for NCI using promolecular density is presented. It is designed to leverage the computational power of NVIDIA graphics processing unit (GPU) accelerators through the CUDA programming model. The code performances of three versions are examined on a test set of 144 systems. NCI calculations are particularly well suited to the GPU architecture, which reduces drastically the computational time. On a single compute node, the dual-GPU version leads to a 39-fold improvement for the biggest instance compared to the optimal OpenMP parallel run (C code, icc compiler) with 16 CPU cores. Energy consumption measurements carried out on both CPU and GPU NCI tests show that the GPU approach provides substantial energy savings.