Integrated Systems

Codex Datalake Engine

Codex Datalake Engine

Datalake is the answer to data highest stakes

Digital transformation is a business requirement for organizations. Data is critical in executing this transformation, enabling innovation and competitive differentiation. The results for data-driven organizations are better decision-making abilities with an immediate positive impact.

“Though 85% of companies are trying to be data-driven, only 37% of that number say they’ve been successful”

As of this moment, only 0.5% of all accessible data is analyzed and used. The potential is huge!

Increasing data within companies create new needs:

  • Capacity to request in real time interactive dashboards from multiple sources
  • Guarantee flexibility of platform to follow business growth & data graduality
  • Take real time decisions based on accurate data
  • Ensure highest security & guarantee sovereignty of data

These needs arise new questions and challenges in the use of datalakes:

  • Who in my organization is using data and how?
  • Is my data secure and can I trust it?
  • How do we handle large quantities of structured and unstructured data?
  • Who in my organization is using data and how?
  • Is my data secure and can I trust it?
  • How do we handle large quantities of structured and unstructured data?

Data-driven digital transformation requires robust and suitable data architecture with strong governance not only to maintain security but also to enable activities such as analytics, enterprise artificial intelligence and interconnected objects.

Here is where Codex Datalake Engine offers the solution. Meeting your trust and compliance needs and leaving you free to get maximum value from your data.

Codex Datalake Engine is a well-defined, new generation of data lake solution. It is an end-to-end data management and security platform, which enables organizations to build a highly scalable, easy-to-use and cost-effective hybrid or on-premise only datalake. It is certified by Cloudera.

Download Codex Datalake Engine position paper

Enabling visibility and control of data

Chief data officers, data stewards have the following concerns:

How is data being used?

How do you efficiently manage the lifecycle of your data?

How do you ensure data regulations are met?

How can I implement seamlessly a datalake without all the skills required?

How can data be optimized?

How do you overcome data silos?

How can I guarantee a high ROI?

Compliance groups track and protect access to sensitive data. Their primary task is to always be prepared for an audit: tracking who is accessing data, what data they’re accessing and what it is being used for.

Their job is to ensure that sensitive data is well governed and protected and in line with General Data Protection Regulation (GDPR) within the European Union and the European Economic Area. Data scientists, AI developers, BI users need to find the data that matters the most for their business. They want to be able to explore data, trust what they find, and be able to visualize relationships between data sets and make the most from their data.

Codex Datalake Engine

Codex Datalake Engine is a preconfigured, scalable, easy-to-use and fully virtualized appliance. It is cost-effective with minimal administration needs or energy usage. As a result, organizations spend less time installing, tuning, operating, troubleshooting, patching, upgrading, and dealing with integration, adoption of technologies and scale-related issues.

Codex Datalake Engine is delivered as an appliance that features BullSequana S and SA20, two of the most agile, scalable and open servers. With their dynamic reconfiguration capabilities, BullSequana S, SA20 and SA20G server combine exceptional performance with unprecedented levels of agility and efficiency.

Use cases. Energy & Utilities

BullSequana S

The BullSequana S server range features from 2 to 32 Intel® Xeon®
Scalable processors in a single server.

Codex Datalake Engine for development

  • For development and test purposes Codex Datalake Engine is bundled into a single compact server, scaling from BullSequana S200
  • (2 CPUs) hosting 5 TB of user data, up to BullSequana S800 (8 CPUs) hosting 20 TB to 50 TB of user data.

Codex Datalake Engine for production

  • For production configuration starts with 3 BullSequana S200 and can be extended up to 12 BullSequana S200, with 50TB to 350 TB of user data. For hefty workloads, Codex Datalake Engine leverages the BullSequana S400, starts with 6 BullSequana S400 and can support several PB of user data with 60 TB upgrade increments.

BullSequana SA20 & SA20G

With its dynamic reconfiguration capabilities, BullSequana SA20 & SA20G server combines exceptional performance with a limited carbon footprint.

BullSequana SA20 and SA20G rack servers powered by 2nd generation AMD EPYC™ processors bring a cost-effective balance of performance and virtualization. By offering up to 100TB in a 2U form factor, and best-in class SAS/SATA and ultra-fast NVMe drives,

they provide optimal performance for on-premise or hybrid datalake architecture. In addition, this outstanding density makes it possible to minimize the footprint and improve the energy efficiency.

BullSequana SA20 best-in-class server for stockage

  • Storage on multiple hard disks allows data redundancy and performance improvement. RAID technology provides the right balance between reliability, availability, performance, capacity – and cost!
  • Enabled up to 2 TB RAM and to 100 TB storage powered by 2 CPUs

BullSequana SA20G best-in-class server for Ai inference

  • Offering a powerful AMD CPU’s and up 3xA100 GPUs combining ML and deep learning capabilities in a single platform.
  • Enables Spark 3.0 applications to run on the serve thanks to GPU enablement, accelerated Spark components and end-to-end integrated data pipeline for CPU/GPU.
  • Offering flexible provision of GPU resources for data scientists to integrate deep learning capabilities into Cloudera Datalake

Atos & Cloudera

Atos is Cloudera’s EMEA Partner of the Year 2020

Cloudera provides a scalable, flexible, integrated platform that makes. It easy to manage rapidly increasing volumes and varieties of data. Cloudera products and solutions enable to deploy and manage Apache Hadoop and related projects, manipulate and analyze data, and keep that data secure and protected. Codex Datalake Engine is Cloudera certified. It is the result of a joint effort between Atos and Cloudera to achieve a solid and trustworthy data architecture and deliver the most complete, secure, industrial and qualified datatake solution on the market.

Migration to CDP

As Codex Datalake Engine is Cloudera certified, Atos offers a migration accelerator towards CDP for a seamless implementation. Repurposing existing hardware by gradually migrating data and workloads to the new cluster.

Why Atos?

  • Big Data expert team with proven migration methodologies
  • Strategic partnership with Cloudera (Partner Award 2019/2020)
  • Codex Datalake Engine is certificied on Cloudera appliance
  • Native compliance with the reference architecture recommended by Cloudera, which means in-fine a high-performance platform
  • Highest level of Cloudera certification for security (level 3)
  • Global Atos coverage with end-to-end services (Edge, Data pipeline, AI, consulting, delivery, managed services…)

Towards a hybrid datalake architecture

Atos has developed a platform approach that consists of horizontal hybrid cloud enablement, providing technology and cloud enterprise agnostic data approach, so that functionality can run anywhere in cloud, in DC or on the Edge, connecting these for the realization of pervasive business end-to-end use cases, that bring insights, guidance and assistance to where the business end user needs it.

Codex Datalake Engine can be the foundation for a hybrid datalake:

  • Integrated with cloud platforms
  • Highly scalable to grow with your business
  • A granular architecture to support heterogeneous workloads (data intensive, data & compute intensive, non compute intensive & compute intensive)
  • A single point of contact
Download our vision paper ``Atos - Cloudera : Together towards hybrid cloud``

Key advantages:

  • Time saving for implementation
  • Combine on-premise high capacities with cloud platforms & private cloud
  • High ROI for mid-long term projects
  • Open platform
  • Prevents vendor lock-in
  • Enables to take advantage of an ecosystem of additional tools and capabilities
  • Build capabilities next to legacy applications and platforms
  • Maintained GPU cost

Codex Datalake Engine key characteristics

Data quality

It ensures data quality as a consistent data management and security appliance, providing a data governance solution and the management of the complete data life-cycle, from data ingestion, data cleansing, data blending, data discovery, audit, data lineage and policy enforcement. Codex Datalake Engine improves and ensures data quality and makes data trustable.


Organizations are very concerned with data security and protection. With Codex Datalake Engine, sensitive data can remain on-premise in both hybrid or on-premise datalake architectures. Organizations keep full control over their data and its lifecycle as well as full control over the infrastructure, the applications and the operations, enabling more compatibility and thus minimizing the risk of failure. The complete virtualization minimizes the configuration costs and avoids extra cost when introducing new applications.

Data encryption & key management

Provides a critical layer of protection against potential threats. Encryption and key management are also required for meeting key compliance initiatives and ensuring the integrity of enterprise data. It transparently encrypts and secures data without requiring changes to applications and ensures there is minimal performance lag in the encryption or decryption process. It uses a trustee server as enterprise grade virtual safe-deposit box to store and manage cryptographic keys.

Optimized datalake

Full power in a virtualized and cost-effective design. All components of Codex Datalake Engine are integrated into a compact, cost optimized platform. For this, Codex Datalake Engine takes advantage of virtualization, which allows it to optimize the hardware sizing needed for each software component, instead of using oversized separate machines.


Codex Datalake Engine is the foundation to develop Big Data and AI applications while business growth. By leveraging all available data across organizations, it gives a reliable and comprehensive view of data.
Analytics and AI developers have a full visibility and control of all data.