Navigating the intricate landscape of Privacy-enhancing technologies (PETs)

Some of us may feel overwhelmed by the multitude of concepts, technologies, and paradigms falling under the banner of “privacy-enhancing technologies.” The overarching target is to enhance privacy in data collection, process, analysis and sharing. You must prioritize the security and privacy of your data, regardless of where they are computed or with whom you are collaborating.

To achieve this, PETs are bringing on the table several approaches:

Data transformation can be achieved through data generation (SD), data perturbation (DP), or
Data in use security can be ensured either by encrypting the data end-to-end (including in use, with HE) or by making the environment in which it is treated secure & private.
Data exploitation necessitates privacy-preserving mechanisms for sharing data, building analytics, and collaboratively leveraging the results.

Privacy enhancing technologies approaches

Each technology contributes a crucial component to your privacy strategy tailored to your business needs.

Transitioning from traditional privacy-enhancing methods to “advanced” PET entails a significant shift in paradigms.

Preserving privacy extends beyond safeguarding the confidentiality of personal information. It also encompasses personal data integrity and availability coming from classic cybersecurity, but also GDPR principles (e.g., data minimization) and NIST privacy engineering features (e.g., dissociability).

Classical techniques include traditional encryption, de-identification through data generalization, perturbation, or tokenization. “Advanced” PETs embrace a zero-trust approach, elevating privacy principles like limitation, minimization, proportionality, confidentiality, and transparency by inherently:

limiting data collection (synthetic data),
converting the ‘where’ into ‘whom’ (data in use security/end-to-end encryption/homomorphic encryption (HE)),
integrating the concept of purpose into processing operations (predefined operations/Secure Multi-Party Computation (SMPC)),
never trusting anybody (secret sharing/resistance to collusion/SMPC),
getting closer to where the data lies (Federated Learning with Secure Aggregation),
granting minimal access with inherent policies (attribute-based encryption (ABE)),
…

So, where should I begin?

The first level in your decision tree can be approached through 3 fundamental questions. The combination of your answers will shape a first-level strategy:

What do I want to do?
(what do I need to protect in terms of privacy, and to what extent?)

What are the threats and risks to the data I want to mitigate?
(what is my ecosystem and the associated threat model?)

What are my challenges and constraints?
(what are the regulatory, organizational, operational, technical constraints I need to comply with?)

By aligning these questions with the purpose, strengths or weaknesses of each technology, you may build a central data pooling platform based on trusted enclaves and homomorphic encryption (anti-financial crime) or opt for a distributed model enabling each participant to conduct analytics on distributed data sets (transnational statistical analysis).

Navigating PETs at one glance

Attempting a one-to-one mapping between each question and technology may be tempting to get a simple framework of thought. This would be a mistake for several reasons… making the intellectual exercise much more exciting.

Let’s provide some guidance to distinguish between the dozens of vendors and solutions on the market, many of which combine classical and advanced PETs.

The Devil is in the details

Each technology has several variants.

Let’s consider an example related to the goal of ‘sharing granular data.’

It is often stated that differential privacy involves a trade-off between accuracy and privacy due to the introduction of noise in the data. However, it’s essential to distinguish between data collected after their owners have applied input privacy measures (resulting in analytics suffering a loss of accuracy during sharing) and data centralized and processed at a granular level, with output privacy applied to AI/ML (thereby guiding the balance between accuracy and privacy in the final output). Depending on your specific needs, you may opt for output differential privacy over homomorphic encryption, as it is more mature and easier to implement.

Full, partial, or somewhat homomorphic encryption must be distinguished. Full-fledged Fully Homomorphic Encryption (FHE) is primarily suited for end-to-end protection of data confidentiality in one-to-one client/server scenarios where various arithmetic or Boolean circuit operations (i.e., mathematical functions) are required. While FHE is resource-intensive in terms of CPU consumption and data size increase, it does not guarantee data integrity and is not well-suited for multi-party interactions, although extensions for multi-party FHE do exist. In many cases, less demanding forms of homomorphic encryption are more efficient and can be effectively combined with other PETs.

PET combinations of high-level protocols and low-level primitives

General-purpose PETs are based on high-level PET protocols and technological tracks (e.g., FHE, SMPC, FE, REE, DP…), offering a range of high-level privacy functions (e.g., confidentiality, integrity, …).

Other more specific PETS are also dedicated to specific use cases, such as “Federated Learning” for Machine Learning, or targeted problems like PSI (Private Set Intersections to determine whether a specific word is part of a list without revealing any one of them), or Shamir’s Secret Sharing (to split secrets between several stakeholders), which can be viewed as specific forms of SMPC for instance.

Most of these high-level advanced PET protocols are constructed using low-level cryptographic primitives, with a high-level advanced PET protocol (e.g., SMPC) relying on some primitives that may be part of another technological track (e.g., FHE) for certain processing tasks.

This can be likened to the high-level SSL/TLS protocol, which functions as a workflow by combining primitives like ECDSA, RSA, AES, SHA-2…

What has to be protected and what threats exist

Preserving privacy also depends on the types of threats, threat actors (such as malicious business or technical stakeholders, as well as collusions between business stakeholders), and also attacks depending on the use case itself (e.g. ML with Reconstruction Attack, Model inversion attack, Membership Inference attacks).

For instance, SMPC is a solution avoiding “by-design” the collusion between parties; but several underlying cryptographic protocols exist, with various benefits & resistance to attack according to the adversarial profile.

Some privacy use cases to illustrate the complexity

Multiple PET functionalities can be rendered by some high-level PET protocols but not by others. This means that that certain use cases often require multiple advanced PET protocols (e.g., SMPC and TEE), in combination with more traditional cybersecurity technologies like IAM or PKI.

De-identification

De-identification in a Confidential Data Pooling platform from multiple data controllers via a trusted intermediary toward analysts for output data

- The challenge is to be able to pool sensitive transaction data for the purpose of fraud management from multiple Data Controllers while maintaining compliance with GDPR, protecting confidentiality and integrity among all business and technical stakeholders, and keeping performance under control.
- De-identification of personal data can be mostly achieved through traditional methods, except for identifiers that must be anonymized consistently across multiple Data Controllers. In such cases, a double layer of homomorphic encryption could provide consistent de-identified tokens.

Multi-party secure private computing

Multi-party secure private computing for processing confidential sets of identifiable data.

- At the technical level, all internal flows between nodes are protected at transport level through mutual TLS. Each node is equipped with a secure TEE enclave to safeguard against technical stakeholders, ensuring both confidentiality and integrity. TEE provides local confidentiality and integrity on the hardware platform but doesn’t extend beyond its limits.
- SMPC ensures that in a distributed architecture with multiple stakeholders:
  - Input data remains known only to their respective data controllers
  - Output results are only provided to intended parties
  - A version of SMPC is resistant to malicious users (including elements of HE) offers business-level protection of personal data in terms of both confidentiality and integrity.
- A service mesh federation (part of a micro-service- and container-based architecture) provides an automated and secure way to monitor and operate the information system, including internal PKI. This enhances the solution, availability and segregation of duties for operational tasks

Privacy-preserving machine learning with de-identified data

Privacy-preserving machine learning with de-identified data in a data hub.

- DP can be injected at diverse stages of Machine Learning (e.g. input perturbation, algorithm perturbation…) depending on the type of ML, with DP providing a certain privacy budget (limiting the volume and nature of requests)
  - Since DP introduces perturbations at various steps in the workflow, it’s effective only in a context where individual data can no longer be exploited.
- To prevent any stakeholder (Data Controller, Data User and Intermediate) from accessing full data in clear, additive noise required by DP on the Data User node can be efficiently added within the Intermediate node through partial HE, enforced on the various end-to-end flows from multiple Data Controllers to Data User Node.
- DR (Dimension Reduction) techniques like PCA (Principal Component Analysis) or DCA (Discriminant Component Analysis) can be employed both for the purpose of Machine Learning and for the purpose of privacy (Compressive Privacy) by reducing identifiability of personal data:
  - In combination with DP,
  - In combination with SMPC to protect data shared among the various business stakeholders.

Common security measures

For all mentioned use cases above, it’s highly recommended to implement common security measures:

- At the technical level, all internal flows between nodes should be protected for confidentiality and integrity at transport level by mutual TLS.
- Even in cases where not strictly required, leveraging secure TEE enclave can protect against technical stakeholders.
- Strict segregation should be maintained between the various business stakeholders.
- An IAM system should be implemented to handle both business and technical level authentication and authorization for business and technical users, to ensure authenticity.
- Protecting private keys and other secrets is essential to keep cryptography and PET meaningful, via Hardware Security Modules (HSM) for THE most sensitive secrets, or alternatively for other container-based security vaults or secure TEE enclaves.

How Eviden can help

Based on the chosen use case and technologies, the implementation’s project varies, ranging from simple to highly transformative projects across applications, AI, data and infrastructure.

Moreover, customers are facing a landscape of vendors, including middleware providers, verticalized AI specialists, PET pure players or hyperscalers. Most of them provide hybrid solutions based on their core competencies.

Eventually, the perfect solution is often a combination of technologies or technology variants that need to be identified based on several variables.

There is a risk in opting for a one-size-fits-all solution from a single vendor, which may not ultimately be the best choice.

Enterprises willing to embark on a PET project should put around the table business teams, application owners, data analysts, CISO teams, crypto experts and internal IT.

Eviden has a global and extended ecosystem of PET vendors, as well as a long-term cryptographic expertise used to develop highly certified products and implement customer projects at national and transnational level. Eviden can support PET projects as an agnostic third-party integrator, with a straight and unbiased approach to solutions to ensure the highest efficiency when investing in privacy.

About the authors

Barbara Couée
Digital Security Portfolio Manager

Barbara Couée is Digital Security Portfolio Manager at Eviden.

Barbara has been working with Atos-Eviden since 2006. After several years managing large, multi-years programs for critical Industries & Defense, she has been appointed in the Defense entity mission-critical systems (MCS-C2I) to support transversality topics and strategy definition at a global business line level. Since the creation of Atos-Eviden Digital Security in 2021, she is now managing a wide portfolio combining critical systems and cybersecurity products & services.

Barbara holds a Business School diploma from ESCP-EAP. She has been selected for the “high potential program” in 2010, the “Atos Excellence program” with Polytechnique in 2018 (certification in business transformation), and she’s now a member of the Scientific Community.

Philippe Bodden
Enterprise & Security Architect

Philippe Bodden is Enterprise & Security Architect at Eviden.

Philippe has been a pivotal figure at Eviden and its predecessors since 1995. Over the course of his career, he has excelled in projects spanning the financial sector and national and European public institutions.

Within the Atos Expert Community, Philippe is recognized as a Senior Expert, and he leads the cryptography group in the cybersecurity domain. His core areas of focus encompass cybersecurity, cryptography, data privacy, Privacy Enhancing Technologies (PET), security architecture, risk management, as well as application, middleware, and technical architecture.

Philippe’s extensive list of certifications includes SABSA, CISSP, CSSP, TOGAF, COBIT, CIPP/E, and CIPT.

Managing your cookies

Necessary cookies

Marketing cookies

Social media cookies

Navigating the intricate landscape of Privacy-enhancing technologies (PETs)

By Barbara Couée and Philippe Bodden

So, where should I begin?

The Devil is in the details