How data moves so quickly between clouds, data centers and jurisdictions is abundantly clear. One of privacy professionals’ tasks is to consider the current progress of the technology.
In this data-driven economy, privacy pros, architects, data scientists, engineers, researchers, regulators and industry groups should focus their attention on technologies that protect privacy and support security principles without losing the utility and functionality of the data: so-called privacy-enhancing technologies.
This topic has become a global trend, with increased attention from regulators and public authorities worldwide. Recently, the principle of privacy by design and by default consecrated in the EU General Data Protection Regulation has been recognized as an ISO standard. On 31 Jan., the International Organization for Standardization published ISO 31700, “Consumer protection — Privacy by design for consumer goods and services.” It features 30 requirements for embedding data privacy into consumer products and services.
From a lawyer’s perspective, working in the privacy domain for several years, PETs are an interesting landscape to explore and are full of potential, but not exempt from challenges, and legal and practical considerations in day-to-day operations.
Two sides of the same coin
PETs are not a new concept. Some of them are market-ready, like differential privacy, while others are still not used in practice because they are expensive and require experts to implement them, like homomorphic encryption and secure multiparty computation. Other solutions, such as secure enclaves are in the middle, as they receive attention for cloud support. Synthetic data has received incredible attention lately, in the context of OpenAI’s ChatGPT, for training and validating artificial intelligence systems.
When a company decides to invest in one of those solutions, there are different factors to consider, including the type and volume of data to be processed, expected outcome, implementation and cost, the number of parties providing input to the computation, and the maturity of these tools for the given use case.
Each of these PETs presents different challenges and vulnerabilities, irrespective of the cost and the expertise required for the implementation. It is worth analyzing some of these solutions.
Differential privacy is achieved by injecting noise into a data set. The introduced noise is capable of protecting privacy while still providing useful information, without divulging personal data. This solution has been implemented in statistics and analysis. However, there are some concerns in terms of output accuracy, which are linked to different factors, such as the volume of the data in the data set, amount of information released and number of queries made on that pool of data.
Homomorphic encryption allows computational operations on encrypted data without disclosing the result. Using this solution, data is encrypted at rest, in transit and in use, and only the party providing the data owns the key to decrypt the output. This solution is not exempt from limitations due to its high computational cost, the specific knowledge required and the fact that the majority of homomorphic encryption schemes provide input privacy only for a single party because there is only one decryption key.
The fully homomorphic encryption solution has been tested for some use cases, like improving collaboration for combatting financial crime and, in the payment card industry sector, fighting attacks by RAM-scraping malware against merchant’s point of sale.
With the echo created by ChatGPT, and the privacy concerns linked to the use of generative AI, it is worth mentioning the use of synthetic data as a way to work around the data privacy and security challenges raised by using AI tools. Synthetic data is a powerful tool in the development and testing of AI. Synthetic data can be artificially produced by a generative model to mimic real data sets with the same statistical properties as the original, enabling companies to create a large amount of training data
However, in this context of using synthetic data for training AI systems, synthetic data does not overcome the main concern about bias in the source data and risk for reidentification.
Conclusion
Reaching a legal assessment on PETs is complex due to the lack of regulations, guidance supporting the deployment of new technologies, business cases for adopting PETs and expertise in cryptography techniques, which can lead to making mistakes during the implementation phase.
However, a wide variety of initiatives on PETs are ongoing throughout the world, with the aim of promoting innovation through research and technology development, regulatory sandboxes and use cases to show how PETs can enhance businesses.
In exploring some of the initiatives underway, it is worth mentioning the Royal Society in the U.K. issued an exhaustive report: “From privacy to partnership: the role of Privacy Enhancing Technologies in data governance and collaborative analysis.” The purpose is to evaluate “new approaches to data protection and collaboration, encouraging further research in — and testing of — PETs in various scenarios.”
In Singapore, the Infocomm Media Development Authority, in collaboration with the Personal Data Protection Commission, launched Singapore’s first PET Sandbox on 20 July 2022 for companies who wish to experiment with PETs, to work with PET solution providers to develop use cases and testing ground to pilot PETs.
In July 2022, the U.K. and the U.S. launched a set of prize challenges to drive innovation in PETs to reduce financial crime and respond to public health emergencies. The goal of this initiative was to provide the opportunity for innovators from academia, institutions, industry and the public to design one technical solution. For the first stage of the competition, teams submitted white papers describing their approaches to privacy-preserving data analytics. In the second stage, they focused on solution development and submitted code for testing their solutions on a platform. In phase three, independent “red teams” executed privacy attacks on the solutions developed in phase two. The winning teams were selected based on attacks by red teams and evaluated by a panel of PETs experts from government, academia and industry.
In February 2022, the U.K. Department for Business, Energy and Industrial Strategy created a project called “PETs for Public Good.” As part of the project, the U.K. Information Commissioner’s Office ran a series of workshops with organizations in the health sector, academics and privacy that focused on how PETs can facilitate data sharing in health and testing these technologies.
I trust regulators will publish official guidance and codes of conduct about the use of PETs, clarify how the use of those technologies can help to enable and satisfy regulatory compliance, define a standard approach on the adequacy of PETs for a given use case, and issue a clear position around the definitions of deidentification, anonymization and pseudonymization of data. The latter represents one of the main challenges for lawyers and technical teams, expanded by the fact that the terminology is often inconsistent across different jurisdictions.
After the cloud era and all the challenges posed by using the cloud, I expect large companies will start to evaluate the use of PETs in secure cloud infrastructures, while considering the probability of deidentification and reverse engineering.