Follow us

On 18 December 2024, the European Data Protection Board ("EDPB") issued its hotly anticipated opinion on the use of personal data for the development and deployment phases of AI models (the "Opinion") (see press release here). The Opinion follows a request from the Irish Data Protection Commission ("DPC") in September 2024 with a view to achieving greater regulatory harmonisation across the EU on the use of AI .

Key takeaways

The Opinion sits against the backdrop of an ever-increasing uptake in the use of AI in our day to day lives, as well as close scrutiny from national supervisory authorities ("SA"), particularly around the use of personal data to train AI models (particularly large language models). 2024 saw multiple complaints by privacy rights groups on the topic, as well as investigations levelled by SAs against the likes of Google, Meta and X (refer to our Data Wrap entries here, here and here).

The Opinion provides some helpful guidance on the use of personal data in the development and deployment of AI models; although much of it is unsurprising and in a number of key areas, the guidance depends on the case in question, requiring SAs to consider a particular issue on a case-by-case basis. This may be a consequence of the varying nature of AI models to which this guidance relates, as well as the nature of the questions levelled at the EDPB by the DPC.

The Opinion also seems to set out some relatively high thresholds for controllers to satisfy in the context of the development and deployment of AI models and around fulfilling the "necessity" limb of the three-step test to rely on legitimate interests as a lawful basis for processing personal data.

In addition, the EDPB uses the Opinion to flag the need for controllers that deploy AI models to conduct an appropriate assessment as to whether an AI model was developed lawfully (including whether the processing in the development phase was subject to a finding of non-compliance with the EU GDPR, particularly if it was determined by an SA or a court). This point will be particularly relevant as part of the due diligence process where a controller wanting to deploy an AI model procures it from a third party (such as a large language model).

The Opinion clarifies that there are certain areas it does not cover as well. These include some of the more challenging aspects of AI development, such as the processing of special categories of data and purpose limitation. It will be interesting to see whether these are covered in any subsequent guidance from the EDPB.

A deeper dive

In line with the DPC's request, the Opinion provides guidance on the following key issues:

1. Anonymous AI models – is personal data processed in an AI model?:

The Opinion notes that whether an AI model is "anonymous" or not should be assessed on a case-by-case basis by the relevant supervisory authority ("SA").

The EDPB sets out a relatively high threshold for anonymity; for an AI model to be considered anonymous, the EDPB proposes that both the likelihood of:

  • direct (including probabilistic) extraction of personal data regarding individuals whose personal data were used to the develop the model; and
  • obtaining personal data from queries (whether intentionally or not),

should be insignificant – taking into account "all the means reasonably likely to be used" by the controller or another person. The EDPB considers that AI models are very likely to require a thorough evaluation of the risks of identification and it will be interesting to see how this threshold in the Opinion reads, for example, beside Recital 26 of the EU GDPR which relates anonymisation and pseudonymisation.

To assist with conducting the case-by-case assessment, the Opinion sets out examples of possible elements that may be considered by SAs when assessing a controller's claim of anonymity. These include the approach adopted by controllers during the design phase, to prevent or limit the collection of personal data used for training and to reduce their identifiability (for example, through: selecting the source(s) of training data, data preparation and minimisation, effective governance at the design stage, ongoing testing, and documentation demonstrating anonymity). The Opinion emphasises the importance of this documentation where a SA is evaluating anonymity.

The EDPB Work Programme for 2024 – 2025 also notes that the EDPB plans to issue guidelines on "anonymisation, pseudonymisation and data scraping in the context of generative AI."

2. Legitimate interest: Can a data controller rely on legitimate interests as a lawful basis to develop and deploy an AI Model?

Again, this question needs to be considered on a case-by-case basis. The Opinion re-confirms the EDPB's 23 May Report of work undertaken by the ChatGPT Taskforce which suggested that "legitimate interests" might be a possible lawful basis for processing personal data in the context of data scraping. 

The Opinion also provides some general considerations for SAs to take into account when assessing whether controllers can rely on legitimate interests as an appropriate lawful basis. In particular, the EDPB reinforces that the three-step test for legitimate interests should continue to be used in the context of establishing a lawful basis for the processing of data for developing or deploying AI models, and provided further guidance on how the test might apply. See our blog here for a summary of the test.

  • Legitimacy test: An interest may be regarded as legitimate if it is (i) lawful; (ii) clearly and precisely articulated; and (iii) real and present, not speculative. The Opinion gives the use of an AI conversational agent to assist users or the use of AI to improve cyber threat detection as examples of legitimate interests, subject to the other two limbs of the test.
  • Necessity test: In an AI context, the intended volume of personal data involved may need to be assessed to consider whether the processing is proportionate to pursue the legitimate interest (in light of the data minimisation principle), as well as whether there are less intrusive alternatives to achieve the legitimate interest, taking into account the broader context of the processing, such as whether the controller has a direct relationship with the data subjects (first-party data) or not (third-party data). It will be interesting to see whether the "necessity" limb of the test can be satisfied, for example, in the context of data scraping from the internet where there are less intrusive alternatives available to achieve the result.
  • Balancing test: The Opinion provides an overview of the elements that SAs may take into account when evaluating whether the interests of a controller or a third party are overridden by the interests, fundamental rights and freedoms of data subjects. The EDPB provides guidance to help SAs gauge the impact on data subjects when processing data in an AI context. For example, it set outs criteria to be considered when assessing whether data subjects may reasonably expect certain uses of their personal data (e.g. the nature of the data processed by the models (including whether the personal data is publicly available), the context of the processing and the possible further consequences of processing, nature of the services, awareness of the data subject that their data is online, etc.). The EDPB also provides examples of measures to mitigate the impact on data subjects, including specific technical measures in the context of data-scraping (such as excluding certain data content from publication, excluding certain sources from data collection, or imposing time limits on collection).

The EDPB clarifies that the development and deployment phases involve distinct processing activities requiring separate lawful bases and should also be assessed on a case-by-case basis. This applies whether or not the same or different controllers are involved in each phase. 

3. Unlawfully processed personal data

The DPC's query which the Opinion seeks to answer on this topic relates to the scenario where "an AI model has been found to have been created, updated or developed using unlawfully processed personal data, what is the impact of this on the lawfulness of the continued or subsequent processing or operation of the AI model, either on its own or as part of an AI system?"

Three different scenarios are considered in the opinion. Of particular note is the scenario where personal data is retained in the AI model (i.e. does not satisfy the anonymisation threshold referenced in point 1 above) and is processed by another controller deploying of the model.

In terms of the lawfulness of the continued / subsequent processing of the AI model, the Opinion states that SAs should take into account whether the controller deploying the model conducted an appropriate assessment to ascertain that the AI model was not developed by unlawfully processing personal data (as part of its accountability obligations to demonstrate compliance with Article 5(1)(a) and Article 6, EU GDPR).

The assessment should take into account non-exhaustive criteria, for example, the source of the personal data (e.g. if the data originated from a personal data breach) and whether the processing in the development phase was subject to a finding of an infringement of the EU GDPR, particularly if it was determined by an SA or a court – so that the controller deploying the model could not ignore that the initial processing was unlawful. The degree of the assessment by the controller and the level of detail should be commensurate to the type and degree of risk raised by the processing in the deployment phase.

Where legitimate interests are relied on as the lawful basis for subsequent processing, the Opinion clarifies that the fact that the initial processing was unlawful should be taken into account in the legitimate interest assessment (e.g. with regard to the potential risks for data subjects whose personal data were unlawfully processed to develop the model or the fact that data subjects may not expect such subsequent processing). Different aspects, either of a technical nature (e.g. existence of filters or access limitations during development of the model) or of a legal nature (e.g. nature and severity of the unlawfulness of the initial processing) need to be given due consideration within the balancing test.

SAs may also impose corrective measures including fines, temporary limitations on processing, erasing parts (or, in extreme cases, the entirety) of datasets used to develop the AI model, etc.

Implication on DPC investigation into OpenAI 

The Italian data protection authority (the "Garante") made headlines back in March 2023 when it introduced a temporary ban on chatbot, ChatGPT (which was subsequently lifted), and launched an investigation into the processing by its provider, OpenAI, for suspected breaches of the EU GDPR (including inadequate legal basis for data processing and lack of transparency) – refer to our data wrap entry here for further information. 

December 2024 saw the investigation conclude with the Garante announcing a fine of €15 million and corrective measures imposed on OpenAI. Whilst the press release was published on 20 December 2024 (shortly after the EDPB Opinion was released), the decision was determined on 2 November 2024 and pre-dated the Opinion. In parallel, the Garante also referred the case to the DPC, as the lead supervisory authority, given that OpenAI subsequently established its European headquarters in Ireland. Given the "case-by-case" basis for interpreting the Opinion and some of the higher thresholds it sets out, it will be interesting to see how the DPC interprets the Opinion going forward when deciding this OpenAI case involving a large language model. 
 

Key contacts

Miriam Everett photo

Miriam Everett

Partner, Global Head of Data Protection and Privacy, London

Miriam Everett
Claire Wiseman photo

Claire Wiseman

Knowledge Lawyer, London

Claire Wiseman
Duc Tran photo

Duc Tran

Of Counsel, London

Duc Tran
Sara Lee photo

Sara Lee

Associate, London

Sara Lee
Miriam Everett Claire Wiseman Duc Tran Sara Lee