Amendments have been made to the UK's Data (Use and Access) Bill which put stringent requirements on webcrawlers and AI developers to be transparent in their identity, purpose and use of copyright protected materials.
The amends to the Bill were made as it passed through the House of Lords, by a cross-bench peer, Baroness Kidron. The amendments are directed to webcrawlers and general-purpose artificial intelligence models (not further defined) whose services are marketed at the UK or who have a significant number of users in the UK ("have links with the United Kingdom within the meaning of section 4(5) of the Online Safety Act 2023" which defines this as (a) the service has a significant number of UK users, or (b) UK users form one of the target markets for the service (or the only target market)).
Such operators are required to comply with UK copyright law, regardless of the jurisdiction in which the copyright-relevant acts relating to the pre-training, development and operation of those web crawlers and general-purpose AI models take place, and to be transparent about their use of data. Those providing these services would need to comply with UK copyright law and provide transparency (on request) as to the copyright works use at every stage development (including pre-training and training) and to provide the identity of the crawler and its purpose, to operate distinct crawlers for different purposes, and not to penalise copyright holders who choose to deny scraping for AI by downranking their content in, or removing their content from, a search engine.
Peter Dalton, partner in the HSF IP and Cyber team, commented:
"The proposed amendments to the Data (Use and Access) Bill, would undoubtedly make it easier for copyright owners to identify the works that are being used to train AI engines, which has historically been difficult due to lack of transparency. Although these amendments from the House of Lords provide an interesting framework within which to balance the rights of creators and developers, it seems likely that we will need to await the outcome of the current consultation on Copyright and AI for the UK government to finalise its approach. "
The current draft of the Bill is accessible here. Sections 134 to 138 are those amendments referred to above and cover Compliance with UK copyright law by operators of web crawlers and general-purpose AI models (s.134); Transparency of crawler identity, purpose, and segmentation (s.135); Transparency of copyrighted works scraped (s.136); Enforcement (s.137); and Technical solutions (s.138). Details of the amendments are discussed below.
Pre-empting the conclusions of the UK government's Consultation on Copyright and AI?
In parallel, the UK government is consulting on Copyright and AI (see our blog post here) where many of the same issues of copyright infringement and transparency are being considered. Finding a compromise between the rights (and encouragement) of AI developers and the rights (and protection) of creators was something that previous UK government consultations have found extremely difficult – the last attempt to establish a text and data mining exception failed for example. The current consultation closes on 25 February but there is no set date for it to report. It therefore seems that although this amendment to the Bill provides a firm stance on protection of copyright works which might be accessed or copied by webcrawlers or in AI training, it seems unlikely that the government would want to introduce such changes before the consultation has reached its conclusions.
Separately, see also our post on the recent decision in Getty v Stability AI case in relation to representative actions by copyright owners re the use of their works in the training of AI machines.
The detail of the amendments to the Bill
Information on the works used for training etc must be available in an easily accessible platform and updated at the same time as any change, and there must be an identifiable contact in relation to all this. These requirements would apply to the entire lifecycle of a general-purpose AI model, including but not limited to—
- pre-training and training,
- fine tuning,
- grounding and retrieval-augmented generation, and
- the collection of data for the said purposes.
The transparency provisions require that the operators of web crawlers and general-purpose artificial intelligence (AI) models whose services have links with the UK (as defined above) to disclose information regarding the identity of crawlers used by them or by third parties on their behalf, including but not limited to—
- the name of the crawler,
- the legal entity responsible for the crawler,
- the specific purposes for which each crawler is used,
- the legal entities to which operators provide data scraped by the crawlers they operate, and
- a single point of contact to enable copyright owners to communicate with them and to lodge complaints about the use of their copyrighted works.
In addition the amendments require the Secretary of State to make provision requiring operators of web crawlers and general-purpose AI models to deploy distinct crawlers for different purposes, including but not limited to—
- web indexing for search engine results pages,
- general-purpose AI model pre-training, and
- retrieval-augmented generation and should also be such that they ensure that the exclusion of a crawler by a copyright owner does not negatively impact the findability of the copyright owner’s content in a search engine.
In relation to scraped works the transparency requirements provide for the providers to to disclose information regarding text and data used in the pre-training, training and fine-tuning of general-purpose AI models, including but not limited to—
- the URLs accessed by crawlers deployed by them or by third parties on their behalf or from whom they have obtained text or data,
- the text and data used for the pre-training, training and fine-tuning, including the type and provenance of the text and data and the means by which it was obtained,
- information that can be used to identify individual works, and
- the timeframe of data collection.
The disclosure of this information must be updated on a monthly basis in such form as the regulations (which the Secretary of State will make) may prescribe and be published in such manner as the regulations may prescribe so as to ensure that it is accessible to copyright owners upon request.
This would all be enforced by the Information Commissioner (as established under the Data Protection Act 2018) who would be given powers by the Secretary of State to by written notice (an “information notice”) require a relevant operator to provide the Commissioner with information that the Commissioner reasonably requires for the purposes of investigating a suspected failure to comply with the duties; and also to be able to issue enforcement notices.
Lastly the amendments required the Secretary of State to conduct a review of the technical solutions that may be adopted by copyright owners and by the operators of web crawlers and general-purpose artificial intelligence (AI) models whose services have links with the United Kingdom (as defined above) to prevent and to identify the unauthorised scraping or other unauthorised use of copyright owners’ text and data. Within 18 months of the Data (Use and Access) Act being passed, the Secretary of State would be required to report on their finding and issue guidance as to the technical solutions to be adopted and other recommendations for the protection of the interests of copyright owners.
The provision for all of this would be made by the Secretary of State via statutory instruments within 6 months of the passing of the Bill. However, Lord Patrick Vallance (Minister of State for Science, Research and Innovation), commented that these amendments were not needed as the current proposals would not impact on a copyright owner's current ability to take action if its works were used by an AI.
Key contacts
Disclaimer
The articles published on this website, current at the dates of publication set out above, are for reference purposes only. They do not constitute legal advice and should not be relied upon as such. Specific legal advice about your specific circumstances should always be sought separately before taking any action.