23 Apr The Legal Landscape of Data Scraping: the difficult balance between rights and interests at stake
Authors: Jacopo Dirutigliano, Andrea Strippoli, Miriam Andrea Fadda
Data scraping, the automated extraction of information from HTML code, has become a ubiquitous tool in the digital age. It enables the collection and aggregation of data from various online sources, facilitating tasks such as comparison shopping. For instance, one of the primary applications of data scraping is with search aggregators. Platforms like Skyscanner and Booking.com rely on scraping software to compile and present search results from multiple online travel agencies. While these services provide value to users by simplifying the search process, they often face opposition from data holders who consider scraping as a violation of their rights.
Indeed, this activity touches on aspects in the field of intellectual property, contractual and non-contractual liability, data protection and unfair competition: consequently, the legitimacy of data scraping – which ultimately depends on the jurisdiction and on whether publicly available information is protected by local data regulations – is based on a complexity of legal profiles. For instance, in the field of intellectual property, while the Court of Justice of the European Union, in the Innoweb ruling[1], highlighted the contentious nature of scraping, labeling it as “nearly parasitic” and suggesting potential infringement of database rights, subsequent rulings, such as Ryanair v. Viaggiare[2], have provided nuanced perspectives, acknowledging that scraping datasets is not inherently illegal, except for potential violations of intellectual property rights.
Legal Issues
As anticipated, there are several legal issues surrounding data scraping. Let us take a look.
Contractual obligations
Data holders’ claims against web scrapers often involve contractual obligations.
A significant challenge in assessing the legality of data scraping lies in interpreting website terms of service (ToS) and whether they constitute enforceable contracts. Most scraping activities fall under browsewrap[3] agreements, raising questions about their enforceability. Courts have grappled with the issue of whether scrapers can be held liable for violating ToS to which they never explicitly agreed, highlighting the complexities of regulating online behavior.
Intellectual Property and database protection
The database protection further complicates the legal landscape, particularly in the EU. EU law grants a sui generis protection to database creators based on a substantial investment criterion in obtaining, verifying, or presenting database contents.
Directive 96/9/EC[4] grants exclusive rights to database makers, allowing them to charge for database use and select licensees. In this scenario, scraping may be lawful under exceptions such as temporary copies and text/data mining provisions. In this scenario, legal disputes often arise regarding whether a scraped website constitutes a protected database, with courts assessing investment and extraction substantiality.
Data Protection
European jurisdictions adopt a stringent approach to data protection, especially concerning personal data. While copyright law in Europe provides certain exceptions for web scraping (under directives such as the Directive on Copyright in the Digital Single Market[5] and the Directive on open data and the re-use of public sector information[6]), the General Data Protection Regulation (GDPR)[7] imposes significant constraints on scraping activities involving personal data. Indeed, publicly accessible data is still subject to data protection law. Non-compliance with these regulations can lead to hefty fines, as demonstrated by the Clearview AI case, where the company was found to have scraped billions of images for its facial recognition system.
In this case, the French data protection authority (CNIL)[8] – as other authorities[9] – ordered Clearview AI to cease collecting and using people’s data online for the development of its facial recognition software, as the company had neither obtained consent for the collection nor had a legitimate interest in conducting such activity, especially considering the particularly invasive nature of the process. Clearview’s violation of privacy regulations was deemed particularly serious, as it was established that data subjects could not even expect their facial images to be collected online.
In another case, the Italian Data Protection Authority sanctioned La Prima SRL[10], a real estate agency, after one of its employees accessed a public real estate register to contact property owners on LinkedIn to confirm property ownership and foster connections among individuals sharing similar professions. Despite the company’s argument that the owner’s LinkedIn profile was public, the Authority found that the employee’s action was conducted with the objective of facilitating property sales, contradicting the intended purposes of both the register and LinkedIn, thus infringing upon Article 5(1)(b) of the GDPR.
In UK, the ICO fined Digital Growth Experts Limited[11] for sending unsolicited text messages promoting a hand sanitizing product without valid consent. The company used scraped data from its director’s online marketplace account, without collecting the consent from the data subjects.
Conclusions
In conclusion, the legality of data scraping is a multifaceted issue shaped by legal and ethical principles and considerations. While it offers significant benefits in terms of data aggregation and analysis, it also raises complex questions regarding intellectual property, privacy, and competition. Striking a balance between innovation and safeguarding individual rights remains a paramount challenge for policymakers and legal practitioners in the digital era.
[1] Judgment of the CJEU of 19 December 2013, Case C‑202/12, Innoweb BV v Wegener ICT Media BV and Wegener Mediaventions BV, ECLI:EU:C:2013:850
[2] Italian Court of Cassation, Judgment n. 2289/2290 of 18 December 2018.
[3] An agreement where the user agrees to the contract by browsing the website.
[4] Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases
[5] Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Market and amending Directives 96/9/EC and 2001/29/EC
[6] Directive (EU) 2019/1024 of the European Parliament and of the Council of 20 June 2019 on open data and the re-use of public sector information
[7] Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation)
[8] Commission Nationale de l’Informatique et des Libertés (CNIL), Restricted Committee Deliberation No. SAN-2022-019 of 17 October 2022, available at: https://www.cnil.fr/en/facial-recognition-20-million-euros-penalty-against-clearview-ai
[9] Garante per la Protezione dei dati personali, injunction against Clearview AI, measure No. 50, February 10, 2022, doc. no. 9751362, available at: https://www.garanteprivacy.it/web/guest/home/docweb/-/docweb-display/docweb/9751362
[10] Italian Data Protection Authority, injunction order against La Prima S.r.l. dated September 16, 2021, Measure No. 316, web doc. no. 9705632, available at https://www.garanteprivacy.it/web/guest/home/docweb/-/docweb-display/docweb/9705632
[11] Information Commissioner’s Office (ICO), Monetary Penalty Notice Of 24 september 2020, available at: https://ico.org.uk/media/action-weve-taken/mpns/2618330/dgel-mpn-20200922.pdf