
Insight and analysis on the information technology space from industry thought leaders.
Does AI Get a Free Pass on IP? Understanding 'Fair Use' for AIDoes AI Get a Free Pass on IP? Understanding 'Fair Use' for AI
As AI redefines the scale of "fair use" by rapidly ingesting and repurposing vast amounts of public data, concerns over privacy, intellectual property, and ethical boundaries grow.
February 18, 2025

By Steve Wilson, Exabeam
Since the Copyright Act of 1976, the principle of "fair use" has allowed individuals to gather information from publicly available content without infringing on intellectual property (IP) rights. This informational accessibility that also provides personal protection has existed as a cornerstone of copyright law for decades. However, artificial intelligence (AI) complicates this concept by operating on a vastly different scale. Its ability to scan and synthesize millions of articles in seconds, which often creates content closely mirroring its sources, raises concerns about the future of data privacy.
Instead of passively analyzing large amounts of data, these models are scraping and digesting valuable information and repurposing it, presenting a new challenge to the traditional understanding of "fair use." These systems can ingest megabytes per second, while the human brain yields a rate of about 50 bits per second. The ability to synthesize such a magnitude of information and then produce analyses does present an ethical challenge. The sheer difference in scale is concerning and prompts the question: Is it still "fair use" when AI models can absorb and then repurpose information on such a massive scale?
The concern with AI's synthesizing of public information is that it transcends traditional data analysis. In other words, it is inferring information that the public may not be explicitly sharing. In extreme cases, the technology predicts personal circumstances with remarkable accuracy. For example, over a decade ago, Target used purchasing patterns to discover a teenager's pregnancy. They began sending her father micro-targeted advertisements before her family even knew. Today, these tactics are employed in inference-based marketing, analyzing purchase history, social media posts, and the tone of emails to predict consumer behavior in ways that almost feel invasive.
With this context in mind, even mundane personal data can be accessed by AI without violating any existing privacy laws. Although regulations have attempted to offer some forms of protection through the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA), these standards are limited in managing AI's ability to infer sensitive information. Without a broad standard that dictates the scale of "fair use" when it comes to large language models (LLMs), many organizations are left without an answer.
While the issues surrounding "fair use" are far from resolved, there are early signs that major players are acknowledging the need to license certain types of IP. Large licensing agreements, such as those between Reddit and Google and OpenAI and The Financial Times, demonstrate that companies are finding it necessary and advantageous to seek explicit licenses for data rather than just scrape information and claim "fair use." Though some private companies are taking matters into their own hands, many are looking to regulators for clearer standards.
This question remains: Where do regulators draw the line when AI can predict information that humans have not explicitly shared? Where does the line fall between fair use and exploitation when AI can absorb an entire content library and repurpose it without regard for the original creators?
These questions highlight the need to re-evaluate traditional frameworks that govern data usage and IP. A blurry line exists between ownership and responsible attribution in creative and informational ecosystems. Although regulators are only beginning to tackle these questions, organizations must do their part to address these challenges presented by AI. They should ensure that any use of LLMs within their organizations include:
Clear Communication: Provide users with clear policies about sharing personal or sensitive information with LLMs. Ensure that users understand the implications of sharing personal information.
Data Sanitization: Implement algorithms that remove personal information or other sensitive data from LLMs. Filtering out this information can protect companies from exposing private details or violating privacy standards.
Temporary Memory: Automatically erase sensitive information after the session ends, ensuring there is no long-term retention of personal information. This not only protects the privacy of the user but also reduces the risk of having this personal information accessed in the future.
AI's growing ability to predict and infer beyond what consumers explicitly share demands stricter guidelines and safeguards. Without standard regulations, companies should take proactive steps to protect data privacy. Protective measures such as robust data sanitization, transparent user policies, and memory-limiting practices can restrict the amount of information LLMs access. The concern around "fair use" is not just a question of legal implications. As regulators address these concerns, businesses and individuals alike will be impacted by the determination of AI's limitations on personal privacy.
About the author:
Steve Wilson is chief product officer at Exabeam, project lead at OWASP, and author of "The Developer's Playbook for Large Language Model Security."
About the Author
You May Also Like