Does AI-Assisted Coding Violate Open Source Licenses?

AI has thrown a wrench in traditional understandings of open source software licensing — and developers should pay attention, whether or not they use open source software in the conventional sense.

The reason why is that AI-powered code generation tools, like GitHub Copilot and Amazon CodeWhisperer, are raising complex legal questions about what counts as open source licensing infringement. The answers to those questions may turn out to be benign — or they may place developers who use AI-assisted code generators in legal hot water.

Here's what developers need to know about the potential licensing implications of AI-assisted coding, and how to make informed decisions about risks associated with these tools.

Copilot, AI-Assisted Coding, and Open Source Licenses

The cause for potential concern surrounding AI-assisted coding and open source licensing infringement is simple enough: AI-assisted code generators like Copilot were trained by parsing millions of lines of open source code, and they use that code as the basis for the code they write.

As a result, it's plausible to argue — as some folks have — that AI-assisted coding tools infringe on open source software licenses because the tools analyze source code to generate their own code, so AI-generated code could be considered a "derivative work" of open source codebases. Under the terms of many open source licenses, this would require the auto-generated code to be governed by the same protections (such as the requirement that it remain publicly available) as the original open source code on which the AI code generators were trained.

The fact that most AI-assisted coding tools never asked for or received permissions from open source developers to train using their code also complicates matters — although legally, that's probably not as important as the argument that AI-generated code counts as a derivative work of open source projects.

At least one developer has already launched a campaign aiming to investigate GitHub, which owns Copilot, for "violating its legal duties to open-source authors and end users." That campaign has spawned a class-action lawsuit "on behalf of a proposed class of possibly millions of GitHub users … challenging the legality of GitHub Copilot." The suit targets not just Microsoft (which owns GitHub and Copilot) but also OpenAI, whose AI engine powers Copilot.

Is AI-Generated Code Really Illegal?

Parties claiming that AI-assisted coders have broken the law — or at least violated licenses — seem to be facing an uphill battle inside the courtroom in many ways.

Probably the hardest hurdle for them to clear stems from the fact that tools like Copilot don't actually copy open source code verbatim. They generate their own, original code. They analyze code written by other people to generate their code, but their code is their own.

In this sense, AI-generated code doesn't seem to be all that different from code that human programmers write by looking at other people's code and using it to help guide their own programming endeavors. To my knowledge, no one has ever argued that a developer who reads publicly available code written by other developers has violated anyone's rights or license. To succeed, a lawsuit would have to show that parsing of public code repositories by AI-powered tools is different from humans who look at publicly available code, and that seems hard to do.

An Open Question

For now, the issue of whether tools like Copilot violate open source licenses, and whether developers who use Copilot are bound by the terms of the licenses associated with the code on which Copilot trained, remains an open question. But it has important ramifications for the future of both AI-assisted coding and open source licensing.

If a court were to decide that AI-assisted coding violates open source licensing terms, it would threaten to shut down the nascent AI-generated coding industry before it really has a chance to take off.

It would also set a precedent that open source licensing terms extend much further than most people previously imagined. It would establish a much more expansive definition of "derivative work" in this context, and it might make some developers (and businesses) think harder about when to use open source code, and which specific open source licenses to use or to avoid.

Fear, Uncertainty, and Doubt: Open Source Edition

I'm no lawyer, but it seems unlikely to me that courts would actually find AI-assisted coding tools to be in violation of open source licenses.

I also have a hunch that a lot of the legal saber-rattling that has occurred surrounding this issue reflects a desire by some developers to throw shade over AI-assisted coding — and possibly Microsoft, the parent company of GitHub — more than it results from genuine concern over licensing terms and developer rights. After all, you'd think that if these folks were worried about the legality of AI-assisted coding in general, they'd also be investigating or suing companies like Amazon, which has developed an AI-assisted development tool of its own, CodeWhisperer, which was also trained in part on open source codebases.

But they're not. They're singling out Microsoft and its partners.

Nonetheless, there's a chance that the fear, uncertainty, and doubt raised by this debate will itself entice many developers to shy away from AI-assisted coding. In that respect, the campaign against tools like Copilot may succeed, even if it ends up having no legal leg to stand on.

That's ironic, because Microsoft adopted this very strategy starting back in the 1990s, when it filed legally dubious lawsuits involving Unix licenses in order to discourage businesses from using Linux. More than two decades later, Microsoft has declared its love for Linux, and it no longer spreads fear, uncertainty, and doubt about open source software. But at least a few open source developers are now arguably wielding this tactic against an AI-assisted coding tool owned by Microsoft.

As the French say: Plus ça change, plus c'est la même chose.

Christopher Tozzi is a technology analyst with subject matter expertise in cloud computing, application development, open source software, virtualization, containers and more. He also lectures at a major university in the Albany, New York, area. His book, “For Fun and Profit: A History of the Free and Open Source Software Revolution,” was published by MIT Press.

Comments

Plain text