Could AI Tech Programs ChatGPT, Copilot Enable IP Theft?

While the development of AI-based programs such as ChatGPT, Stability AI, and GitHub Copilot could provide substantial benefits for organizations — and individual developers — these applications could present a myriad of thorny legal and regulatory issues for businesses.

The current implementation of large language models (LLMs), which form the core of ChatGPT/Copilot/Codex implementations, was mostly trained on open source code available on the internet under various licensing agreements behind them.

This was mostly done as a proof of concept around the technology. While the data used to train an LLM makes a huge difference in the way the model performs, that is not the only factor.

The architecture of the LLM, the number of reinforcements done after the foundational model is trained, the prompts used in the training process, and the architecture of the prompts themselves play an important role in the way the model performs.

"The challenge we face with the current implementation is that the models may use the code from its training data set as-is, or may combine them in statistical ways that are not easy for the end user to decipher where it was generated," explained Sreekar Krishna, leader of artificial intelligence at advisory firm KPMG.

Such a methodology does pose an intellectual property challenge as it exposes institutions to potential lawsuits, especially if there is no governance around how the AI generated code is accepted into the production code.

"That said, we are at the early stages of generative AI," he said. "The IP issue could be easily surmounted by institutions developing an LLM that uses code that is appropriately licensed from various sources."

This would allow any outputs generated from the system to be used in production systems without any challenges to the licensing or IP issues.

It will also mean that the institutions should be cautious to ensure they have full ownership of the code they generated through AI for further usage, without any IP ties to the product or company that generated the code.

Understanding Licensing and Content Parameters

Jeff Pollard, vice president and principal analyst at research firm Forrester, said he divides the topic into two issues when discussing it with organizations: The first point concerns the information and content input into an AI-aided program.

"Companies need to understand the licensing terms of the solutions they leverage based on the data they put into the system and how providers secure that information," he said.

The second point revolves around the information and content produced by the AI-aided program.

Krishna pulled quote

"The terms, conditions, and ownership rights to the material produced by the AI-aided program and what licensing and rights customers have to that information, and whether their information could be provided to a different client without their consent," he explained.

Krishna added that technologies including Copilot and Codex will evolve to become more integrated into the day-to-day tools used by employees of an institution.

"Just the way we all come to expect spell checking and grammar correction to be part of all text interface — document creation, social media posts, or text messaging apps," he said, "in the same way, we should see that prompt-based outputs will become part of people's productivity tools."

Krishna expects online and desktop-based tools to allow people to prompt first before they engage in actual knowledge work.

"Be it starting a new article, email, or slide deck, we expect that people won't have to start with a blank page," he noted. "They would have access to templates now that are customized for the particular task."

Similar Concerns Arose Over Open Source, Cloud

There have been similar IP concerns around other technologies in the past, including worries over open source software.

"Perhaps the closest comparison to this is a recent one — cloud," Pollard said. "Many of the hyperscalers take extensive effort to limit access to the information customers upload to the cloud — in part to protect themselves from that data."

He explained that a starting point for organizations to approach the issue of AI and potential intellectual property issues is knowing whether you are building or buying AI-aided programs.

"If you are building an AI-aided program internally, then you'll have a better understanding of what data is harvested, what data is produced, and have better visibility into the underlying models and data flows," Pollard said. "But that is not going to be the most common scenario."

The more likely situation is that your firm will be buying AI-aided programs —or AI will be added to existing applications as a feature in the future.

In that case, it's going to be a cross-functional group of stakeholders involving cybersecurity, third-party risk management, legal, procurement, and development.

"You'll need multiple perspectives that can provide insights into the use cases for the AI-aided application, your current cybersecurity practices, risk tolerance, third-party risk management governance, customer contracts, legal implications, and more," Pollard noted. "This is going to take a substantial amount of time to really sort through."

Krishna predicts the IP issues will continue to be a concern for the near future, but if the market shows promise for the usage of these models, it is very likely there will be advances that help deliver models with fewer or no IP concerns attached with them.

"This is not new — we have seen this with open source technologies. When open source technologies first came in, there were no standard licensing terms around them," he noted. "But over time, many different licensing models evolved, like MIT, GPL, and so on. Similarly, we expect that AI model-based licensing might be in the near future."

Additional Scrutiny Over AI Expected

The use of AI, irrespective of whether it's generative or not, is likely to come under increasing levels of scrutiny from the regulators.

"The reason for this is the cloudiness that exists around certain types of algorithms and how they arrive at their decisions," Krishna said. "From a moral and ethical front, it is hard for institutions to lean purely on algorithms without having some human oversight on the outcomes. If we continue to down the road of incorporating critical decision-making algorithms, it will be essential that they be regulated and monitored appropriately."

As these tools get integrated into the line-of-business applications organizations depend on, they will be met with interest, then skepticism, then reliance, Pollard added.

As adoption grows, the potential for data or sensitive intellectual property to leak from one client to the other increases at the same rate.

Pollard said it will take time to sort out the security controls and governance necessary to prevent creative people from figuring out unintended consequences like with every other new technology.

"This is what makes emerging technology interesting," he said. "We don't know what other legal and regulatory liabilities AI-based generative platforms pose, but we can guarantee that things we never imagined will occur as adoption increases and litigation winds its way through court systems."

About the author

Nathan Eddy is a freelance writer for ITPro Today. He has written for Popular Mechanics, Sales & Marketing Management Magazine, FierceMarkets, and CRN, among others. In 2012 he made his first documentary film, The Absent Column. He currently lives in Berlin.

Comments

Plain text