Home Source code How GitHub Copilot Could Get Microsoft Into a Copyright Storm

How GitHub Copilot Could Get Microsoft Into a Copyright Storm


An anonymous reader quotes a Registry report: GitHub Copilot — a programming self-suggestion tool formed from public source code on the Internet — was caught generate what appears to be copyrighted code, prompting an attorney to investigate a possible claim of copyright infringement. On Monday, Matthew Butterick, attorney, designer, and developer, announced that he was working with law firm Joseph Saveri to investigate the possibility of filing a copyright claim against GitHub. There are two potential lines of attack here: is GitHub improperly training Copilot on open source code, and is the tool improperly emitting other people’s copyrighted work – excerpt training data – to suggest code snippets to users?

Butterick has been criticizing Copilot since its launch. In June, he publishes a blog post arguing that “any code generated by Copilot may contain license or intellectual property violations”, and should therefore be avoided. That same month, Denver Gingerich and Bradley Kuhn of the Software Freedom Conservancy (SFC) said their organization would stop using GitHub, largely because Microsoft and GitHub released Copilot without addressing concerns about how the machine learning was dealing with the various open source. licensing requirements.

Copilot’s ability to copy code verbatim, or nearly so, surfaced last week when Tim Davis, professor of computer science and engineering at Texas A&M University, discovered that Copilot, when prompted, reproduced its copyrighted sparse-matrix transposition code. Asked for comment, Davis said he would rather wait to hear from GitHub and its parent Microsoft about his concerns. In an email to The Register, Butterick said there had been a strong response to news of his investigation. “It’s clear that many developers have been concerned about what Copilot means for open source,” he writes. “We hear a lot of stories. Our experience with Copilot has been similar to what others have found – that it’s not hard to get Copilot to issue verbatim code from identifiable open source repositories. as we expand our investigation, we expect to see “But keep in mind that verbatim copying is just one of many problems presented by Copilot. For example, a software author’s copyright in their code can be infringed without textual copying. Additionally, most open source code is covered by a license, which imposes additional legal requirements. Has Copilot met these requirements? We are investigating all such issues.” GitHub’s documentation for Copilot warns that the output may contain “unwanted patterns” and puts the responsibility for intellectual property infringement on the Copilot user, the report notes.

Bradley Kuhn of the Software Freedom Conservancy is less willing to set aside how Copilot handles software licensing. “What Microsoft’s GitHub did in this process is absolutely unconscionable,” he said. “Without discussion, consent or engagement with the FOSS community, they have stated that they know better than the courts and our laws what is or is not permitted under a FOSS license. ‘grant of all FOSS licenses, and, more importantly, the more freedom-protecting requirements of copyleft licenses.’

Brett Becker, assistant professor at University College Dublin in Ireland, told The Register in an email: “AI-assisted programming tools are not going away and will continue to evolve. Where these tools fit into the current landscape of programming practices, law, and community standards are only just beginning to be explored and will also continue to evolve.” He added, “An interesting question is: what will be the main drivers of this evolution Will these tools fundamentally change future practices, laws and community standards — or will our practices, laws and community standards prove resilient and lead to the evolution? tools?”