How GitHub Copilot Could Steer Microsoft Into a Copyright Storm

An anonymous reader quotes a report from the Register: GitHub Copilot — a programming auto-suggestion tool trained from public source code on the internet — has been caught generating what appears to be copyrighted code, prompting an attorney to look into a possible copyright infringement claim. On Monday, Matthew Butterick, a lawyer, designer, and developer, announced he is working with Joseph Saveri Law Firm to investigate the possibility of filing a copyright claim against GitHub. There are two potential lines of attack here: is GitHub improperly training Copilot on open source code, and is the tool improperly emitting other people’s copyrighted work — pulled from the training data — to suggest code snippets to users?

Butterick has been critical of Copilot since its launch. In June he published a blog post arguing that “any code generated by Copilot may contain lurking license or IP violations,” and thus should be avoided. That same month, Denver Gingerich and Bradley Kuhn of the Software Freedom Conservancy (SFC) said their organization would stop using GitHub, largely as a result of Microsoft and GitHub releasing Copilot without addressing concerns about how the machine-learning model dealt with different open source licensing requirements.

Copilot’s capacity to copy code verbatim, or nearly so, surfaced last week when Tim Davis, a professor of computer science and engineering at Texas A&M University, found that Copilot, when prompted, would reproduce his copyrighted sparse matrix transposition code. Asked to comment, Davis said he would prefer to wait until he has heard back from GitHub and its parent Microsoft about his concerns. In an email to The Register, Butterick indicated there’s been a strong response to news of his investigation. “Clearly, many developers have been worried about what Copilot means for open source,” he wrote. “We’re hearing lots of stories. Our experience with Copilot has been similar to what others have found — that it’s not difficult to induce Copilot to emit verbatim code from identifiable open source repositories. As we expand our investigation, we expect to see more examples. “But keep in mind that verbatim copying is just one of many issues presented by Copilot. For instance, a software author’s copyright in their code can be violated without verbatim copying. Also, most open-source code is covered by a license, which imposes additional legal requirements. Has Copilot met these requirements? We’re looking at all these issues.” GitHub’s documentation for Copilot warns that the output may contain “undesirable patterns” and puts the onus of intellectual property infringement on the user of Copilot, notes the report.

Bradley Kuhn of the Software Freedom Conservancy is less willing to set aside how Copilot deals with software licenses. “What Microsoft’s GitHub has done in this process is absolutely unconscionable,” he said. “Without discussion, consent, or engagement with the FOSS community, they have declared that they know better than the courts and our laws about what is or is not permissible under a FOSS license. They have completely ignored the attribution clauses of all FOSS licenses, and, more importantly, the more freedom-protecting requirements of copyleft licenses.”

Brett Becker, assistant professor at University College Dublin in Ireland, told The Register in an email, “AI-assisted programming tools are not going to go away and will continue to evolve. Where these tools fit into the current landscape of programming practices, law, and community norms is only just beginning to be explored and will also continue to evolve.” He added: “An interesting question is: what will emerge as the main drivers of this evolution? Will these tools fundamentally alter future practices, law, and community norms — or will our practices, law and community norms prove resilient and drive the evolution of these tools?”

Read more of this story at Slashdot.

Rust 1.63 Released, Adding Scoped Threads

This week the Rust team announced the release of Rust 1.63.

One noteable update? Adding scoped threads to the standard library:

Rust code could launch new threads with std::thread::spawn since 1.0, but this function bounds its closure with ‘static. Roughly, this means that threads currently must have ownership of any arguments passed into their closure; you can’t pass borrowed data into a thread. In cases where the threads are expected to exit by the end of the function (by being join()’d), this isn’t strictly necessary and can require workarounds like placing the data in an Arc.

Now, with 1.63.0, the standard library is adding scoped threads, which allow spawning a thread borrowing from the local stack frame. The std::thread::scope API provides the necessary guarantee that any spawned threads will have exited prior to itself returning, which allows for safely borrowing data.
The official Rust RFC book says “The main drawback is that scoped threads make the standard library a little bit bigger,” but calls it “a very common and useful utility…great for learning, testing, and exploratory programming.

“Every person learning Rust will at some point encounter interaction of borrowing and threads. There’s a very important lesson to be taught that threads can in fact borrow local variables, but the standard library [didn’t] reflect this.” And otherwise, “Implementing scoped threads is very tricky to get right so it’s good to have a reliable solution provided by the standard library.”

Read more of this story at Slashdot.

Vim 9.0 Released

After many years of gradual improvement Vim now takes a big step with a major release. Besides many small additions the spotlight is on a new incarnation of the Vim script language: Vim9 script. Why Vim9 script:
A new script language, what is that needed for? Vim script has been growing over time, while preserving backwards compatibility. That means bad choices from the past often can’t be changed and compatibility with Vi restricts possible solutions. Execution is quite slow, each line is parsed every time it is executed.

The main goal of Vim9 script is to drastically improve performance. This is accomplished by compiling commands into instructions that can be efficiently executed. An increase in execution speed of 10 to 100 times can be expected. A secondary goal is to avoid Vim-specific constructs and get closer to commonly used programming languages, such as JavaScript, TypeScript and Java.

The performance improvements can only be achieved by not being 100% backwards compatible. For example, making function arguments available by creating an “a:” dictionary involves quite a lot of overhead. In a Vim9 function this dictionary is not available. Other differences are more subtle, such as how errors are handled. For those with a large collection of legacy scripts: Not to worry! They will keep working as before. There are no plans to drop support for legacy script. No drama like with the deprecation of Python 2.

Read more of this story at Slashdot.

Researchers Claim Travis CI API Leaks ‘Tens of Thousands’ of User Tokens

Ars Technica describes Travis CI as “a service that helps open source developers write and test software.” They also wrote Monday that it’s “leaking thousands of authentication tokens and other security-sensitive secrets.

“Many of these leaks allow hackers to access the private accounts of developers on Github, Docker, AWS, and other code repositories, security experts said in a new report.”

The availability of the third-party developer credentials from Travis CI has been an ongoing problem since at least 2015. At that time, security vulnerability service HackerOne reported that a Github account it used had been compromised when the service exposed an access token for one of the HackerOne developers. A similar leak presented itself again in 2019 and again last year.

The tokens give anyone with access to them the ability to read or modify the code stored in repositories that distribute an untold number of ongoing software applications and code libraries. The ability to gain unauthorized access to such projects opens the possibility of supply chain attacks, in which threat actors tamper with malware before it’s distributed to users. The attackers can leverage their ability to tamper with the app to target huge numbers of projects that rely on the app in production servers.

Despite this being a known security concern, the leaks have continued, researchers in the Nautilus team at the Aqua Security firm are reporting. A series of two batches of data the researchers accessed using the Travis CI programming interface yielded 4.28 million and 770 million logs from 2013 through May 2022. After sampling a small percentage of the data, the researchers found what they believe are 73,000 tokens, secrets, and various credentials.

“These access keys and credentials are linked to popular cloud service providers, including GitHub, AWS, and Docker Hub,” Aqua Security said. “Attackers can use this sensitive data to initiate massive cyberattacks and to move laterally in the cloud. Anyone who has ever used Travis CI is potentially exposed, so we recommend rotating your keys immediately.”

Read more of this story at Slashdot.

Museum Restores 21 Rare Videos from Legendary 1976 Computing Conference

At Silicon Valley’s Computer History Museum, the senior curator just announced the results of a multi-year recovery and restoration process: making available 21 never-before-seen video recordings of a legendary 1976 conference:

For five summer days in 1976, the first generation of computer rock stars had its own Woodstock. Coming from around the world, dozens of computing’s top engineers, scientists, and software pioneers got together to reflect upon the first 25 years of their discipline in the warm, sunny (and perhaps a bit unsettling) climes of the Los Alamos National Laboratories, birthplace of the atomic bomb.

Among the speakers:

– A young Donald Knuth on the early history of programming languages

– FORTRAN designer John Backus on programming in America in the 1950s — some personal perspectives

– Harvard’s Richard Milton Bloch (who worked with Grace Hopper in 1944)

– Mathematician/nuclear physicist StanisÅaw M. Ulam on the interaction of mathematics and computing
– Edsger W. Dijkstra on “a programmer’s early memories.”

The Computer History Museum teases some highlights:

Typical of computers of this generation, the 1946 ENIAC, the earliest American large-scale electronic computer, had to be left powered up 24 hours a day to keep its 18,000 vacuum tubes healthy. Turning them on and off, like a light bulb, shortened their life dramatically. ENIAC co-inventor John Mauchly discusses this serious issue….
The Los Alamos peak moment was the brilliant lecture on the British WW II Colossus computing engines by computer scientist and historian of computing Brian Randell. Colossus machines were special-purpose computers used to decipher messages of the German High Command in WW II. Based in southern England at Bletchley Park, these giant codebreaking machines regularly provided life-saving intelligence to the allies. Their existence was a closely-held secret during the war and for decades after. Randell’s lecture was — excuse me — a bombshell, one which prompted an immediate re-assessment of the entire history of computing. Observes conference attendee (and inventor of ASCII) IBM’s Bob Bemer, “On stage came Prof. Brian Randell, asking if anyone had ever wondered what Alan Turing had done during World War II? From there he went on to tell the story of Colossus — that day at Los Alamos was close to the first time the British Official Secrets Act had permitted any disclosures. I have heard the expression many times about jaws dropping, but I had really never seen it happen before.”

Publishing these original primary sources for the first time is part of CHM’s mission to not only preserve computing history but to make it come alive. We hope you will enjoy seeing and hearing from these early pioneers of computing.

Read more of this story at Slashdot.

Should IT Professionals Be Liable for Ransomware Attacks?

Denmark-based Poul-Henning Kamp describes himself as the “author of a lot of FreeBSD, most of Varnish and tons of other Open Source Software.” And he shares this message in June’s Communications of the ACM.

“The software industry is still the problem.”
If any science fiction author, famous or obscure, had submitted a story where the plot was “modern IT is a bunch of crap that organized crime exploits for extortion,” it would have gotten nowhere, because (A) that is just not credible, and (B) yawn!

And yet, here we are…. As I write this, 200-plus corporations, including many retail chains, have inoperative IT because extortionists found a hole in some niche, third-party software product most of us have never heard of.

But he’s also proposing a solution.
In Denmark, 129 jobs are regulated by law. There are good and obvious reasons why it is illegal for any random Ken, Brian, or Dennis to install toilets or natural-gas furnaces, perform brain surgery, or certify a building is strong enough to be left outside during winter. It may be less obvious why the state cares who runs pet shops, inseminates cattle, or performs zoological taxidermy, but if you read the applicable laws, you will learn that animal welfare and protection of endangered species have many and obscure corner cases.

Notably absent, as in totally absent, on that list are any and all jobs related to IT; IT architecture, computers, computer networks, computer security, or protection of privacy in computer systems. People who have been legally barred and delicensed from every other possible trade — be it for incompetence, fraud, or both — are entirely free to enter the IT profession and become responsible for the IT architecture or cybersecurity of the IT system that controls nearly half the hydrocarbons to the Eastern Seaboard of the U.S….

With respect to gas, water, electricity, sewers, or building stability, the regulations do not care if a company is hundreds of years old or just started this morning, the rules are always the same: Stuff should just work, and only people who are licensed — because they know how to — are allowed to make it work, and they can be sued if they fail to do so.

The time is way overdue for IT engineers to be subject to professional liability, like almost every other engineering profession. Before you tell me that is impossible, please study how the very same thing happened with electricity, planes, cranes, trains, ships, automobiles, lifts, food processing, buildings, and, for that matter, driving a car.

As with software product liability, the astute reader is apt to exclaim, “This will be the end of IT as we know it!” Again, my considered response is, “Yes, please, that is precisely my point!”

Read more of this story at Slashdot.

Is GitHub Suspending the Accounts of Russian Developers at Sanctioned Companies?

“Russian software developers are reporting that their GitHub accounts are being suspended without warning if they work for or previously worked for companies under U.S. sanctions, writes Bleeping Computer:

According to Russian media outlets, the ban wave began on April 13 and didn’t discriminate between companies and individuals. For example, the GitHub accounts of Sberbank Technology, Sberbank AI Lab, and the Alfa Bank Laboratory had their code repositories initially disabled and are now removed from the platform…. Personal accounts suspended on GitHub have their content wiped while all repositories become immediately out of reach, and the same applies to issues and pull requests.

Habr.com [a Russian collaborative blog about IT] reports that some Russian developers contacted GitHub about the suspension and received an email titled ‘GitHub and Trade Controls’ that explained their account was disabled due to US sanctions. This email contains a link to a GitHub page explaining the company’s policies regarding sanctions and trade controls, which explains how a user can appeal their suspension. This appeal form requires the individual to certify that they do not use their GitHub account on behalf of a sanctioned entity. A developer posted to Twitter saying that he could remove the suspension after filling out the form and that it was due to his previous employer being sanctioned.
A GitHub blog post in March had promised to ensure the availability of open source services “to all, including developers in Russia.” So Bleeping Computer contacted a GitHub spokesperson, who explained this weekend that while GitHub may be required to restrict some users to comply with U.S. laws, “We examine government sanctions thoroughly to be certain that users and customers are not impacted beyond what is required by law.”
According to this, the suspended private accounts are either affiliated, collaborating, or working with/for sanctioned entities. However, even those who previously worked for a sanctioned company appear to be suspended by mistake.

This means that Russian users, in general, can suddenly find their projects wiped and accounts suspended, even if those projects have nothing to do with the sanctioned entities.

Read more of this story at Slashdot.

‘Biggest Change Ever’ to Go Brings Generics, Native Fuzzing, and a Performance Boost

“Supporting generics has been Go’s most often requested feature, and we’re proud to deliver the generic support that the majority of users need today,” the Go blog announced this week. *

It’s part of what Go’s development team is calling the “biggest change ever to the language”.

SiliconANGLE writes that “Right out of the gate, Go 1.18 is getting a CPU speed performance boost of up to 20% for Apple M1, ARM64 and PowerPC64 chips. This is all from an expansion of Go 1.17’s calling conventions for the application binary interface on these processor architectures.”

And Go 1.18 also introduces native support for fuzz testing — the first major programming language to do so, writes ZDNet:

As Google explains, fuzz testing or ‘fuzzing’ is a means of testing the vulnerability of a piece of software by throwing arbitrary or invalid data at it to expose bugs and unknown errors. This adds an additional layer of security to Go’s code that will keep it protected as its functionality evolves — crucial as attacks on software continue to escalate both in frequency and complexity. “At Google we are committed to securing the online infrastructure and applications the world depends upon,” said Eric Brewer, VIP infrastructure at Google….
While other languages support fuzzing, Go is the first major programming language to incorporate it into its core toolchain, meaning — unlike other languages — third-party support integrations aren’t required.

Google is emphasizing Go’s security features — and its widespread adoption. ZDNet writes:

Google created Go in 2007 and was designed specifically to help software engineers build secure, open-source enterprise applications for modern, multi-core computing systems. More than three-quarters of Cloud Native Computing Foundation projects, including Kubernetes and Istio, are written in Go, says Google. [Also Docker and Etc.] According to data from Stack Overflow, some 10% of developers are writing in Go worldwide, and there are signs that more recruiters are seeking out Go coders in their search for tech talent….. “Although we have a dedicated Go team at Google, we welcome a significant amount of contributions from our community. It’s a shared effort, and with their updates we’re helping our community achieve Go’s long-term vision.
Or, as the Go blog says:

We want to thank every Go user who filed a bug, sent in a change, wrote a tutorial, or helped in any way to make Go 1.18 a reality. We couldn’t do it without you. Thank you.

Enjoy Go 1.18!

* Supporting generics “includes major — but fully backward-compatible — changes to the language,” explains the release notes. Although it adds a few cautionary notes:

These new language changes required a large amount of new code that has not had significant testing in production settings. That will only happen as more people write and use generic code. We believe that this feature is well implemented and high quality. However, unlike most aspects of Go, we can’t back up that belief with real world experience. Therefore, while we encourage the use of generics where it makes sense, please use appropriate caution when deploying generic code in production.

While we believe that the new language features are well designed and clearly specified, it is possible that we have made mistakes…. it is possible that there will be code using generics that will work with the 1.18 release but break in later releases. We do not plan or expect to make any such change. However, breaking 1.18 programs in future releases may become necessary for reasons that we cannot today foresee. We will minimize any such breakage as much as possible, but we can’t guarantee that the breakage will be zero.

Read more of this story at Slashdot.

Researchers Release ‘PolyCoder’, the First Open-Source Code-Generating AI Model

“Code generation AI — AI systems that can write in different programming languages given a prompt — promise to cut development costs while allowing coders to focus on creative, less repetitive tasks,” writes VentureBeat.

“But while research labs like OpenAI and Alphabet-backed DeepMind have developed powerful code-generating AI, many of the most capable systems aren’t available in open source.”

For example, the training data for OpenAI’s Codex, which powers GitHub’s Copilot feature, hasn’t been made publicly available, preventing researchers from fine-tuning the AI model or studying aspects of it such as interpretability.

To remedy this, researchers at Carnegie Mellon University — Frank Xu, Uri Alon, Graham Neubig, and Vincent Hellendoorn — developed PolyCoder, a model based on OpenAI’s GPT-2 language model that was trained on a database of 249 gigabytes of code across 12 programming languages. While PolyCoder doesn’t match the performance of top code generators in every task, the researchers claim that PolyCoder is able to write in C with greater accuracy than all known models, including Codex….

“Large tech companies aren’t publicly releasing their models, which is really holding back scientific research and democratization of such large language models of code,” the researchers said. “To some extent, we hope that our open-sourcing efforts will convince others to do the same. But the bigger picture is that the community should be able to train these models themselves. Our model pushed the limit of what you can train on a single server — anything bigger requires a cluster of servers, which dramatically increases the cost.”

Read more of this story at Slashdot.