GPT-4 Can Exploit Real Vulnerabilities By Reading Security Advisories

Long-time Slashdot reader tippen shared this report from the Register:

AI agents, which combine large language models with automation software, can successfully exploit real world security vulnerabilities by reading security advisories, academics have claimed.

In a newly released paper, four University of Illinois Urbana-Champaign (UIUC) computer scientists — Richard Fang, Rohan Bindu, Akul Gupta, and Daniel Kang — report that OpenAI’s GPT-4 large language model (LLM) can autonomously exploit vulnerabilities in real-world systems if given a CVE advisory describing the flaw. “To show this, we collected a dataset of 15 one-day vulnerabilities that include ones categorized as critical severity in the CVE description,” the US-based authors explain in their paper. “When given the CVE description, GPT-4 is capable of exploiting 87 percent of these vulnerabilities compared to 0 percent for every other model we test (GPT-3.5, open-source LLMs) and open-source vulnerability scanners (ZAP and Metasploit)….”

The researchers’ work builds upon prior findings that LLMs can be used to automate attacks on websites in a sandboxed environment. GPT-4, said Daniel Kang, assistant professor at UIUC, in an email to The Register, “can actually autonomously carry out the steps to perform certain exploits that open-source vulnerability scanners cannot find (at the time of writing).”
The researchers wrote that “Our vulnerabilities span website vulnerabilities, container vulnerabilities, and vulnerable Python packages. Over half are categorized as ‘high’ or ‘critical’ severity by the CVE description….”

“Kang and his colleagues computed the cost to conduct a successful LLM agent attack and came up with a figure of $8.80 per exploit”

Read more of this story at Slashdot.

State Tax Officials Are Using AI To Go After Wealthy Payers

State tax collectors, particularly in New York, have intensified their audit efforts on high earners, leveraging artificial intelligence to compensate for a reduced number of auditors. CNBC reports: In New York, the tax department reported 771,000 audits in 2022 (the latest year available), up 56% from the previous year, according to the state Department of Taxation and Finance. At the same time, the number of auditors in New York declined by 5% to under 200 due to tight budgets. So how is New York auditing more people with fewer auditors? Artificial Intelligence.

“States are getting very sophisticated using AI to determine the best audit candidates,” said Mark Klein, partner and chairman emeritus at Hodgson Russ LLP. “And guess what? When you’re looking for revenue, it’s not going to be the person making $10,000 a year. It’s going to be the person making $10 million.” Klein said the state is sending out hundreds of thousands of AI-generated letters looking for revenue. “It’s like a fishing expedition,” he said.

Most of the letters and calls focused on two main areas: a change in tax residency and remote work. During Covid many of the wealthy moved from high-tax states like California, New York, New Jersey and Connecticut to low-tax states like Florida or Texas. High earners who moved, and took their tax dollars with them, are now being challenged by states who claim the moves weren’t permanent or legitimate. Klein said state tax auditors and AI programs are examining cellphone records to see where the taxpayers spent most of their time and lived most of their lives. “New York is being very aggressive,” he said.

Read more of this story at Slashdot.

‘Crescendo’ Method Can Jailbreak LLMs Using Seemingly Benign Prompts

spatwei shares a report from SC Magazine: Microsoft has discovered a new method to jailbreak large language model (LLM) artificial intelligence (AI) tools and shared its ongoing efforts to improve LLM safety and security in a blog post Thursday. Microsoft first revealed the “Crescendo” LLM jailbreak method in a paper published April 2, which describes how an attacker could send a series of seemingly benign prompts to gradually lead a chatbot, such as OpenAI’s ChatGPT, Google’s Gemini, Meta’s LlaMA or Anthropic’s Claude, to produce an output that would normally be filtered and refused by the LLM model. For example, rather than asking the chatbot how to make a Molotov cocktail, the attacker could first ask about the history of Molotov cocktails and then, referencing the LLM’s previous outputs, follow up with questions about how they were made in the past.

The Microsoft researchers reported that a successful attack could usually be completed in a chain of fewer than 10 interaction turns and some versions of the attack had a 100% success rate against the tested models. For example, when the attack is automated using a method the researchers called “Crescendomation,” which leverages another LLM to generate and refine the jailbreak prompts, it achieved a 100% success convincing GPT 3.5, GPT-4, Gemini-Pro and LLaMA-2 70b to produce election-related misinformation and profanity-laced rants. Microsoft reported the Crescendo jailbreak vulnerabilities to the affected LLM providers and explained in its blog post last week how it has improved its LLM defenses against Crescendo and other attacks using new tools including its “AI Watchdog” and “AI Spotlight” features.

Read more of this story at Slashdot.

Adobe Premiere Pro Is Getting Generative AI Video Tools

Adobe is using its Firefly machine learning model to bring generative AI video tools to Premiere Pro. “These new Firefly tools — alongside some proposed third-party integrations with Runway, Pika Labs, and OpenAI’s Sora models — will allow Premiere Pro users to generate video and add or remove objects using text prompts (just like Photoshop’s Generative Fill feature) and extend the length of video clips,” reports The Verge. From the report: Unlike many of Adobe’s previous Firefly-related announcements, no release date — beta or otherwise — has been established for the company’s new video generation tools, only that they’ll roll out “this year.” And while the creative software giant showcased what its own video model is currently capable of in an early video demo, its plans to integrate Premiere Pro with AI models from other providers isn’t a certainty. Adobe instead calls the third-party AI integrations in its video preview an “early exploration” of what these may look like “in the future.” The idea is to provide Premiere Pro users with more choice, according to Adobe, allowing them to use models like Pika to extend shots or Sora or Runway AI when generating B-roll for their projects. Adobe also says its Content Credentials labels can be applied to these generated clips to identify which AI models have been used to generate them.

Read more of this story at Slashdot.

UK To Deploy Facial Recognition For Shoplifting Crackdown

Bruce66423 shares a report from The Guardian, with the caption: “The UK is hyperventilating about stories of shoplifting; though standing outside a shop and watching as a guy calmly gets off his bike, parks it, walks in and walks out with a pack of beer and cycles off — and then seeing staff members rushing out — was striking. So now it’s throwing technical solutions at the problem…” From the report: The government is investing more than 55 million pounds in expanding facial recognition systems — including vans that will scan crowded high streets — as part of a renewed crackdown on shoplifting. The scheme was announced alongside plans for tougher punishments for serial or abusive shoplifters in England and Wales, including being forced to wear a tag to ensure they do not revisit the scene of their crime, under a new standalone criminal offense of assaulting a retail worker.

The new law, under which perpetrators could be sent to prison for up to six months and receive unlimited fines, will be introduced via an amendment to the criminal justice bill that is working its way through parliament. The change could happen as early as the summer. The government said it would invest 55.5 million pounds over the next four years. The plan includes 4 million pounds for mobile units that can be deployed on high streets using live facial recognition in crowded areas to identify people wanted by the police — including repeat shoplifters. “This Orwellian tech has no place in Britain,” said Silkie Carlo, director of civil liberties at campaign group Big Brother Watch. “Criminals should be brought to justice, but papering over the cracks of broken policing with Orwellian tech is not the solution. It is completely absurd to inflict mass surveillance on the general public under the premise of fighting theft while police are failing to even turn up to 40% of violent shoplifting incidents or to properly investigate many more serious crimes.”

Read more of this story at Slashdot.

Texas Will Use Computers To Grade Written Answers On This Year’s STAAR Tests

Keaton Peters reports via the Texas Tribune: Students sitting for their STAAR exams this week will be part of a new method of evaluating Texas schools: Their written answers on the state’s standardized tests will be graded automatically by computers. The Texas Education Agency is rolling out an “automated scoring engine” for open-ended questions on the State of Texas Assessment of Academic Readiness for reading, writing, science and social studies. The technology, which uses natural language processing technology like artificial intelligence chatbots such as GPT-4, will save the state agency about $15-20 million per year that it would otherwise have spent on hiring human scorers through a third-party contractor.

The change comes after the STAAR test, which measures students’ understanding of state-mandated core curriculum, was redesigned in 2023. The test now includes fewer multiple choice questions and more open-ended questions — known as constructed response items. After the redesign, there are six to seven times more constructed response items. “We wanted to keep as many constructed open ended responses as we can, but they take an incredible amount of time to score,” said Jose Rios, director of student assessment at the Texas Education Agency. In 2023, Rios said TEA hired about 6,000 temporary scorers, but this year, it will need under 2,000.

To develop the scoring system, the TEA gathered 3,000 responses that went through two rounds of human scoring. From this field sample, the automated scoring engine learns the characteristics of responses, and it is programmed to assign the same scores a human would have given. This spring, as students complete their tests, the computer will first grade all the constructed responses. Then, a quarter of the responses will be rescored by humans. When the computer has “low confidence” in the score it assigned, those responses will be automatically reassigned to a human. The same thing will happen when the computer encounters a type of response that its programming does not recognize, such as one using lots of slang or words in a language other than English. “In addition to ‘low confidence’ scores and responses that do not fit in the computer’s programming, a random sample of responses will also be automatically handed off to humans to check the computer’s work,” notes Peters. While similar to ChatGPT, TEA officials have resisted the suggestion that the scoring engine is artificial intelligence. They note that the process doesn’t “learn” from the responses and always defers to its original programming set up by the state.

Read more of this story at Slashdot.

Scientists Turn To AI To Make Beer Taste Even Better

Researchers say they have used AI to make brews even better. From a report: Prof Kevin Verstrepen, of KU Leuven university, who led the research, said AI could help tease apart the complex relationships involved in human aroma perception. “Beer — like most food products — contains hundreds of different aroma molecules that get picked up by our tongue and nose, and our brain then integrates these into one picture. However, the compounds interact with each other, so how we perceive one depends also on the concentrations of the others,” he said.

Writing in the journal Nature Communications, Verstrepen and his colleagues report how they analysed the chemical makeup of 250 commercial Belgian beers of 22 different styles including lagers, fruit beers, blonds, West Flanders ales, and non-alcoholic beers. Among the properties studied were alcohol content, pH, sugar concentration, and the presence and concentration of more than 200 different compounds involved in flavour — such as esters that are produced by yeasts and terpenoids from hops, both of which are involved in creating fruity notes.

A tasting panel of 16 participants sampled and scored each of the 250 beers for 50 different attributes, such as hop flavours, sweetness, and acidity — a process that took three years. The researchers also collected 180,000 reviews of different beers from the online consumer review platform RateBeer, finding that while appreciation of the brews was biased by features such as price meaning they differed from the tasting panel’s ratings, the ratings and comments relating to other features — such as bitterness, sweetness, alcohol and malt aroma — these correlated well with those from the tasting panel.

Read more of this story at Slashdot.