Apple Will Revamp Siri To Catch Up To Its Chatbot Competitors

An anonymous reader quotes a report from the New York Times: Apple’s top software executives decided early last year that Siri, the company’s virtual assistant, needed a brain transplant. The decision came after the executives Craig Federighi and John Giannandrea spent weeks testing OpenAI’s new chatbot, ChatGPT. The product’s use of generative artificial intelligence, which can write poetry, create computer code and answer complex questions, made Siri look antiquated, said two people familiar with the company’s work, who didn’t have permission to speak publicly. Introduced in 2011 as the original virtual assistant in every iPhone, Siri had been limited for years to individual requests and had never been able to follow a conversation. It often misunderstood questions. ChatGPT, on the other hand, knew that if someone asked for the weather in San Francisco and then said, “What about New York?” that user wanted another forecast.

The realization that new technology had leapfrogged Siri set in motion the tech giant’s most significant reorganization in more than a decade. Determined to catch up in the tech industry’s A.I. race, Apple has made generative A.I. a tent pole project — the company’s special, internal label that it uses to organize employees around once-in-a-decade initiatives. Apple is expected to show off its A.I. work at its annual developers conference on June 10 when it releases an improved Siri that is more conversational and versatile, according to three people familiar with the company’s work, who didn’t have permission to speak publicly. Siri’s underlying technology will include a new generative A.I. system that will allow it to chat rather than respond to questions one at a time. The update to Siri is at the forefront of a broader effort to embrace generative A.I. across Apple’s business. The company is also increasing the memory in this year’s iPhones to support its new Siri capabilities. And it has discussed licensing complementary A.I. models that power chatbots from several companies, including Google, Cohere and OpenAI. Further reading: Apple Might Bring AI Transcription To Voice Memos and Notes

Read more of this story at Slashdot.

OpenAI Exec Says Today’s ChatGPT Will Be ‘Laughably Bad’ In 12 Months

At the 27th annual Milken Institute Global Conference on Monday, OpenAI COO Brad Lightcap said today’s ChatGPT chatbot “will be laughably bad” compared to what it’ll be capable of a year from now. “We think we’re going to move toward a world where they’re much more capable,” he added. Business Insider reports: Lightcap says large language models, which people use to help do their jobs and meet their personal goals, will soon be able to take on “more complex work.” He adds that AI will have more of a “system relationship” with users, meaning the technology will serve as a “great teammate” that can assist users on “any given problem.” “That’s going to be a different way of using software,” the OpenAI exec said on the panel regarding AI’s foreseeable capabilities.

In light of his predictions, Lightcap acknowledges that it can be tough for people to “really understand” and “internalize” what a world with robot assistants would look like. But in the next decade, the COO believes talking to an AI like you would with a friend, teammate, or project collaborator will be the new norm. “I think that’s a profound shift that we haven’t quite grasped,” he said, referring to his 10-year forecast. “We’re just scratching the surface on the full kind of set of capabilities that these systems have,” he said at the Milken Institute conference. “That’s going to surprise us.” You can watch/listen to the talk here.

Read more of this story at Slashdot.

AI-Powered ‘HorseGPT’ Fails to Predict This Year’s Kentucky Derby Winner

In 2016, an online “swarm intelligence” platform generated a correct prediction for the Kentucky Derby — naming all four top finishers, in order. (But the next year their predictions weren’t even close, with TechRepublic suggesting 2016’s race had an unusual cluster of just a few top racehorses.)

So this year Decrypt.co tried crafting their own system “that can be called up when the next Kentucky Derby draws near.
There are a variety of ways to enlist artificial intelligence in horse racing. You could process reams of data based on your own methodology, trust a third-party pre-trained model, or even build a bespoke solution from the ground up. We decided to build a GPT we named HorseGPT to crunch the numbers and make the picks for us…

We carefully curated prompts to instill HorseGPT with expertise in data science specific to horse racing: how weather affects times, the role of jockeys and riding styles, the importance of post positions, and so on. We then fed it a mix of research papers and blogs covering the theoretical aspects of wagering, and layered on practical knowledge: how to read racing forms, what the statistics mean, which factors are most predictive, expert betting strategies, and more. Finally, we gave HorseGPT a wealth of historical Kentucky Derby data, arming it with the raw information needed to put its freshly imparted skills to use.
We unleashed HorseGPT on official racing forms for this year’s Derby. We asked HorseGPT to carefully analyze each race’s form, identify the top contenders, and recommend wager types and strategies based on deep background knowledge derived from race statistics.

HorseGPT picked two horses to win — both of which failed to do so. (Sierra Leone did finish second — in a rare photo finish. But Fierceness finished… 15th.) It also recommended the same two horses if you were trying to pick the top two finishers in the correct order — a losing bet, since, again, Fierceness finished 15th.

But even worse, HorseGPT recommended betting on Just a Touch to finish in either first or second place. When the race was over, that horse finished dead last. (And when asked to pick the top three finishers in correct order, HorseGPT stuck with its choices for the top two — which finished #2 and #15 — and, again, Just a Touch, who came in last.)

When Google Gemini was asked to pick the winner by The Athletic, it first chose Catching Freedom (who finished 4th). But it then gave an entirely different answer when asked to predict the winner “with an Italian accent.”

“The winner of the Kentucky Derby will be… Just a Touch! Si, that’s-a right, the underdog! There will be much-a celebrating in the piazzas, thatta-a I guarantee!”

Again, Just a Touch came in last.

Decrypt noticed the same thing. “Interestingly enough, our HorseGPT AI agent and the other out-of-the-box chatbots seemed to agree with each other,” the site notes, “and with many experts analysts cited by the official Kentucky Derby website.”

But there was one glimmer of insight into the 20-horse race. When asked to choose the top four finishers in order, HorseGPT repeated those same losing picks — which finished #2, #15, and #20. But then it added two more underdogs for fourth place finishers, “based on their potential to outperform expectations under muddy conditions.”
One of those two horses — Domestic Product — finished in 13th place.

But the other of the two horses was Mystik Dan — who came in first.

Mystik Dan appeared in only one of the six “Top 10 Finishers” lists (created by humans) at the official Kentucky Derby site… in the #10 position.

Read more of this story at Slashdot.

Nurses Say Hospital Adoption of Half-Cooked ‘AI’ Is Reckless

An anonymous reader quotes a report from Techdirt: Last week, hundreds of nurses protested the implementation of sloppy AI into hospital systems in front of Kaiser Permanente. Their primary concern: that systems incapable of empathy are being integrated into an already dysfunctional sector without much thought toward patient care: “No computer, no AI can replace a human touch,” said Amy Grewal, a registered nurse. “It cannot hold your loved one’s hand. You cannot teach a computer how to have empathy.”

There are certainly roles automation can play in easing strain on a sector full of burnout after COVID, particularly when it comes to administrative tasks. The concern, as with other industries dominated by executives with poor judgement, is that this is being used as a justification by for-profit hospital systems to cut corners further. From a National Nurses United blog post (spotted by 404 Media): “Nurses are not against scientific or technological advancement, but we will not accept algorithms replacing the expertise, experience, holistic, and hands-on approach we bring to patient care,” they added.

Kaiser Permanente, for its part, insists it’s simply leveraging “state-of-the-art tools and technologies that support our mission of providing high-quality, affordable health care to best meet our members’ and patients’ needs.” The company claims its “Advance Alert” AI monitoring system — which algorithmically analyzes patient data every hour — has the potential to save upwards of 500 lives a year. The problem is that healthcare giants’ primary obligation no longer appears to reside with patients, but with their financial results. And, that’s even true in non-profit healthcare providers. That is seen in the form of cut corners, worse service, and an assault on already over-taxed labor via lower pay and higher workload (curiously, it never seems to impact outsized high-level executive compensation).

Read more of this story at Slashdot.

GPT-4 Can Exploit Real Vulnerabilities By Reading Security Advisories

Long-time Slashdot reader tippen shared this report from the Register:

AI agents, which combine large language models with automation software, can successfully exploit real world security vulnerabilities by reading security advisories, academics have claimed.

In a newly released paper, four University of Illinois Urbana-Champaign (UIUC) computer scientists — Richard Fang, Rohan Bindu, Akul Gupta, and Daniel Kang — report that OpenAI’s GPT-4 large language model (LLM) can autonomously exploit vulnerabilities in real-world systems if given a CVE advisory describing the flaw. “To show this, we collected a dataset of 15 one-day vulnerabilities that include ones categorized as critical severity in the CVE description,” the US-based authors explain in their paper. “When given the CVE description, GPT-4 is capable of exploiting 87 percent of these vulnerabilities compared to 0 percent for every other model we test (GPT-3.5, open-source LLMs) and open-source vulnerability scanners (ZAP and Metasploit)….”

The researchers’ work builds upon prior findings that LLMs can be used to automate attacks on websites in a sandboxed environment. GPT-4, said Daniel Kang, assistant professor at UIUC, in an email to The Register, “can actually autonomously carry out the steps to perform certain exploits that open-source vulnerability scanners cannot find (at the time of writing).”
The researchers wrote that “Our vulnerabilities span website vulnerabilities, container vulnerabilities, and vulnerable Python packages. Over half are categorized as ‘high’ or ‘critical’ severity by the CVE description….”

“Kang and his colleagues computed the cost to conduct a successful LLM agent attack and came up with a figure of $8.80 per exploit”

Read more of this story at Slashdot.

State Tax Officials Are Using AI To Go After Wealthy Payers

State tax collectors, particularly in New York, have intensified their audit efforts on high earners, leveraging artificial intelligence to compensate for a reduced number of auditors. CNBC reports: In New York, the tax department reported 771,000 audits in 2022 (the latest year available), up 56% from the previous year, according to the state Department of Taxation and Finance. At the same time, the number of auditors in New York declined by 5% to under 200 due to tight budgets. So how is New York auditing more people with fewer auditors? Artificial Intelligence.

“States are getting very sophisticated using AI to determine the best audit candidates,” said Mark Klein, partner and chairman emeritus at Hodgson Russ LLP. “And guess what? When you’re looking for revenue, it’s not going to be the person making $10,000 a year. It’s going to be the person making $10 million.” Klein said the state is sending out hundreds of thousands of AI-generated letters looking for revenue. “It’s like a fishing expedition,” he said.

Most of the letters and calls focused on two main areas: a change in tax residency and remote work. During Covid many of the wealthy moved from high-tax states like California, New York, New Jersey and Connecticut to low-tax states like Florida or Texas. High earners who moved, and took their tax dollars with them, are now being challenged by states who claim the moves weren’t permanent or legitimate. Klein said state tax auditors and AI programs are examining cellphone records to see where the taxpayers spent most of their time and lived most of their lives. “New York is being very aggressive,” he said.

Read more of this story at Slashdot.

‘Crescendo’ Method Can Jailbreak LLMs Using Seemingly Benign Prompts

spatwei shares a report from SC Magazine: Microsoft has discovered a new method to jailbreak large language model (LLM) artificial intelligence (AI) tools and shared its ongoing efforts to improve LLM safety and security in a blog post Thursday. Microsoft first revealed the “Crescendo” LLM jailbreak method in a paper published April 2, which describes how an attacker could send a series of seemingly benign prompts to gradually lead a chatbot, such as OpenAI’s ChatGPT, Google’s Gemini, Meta’s LlaMA or Anthropic’s Claude, to produce an output that would normally be filtered and refused by the LLM model. For example, rather than asking the chatbot how to make a Molotov cocktail, the attacker could first ask about the history of Molotov cocktails and then, referencing the LLM’s previous outputs, follow up with questions about how they were made in the past.

The Microsoft researchers reported that a successful attack could usually be completed in a chain of fewer than 10 interaction turns and some versions of the attack had a 100% success rate against the tested models. For example, when the attack is automated using a method the researchers called “Crescendomation,” which leverages another LLM to generate and refine the jailbreak prompts, it achieved a 100% success convincing GPT 3.5, GPT-4, Gemini-Pro and LLaMA-2 70b to produce election-related misinformation and profanity-laced rants. Microsoft reported the Crescendo jailbreak vulnerabilities to the affected LLM providers and explained in its blog post last week how it has improved its LLM defenses against Crescendo and other attacks using new tools including its “AI Watchdog” and “AI Spotlight” features.

Read more of this story at Slashdot.