Microsoft Unveils a Large Language Model That Excels At Encoding Spreadsheets

Microsoft has quietly announced the first details of its new “SpreadsheetLLM,” claiming it has the “potential to transform spreadsheet data management and analysis, paving the way for more intelligent and efficient user interactions.” You can read more details about the model in a pre-print paper available here. Jasper Hamill reports via The Stack: One of the problems with using LLMs in spreadsheets is that they get bogged down by too many tokens (basic units of information the model processes). To tackle this, Microsoft developed SheetCompressor, an “innovative encoding framework that compresses spreadsheets effectively for LLMs.” “It significantly improves performance in spreadsheet table detection tasks, outperforming the vanilla approach by 25.6% in GPT4’s in-context learning setting,” Microsoft added. The model is made of three modules: structural-anchor-based compression, inverse index translation, and data-format-aware aggregation.

The first of these modules involves placing “structural anchors” throughout the spreadsheet to help the LLM understand what’s going on better. It then removes “distant, homogeneous rows and columns” to produce a condensed “skeleton” version of the table. Index translation addresses the challenge caused by spreadsheets with numerous empty cells and repetitive values, which use up too many tokens. “To improve efficiency, we depart from traditional row-by-row and column-by-column serialization and employ a lossless inverted index translation in JSON format,” Microsoft wrote. “This method creates a dictionary that indexes non-empty cell texts and merges addresses with identical text, optimizing token usage while preserving data integrity.” […]

After conducting a “comprehensive evaluation of our method on a variety of LLMs” Microsoft found that SheetCompressor significantly reduces token usage for spreadsheet encoding by 96%. Moreover, SpreadsheetLLM shows “exceptional performance in spreadsheet table detection,” which is the “foundational task of spreadsheet understanding.” The new LLM builds on the Chain of Thought methodology to introduce a framework called “Chain of Spreadsheet” (CoS), which can “decompose” spreadsheet reasoning into a table detection-match-reasoning pipeline.

How Will AI Transform the Future of Work?

An anonymous reader shared this report from the Guardian:

In March, after analysing 22,000 tasks in the UK economy, covering every type of job, a model created by the Institute for Public Policy Research predicted that 59% of tasks currently done by humans — particularly women and young people — could be affected by AI in the next three to five years. In the worst-case scenario, this would trigger a “jobs apocalypse” where eight million people lose their jobs in the UK alone…. Darrell West, author of The Future of Work: AI, Robots and Automation, says that just as policy innovations were needed in Thomas Paine’s time to help people transition from an agrarian to an industrial economy, they are needed today, as we transition to an AI economy. “There’s a risk that AI is going to take a lot of jobs,” he says. “A basic income could help navigate that situation.”

AI’s impact will be far-reaching, he predicts, affecting blue- and white-collar jobs. “It’s not just going to be entry-level people who are affected. And so we need to think about what this means for the economy, what it means for society as a whole. What are people going to do if robots and AI take a lot of the jobs?”

Nell Watson, a futurist who focuses on AI ethics, has a more pessimistic view. She believes we are witnessing the dawn of an age of “AI companies”: corporate environments where very few — if any — humans are employed at all. Instead, at these companies, lots of different AI sub-personalities will work independently on different tasks, occasionally hiring humans for “bits and pieces of work”. These AI companies have the potential to be “enormously more efficient than human businesses”, driving almost everyone else out of business, “apart from a small selection of traditional old businesses that somehow stick in there because their traditional methods are appreciated”… As a result, she thinks it could be AI companies, not governments, that end up paying people a basic income.

AI companies, meanwhile, will have no salaries to pay. “Because there are no human beings in the loop, the profits and dividends of this company could be given to the needy. This could be a way of generating support income in a way that doesn’t need the state welfare. It’s fully compatible with capitalism. It’s just that the AI is doing it.”

OpenAI Working On New Reasoning Technology Under Code Name ‘Strawberry’

OpenAI is close to a breakthrough with a new project called “Strawberry,” which aims to enhance its AI models with advanced reasoning abilities. Reuters reports: Teams inside OpenAI are working on Strawberry, according to a copy of a recent internal OpenAI document seen by Reuters in May. Reuters could not ascertain the precise date of the document, which details a plan for how OpenAI intends to use Strawberry to perform research. The source described the plan to Reuters as a work in progress. The news agency could not establish how close Strawberry is to being publicly available. How Strawberry works is a tightly kept secret even within OpenAI, the person said.

The document describes a project that uses Strawberry models with the aim of enabling the company’s AI to not just generate answers to queries but to plan ahead enough to navigate the internet autonomously and reliably to perform what OpenAI terms “deep research,” according to the source. This is something that has eluded AI models to date, according to interviews with more than a dozen AI researchers. Asked about Strawberry and the details reported in this story, an OpenAI company spokesperson said in a statement: “We want our AI models to see and understand the world more like we do. Continuous research into new AI capabilities is a common practice in the industry, with a shared belief that these systems will improve in reasoning over time.”

On Tuesday at an internal all-hands meeting, OpenAI showed a demo of a research project that it claimed had new human-like reasoning skills, according to Bloomberg, opens new tab. An OpenAI spokesperson confirmed the meeting but declined to give details of the contents. Reuters could not determine if the project demonstrated was Strawberry. OpenAI hopes the innovation will improve its AI models’ reasoning capabilities dramatically, the person familiar with it said, adding that Strawberry involves a specialized way of processing an AI model after it has been pre-trained on very large datasets. Researchers Reuters interviewed say that reasoning is key to AI achieving human or super-human-level intelligence.

‘How Good Is ChatGPT at Coding, Really?’

IEEE Spectrum (the IEEE’s official publication) asks the question. “How does an AI code generator compare to a human programmer?”

A study published in the June issue of IEEE Transactions on Software Engineering evaluated the code produced by OpenAI’s ChatGPT in terms of functionality, complexity and security. The results show that ChatGPT has an extremely broad range of success when it comes to producing functional code — with a success rate ranging from anywhere as poor as 0.66 percent and as good as 89 percent — depending on the difficulty of the task, the programming language, and a number of other factors. While in some cases the AI generator could produce better code than humans, the analysis also reveals some security concerns with AI-generated code.

The study tested GPT-3.5 on 728 coding problems from the LeetCode testing platform — and in five programming languages: C, C++, Java, JavaScript, and Python. The results?

Overall, ChatGPT was fairly good at solving problems in the different coding languages — but especially when attempting to solve coding problems that existed on LeetCode before 2021. For instance, it was able to produce functional code for easy, medium, and hard problems with success rates of about 89, 71, and 40 percent, respectively. “However, when it comes to the algorithm problems after 2021, ChatGPT’s ability to generate functionally correct code is affected. It sometimes fails to understand the meaning of questions, even for easy level problems,” said Yutian Tang, a lecturer at the University of Glasgow. For example, ChatGPT’s ability to produce functional code for “easy” coding problems dropped from 89 percent to 52 percent after 2021. And its ability to generate functional code for “hard” problems dropped from 40 percent to 0.66 percent after this time as well…

The researchers also explored the ability of ChatGPT to fix its own coding errors after receiving feedback from LeetCode. They randomly selected 50 coding scenarios where ChatGPT initially generated incorrect coding, either because it didn’t understand the content or problem at hand. While ChatGPT was good at fixing compiling errors, it generally was not good at correcting its own mistakes… The researchers also found that ChatGPT-generated code did have a fair amount of vulnerabilities, such as a missing null test, but many of these were easily fixable.

“Interestingly, ChatGPT is able to generate code with smaller runtime and memory overheads than at least 50 percent of human solutions to the same LeetCode problems…”

Brazil Data Regulator Bans Meta From Mining Data To Train AI Models

Brazil’s national data protection authority ruled on Tuesday that Meta must stop using data originating in the country to train its artificial intelligence models. The Associated Press reports: Meta’s updated privacy policy enables the company to feed people’s public posts into its AI systems. That practice will not be permitted in Brazil, however. The decision stems from “the imminent risk of serious and irreparable or difficult-to-repair damage to the fundamental rights of the affected data subjects,” the agency said in the nation’s official gazette. […] Hye Jung Han, a Brazil-based researcher for the rights group, said in an email Tuesday that the regulator’s action “helps to protect children from worrying that their personal data, shared with friends and family on Meta’s platforms, might be used to inflict harm back on them in ways that are impossible to anticipate or guard against.”

But the decision regarding Meta will “very likely” encourage other companies to refrain from being transparent in the use of data in the future, said Ronaldo Lemos, of the Institute of Technology and Society of Rio de Janeiro, a think-tank. “Meta was severely punished for being the only one among the Big Tech companies to clearly and in advance notify in its privacy policy that it would use data from its platforms to train artificial intelligence,” he said. Compliance must be demonstrated by the company within five working days from the notification of the decision, and the agency established a daily fine of 50,000 reais ($8,820) for failure to do so. In a statement, Meta said the company is “disappointed” by the decision and insists its method “complies with privacy laws and regulations in Brazil.”

“This is a step backwards for innovation, competition in AI development and further delays bringing the benefits of AI to people in Brazil,” a spokesperson for the company added.

AI Researcher Warns Data Science Could Face a Reproducibility Crisis

Long-time Slashdot reader theodp shared this warning from a long-time AI researcher arguing that data science “is due” for a reckoning over whether results can be reproduced. “Few technological revolutions came with such a low barrier of entry as Machine Learning…”
Unlike Machine Learning, Data Science is not an academic discipline, with its own set of algorithms and methods… There is an immense diversity, but also disparities in skill, expertise, and knowledge among Data Scientists… In practice, depending on their backgrounds, data scientists may have large knowledge gaps in computer science, software engineering, theory of computation, and even statistics in the context of machine learning, despite those topics being fundamental to any ML project. But it’s ok, because you can just call the API, and Python is easy to learn. Right…?

Building products using Machine Learning and data is still difficult. The tooling infrastructure is still very immature and the non-standard combination of data and software creates unforeseen challenges for engineering teams. But in my views, a lot of the failures come from this explosive cocktail of ritualistic Machine Learning:

– Weak software engineering knowledge and practices compounded by the tools themselves;
– Knowledge gap in mathematical, statistical, and computational methods, encouraged black boxing API;
– Ill-defined range of competence for the role of data scientist, reinforced by a pool of candidates with an unusually wide range of backgrounds;
– A tendency to follow the hype rather than the science. –

What can you do?

– Hold your data scientists accountable using Science.
– At a minimum, any AI/ML project should include an Exploratory Data Analysis, whose results directly support the design choices for feature engineering and model selection.
– Data scientists should be encouraged to think outside-of-the box of ML, which is a very small box
– Data scientists should be trained to use eXplainable AI methods to provide context about the algorithm’s performance beyond the traditional performance metrics like accuracy, FPR, or FNR.
– Data scientists should be held at similar standards than other software engineering specialties, with code review, code documentation, and architectural designs.

The article concludes, “Until such practices are established as the norm, I’ll remain skeptical of Data Science.”

GPT-4 Has Passed the Turing Test, Researchers Claim

Drew Turney reports via Live Science: The “Turing test,” first proposed as “the imitation game” by computer scientist Alan Turing in 1950, judges whether a machine’s ability to show intelligence is indistinguishable from a human. For a machine to pass the Turing test, it must be able to talk to somebody and fool them into thinking it is human. Scientists decided to replicate this test by asking 500 people to speak with four respondents, including a human and the 1960s-era AI program ELIZA as well as both GPT-3.5 and GPT-4, the AI that powers ChatGPT. The conversations lasted five minutes — after which participants had to say whether they believed they were talking to a human or an AI. In the study, published May 9 to the pre-print arXiv server, the scientists found that participants judged GPT-4 to be human 54% of the time.

ELIZA, a system pre-programmed with responses but with no large language model (LLM) or neural network architecture, was judged to be human just 22% of the time. GPT-3.5 scored 50% while the human participant scored 67%. “Machines can confabulate, mashing together plausible ex-post-facto justifications for things, as humans do,” Nell Watson, an AI researcher at the Institute of Electrical and Electronics Engineers (IEEE), told Live Science. “They can be subject to cognitive biases, bamboozled and manipulated, and are becoming increasingly deceptive. All these elements mean human-like foibles and quirks are being expressed in AI systems, which makes them more human-like than previous approaches that had little more than a list of canned responses.” Further reading: 1960s Chatbot ELIZA Beat OpenAI’s GPT-3.5 In a Recent Turing Test Study

