According to a new study published in Nature, researchers found that training AI models using AI-generated datasets can lead to “model collapse,” where models produce increasingly nonsensical outputs over generations. “In one example, a model started with a text about European architecture in the Middle Ages and ended up — in the ninth generation — spouting nonsense about jackrabbits,” writes The Register’s Lindsay Clark. From the report: [W]ork led by Ilia Shumailov, Google DeepMind and Oxford post-doctoral researcher, found that an AI may fail to pick up less common lines of text, for example, in training datasets, which means subsequent models trained on the output cannot carry forward those nuances. Training new models on the output of earlier models in this way ends up in a recursive loop. In an accompanying article, Emily Wenger, assistant professor of electrical and computer engineering at Duke University, illustrated model collapse with the example of a system tasked with generating images of dogs. “The AI model will gravitate towards recreating the breeds of dog most common in its training data, so might over-represent the Golden Retriever compared with the Petit Basset Griffon Vendéen, given the relative prevalence of the two breeds,” she said.
“If subsequent models are trained on an AI-generated data set that over-represents Golden Retrievers, the problem is compounded. With enough cycles of over-represented Golden Retriever, the model will forget that obscure dog breeds such as Petit Basset Griffon Vendeen exist and generate pictures of just Golden Retrievers. Eventually, the model will collapse, rendering it unable to generate meaningful content.” While she concedes an over-representation of Golden Retrievers may be no bad thing, the process of collapse is a serious problem for meaningful representative output that includes less-common ideas and ways of writing. “This is the problem at the heart of model collapse,” she said.
Read more of this story at Slashdot.