The Data That Powers AI Is Disappearing Fast

An anonymous reader quotes a report from the New York Times: For years, the people building powerful artificial intelligence systems have used enormous troves of text, images and videos pulled from the internet to train their models. Now, that data is drying up. Over the past year, many of the most important web sources used for training A.I. models have restricted the use of their data, according to a study published this week by the Data Provenance Initiative, an M.I.T.-led research group. The study, which looked at 14,000 web domains that are included in three commonly used A.I. training data sets, discovered an “emerging crisis in consent,” as publishers and online platforms have taken steps to prevent their data from being harvested.

The researchers estimate that in the three data sets — called C4, RefinedWeb and Dolma — 5 percent of all data, and 25 percent of data from the highest-quality sources, has been restricted. Those restrictions are set up through the Robots Exclusion Protocol, a decades-old method for website owners to prevent automated bots from crawling their pages using a file called robots.txt. The study also found that as much as 45 percent of the data in one set, C4, had been restricted by websites’ terms of service. “We’re seeing a rapid decline in consent to use data across the web that will have ramifications not just for A.I. companies, but for researchers, academics and noncommercial entities,” said Shayne Longpre, the study’s lead author, in an interview.

Read more of this story at Slashdot.

Cloudflare Reports Almost 7% of Internet Traffic Is Malicious

In its latest State of Application Security Report, Cloudflare says 6.8% of traffic on the internet is malicious, “up a percentage point from last year’s study,” writes ZDNet’s Steven Vaughan-Nichols. “Cloudflare, the content delivery network and security services company, thinks the rise is due to wars and elections. For example, many attacks against Western-interest websites are coming from pro-Russian hacktivist groups such as REvil, KillNet, and Anonymous Sudan.” From the report: […] Distributed Denial of Service (DDoS) attacks continue to be cybercriminals’ weapon of choice, making up over 37% of all mitigated traffic. The scale of these attacks is staggering. In the first quarter of 2024 alone, Cloudflare blocked 4.5 million unique DDoS attacks. That total is nearly a third of all the DDoS attacks they mitigated the previous year. But it’s not just about the sheer volume of DDoS attacks. The sophistication of these attacks is increasing, too. Last August, Cloudflare mitigated a massive HTTP/2 Rapid Reset DDoS attack that peaked at 201 million requests per second (RPS). That number is three times bigger than any previously observed attack.

The report also highlights the increased importance of application programming interface (API) security. With 60% of dynamic web traffic now API-related, these interfaces are a prime target for attackers. API traffic is growing twice as fast as traditional web traffic. What’s worrying is that many organizations appear not to be even aware of a quarter of their API endpoints. Organizations that don’t have a tight grip on their internet services or website APIs can’t possibly protect themselves from attackers. Evidence suggests the average enterprise application now uses 47 third-party scripts and connects to nearly 50 third-party destinations. Do you know and trust these scripts and connections? You should — each script of connection is a potential security risk. For instance, the recent Polyfill.io JavaScript incident affected over 380,000 sites.

Finally, about 38% of all HTTP requests processed by Cloudflare are classified as automated bot traffic. Some bots are good and perform a needed service, such as customer service chatbots, or are authorized search engine crawlers. However, as many as 93% of bots are potentially bad.

Read more of this story at Slashdot.

Substack Rival Ghost Federates Its First Newsletter

After teasing support for the fediverse earlier this year, the newsletter platform and Substack rival Ghost has finally delivered. “Over the past few days, Ghost says it has achieved two major milestones in its move to become a federated service,” reports TechCrunch. “Of note, it has federated its own newsletter, making it the first federated Ghost instance on the internet.” From the report: Users can follow the newsletter through their preferred federated app at @index@activitypub.ghost.org, though the company warns there will be bugs and issues as it continues to work on the platform’s integration with ActivityPub, the protocol that powers Mastodon and other federated apps. “Having multiple Ghost instances in production successfully running ActivityPub is a huge milestone for us because it means that for the first time, we’re interacting with the wider fediverse. Not just theoretical local implementations and tests, but the real world wide social web,” the company shared in its announcement of the news.

In addition, Ghost’s ActivityPub GitHub repository is now fully open source. That means those interested in tracking Ghost’s progress toward federation can follow its code changes in real time, and anyone else can learn from, modify, distribute or contribute to its work. Developers who want to collaborate with Ghost are also being invited to get involved following this move. By offering a federated version of the newsletter, readers will have more choices on how they want to subscribe. That is, instead of only being able to follow the newsletter via email or the web, they also can track it using RSS or ActivityPub-powered apps, like Mastodon and others. Ghost said it will also develop a way for sites with paid subscribers to manage access via ActivityPub, but that functionality hasn’t yet rolled out with this initial test.

Read more of this story at Slashdot.

MTV News Website Goes Dark, Archives Pulled Offline

MTVNews.com has been shut down, with more than two decades’ worth of content no longer available. “Content on its sister site, CMT.com, seems to have met a similar fate,” adds Variety. From the report: In 2023, MTV News was shuttered amid the financial woes of parent company Paramount Global. As of Monday, trying to access MTV News articles on mtvnews.com or mtv.com/news resulted in visitors being redirected to the main MTV website.

The now-unavailable content includes decades of music journalism comprising thousands of articles and interviews with countless major artists, dating back to the site’s launch in 1996. Perhaps the most significant loss is MTV News’ vast hip-hop-related archives, particularly its weekly “Mixtape Monday” column, which ran for nearly a decade in the 2000s and 2010s and featured interviews, reviews and more with many artists, producers and others early in their careers. “So, mtvnews.com no longer exists. Eight years of my life are gone without a trace,” Patrick Hosken, former music editor for MTV News, wrote on X. “All because it didn’t fit some executives’ bottom lines. Infuriating is too small a word.”

“sickening (derogatory) to see the entire @mtvnews archive wiped from the internet,” Crystal Bell, culture editor at Mashable and one-time entertainment director of MTV News, posted on X.”decades of music history gone… including some very early k-pop stories.”

“This is disgraceful. They’ve completely wiped the MTV News archive,” longtime Rolling Stone senior writer Brian Hiatt commented. “Decades of pop culture history research material gone, and why?”

The report notes that some MTV News articles may be available via internet archiving services like the Wayback Machine. However, older articles aren’t available.

Read more of this story at Slashdot.

Remote Amazon Tribe Connects To Internet, Gets Addicted To Porn and Social Media

The Marubo people, an isolated Indigenous tribe in the Amazon, have gained high-speed internet access through Elon Musk’s Starlink service, drastically altering their traditional way of life. While the internet has brought significant benefits like improved communication and emergency response, it has also introduced challenges such as social media addiction, exposure to inappropriate content, and cultural erosion. The New York Times reports: After only nine months with Starlink, the Marubo are already grappling with the same challenges that have racked American households for years: teenagers glued to phones; group chats full of gossip; addictive social networks; online strangers; violent video games; scams; misinformation; and minors watching pornography. Modern society has dealt with these issues over decades as the internet continued its relentless march. The Marubo and other Indigenous tribes, who have resisted modernity for generations, are now confronting the internet’s potential and peril all at once, while debating what it will mean for their identity and culture.

The internet was an immediate sensation. “It changed the routine so much that it was detrimental,” [admitted one Marubo leader, Enoque Marubo]. “In the village, if you don’t hunt, fish and plant, you don’t eat.” Leaders realized they needed limits. The internet would be switched on for only two hours in the morning, five hours in the evening, and all day Sunday. During those windows, many Marubo are crouched over or reclined in hammocks on their phones. They spend lots of time on WhatsApp. There, leaders coordinate between villages and alert the authorities to health issues and environmental destruction. Marubo teachers share lessons with students in different villages. And everyone is in much closer contact with faraway family and friends. To Enoque, the biggest benefit has been in emergencies. A venomous snake bite can require swift rescue by helicopter. Before the internet, the Marubo used amateur radio, relaying a message between several villages to reach the authorities. The internet made such calls instantaneous. “It’s already saved lives,” he said.

In April, seven months after Starlink’s arrival, more than 200 Marubo gathered in a village for meetings. Enoque brought a projector to show a video about bringing Starlink to the villages. As proceedings began, some leaders in the back of the audience spoke up. The internet should be turned off for the meetings, they said. “I don’t want people posting in the groups, taking my words out of context,” another said. During the meetings, teenagers swiped through Kwai, a Chinese-owned social network. Young boys watched videos of the Brazilian soccer star Neymar Jr. And two 15-year-old girls said they chatted with strangers on Instagram. One said she now dreamed of traveling the world, while the other wants to be a dentist in Sao Paulo. This new window to the outside world had left many in the tribe feeling torn. “Some young people maintain our traditions,” said TamaSay Marubo, 42, the tribe’s first woman leader. “Others just want to spend the whole afternoon on their phones.”

Read more of this story at Slashdot.

How Internet Pioneers Celebrated 50 Years of the Internet

Founded in 1963, the Institute of Electrical and Electronics Engineers held a special event Sunday that they said would be “inspiring engineering for the next 50 years.”

The event featured talks on the origins of the internet from 80-year-old “father of the internet” Vint Cerf, along with John Shoch (who helped develop the Ethernet and internetwork protocols at Xerox PARC), Judith Estrin (who worked with Cerf on the TCP project), and Robert Kahn (who with Cerf first proposed the IP and TCP protocols). Ethernet co-inventor Bob Metcalfe also spoke at the end of the event.

Long-time Slashdot reader repett0 was an onsite volunteer, and shares that “it was incredible to meet and greet such a wonderful mix of people making technology happen… [T]he event celebrated many key technologies and innovators from the past 50 years and considerations of what is to come in the next 50 years.”

Video streams are available and more are coming online (including interviews with key innovators, society leadership, and more). If you could not make this event event, follow-on activities continue, including the People-Centered Internet Imagine Workshop where a mix of society is working together to consider how to improve humanity’s intersection with ever-expanding abilities thanks to technology.

They add that the event was made possible “through the collaboration of many professional computing societies” including the IEEE, People-Centered Internet, Google, Internet Society, IEEE Computer Society, GIANT Protocol, IEEE Foundation — and volunteers from the SF Bay Area ACM and Internet Society.

Read more of this story at Slashdot.

Novel Attack Against Virtually All VPN Apps Neuters Their Entire Purpose

Researchers have discovered a new attack that can force VPN applications to route traffic outside the encrypted tunnel, thereby exposing the user’s traffic to potential snooping or manipulation. This vulnerability, named TunnelVision, is found in almost all VPNs on non-Linux and non-Android systems. It’s believe that the vulnerability “may have been possible since 2002 and may already have been discovered and used in the wild since then,” reports Ars Technica. From the report: The effect of TunnelVision is “the victim’s traffic is now decloaked and being routed through the attacker directly,” a video demonstration explained. “The attacker can read, drop or modify the leaked traffic and the victim maintains their connection to both the VPN and the Internet.” The attack works by manipulating the DHCP server that allocates IP addresses to devices trying to connect to the local network. A setting known as option 121 allows the DHCP server to override default routing rules that send VPN traffic through a local IP address that initiates the encrypted tunnel. By using option 121 to route VPN traffic through the DHCP server, the attack diverts the data to the DHCP server itself. […]

The attack can most effectively be carried out by a person who has administrative control over the network the target is connecting to. In that scenario, the attacker configures the DHCP server to use option 121. It’s also possible for people who can connect to the network as an unprivileged user to perform the attack by setting up their own rogue DHCP server. The attack allows some or all traffic to be routed through the unencrypted tunnel. In either case, the VPN application will report that all data is being sent through the protected connection. Any traffic that’s diverted away from this tunnel will not be encrypted by the VPN and the Internet IP address viewable by the remote user will belong to the network the VPN user is connected to, rather than one designated by the VPN app.

Interestingly, Android is the only operating system that fully immunizes VPN apps from the attack because it doesn’t implement option 121. For all other OSes, there are no complete fixes. When apps run on Linux there’s a setting that minimizes the effects, but even then TunnelVision can be used to exploit a side channel that can be used to de-anonymize destination traffic and perform targeted denial-of-service attacks. Network firewalls can also be configured to deny inbound and outbound traffic to and from the physical interface. This remedy is problematic for two reasons: (1) a VPN user connecting to an untrusted network has no ability to control the firewall and (2) it opens the same side channel present with the Linux mitigation. The most effective fixes are to run the VPN inside of a virtual machine whose network adapter isn’t in bridged mode or to connect the VPN to the Internet through the Wi-Fi network of a cellular device. You can learn more about the research here.

Read more of this story at Slashdot.

Humans Now Share the Web Equally With Bots, Report Warns

An anonymous reader quotes a report from The Independent, published last month: Humans now share the web equally with bots, according to a major new report — as some fear that the internet is dying. In recent months, the so-called “dead internet theory” has gained new popularity. It suggests that much of the content online is in fact automatically generated, and that the number of humans on the web is dwindling in comparison with bot accounts. Now a new report from cyber security company Imperva suggests that it is increasingly becoming true. Nearly half, 49.6 per cent, of all internet traffic came from bots last year, its “Bad Bot Report” indicates. That is up 2 percent in comparison with last year, and is the highest number ever seen since the report began in 2013. In some countries, the picture is worse. In Ireland, 71 per cent of internet traffic is automated, it said.

Some of that rise is the result of the adoption of generative artificial intelligence and large language models. Companies that build those systems use bots scrape the internet and gather data that can then be used to train them. Some of those bots are becoming increasingly sophisticated, Imperva warned. More and more of them come from residential internet connections, which makes them look more legitimate. “Automated bots will soon surpass the proportion of internet traffic coming from humans, changing the way that organizations approach building and protecting their websites and applications,” said Nanhi Singh, general manager for application security at Imperva. “As more AI-enabled tools are introduced, bots will become omnipresent.”

Read more of this story at Slashdot.

Court Upholds New York Law That Says ISPs Must Offer $15 Broadband

The U.S. Court of Appeals for the 2nd Circuit overturned a prior district court decision, lifting the injunction that blocked New York’s law mandating that ISPs offer $15 broadband plans to low-income families. Ars Technica reports: The ruling (PDF) is a loss for six trade groups that represent ISPs, although it isn’t clear right now whether the law will be enforced. For consumers who qualify for means-tested government benefits, the state law requires ISPs to offer “broadband at no more than $15 per month for service of 25Mbps, or $20 per month for high-speed service of 200Mbps,” the ruling noted. The law allows for price increases every few years and makes exemptions available to ISPs with fewer than 20,000 customers.

“First, the ABA is not field-preempted by the Communications Act of 1934 (as amended by the Telecommunications Act of 1996), because the Act does not establish a framework of rate regulation that is sufficiently comprehensive to imply that Congress intended to exclude the states from entering the field,” a panel of appeals court judges stated in a 2-1 opinion. Trade groups claimed the state law is preempted by former Federal Communications Commission Chairman Ajit Pai’s repeal of net neutrality rules. Pai’s repeal placed ISPs under the more forgiving Title I regulatory framework instead of the common-carrier framework in Title II of the Communications Act.

2nd Circuit judges did not find this argument convincing: “Second, the ABA is not conflict-preempted by the Federal Communications Commission’s 2018 order classifying broadband as an information service. That order stripped the agency of its authority to regulate the rates charged for broadband Internet, and a federal agency cannot exclude states from regulating in an area where the agency itself lacks regulatory authority. Accordingly, we REVERSE the judgment of the district court and VACATE the permanent injunction.”

Read more of this story at Slashdot.