The Weekly Weird #37
'Egregious' Anthropic, Kamala Harris fakes it, nuclear fusion requires mayonnaise, China gets charming, another 'biggest-ever' data breach
Welcome back, Weirders!
Once more we meet to cluck our tongues, shake our heads, and giggle at the gaggle of grotesqueries the past week has provided.
Quick mentions:
August 15 is the Berlin re-trial of
- I’ll update with news when I have it, but, if you’re inclined, make some noise wherever you can for a free thinker in need. The latest is that he’s been prevented from bringing a prepared statement to read into the record.Next episode of the podcast drops this Sunday, August 18, so look out for it.
Onwards!
‘Egregious’ Anthropic
The AI firm Anthropic has been accused of “aggressively scraping data from websites to train its systems.”
The Financial Times reported on the firm’s alleged (mis)use of ‘crawlers’ to take data from websites to feed the insatiable maw of its Armageddon machine for training its large language models.
Matt Barrie, the chief executive of Freelancer.com accused the San Francisco-based company of being “the most aggressive scraper by far” of his portal for freelancers, which has millions of daily visits.
[…]
Freelancer.com received 3.5mn visits from an Anthropic-linked web “crawler” in the space of four hours, according to data shared with the Financial Times. That makes Anthropic “probably about five times the volume of the number two” AI crawler, Barrie said.
[…]
“We had to block them because they don’t obey the rules of the internet,” Barrie said. “This is egregious scraping [which] makes the site slower for everyone operating on it and ultimately affects our revenue.”
If you’ve managed to stay out of the AI kerfuffle up until now, ‘crawling’ or ‘scraping’ is the practice of using automated scripts to take publicly available data from websites. Although some sites have terms that prohibit this for uses like training an AI, it is usually legal. That said, it is frowned-upon.
Kyle Wiens, chief executive of iFixit.com, said his electronic repairs site received 1mn hits from Anthropic bots in the space of 24 hours. “We have a load of alarms [for high traffic], people get woken up at 3am. This set off every alarm we have,” he said.
iFixit’s terms of service prohibited the use of its data for machine learning, said Wiens. “My first message to Anthropic is: if you’re using this to train your model, that’s illegal. My second is: this is not polite internet behaviour. Crawling is an etiquette thing.”
Scraping isn’t just an issue because of the way the content is used - in practical terms it can slow down how a website works for other users, and even raise costs for the website owner.
“AI crawlers have cost us a significant amount of money in bandwidth charges, and caused us to spend a large amount of time dealing with abuse,” wrote Eric Holscher, co-founder of document hosting website Read the Docs in a blog post on Thursday. “AI crawlers are acting in a way that is not respectful to the sites they are crawling, and that is going to cause a backlash against AI crawlers in general,” he added.
Besides the social and cultural implications of LLMs and AI cranking out useless mindpaste that clogs the pipework of the internet with inanity and lies, there are hard costs in terms of energy and bandwidth that make the process of training an AI seem increasingly parasitic. Not only is other people’s (often-copyrighted) work being appropriated so that university students can fake their way to a medical degree and then not know how to remove your spleen, AI is being trained a lot of the time on someone else’s dime. How long before the recipients of these ‘collect calls’ decide not to accept the charges?
The AI industry’s response to the rising backlash?
Kamala Harris Fakes It
Speaking of automated processes fraying the very fabric of information itself, Kamala Harris’s campaign got busted by Axios this week using Google to present real news stories in sponsored search results but with altered headlines that “make it appear as if the Guardian, Reuters, CBS News and other major publishers are on her side.”
Exhibit A from Axios:
Google’s excuse was that the alterations, which are advertising, not search results, “don’t violate its rules,” and that ‘because ads on Search are prominently labeled as “Sponsored,” they're “easily distinguishable from Search results.”’
As Axios pointed out:
The ads say that they are sponsored, but it's not immediately clear that the text that accompanies real news links is written by the campaigns and not by the media publication itself.
The campaign has been doing this at least since the beginning of this month, altering headlines from a championship roster of mainstream news outlets.
Since August 3rd, nearly a dozen news companies have been used in these types of search ads from the Harris campaign, Axios found.
Examples include The Independent UK, NPR, AP, The Guardian, USA Today, PBS, CNN, CBS News, Time and others, including local outlets like North Dakota radio station WDAY Radio.
The ads include links to real articles from the news outlets, but the headlines and supporting text have been altered to read as though the articles support the Harris campaign's objectives.
For example, an ad that ran alongside an article from The Guardian shows a headline that reads "VP Harris Fights Abortion Bans - Harris Defends Repro Freedom" and then includes supporting text underneath the headline that reads, "VP Harris is a champion for reproductive freedom and will stop Trump's abortion bans."
An ad featuring a link to an NPR story reads, "Harris Will Lower Health Costs," with supporting text that says, "Kamala Harris will lower the cost of high-quality affordable health care."
A punchline to this sordid little piece of politicking appears in a brief paragraph in the Axios article:
Facebook banned the ability for advertisers to edit text from Instant Article news links in their ads in 2017, citing its "continuing efforts to stop the spread of misinformation and false news."
Google is doing something even Facebook won’t do.
“Don’t be evil” indeed.
Nuclear Fusion Requires Mayonnaise
We can all agree that science is awesome, but every once in a while we get a taste of something even more delicious than the inherent awesomeness of figuring out how something works, proving it, and then doing it consistently for fun and profit.
Enter a humble dollop of mayo.
According to OilPrice, “Scientists are studying the condiment to understand the behavior of plasma in fusion reactions.”
The “Rayleigh-Taylor instability…is the interpenetration of materials when fluids with different densities collide,” and a key issue facing the people in white coats trying to build the sun in a box to give us all free energy forever is “the plasma state which results from rapid compression and heating of capsules filled with isotopes of hydrogen.”
The answer, which I’ll bet you didn’t have on your fusion bingo card, is mayonnaise.
OilPrice quotes “Arindam Banerjee, the Paul B. Reinhold Professor of Mechanical Engineering and Mechanics at Lehigh University and Chair of the MEM department in the P.C. Rossin College of Engineering and Applied Science”:
“We use mayonnaise because it behaves like a solid, but when subjected to a pressure gradient, it starts to flow,” he says.
Banerjee and team have been studying the properties of mayonnaise in relation to plasma characteristics for at least half a decade.
Their most recent research has been “trying to enhance the predictability of what would happen with those molten, high-temperature, high-pressure plasma capsules with these analog experiments of using mayonnaise in a rotating wheel.”
To every scientist who weathers the raised eyebrows when they explain that their research is in “the plasma capsule-mimicking properties of mayonnaise,” thank you for your service. It can’t be easy choosing a condiment over quantum.
We salute you.
China Gets Charming
The Network Contagion Research Institute at Rutgers University has just published a report called The CCP's Digital Charm Offensive: How TikTok's Search Algorithm and Pro-China Influence Networks Indoctrinate GenZ Users in the United States.
From the report’s Executive Summary:
[Our] findings present the following circumstantial evidence:
• Algorithmic Bias: TikTok's algorithms consistently amplify pro-CCP content and suppress anti-CCP narratives.
• Content Origination: Much of the pro-CCP content originates from state-linked entities, including media outlets and influencers.
• Psychological Impact: Survey data shows significant shifts in user attitudes towards China, especially among heavy TikTok users, indicating successful indoctrination.
These points collectively indicate a systematic manipulation of information, suggesting that propaganda produced by state actors and orchestrated through assets owned or influenced by them shapes user perceptions at a massive scale.
How adept at shaping perceptions is the algorithm in question?
The views-to-likes ratio for anti-China content on TikTok was 87% lower than pro-China content even though the content was liked nearly twice as much.
The key takeaway from the report (emphasis in the original):
TikTok amplifies frontier influencers (travel and lifestyle content accounts) and irrelevant or clickbait material, to crowd out discussion of CCP-driven ethnic genocide and human rights abuses on its platform.
[…]
A psychological survey of Americans (n=1214) shows that, among the platforms studied, TikTok screentime positively and uniquely predicted favorability towards China's human rights record. Notably, heavy users of TikTok (i.e., those with >3 hours of daily screentime) demonstrated a roughly 50% increase in proChina attitudes compared to non-users. This suggests that TikTok's content may contribute to psychological manipulation of users, aligning with the CCP’s strategic objective of shaping favorable perceptions among young audiences.
In a line ripe with implications unexplored, the authors of the report add:
These findings underscore the urgent need for transparent regulation of social media algorithms, or even the creation of a public trust funded by the platforms themselves to safeguard democratic values and free will.
Now we just need to all agree on what democratic values are, exactly, and then find a way to empower the policing of the management of the visibility of speech without it impacting on what can be said…
No biggie, right?
The TikTok ‘crowding out’ system also sounds a little like Elon Musk’s “speech, not reach” caveat on X.
You will be allowed to push the sounds out of your windpipe, but the State will block the ears that might otherwise hear them, or drown out your sounds with a chorus of opposing voices that make what you say inaudible.
What a time to be alive.
Another ‘Biggest-Ever’ Data Breach
Nearly 3 billion (yeah, with a ‘b’) people have had their data hacked in yet another ‘biggest-ever’ privacy protection failure.
A recent class action lawsuit filed against Jerico Pictures Inc., a background check company that does business under the name National Public Data, claims that the company was breached by hackers earlier this year.
As a result, the lawsuit says, confidential data for 2.9 billion was exposed and stolen by a hacker group known as USDoD.
Okay, well at least National Public Data can notify the affected people so they can take steps like changing passwords, double-checking their security etc.
To quote Wayne Campbell, “As if!”
Making matters even worse, those affected by this cyberattack may not even know they could be involved. National Public Data reportedly gathers its data by scraping information about individuals from non-public sources without their knowledge or consent.
The exposed information contains varying details for nearly 3 billion people which include full names, former and current addresses, and Social Security numbers as well as personal data tied to family members and relatives who are both living and deceased.
When did this all go down?
This breach was previously unknown to the public. It's unclear when exactly the breach occurred. Named plaintiff Christopher Hofmann says he only became aware of the issue when an identity theft protection service notified him in July that his personal information had been compromised and leaked on the dark web.
The group [USDoD] posted a "National Public Data" database containing the leaked information on a dark web hacking forum in April, and sought $3.5 million from a potential buyer.
Just how big a deal is this?
With billions exposed, the National Public Data breach appears to be one of the biggest single data breaches ever, seemingly rivaled only by Yahoo's 2013 data breach which affected 3 billion accounts.
So, to recap, a shifty outfit sold background checks under a pseudonymous trading name by scraping personal data and compiling it without knowledge or consent, and then protected it so poorly that all the details got stolen and ended up for sale on the dark web for $3.5 million.
In the midst of a global movement to roll out digital identity, hacks like this are a sobering counterpoint that hopefully will give authorities pause before they ram through ill-conceived technological fixes for non-existent problems that ultimately make individuals less secure.
Back to Wayne for the required sarcastic response…
That’s it for this week’s Weird, everyone. Thank you as always for reading.
Outro music is Jane’s Addiction with Been Caught Stealing, dedicated to Anthropic and the rest of the AI industry.
Stay sane, friends.
What kind of frustration and failure brings a scientist to say, “You know, we haven’t tried using condiments!”🤣
Lol, they are upset that tiktok users are more pro China.
I suppose because tiktok, although it censors some things, does not promote the US empirial agenda which is to put down China as authoritarian.
Quantum theory is pseudoscience, thus mayo methods.
https://youtube.com/playlist?list=PLkdAkAC4ItcHNLDIK9ORydQl_Ik6GJ0bD