FaceDeer

@ FaceDeer @fedia.io

Posts

0
Comments

3071
Joined

2 yr. ago

Basically a deer with a human face. Despite probably being some sort of magical nature spirit, his interests are primarily in technology and politics and science fiction.
Spent many years on Reddit before joining the Threadiverse as well.

1mo ago

Whistleblower drops 'largest ever' ICE leak to unmask agents: 'The last straw'
Jump
FaceDeer @fedia.io 1mo ago
Those would be the Epstein Files. This is still nice to have out there, though. Governments should fear their people.

1mo ago

A Project to Poison LLM Crawlers

I have a sneaking suspicion that the vast majority of the people raging about AIs scraping their data are not raging about it being done inefficiently.

1mo ago

A Project to Poison LLM Crawlers

Jump

FaceDeer @fedia.io 1mo ago

You're thinking of "model decay", I take it? That's not really a thing in practice.

1mo ago

A Project to Poison LLM Crawlers

Jump

FaceDeer @fedia.io 1mo ago

Raw materials to inform the LLMs constructing the synthetic data, most likely. If you want it to be up to date on the news, you need to give it that news.

The point is not that the scraping doesn't happen, it's that the data is already being highly processed and filtered before it gets to the LLM training step. There's a ton of "poison" in that data naturally already. Early LLMs like GPT-3 just swallowed the poison and muddled on, but researchers have learned how much better LLMs can be when trained on cleaner data and so they already take steps to clean it up.

1mo ago

A Project to Poison LLM Crawlers

Jump

FaceDeer @fedia.io 1mo ago

I have no idea what "established means" would be. In the particular case of the Fediverse it seems impossible, you can just set up your own instance specifically intended for harvesting comments and use that. The Fediverse is designed specifically to publish its data for others to use in an open manner.

1mo ago

A Project to Poison LLM Crawlers

Jump

FaceDeer @fedia.io 1mo ago

Are you proposing flooding the Fediverse with fake bot comments in order to prevent the Fediverse from being flooded with fake bot comments? Or are you thinking more along the lines of that guy who keeps using "Þ" in place of "th"? Making the Fediverse too annoying to use for bot and human alike would be a fairly phyrric victory, I would think.

1mo ago

A Project to Poison LLM Crawlers

Jump

FaceDeer @fedia.io 1mo ago

A basic Google search for "synthetic data llm training" will give you lots of hits describing how the process goes these days.

Take this as "defeatist" if you wish, as I said it doesn't really matter. In the early days of LLMs when ChatGPT first came out the strategy for training these things was to just dump as much raw data onto them as possible and hope quantity allowed the LLM to figure something out from it, but since then it's been learned that quality is better than quantity and so training data is far more carefully curated these days. Not because there's "poison" in it, just because it results in better LLMs. Filtering out poison will happen as a side effect.

It's like trying to contaminate a city's water supply by peeing in the river upstream of the water treatment plant drawing from it. The water treatment plant is already dealing with all sorts of contaminants anyway.

1mo ago

A Project to Poison LLM Crawlers

Jump

FaceDeer @fedia.io 1mo ago

I think it's worthwhile to show people that views outside of their like-minded bubble exist. One of the nice things about the Fediverse over Reddit is that the upvote and downvote tallies are both shown, so we can see that opinions are not a monolith.

Also, engaging in Internet debate is never to convince the person you're actually talking to. That almost never happens. The point of debate is to present convincing arguments for the less-committed casual readers who are lurking rather than participating directly.

1mo ago

A Project to Poison LLM Crawlers

Jump

FaceDeer @fedia.io 1mo ago

Doesn't work, but I guess if it makes people feel better I suppose they can waste their resources doing this.

Modern LLMs aren't trained on just whatever raw data can be scraped off the web any more. They're trained with synthetic data that's prepared by other LLMs and carefully crafted and curated. Folks are still thinking ChatGPT 3 is state of the art here.

1mo ago

[Meta] RE: Starfleet Academy

Jump

FaceDeer @fedia.io 1mo ago

even bad trek is more trek.

What this attitude tells executives is "guess we don't need to bother trying to make Trek good, they'll eat up whatever we shovel their way."

I'm a big fan of Star Trek. That's why I haven't watched anything other than Lower Decks since Discovery started airing.

1mo ago

‘The streets are full of blood’: Iranian protests gather momentum as regime cracks down

Jump

FaceDeer @fedia.io 1mo ago

Meanwhile Americans are moaning about how they can't protest because they need their jobs to provide their health insurance.

1mo ago

EU_irl

Jump

FaceDeer @fedia.io 1mo ago

Yeah, the list goes on and on.

1mo ago

EU_irl

Jump

FaceDeer @fedia.io 1mo ago

It's "I've taken Venezuela, now give me Greenland. And Cuba. And Panama. And Canada. And.,.."

1mo ago

A $400,000 payout after Maduro's capture is putting prediction markets in the spotlight

Jump

FaceDeer @fedia.io 1mo ago

Seems likely to have been the result of inside information, sure.

But that's the point of prediction markets, isn't it? Their purpose is to get people who have knowledge about a topic to put that knowledge out there to the public, with money put up against that to back their certainty. It's a way of surfacing that knowledge to the general public. People who don't have knowledge and are just making gambling bets based on their "feelings" are going to get screwed now and then. They probably shouldn't be doing that.

1mo ago

Nobel Institute rejects María Corina Machado’s offer to share peace prize with Trump

Jump

FaceDeer @fedia.io 1mo ago

Eh, I could see it as a reasonable deal from her perspective. If giving Trump a pointless shiny metal disk got him to actually back democratic reform in Venezuela, sure, give him the shiny metal disk. Whatever, it's just a thing.

Unfortunately it wouldn't work, though; Trump never sticks to his deals after he's got what he wanted out of them or when he sees an opportunity to squeeze even more out of you. So might as well keep the shiny metal disk.

1mo ago

Has Canada's government done anything concrete to reduce dependence on the US since Trump took office? Maybe even since the first term?

Jump

FaceDeer @fedia.io 1mo ago

Carney didn't "clap for the attack on Venezuela." He called for international law to be followed, which should be an obvious rebuke to anyone who isn't at a Trump level of understanding of how diplomacy is done.

1mo ago

Oh no! Linus doesn't know AI is useless!

Jump

FaceDeer @fedia.io 1mo ago

I've long found it funny how some people claim that generative AI produces terrible slop, and simultaneously that it's a huge threat to their jobs.

1mo ago

AI insiders seek to poison the data that feeds them

Jump

FaceDeer @fedia.io 1mo ago

People have been doing this to "protest" AI for years already. AI trainers already do extensive filtering and processing of their training data before they use it to train, the days of simply turning an AI loose on Common Crawl and hoping to get something out of that are long past. Most AIs these days train on synthetic data which isn't even taken directly from the web.