Skip Navigation

User banner

Jerry on PieFed

@ Jerry @feddit.online

帖子
25
评论
120
加入于
1 yr. ago

Just a techie guy running feddit.online to allow people to communicate, make friends and acquaintances. Odd coming from a happy introvert, right? (https://jerry.hear-me.blog/about)

I also own these publicly available applications:Mastodon: https://hear-me.social/Alternative Mastodon UI: https://phanpy.hear-me.social/Peertube: https://my-sunshine.video/Friendica: https://my-place.social/Matrix: https://element.secure-channel.net/XMPP/Jabber: https://between-us.online/Bluesky PDS: https://blue-ocean.social/ (jerry.blue-ocean.social) Mobilizon (Facebook Events Alt): https://my-group.events/and more...

  • I asked Gemini Pro whether the thorn character messes an LLM up? If anyone is interested, here's the answer. The short answer is it makes it harder for humans to read the text, but LLMs are not bothered by it. So, those who are using the thorn character for "th" believing that it breaks AI are not achieving anything and are just being annoying.

    Gemini Pro's response: (Note that as I write this in PieFed, the thorn character is being converted to "th" and so you will have to convert it back in your brain.

    The belief that using the character "þ" (Thorn) "messes with" LLM training has gained some traction in privacy-focused and "anti-AI" circles. Here is a breakdown of the technical reality behind that claim.

    1. The Theory: Disrupting Tokenization

    Large Language Models don't read words; they read tokens (numerical representations of character clusters).

    Common Tokens: In standard English, the word "the" is one of the most common tokens. It is highly optimized, and the model "understands" it perfectly.

    The "Thorn" Hack: If you replace "the" with "

    <thorn>

    e," you force the model's tokenizer to handle a rare character. Because

    <thorn>

    is uncommon in English training data, the tokenizer might break "

    <thorn>

    e" into two separate, rare tokens (the byte for

    <thorn>

    and the letter "e"). Proponents believe this makes the text "invisible" or "garbage" to the model, preventing it from learning from the content or correctly categorizing it.

    1. Does it actually work? (Mostly No)

    While the idea is clever, it is generally ineffective against modern, high-scale LLMs for several reasons:

    Multilingual Training: Most modern models are trained on massive multilingual datasets. Icelandic uses

    <thorn>

    as a standard letter. Therefore, the models already have

    <thorn>

    in their vocabulary and have seen it millions of times.

    Byte-Level Tokenization: Modern tokenizers (using Byte Pair Encoding or BPE) are designed to handle any Unicode character by falling back to its raw bytes. They don't "break" when they see an unfamiliar symbol; they just process it as a less-common sequence.

    Contextual Mapping: LLMs are excellent at pattern recognition. If a model sees a sentence like "

    <thorn>

    e cat sat on

    <thorn>

    e mat," it can easily infer from the context that "

    <thorn>

    e" serves the exact same linguistic function as "the."

    1. The "Glitch Token" Phenomenon

    There is a grain of truth to the "glitching" idea. **In the past, ** certain rare characters or specific strings (often called "glitch tokens") were included in a model's vocabulary but appeared so rarely in the training data that their "embedding" (the model's understanding of them) was never properly trained.

    When a model encounters a true glitch token, it can sometimes "hallucinate" or produce nonsensical output. However,

    <thorn>

    is a standard UTF-8 character and is too common globally to act as a true glitch token for most general-purpose models.

    Summary

    Using

    <thorn>

    is more of a digital protest or a linguistic "speed bump" than a functional way to break an AI. It might make your text harder for a simple keyword-search bot to find, but for a modern LLM, it’s just a slightly "noisy" version of English that we can still read and understand quite easily.

  • startpage.com blocking VPN?

    跳过
  • I deleted what I wrote before. If it federated, ignore it.

    Your browser is trying to find startpage.com on your local machine instead of the Internet.

    While on the VPN, open a command window and ping startpage.com. Does it return localhost or the real IP address? If it returns the real IP address, then the problem is related to the browser. Try another browser to see if it's Vivaldi-related.

    If it returns localhost then maybe it's a setting in protonVPN?

    This is strange. Just try to find clues.

  • startpage.com blocking VPN?

    跳过
  • I'd ask for a refund.

  • You don't send them the password. The password never leaves your device. The password is the decryption key to decrypt your encrypted private key, which is what they send to your device. This is why, for Proton Mail, and others that use this technique, it is imperative to have a strong password to protect your private key.

  • Proton stores your encrypted private key . An encrypted private key does not allow them to read your email or files.

    When you log into a new device: Proton sends the encrypted private key to your device. You type your password. ** Your device** (not Proton's server) uses the password to decrypt the private key locally in your browser or app memory.

    That decrypted key is then used to decrypt your emails on your device. Proton mail sends you just the encrypted text.

    There is one potential security issue:

    Since Proton serves the website code (HTML/JavaScript) that performs the encryption, you have to trust that they serve you honest code. Proton could theoretically alter their website code to capture your password the next time you log in, which theoretically a government can force them to do.

    However, this is a different threat than "they have the keys." Currently, they possess the keys only in a form they mathematically cannot unlock.

  • "From time to time, Proton may be legally compelled to disclose certain user information to Swiss authorities, as detailed in our Privacy Policy. This can happen if Swiss law is broken. As stated in our Privacy Policy, all emails, files and invites are encrypted and we have no means to decrypt them. "

  • Seems that if the uninvited code never tried doing network scans, it may not have been detected for quite a while. Good thing the hacker was a bit greedy.

  • I read the post and understood the problem. Then I read Dan's reply. And he's fixing it. In fact, he's fixing it in the exact way the blog post says it should be fixed, "... but those filters should be manually triggered and always removable."

    Here's Dan's reply:

    "... I hear the community: you want text-only posts. We are going to build this as an opt-in feature. If you want microblogging, turn it on. If you prefer the classic media-only experience, nothing changes."

    Why was this blog post even written then? Now it's seeming to be personal.

  • One option is to switch to a PieFed instance running V 1.3.6. The devs did something about this for PieFed users that should help.

    v1.3.6Stop floods of spammy posts from very new accounts on lemmy.world.New accounts (< 24h old) from lemmy.world are now limited to 3 posts in their first day.

    V1.3.3Automatically delete posts by very young accounts that self-delete.

    piefed.social and feddit.online are two that are running the latest code. I'm sure there are others.

    Yeah, that person is quite annoying.

  • my-sunshine.video is one with no restrictions and has open registration. Don't know about others.

  • 跳过
  • How long has the site been down, as far as you know?

  • Yes, it claims to proxy the traffic, but then you have to connect to the fedibuzz relay server.

    It sounded from the post that there was an attempt to directly connect to a Mastodon server through the relay connection, which won't work.

  • A relay, as in the Mastodon relay feature? As in Administration->Relays? If so, does mastodonapp.uk have an active relay? It can't be used to connect to just any Mastodon server. It has to connect to a special relay server.

    Not sure whether this is helpful.

  • I hate these hasty generalizations about a group of people. It serves no purpose and gives less intelligent people a reason to be snarky against other people.

  • I'm now displaying it on feddit.online. And it's staying.

  • Proton Mail, Tuta mail, runbox.com, to name three.

  • Today I Learned (TIL) @lemmy.ca

    TIL the guillotine was named after a man who neither invented it nor believed in the death penalty

  • Today I Learned @lemmy.world

    TIL the Guillotine was named after a man who neither invented it nor believed in the death penalty

  • Cybersecurity @sh.itjust.works

    AI hacking. Downloading images can allow your computer to be hijacked

  • Cybersecurity @sh.itjust.works

    My Pixel 10 warned me 8 times in 30-minutes that there was a rogue connection made. Deeply concerning

  • Facepalm @lemmy.world

    Hackers got Clorox passwords by simply asking for them?

  • Facepalm @lemmy.wtf

    Chicago Sun-Times Ripped For AI Summer Reading List - Comic Sands

    www.comicsands.com /sun-times-ai-list
  • Facepalm @lemmy.world

    Chicago Sun-Times Ripped For AI Summer Reading List - Comic Sands

    www.comicsands.com /sun-times-ai-list
  • Privacy @lemmy.world

    Big win for States, and you, for stopping corporations who violate state privacy laws

  • Fediverse @lemmy.world

    Short video that show what Friendica can do

  • Selfhosted @lemmy.world

    mysql or postgresql? Which is better for an Internet-facing application

  • Community Promo @lemmy.ca

    Antisocial Media

  • memes @lemmy.world

    The elf in the swamp

  • Technology @lemmy.world

    Digital Ocean's long outage on 28-Nov-2024 caused by a Network Solutions blunder they [NS] didn't know how to fix

  • memes @lemmy.world

    Schrödinger's Supermarket

  • memes @lemmy.world

    It could be worse

  • memes @lemmy.world

    This Building Looks Like It Was Designed In Microsoft Word

  • memes @lemmy.world

    Different perspectives on dogs

  • memes @lemmy.world

    Depends what you base it on

  • memes @lemmy.world

    Schrödinger's Code