• altphoto@lemmy.today
    link
    fedilink
    arrow-up
    1
    ·
    18 hours ago

    Maybe we should have 2 factor authentication to read wikis. A person would have to come here and seek a public key from other people. If you can get 2 Public keys by proving you are human, then you get a month of wiki access unless you keep proving you are human or if you prove to be AI then you’re totally banned unless you get a public key in real person at the local Walmart or Safeway.

  • poVoq@slrpnk.netM
    link
    fedilink
    arrow-up
    21
    ·
    21 days ago

    Not only wikis sadly. Anything that has public facing deep links that trigger extensive database operations are being hammered by these bots and few servers can take the load.

  • thatsnothowyoudoit@lemmy.ca
    link
    fedilink
    arrow-up
    12
    ·
    edit-2
    20 days ago

    We use NGINX’s 444 response A LOT.

    In coordination with careful rate-limiting, it’s been a dramatic improvement.

    The worst of the bots don’t advertise their User Agent (or worse, attempt to present they’re a normal user making 100s of requests a second) but there’s lots of low hanging fruit.

  • Tiresia@slrpnk.net
    link
    fedilink
    arrow-up
    7
    ·
    21 days ago

    On the plus side, this isn’t a problem with AI, this is a problem with AI companies having more investment money than they know what to do with. The moment the hype fades and they don’t want to hemmorage money scraping every wiki on the internet thousands of times per day, this traffic will go back to a far more sane amount.

    • poVoq@slrpnk.netM
      link
      fedilink
      arrow-up
      3
      ·
      21 days ago

      It will probably go down, but the process itself is kind of unavoidable for training LLMs, so I doubt things will go back to how they were before.