page archiving sites (wayback, archive.is) are unreliable to access these days. what to do about it?

hellinkilla [they/them, they/them]@hexbear.net · 7 days ago

page archiving sites (wayback, archive.is) are unreliable to access these days. what to do about it?

RedWizard [he/him, comrade/them]@hexbear.net · 7 days ago

I’ve always thought archive sites would be an interesting decentralization problem to solve. The real hurdles I think would be these:

Verification of authenticity of the capture.
Distribution of the capture.

As we’ve seen with archive.is, its maintainer was willing to alter the content of archives for his petty spat with some blogger. That’s not good for the trustworthiness of the archives themselves.

I could imagin that an archive system could be built using a combination of bittorrent, cryptography, and activitypub.

If you’re on archiveSiteA and request a URL it doesn’t have, it can attempt to find it via federation from other servers. What the servers federate could be a RSS feed of archived URLs, their md5 hash, and magnet link for downloading. If the site is found in those lists it could then pull the contents and surve it to you, or redirect you to the archive that has it.

If it can’t find it on the network you then have the option to pull the site locally to the server.

The really hard part I think would be the verification of archives. The last thing you would want is someone on the network hosting archives of URLs that contain doctored information or even malicious code that loads as part of the archive.

hellinkilla [they/them, they/them]@hexbear.net · 6 days ago

Probably plausible. I wonder if it would run into the problem of many of these distributed sites in that they immediately become giant CSAM hosting projects and nobody wants anything to do with them.