I'm not sure there's many options other than cynical doomerism for analysing this situation. My uneducated guess? They probably already ran out of real world data, and are now forced to produce ridiculous amounts of LLM-generated data to try and continue the training process like this.
Other alternatives I can think of:
- they might be creating multiple model versions and keeping them for iteration metrics
- they might need to ingest a lot more real-world data to continue. Since video has been such a focus as of late, maybe they're building huge video libraries for the models? Or maybe they're creating their own real-world data with high detail.
- my most doomer take is that this is the beginning of a vastly deeper authoritarian online state, where a LOT more data is getting collected from EVERYONE and being fed into both new training data for models, as well as knowledge bases for context for models to work on top of.
We've known for a while that they're running out of training data, so it makes a lot of sense to either generate more data with models, create it at big scale without AI, or collect even more data in even more invasive ways from everyone online. There's literally no other reason I can think of to buy the entire stock of WD drives for 2026 2 months into the year.
Sony XM5 earbuds. The most annoying part of them is their feature that connects to multiple devices at once, so I end up fighting my phone (Graphene) or gaming PC (Bazzite) midway through a business call on my work laptop (believe it or not, also Bazzite).
So yeah, their only problem is they work with everything and can prioritize sounds from other devices mid-call. You can just not connect them to everything at the same time, or turn off Bluetooth on the phone n stuff while not in use.
I'm pretty sure you can just buy any device like that (so not Razer, and generally not gaming) and it'll just work on Linux. Gaming stuff in general usually has a hard time working even on Windows due to absolute dog-shit firmware & software implementations.