Yeah they make no sense for most devices. For high end gaming laptops I can understand it though - My laptop has a 300W brick that is as slim as I would bet they could get it at the time.
Ive had good success on similar hardware (5070 + more ram) with GLM-4.7-Flash, using llama.cpp's --cpu-moe flag - I can get up to 150k context with it at 20ish tok/sec. I've found it to be a lot better for agentic use than GPT-OSS as well, it seems to do a much more in depth reasoning effort, so while it spends more tokens it seems worth it for the end result.
I gave my agents a skill that has them cat from /dev/urandom (which is this corralled into text characters) any time they need to generate passwords for something. Even then I have only ever had one need it like twice.
Interestingly, none of the official sources for the model weights clickwrap the download in a way that forces the user to read or agree to those terms before downloading. There is precedent for such terms being unenforceable when the user isn't forced to agree to the terms.
There's lots of open source models you can download from Hugging Face, Ollama, and even github without signing any contracts or terms of use. Gemma3, Llama, Ministral, GLM, olmo, and a bajillion others. GLM-4.7-Flash is a very capable agentic model that can run at very usable speeds on commodity hardware - and none of what it generates is dictated by any agreements or policies agreed to anywhere.
Someone forgot a debounce