

For the byte pair encoding (how those tokens get created) i think https://bpemb.h-its.org/ does a good job at giving an overview. after that i’d say self attention from 2017 is the seminal work that all of this is based on, and the most crucial to understand. https://jtlicardo.com/blog/self-attention-mechanism does a good job of explaining it. And https://jalammar.github.io/illustrated-transformer/ is probably the best explanation of a transformer architecture (llms) out there. Transformers are made up of a lot of self attention.
it does help if you know how matrix multiplications work, and how the backpropagation algorithm is used to train these things. i don’t know of a good easy explanation off the top of my head but https://xnought.github.io/backprop-explainer/ looks quite good.
and that’s kinda it, you just make the transformers bigger, with more weight, pluck on a lot of engineering around them, like being able to run code and making it run more efficientls, exploit thousands of poor workers to fine tune it better with human feedback, and repeat that every 6-12 month for ever so it can stay up to date.
I remember reading that hotel TVs are an option. They also have an ad platform, but one intended for the hotel owner to send ads from, not some 3rd party. Not exactly dumb but also not as bad as regular TVs.
And of course a beamer or PC screen connected to some cheap small form factor PC is always an option, with Kodi or similar on it, i haven’t owned a TV in like 10 years, just using a small linux pc with beamer, and a tv tuner card in the past (nowadays my ISP offers all public channels on IPTV)