A possible hardware solution for ultra speed (73x faster than H200) self hosted small models that is not dependent on RAM
A possible hardware solution for ultra speed (73x faster than H200) self hosted small models that is not dependent on RAM
dev.to
A 25-Person Startup Built a Chip That Only Runs One AI Model. It's 73 Times Faster Than Nvidia.

Approach hardwires model weights into transistors, and uses older 6nm process. Targetting 70b model sizes (presumably 16 bit) by year end. It should cost much less than a 140gb card. but I don't know details.