• 2 Posts
  • 219 Comments
Joined 2 years ago
cake
Cake day: March 3rd, 2024

help-circle
  • it’s kind of frustrating to have to keep explaining to people how these models work, mostly because of how intensely oversold they are.

    on the one hand you have people who think it’s literally just a normal computer program doing database lookups with conditional logic and decision trees plus some sort of hand wavy magic. it’s not.

    on the other hand you have people who think it’s a literal brain that can stub its toe and change the way it walks thereafter. it won’t.

    every attempt at “agent memory” or whatever has thus far been desperate bullshit. i don’t care how many markdown files and vector databases and prompt engineering hacks you implement; you’ll never change the fact that these models have limited context and frozen weights. reading a markdown file or querying a database is not “remembering”.


  • i don’t think people in this forum would disagree with this move in 2018, as much as sentiments have changed. if you remove the political context and market moves from the equation, it is truly fascinating how these models work. GPT 2 was a crazy leap forward for language modeling, and the idea that a language model would be threatening middle class jobs wasn’t even on the table at that point. the idea that a pile of floating point numbers could write a React app is incredible, if politically fraught.

    also, it wasn’t clear back then what OpenAI would become. they were a non-profit, and as clear as our hindsight is today this was before ChatGPT or any customer facing products were coming out of OpenAI.

    i can’t be the only nerd in the room that has been fascinated by AI since i was a child only to face a reality where it’s not what i imagined it would be.







  • thanks for clarifying. it was hard for me to dignify such a comment with a response.

    you’re also going to run into hardware acceleration issues trying to run Metal acceleration with a Linux kernel. i don’t really see a need to containerize these workloads these days anyway with tools like uv.

    it’s a big pain in my ass at times trying to do web dev work with an aarch64-darwin dev env vs the target x86_64-linux. adding in hardware acceleration issues just sounds painful.

    i also just personally don’t like containers. feels like bludgeon of a solution.







  • so, it’s the same.

    saying “Linux does dynamic linking and Window does static linking” is both false and a mischaracterization. Windows absolutely does dynamic linking with its Dynamically Linked Libraries (.dll). how dependencies are linked is up to the developer and whatever hardware constraints. one reason i like Rust is that it prefers static linking, and a lot of tool chains are moving in that direction. the reason Linux distros push people toward their internal package management tools (eg apt) is to have tighter control over dynamic linking.

    and we’re also glossing over scoop and chocolatey and winget and Docker.

    but that’s where you get to stuff like flatpack and snap and Nix that try to contain the dynamic dependencies.

    i don’t think downloading exes hoping that Windows has stuffed enough DLLs into the OS and just running them is a better solution.




  • honestly it’s hard to beat Macs these days in this space for two reasons:

    • unified memory means that you don’t have to load up on RAM just to load the model and then also shell out for a video card with barely enough VRAM to fit a basic language model
    • their supply chain is solid and has mostly avoided the constraints that other OEMs and parts manufacturers are struggling with

    pricing is tough. sure, crypto is on its way out, but GPUs are still the platform of choice for most neural net workloads (outside of SoCs like Apple M-series). i built a PC in late 2024, and it’s easily worth twice what i paid for it.




  • as someone who has been watching far too much Food Network on the treadmill: just give em some freakin time to cook. the best things i’ve made personally are low and slow or from scratch pasta or slaw that sat in the fridge overnight. the 15-45min time frame has produced so many undercooked or otherwise mangled $80 steaks. like, even for a weeknight dinner i’m using things i marinated overnight or whatever. and in a kitchen setting you literally have all morning to prep in addition to doing overnight prep or even coming in super early to start your fresh bread. the format precludes entire classes of dishes.