• sunzu2@thebrainbin.org
      link
      fedilink
      arrow-up
      3
      arrow-down
      1
      ·
      1 day ago

      https://ollama.org/

      You can pick something that fits your GPU size. Works well on apple silicon too. My fav’s now are qwen3 series. Prolly best performance for local single gpu

      Will work on CPU/RAM but slower

      If you got Linux, I would put into a docker container. Might too much for the first try. There easier options I think.

      • tormeh@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        2
        ·
        8 hours ago

        Ollama is apparently going for lock-in and incompatibility. They’re forking llama.cpp for some reason, too. I’d use GPT4All or llama.cpp directly. They support Vulkan, too, so your GPU will just work.

      • venusaur@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        13 hours ago

        Hm, I’ll see if my laptop can handle it. Probably do t have the patience or processing power

      • Jakeroxs@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        3
        ·
        1 day ago

        I use oobabooga, little bit more options in the gguf space then ollama but not as easy to use imo. Does support openAI api connection though so can plug in other services to use it.