Hello internet users. I have tried gpt4all and like it, but it is very slow on my laptop. I was wondering if anyone here knows of any solutions I could run on my server (debian 12, amd cpu, intel a380 gpu) through a web interface. Has anyone found any good way to do this?
Koboldcpp should allow you to run much larger models with a little bit of ram offloading. There’s a fork that supports rocm for AMD cards: https://github.com/YellowRoseCx/koboldcpp-rocm
Make sure to use quantized models for the best performace, q4k_M being the standard.