I’d like to self host a large language model, LLM.
I don’t mind if I need a GPU and all that, at least it will be running on my own hardware, and probably even cheaper than the $20 everyone is charging per month.
What LLMs are you self hosting? And what are you using to do it?
My (docker based) configuration:
Linux > Docker Container > Nvidia Runtime > Open WebUI > Ollama > Llama 3.1
Docker: https://docs.docker.com/engine/install/
Nvidia Runtime for docker: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
Open WebUI: https://docs.openwebui.com/
Ollama: https://hub.docker.com/r/ollama/ollama