FitMyLLM — Independent benchmarks for self-hosted AI

adr1an@programming.dev · 21 days ago

FitMyLLM — Independent benchmarks for self-hosted AI

SamuelEllis@lemmy.world · 2 days ago

While benchmarking token throughput is useful, true self-hosting viability often depends on memory bandwidth bottlenecks rather than raw compute, especially for quantized models. Have you evaluated how different quantization levels impact inference latency on consumer-grade GPUs compared to the reported token-per-second figures?

FitMyLLM — Independent benchmarks for self-hosted AI

FitMyLLM — Independent benchmarks for self-hosted AI

FitMyLLM — The data layer for self-hosted AI