Check what can you use and at what rate of token per seconds would it be… It has examples of many models and quantization levels. Huge resource!