← all providers14 models
Groq
The fastest inference layer for open-weight models.
Groq runs popular open-weight models (Llama, Mixtral, Gemma) on custom LPU silicon for industry-leading tokens-per-second throughput.
| Model | Context | Max out | $/M in | $/M out | Capabilities |
|---|---|---|---|---|---|
gemma-7b-it | 8.2K | 8.2K | $0.050 | $0.080 | tools |
llama-3.1-8b-instant | 128K | 8.2K | $0.050 | $0.080 | tools |
llama-3.3-70b-versatile | 128K | 32.8K | $0.590 | $0.790 | tools |
meta-llama/llama-4-maverick-17b-128e-instruct | 131.1K | 8.2K | $0.200 | $0.600 | visiontoolsjson |
meta-llama/llama-4-scout-17b-16e-instruct | 131.1K | 8.2K | $0.110 | $0.340 | visiontoolsjson |
meta-llama/llama-guard-4-12b | 8.2K | 8.2K | $0.200 | $0.200 | — |
moonshotai/kimi-k2-instruct-0905 | 262.1K | 16.4K | $1.00 | $3.00 | toolsjson |
openai/gpt-oss-120b | 131.1K | 32.8K | $0.150 | $0.600 | toolsjson |
openai/gpt-oss-20b | 131.1K | 32.8K | $0.075 | $0.300 | toolsjson |
openai/gpt-oss-safeguard-20b | 131.1K | 65.5K | $0.075 | $0.300 | toolsjson |
playai-tts | 10K | 10K | — | — | — |
qwen/qwen3-32b | 131K | 131K | $0.290 | $0.590 | tools |
whisper-large-v3 | — | — | — | — | — |
whisper-large-v3-turbo | — | — | — | — | — |