← all providers

Groq

14 models

The fastest inference layer for open-weight models.

Groq runs popular open-weight models (Llama, Mixtral, Gemma) on custom LPU silicon for industry-leading tokens-per-second throughput.

ModelContextMax out$/M in$/M outCapabilities
gemma-7b-it
8.2K8.2K$0.050$0.080
tools
llama-3.1-8b-instant
128K8.2K$0.050$0.080
tools
llama-3.3-70b-versatile
128K32.8K$0.590$0.790
tools
meta-llama/llama-4-maverick-17b-128e-instruct
131.1K8.2K$0.200$0.600
visiontoolsjson
meta-llama/llama-4-scout-17b-16e-instruct
131.1K8.2K$0.110$0.340
visiontoolsjson
meta-llama/llama-guard-4-12b
8.2K8.2K$0.200$0.200
moonshotai/kimi-k2-instruct-0905
262.1K16.4K$1.00$3.00
toolsjson
openai/gpt-oss-120b
131.1K32.8K$0.150$0.600
toolsjson
openai/gpt-oss-20b
131.1K32.8K$0.075$0.300
toolsjson
openai/gpt-oss-safeguard-20b
131.1K65.5K$0.075$0.300
toolsjson
playai-tts
10K10K
qwen/qwen3-32b
131K131K$0.290$0.590
tools
whisper-large-v3
whisper-large-v3-turbo