Groq

14 models

The fastest inference layer for open-weight models.

Groq runs popular open-weight models (Llama, Mixtral, Gemma) on custom LPU silicon for industry-leading tokens-per-second throughput.

Model	Context	Max out	$/M in	$/M out	Capabilities
gemma-7b-it	8.2K	8.2K	$0.050	$0.080	tools
llama-3.1-8b-instant	128K	8.2K	$0.050	$0.080	tools
llama-3.3-70b-versatile	128K	32.8K	$0.590	$0.790	tools
meta-llama/llama-4-maverick-17b-128e-instruct	131.1K	8.2K	$0.200	$0.600	visiontoolsjson
meta-llama/llama-4-scout-17b-16e-instruct	131.1K	8.2K	$0.110	$0.340	visiontoolsjson
meta-llama/llama-guard-4-12b	8.2K	8.2K	$0.200	$0.200	—
moonshotai/kimi-k2-instruct-0905	262.1K	16.4K	$1.00	$3.00	toolsjson
openai/gpt-oss-120b	131.1K	32.8K	$0.150	$0.600	toolsjson
openai/gpt-oss-20b	131.1K	32.8K	$0.075	$0.300	toolsjson
openai/gpt-oss-safeguard-20b	131.1K	65.5K	$0.075	$0.300	toolsjson
playai-tts	10K	10K	—	—	—
qwen/qwen3-32b	131K	131K	$0.290	$0.590	tools
whisper-large-v3	—	—	—	—	—
whisper-large-v3-turbo	—	—	—	—	—