name
system prompt
primary
fallback
Model Builder
Compose system prompts, context pairs, and a routing chain. Deploy in one click.
lla.ma is the operations layer between your app and your providers. Build agents visually, route across Claude, Gemini, Groq and Ollama, and watch every call — all behind a single api.lla.ma/m/:slug.
4
providers
184ms
p50 latency
6μ¢
cost / 1k tok
zf-coach
zen-fitness
routing
gemini/ gemini-1.5-flash→groq/ llama-3.3-70b
1,284
runs · 24h
412ms
p95
3.4¢
cost · 24h
last run
Three surfaces. One workflow. Compose a system prompt, point it at a provider chain, watch the run land — and never glue a fallback together with try/catch again.
name
system prompt
primary
fallback
Compose system prompts, context pairs, and a routing chain. Deploy in one click.
run_01HX9F2KMR3
zf-coach
184ms
latency
284
in
312
out
0.012¢
cost
resolved prompt
Every call. Resolved prompt, provider used, tokens, latency, cost.
42,180
runs · 30d
18.4M
tokens · 30d
$11.92
cost · 30d
recent
Aggregated usage across projects, with budget caps and live counters.
Forget juggling Anthropic, Google, Groq, and Ollama SDKs in your app. Provider keys live server-side. Your client only ever sees lla.ma.
curl https://api.lla.ma/m/zf-coach \ -H "Authorization: Bearer sk_llm_…" \ -H "Content-Type: application/json" \ -d '{ "messages": [{"role": "user", "content": "30 min, what should I do?"}], "context": { "steps_today": 8214, "sleep": 7.2 } }'{ "id": "run_01HX9F2KMR3", "content": "Take a brisk 20-min zone-2 walk, then…", "provider_used": "gemini", "model_used": "gemini-1.5-flash", "tokens_in": 284, "tokens_out": 312, "latency_ms": 184, "used_fallback": false}Set routing once. lla.ma picks the right model per request, retries the fallback on failure, and logs which one actually answered.
quality
Anthropic
Claude · Sonnet / Haiku
vision
Gemini · Flash / Pro
speed
Groq
LLaMA · 3.3 70b
private
Ollama
Local · self-hosted
your app
POST /m/zf-coach
messages, context
lla.ma
router
auth · budget · prompt · route · log
gemini
gemini-1.5-flash
groq
llama-3.3
Primary → fallback in one config. If Gemini hiccups, Groq takes the call. lla.ma logs which one answered and how long it took.
Toggle schedule on a model, drop in a cron string, and lla.ma fires it on its own. node-cron under the hood, env-guarded.
Read and write memory keys from any run. Persist preferences, conversation summaries, or anything else you'd shove in a JSON file.
Resolved prompt, provider used, tokens in/out, cost in microcents, latency, fallback status. No more guessing what the model saw.
Set a per-model budget. The executor pre-flights and 402s before the wire if you'd blow past it. Sleep peacefully on cron jobs.
Pass stream:true and get a real SSE stream — deltas, done, error frames. Fallback is pre-first-byte only, so the wire never lies.
No per-token markup. No platform fee. Provider costs pass through at cost — you only pay lla.ma for the slot.
Free
Pre-built agents and one Custom Agent to kick the tires.
Developer
recommendedFor one builder shipping agents into production apps.
Team
Shared projects, shared API keys, shared bills.
See the full breakdown on /signup.
Free to start. No card. Bring a system prompt — leave with a deployed endpoint.