00lla.ma

Build agents.
Ship them anywhere.

Developer infrastructure for AI agents — used by solo builders, teams, and enterprises alike.

lla.ma — describe an agent

⌘+↵to build

Try it. No signup. Save it when you're ready.

live

support-bot›gemini·184ms·312 toklead-router›groq·92ms·184 tokmarket-scanner›gemini·246ms·891 tokcopy-writer›groq·78ms·412 tokcopy-optimizer›gemini·312ms·1024 toksummary-gen›gemini·198ms·612 tokintent-classifier›groq·114ms·248 tokcontent-mod›groq·64ms·96 tokcaption-writer›gemini·156ms·280 toksupport-bot›gemini·184ms·312 toklead-router›groq·92ms·184 tokmarket-scanner›gemini·246ms·891 tokcopy-writer›groq·78ms·412 tokcopy-optimizer›gemini·312ms·1024 toksummary-gen›gemini·198ms·612 tokintent-classifier›groq·114ms·248 tokcontent-mod›groq·64ms·96 tokcaption-writer›gemini·156ms·280 tok

01build

A studio for the people
who'd rather ship.

Three surfaces. One workflow. Compose a system prompt, point it at a provider chain, watch the run land — and never glue a fallback together with try/catch again.

lla.ma/models/support-bot/edit

name

support-bot

system prompt

You are a calm, helpful support agent for ACME customers. Be concise…

primary

gemini-1.5-flash▾

fallback

groq · llama-3.3▾

memory

schedule

stream

Model Builder

Compose system prompts, context pairs, and a routing chain. Deploy in one click.

lla.ma/models/support-bot/runs

run_01HX9F2KMR3

support-bot

complete

184ms

latency

284

312

out

0.012¢

cost

resolved prompt

<system> You are a calm, helpful support agent…

<context> tier=pro · orders=12 · days_active=84

<user> How do I cancel an order that already shipped?

Run Inspector

Every call. Resolved prompt, provider used, tokens, latency, cost.

lla.ma/usage

42,180

runs · 30d

18.4M

tokens · 30d

$11.92

cost · 30d

runs · 16d↑ 38%

recent

support-botgemini · 184ms
lead-routergroq · 92ms
market-scannergemini · 246ms
copy-writergroq · 78ms

Live Telemetry

Aggregated usage across projects, with budget caps and live counters.

02endpoint

One URL that
replaces your inference stack.

Forget juggling Anthropic, Google, Groq, and OpenAI SDKs in your app. Provider keys live server-side. Your client only ever sees lla.ma.

requestcurl

curl https://api.lla.ma/m/support-bot \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "How do I cancel a shipped order?"}],
    "context": { "tier": "pro", "orders": 12 }
  }'

response · 200json

{
  "id": "run_01HX9F2KMR3",
  "content": "Sure — if it has already shipped, you can…",
  "provider_used": "gemini",
  "model_used": "gemini-1.5-flash",
  "tokens_in": 284,
  "tokens_out": 312,
  "latency_ms": 184,
  "used_fallback": false
}

03providers

Four providers.
Zero SDK juggling.

Set routing once. lla.ma picks the right model per request, retries the fallback on failure, and logs which one actually answered.

quality

Anthropic

Claude · Sonnet / Haiku

vision

Google

Gemini · Flash / Pro

speed

Groq

LLaMA · 3.3 70b

reasoning

OpenAI

GPT · 5.4 / 5.5

your app

POST /m/support-bot

messages, context

lla.ma

router

auth · budget · prompt · route · log

gemini

gemini-1.5-flash

primary

groq

llama-3.3

fallback

04features

Everything you'd build
before the second sprint.

Routing

Multi-provider fallback.

Primary → fallback in one config. If Gemini hiccups, Groq takes the call. lla.ma logs which one answered and how long it took.

Scheduler

Cron-shaped agents.

Toggle schedule on a model, drop in a cron string, and lla.ma fires it on its own. node-cron under the hood, env-guarded.

Memory

Per-model key/value memory.

Read and write memory keys from any run. Persist preferences, conversation summaries, or anything else you'd shove in a JSON file.

Inspector

Every run, fully logged.

Resolved prompt, provider used, tokens in/out, cost in microcents, latency, fallback status. No more guessing what the model saw.

Budgets

Cost caps in microcents.

Set a per-model budget. The executor pre-flights and 402s before the wire if you'd blow past it. Sleep peacefully on cron jobs.

Streaming

SSE out of the box.

Pass stream:true and get a real SSE stream — deltas, done, error frames. Fallback is pre-first-byte only, so the wire never lies.

05pricing

Pay for agents.
Not infrastructure.

No per-token markup. No platform fee. Provider costs pass through at cost — you only pay lla.ma for the slot.

Free

Pre-built agents and one Custom Agent to kick the tires.

$0forever

1 Custom Agent
Pre-built agent library
100k tokens / month

Get started

Developer

recommended

For one builder shipping agents into production apps.

$20/ month

3 Custom Agents
Full builder + multi-provider routing
$12/mo per additional agent

Start building

Team

Shared projects, shared API keys, shared bills.

$99/ month

15 Custom Agents
Shared projects
$9/mo per additional agent

Set up your team

See the full breakdown on /signup.

06ship

Build the model.
Skip the wiring.

Free to start. No card. Bring a system prompt — leave with a deployed endpoint.

Start building→Sign in

Build agents.Ship them anywhere.

A studio for the people who'd rather ship.