00lla.ma · ops v0.1

Every model.
One endpoint.

lla.ma is the operations layer between your app and your providers. Build agents visually, route across Claude, Gemini, Groq and Ollama, and watch every call — all behind a single api.lla.ma/m/:slug.

4

providers

184ms

p50 latency

6μ¢

cost / 1k tok

api.lla.ma/m/zf-coach

zf-coach

zen-fitness

deployed

routing

gemini/ gemini-1.5-flashgroq/ llama-3.3-70b

1,284

runs · 24h

412ms

p95

3.4¢

cost · 24h

last run

run_01HX9·complete184ms · 312 tok
live
zf-coachgemini·184ms·312 toksprks-matchergroq·92ms·184 tokzen-market-scannergemini·246ms·891 tokzf-workout-gengroq·78ms·412 toksprks-content-optgemini·312ms·1024 tokzf-meditation-scriptgemini·198ms·612 tokzf-health-analyzergroq·114ms·248 toksprks-moderatorgroq·64ms·96 tokzf-caption-optimizergemini·156ms·280 tokzf-coachgemini·184ms·312 toksprks-matchergroq·92ms·184 tokzen-market-scannergemini·246ms·891 tokzf-workout-gengroq·78ms·412 toksprks-content-optgemini·312ms·1024 tokzf-meditation-scriptgemini·198ms·612 tokzf-health-analyzergroq·114ms·248 toksprks-moderatorgroq·64ms·96 tokzf-caption-optimizergemini·156ms·280 tok
01build

A studio for the people
who'd rather ship.

Three surfaces. One workflow. Compose a system prompt, point it at a provider chain, watch the run land — and never glue a fallback together with try/catch again.

lla.ma/models/zf-coach/edit

name

zf-coach

system prompt

You are a calm, evidence-based fitness coach for Zen Fitness members…

primary

gemini-1.5-flash

fallback

groq · llama-3.3
memory
schedule
stream
01

Model Builder

Compose system prompts, context pairs, and a routing chain. Deploy in one click.

lla.ma/models/zf-coach/runs

run_01HX9F2KMR3

zf-coach

complete

184ms

latency

284

in

312

out

0.012¢

cost

resolved prompt

<system> You are a calm, evidence-based fitness coach…
<context> steps_today=8214 · sleep=7.2h · zen_score=82
<user> Got 30 min before a meeting. What should I do?
02

Run Inspector

Every call. Resolved prompt, provider used, tokens, latency, cost.

lla.ma/usage

42,180

runs · 30d

18.4M

tokens · 30d

$11.92

cost · 30d

runs · 16d↑ 38%

recent

  • zf-coachgemini · 184ms
  • sprks-matchergroq · 92ms
  • zen-market-scannergemini · 246ms
  • zf-workout-gengroq · 78ms
03

Live Telemetry

Aggregated usage across projects, with budget caps and live counters.

02endpoint

One URL that
replaces your inference stack.

Forget juggling Anthropic, Google, Groq, and Ollama SDKs in your app. Provider keys live server-side. Your client only ever sees lla.ma.

requestcurl
curl https://api.lla.ma/m/zf-coach \
-H "Authorization: Bearer sk_llm_…" \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "30 min, what should I do?"}],
"context": { "steps_today": 8214, "sleep": 7.2 }
}'
response · 200json
{
"id": "run_01HX9F2KMR3",
"content": "Take a brisk 20-min zone-2 walk, then…",
"provider_used": "gemini",
"model_used": "gemini-1.5-flash",
"tokens_in": 284,
"tokens_out": 312,
"latency_ms": 184,
"used_fallback": false
}
03providers

Four providers.
Zero SDK juggling.

Set routing once. lla.ma picks the right model per request, retries the fallback on failure, and logs which one actually answered.

quality

Anthropic

Claude · Sonnet / Haiku

vision

Google

Gemini · Flash / Pro

speed

Groq

LLaMA · 3.3 70b

private

Ollama

Local · self-hosted

your app

POST /m/zf-coach

messages, context

lla.ma

router

auth · budget · prompt · route · log

gemini

gemini-1.5-flash

primary

groq

llama-3.3

fallback
04features

Everything you'd build
before the second sprint.

Routing

Multi-provider fallback.

Primary → fallback in one config. If Gemini hiccups, Groq takes the call. lla.ma logs which one answered and how long it took.

Scheduler

Cron-shaped agents.

Toggle schedule on a model, drop in a cron string, and lla.ma fires it on its own. node-cron under the hood, env-guarded.

Memory

Per-model key/value memory.

Read and write memory keys from any run. Persist preferences, conversation summaries, or anything else you'd shove in a JSON file.

Inspector

Every run, fully logged.

Resolved prompt, provider used, tokens in/out, cost in microcents, latency, fallback status. No more guessing what the model saw.

Budgets

Cost caps in microcents.

Set a per-model budget. The executor pre-flights and 402s before the wire if you'd blow past it. Sleep peacefully on cron jobs.

Streaming

SSE out of the box.

Pass stream:true and get a real SSE stream — deltas, done, error frames. Fallback is pre-first-byte only, so the wire never lies.

05pricing

Pay for agents.
Not infrastructure.

No per-token markup. No platform fee. Provider costs pass through at cost — you only pay lla.ma for the slot.

Free

$0forever

Pre-built agents and one Custom Agent to kick the tires.

  • 1 Custom Agent
  • Pre-built agent library
  • 100k tokens / month
Get started

Developer

recommended
$20/ month

For one builder shipping agents into production apps.

  • 3 Custom Agents
  • Full builder + multi-provider routing
  • $12/mo per additional agent
Start building

Team

$99/ month

Shared projects, shared API keys, shared bills.

  • 15 Custom Agents
  • Shared projects
  • $9/mo per additional agent
Set up your team

See the full breakdown on /signup.

06ship

Build the model.
Skip the wiring.

Free to start. No card. Bring a system prompt — leave with a deployed endpoint.