apollo-street/api92/100
watch . +1
Last 13 runs
identitymodel / context / goal matched
mcp.configdrift from clean-local
latency.tool+312ms over 5-day median
baseline.
Installs in seconds
Probe your agent in seconds to track real-life drift over time. Keep Hermes and OpenClaw dialed, and know before issues reach the work.
Installs in seconds
$ brew install trackbaseline/tap/baseline
$ baseline run
-> run_8f2c . score 92 . 11 ok / 3 watch / 0 fail
# watch: mcp.openclaw.config drifted from baseline_clean-local
# watch: latency.tool_probe +312ms over 5-day median
Today . across your workstations
apollo-street/api92/100
watch . +1
Last 13 runs
identitymodel / context / goal matched
mcp.configdrift from clean-local
latency.tool+312ms over 5-day median
apollo-street/web97/100
ok . +2
Last 13 runs
identitystable for 14 runs
memorycontext survives across calls
safety.scrubberno leaks detected
apollo-street/infra78/100
fail . -9
Last 13 runs
memorycontext lost mid-session
variance2 of 5 prompts diverged
repo.workspacedirty: 11 unstaged files
Run Baseline daily
Learn about latency, drift, and memory issues in real time, before they impact your work.
The default set
Model, provider, context window, primary goal.
v0.1Workspace path, clean/dirty state, branch.
v0.1MCP server reachable, allowed tool surface declared.
v0.1Carried context survives between calls. Same answer twice.
v0.1P50 and P95 against the 5-day local median.
v0.1Five identical prompts. Five identical answers.
v0.1Scrubber catches paths, secrets, prompt fragments.
v0.1Repository conventions: tabs, naming, file layout.
v0.1Reports any tool / MCP / repo / config change since Good.
v0.1Two-plus-two. Date. The smoke test for a broken model.
v0.1Answer only the word. Answer only the number.
v0.1Redacted summaries verify against original on push.
v0.1Tool calls actually fire. Names match the declared surface.
v0.1A second session in the same workspace agrees with the first.
v0.1Four commands
The local loop. Seven MCP tools on top for agents that want to call Baseline themselves. No web app required to get value; the cloud surface is for history, alerts, and team-visible evidence after the workstation is already producing redacted runs.
01$ baseline setup
02$ baseline run --mode fast
03$ baseline accept run_8f2c --label clean-local
04$ baseline compare



Pricing
The CLI and MCP are free. Pro adds redacted run history, lifecycle email, and private probes. Team adds shared workstations and routed alerts.
The whole product
$0
When history matters
$39/mo / workstation
When alerts need to route
$129/mo / team
Field notes
The five-minute review ritual before a run becomes the standard your workstation compares against.
Read → 2026 . 04 . 28 MCP drift looks like nothing, until it costs a day.Three real incidents from operator pilots. None of them showed up in trace dashboards.
Read → 2026 . 04 . 09 The case against a leaderboard.Why Baseline does not score models against each other, and what it scores instead.
Read →Install the CLI. Run the check once. Accept the run that earns it. That is the first hour with Baseline.