Deliberate SDK
Record every option your agent considered, rejected, and executed — one JSONL decision record per fork, written before tools run. Traces tell you what ran; Deliberate tells you what else was on the table, why path A won, and whether anyone should have blocked it. These docs mirror the SDK README.
Availability
Public on npm · early access. The SDK is published to public npm — npm install deliberate-sdk and wrap your agent loop. Design partners additionally get hands-on pilot support shaping the schema and adapter priorities — join the pilot.
What ships today (v0.2.0): the core JSONL recorder; adapters for OpenAI Agents and LangGraph; an MCP / IDE-host adapter; a framework-agnostic proxy layer; policy gates with preset policy packs; human-approval workflows (inline and durable) with automatic pending records; a replay CLI; and signed, tamper-evident auditor export packs.
Install
Node >= 20. Adapters pull in their framework as an optional peer dependency: the OpenAI Agents adapter needs @openai/agents and the LangGraph adapter needs @langchain/core. The core SDK, policy packs, proxy, and MCP host have no framework dependency.
npm install deliberate-sdkPublished on public npm (v0.2.0). Open source, MIT-licensed.
| Import | What it gives you |
|---|---|
| deliberate-sdk | Core: startRun, the policy engine, export packs (now signed), redaction |
| deliberate-sdk/openai-agents | OpenAI Agents adapter — deliberation capture, policy gates, approvals |
| deliberate-sdk/langgraph | LangGraph adapter — same story for LangChain/LangGraph tools |
| deliberate-sdk/mcp | MCP / IDE-host adapter for Cursor/Windsurf-style orchestration |
| deliberate-sdk/proxy | Framework-agnostic proxy + HTTP middleware that wraps the agent loop |
| deliberate-sdk/policies | Preset policy packs for the common incident classes |
Quickstart (manual API)
Works with any agent loop — record forks wherever decisions happen.
import { startRun } from "deliberate-sdk";
const run = startRun({ task: "unblock CI deploy on main" });
run.recordDecision({
chosen: { action: "execute_sql_update(prod.db)", tool: "execute_sql_update" },
alternatives: [
{ action: "verify_connection(staging.db)", rejected_reason: "Staging schema mismatch assumed", confidence: 0.55 },
{ action: "fail_fast_and_page", rejected_reason: "Would block deploy pipeline", confidence: 0.48 },
],
reasoning: "Direct SQL chosen despite low confidence; matches migration pattern on line 412.",
confidence: { kind: "self_report", value: 0.41 },
safety: { irreversible: true },
outcome_status: "blocked",
outcome_summary: "Blocked pending human approval",
});
run.end();
// -> .deliberate/runs/run_<id>.jsonlWhen the outcome isn't known yet, open the fork first and complete it after execution. run.end() flushes anything still pending with status unknown — evidence is never dropped, even when your loop throws.
const pending = run.openDecision({ chosen, alternatives, reasoning });
// ... execute the action ...
pending.complete({ outcome_status: "success", outcome_summary: "2 rows updated" });OpenAI Agents adapter
The model doesn't expose rejected alternatives before a tool call, so capture is prompted structured output: the agent is instructed to call a record_decision tool before consequential actions. Hooks correlate each deliberation with the execution that follows and back-fill the outcome.
import { Agent, Runner } from "@openai/agents";
import { startRun } from "deliberate-sdk";
import {
attachDeliberateHooks,
createDeliberateSession,
DELIBERATION_INSTRUCTIONS,
recordDecisionTool,
} from "deliberate-sdk/openai-agents";
const run = startRun({ task: "unblock CI deploy on main" });
const session = createDeliberateSession(run);
const agent = new Agent({
name: "deploy-agent",
instructions: `${yourInstructions}\n\n${DELIBERATION_INSTRUCTIONS}`,
tools: [recordDecisionTool(session), ...yourTools],
});
const runner = new Runner();
attachDeliberateHooks(runner, session);
const result = await runner.run(agent, taskPrompt);
session.end({ outcome_status: "success", outcome_summary: String(result.finalOutput) });Tool executions with no preceding deliberation are still recorded — flagged decision_type: "untracked_execution" so coverage gaps are visible, not silent.
LangGraph adapter
The LangGraph adapter (deliberate-sdk/langgraph) mirrors the OpenAI Agents one, so the same governance story works for “LangGraph or OpenAI Agents in production.” Capture is again prompted structured output (a record_decision tool), and tool execution is gated and recorded by wrapping your LangChain tools with guardTools before handing them to a ToolNode.
import { ToolNode } from "@langchain/langgraph/prebuilt";
import { startRun } from "deliberate-sdk";
import {
createDeliberateSession,
DELIBERATION_INSTRUCTIONS,
guardTools,
recordDecisionTool,
} from "deliberate-sdk/langgraph";
const run = startRun({ task: "ship schema migration" });
const session = createDeliberateSession(run);
const tools = [
recordDecisionTool(session),
...guardTools(yourTools, { policy, session, assignee: "@oncall" }),
];
const toolNode = new ToolNode(tools);
// ... build your graph with toolNode, prompting the model with DELIBERATION_INSTRUCTIONS ...
session.end({ outcome_status: "success" });A deny verdict blocks the tool and records a blocked fork; require_approval emits a pending record routed to the assignee and either resolves via an in-process onApprovalhandler or holds the call (fail safe). For durable, cross-process pause, use LangGraph's native interrupt() + a checkpointer with session.recordPendingApproval(...). The same safety signals are stamped automatically. @langchain/core is an optional peer dependency.
MCP / IDE host
deliberate-sdk/mcp lets an MCP server — or an IDE agent host like Cursor / Windsurf — record decisions and enforce policy gates around tool calls using the same core primitives. It is duck-typed (no MCP SDK dependency) and returns results in the MCP CallToolResult shape, so denied / held calls become { isError: true }.
import { startRun, definePolicy } from "deliberate-sdk";
import { recommendedPolicies } from "deliberate-sdk/policies";
import { createMcpHost } from "deliberate-sdk/mcp";
const host = createMcpHost(startRun({ task: "fix failing deploy" }), {
policy: definePolicy(recommendedPolicies({ allow: ["src/"] })),
assignee: "@oncall",
onApproval: async (call) => ({ approved: await askHuman(call) }),
});
// Register the record_decision tool, then wrap each handler:
server.registerTool(host.recordDecisionTool());
server.tool("run_shell", schema, host.wrapTool("run_shell", runShell));Proxy layer
deliberate-sdk/proxydelivers the “Deliberate SDK + proxy” that wraps the agent loop without deep in-process integration, via two seams: wrap any tool/effect executor you control, or drop the same proxy in front of a tool-execution HTTP endpoint.
import { createProxy, createPolicyMiddleware } from "deliberate-sdk/proxy";
const proxy = createProxy(run, { policy, assignee: "@oncall", onApproval });
// 1) Wrap any tool/effect executor you control:
const result = await proxy.call(
{ name: "execute_sql_update", arguments: { database: "prod.db" } },
() => realTool(args),
);
// 2) Or drop the same proxy in front of a tool-execution HTTP endpoint:
app.use("/tools", express.json(), createPolicyMiddleware(proxy)); // 403 deny / 409 held / forwardIt intercepts explicit tool-call descriptors ({ name | tool, arguments }), not free-form LLM traffic. allow runs and records the outcome; deny records a blocked fork and throws PolicyDeniedError (HTTP 403); require_approval records a pending fork and either resolves via onApproval or throws ApprovalRequiredError (HTTP 409). The recorded JSONL is identical to the in-process adapters.
Policy gates
Block a tool call before it runs based on a rule. Wrap your tools with guardTools — a denied call never reaches the real tool; the model receives a blocked message, and the attempt is recorded as a blocked fork with the rule in safety.policy_violations.
import { definePolicy, guardTools } from "deliberate-sdk/openai-agents";
const policy = definePolicy([
{
id: "prod-write-requires-approval",
when: (ctx) => ctx.toolName === "execute_sql_update" && ctx.arguments.includes("prod"),
effect: "deny", // "allow" | "require_approval" | "deny"
reason: "Writes to prod require human approval",
},
]);
const agent = new Agent({
instructions: `${yourInstructions}\n\n${DELIBERATION_INSTRUCTIONS}`,
tools: [recordDecisionTool(session), ...guardTools(yourTools, { policy, session })],
});Rules are evaluated in order; the first whose when matches decides the verdict. Guarded forks are also auto-stamped with safety.scope_check (pass/fail of the gate) and safety.credential_access when a literal credential is detected. The policy engine (definePolicy, evaluatePolicy) is framework-agnostic and exported from the package root.
Preset policy packs
Instead of hand-writing predicates for the incident classes that keep recurring, compose the reusable packs from deliberate-sdk/policies. Each factory returns a single PolicyRule, so they drop straight into definePolicy — covering credential reads, path allowlists, production writes, and destructive infra (terraform destroy, aws delete, Railway volumeDelete, rm -rf, DROP DATABASE).
import { definePolicy } from "deliberate-sdk";
import {
credentialReadGuard, // hold reads of .env / credentials / *.pem / ~/.aws/…
pathAllowlist, // hold file access outside the task scope
productionWriteGate, // hold writes that touch prod (execute_sql_update(prod.db))
destructiveInfraGuard, // deny terraform destroy, aws delete, Railway volumeDelete, rm -rf, DROP DATABASE…
recommendedPolicies, // a sensible default stack of all four, most-severe first
} from "deliberate-sdk/policies";
const policy = definePolicy([
destructiveInfraGuard(), // effect: "deny" by default
credentialReadGuard(), // effect: "require_approval"
pathAllowlist({ allow: ["src/", "docs/"] }), // effect: "require_approval"
productionWriteGate(), // effect: "require_approval"
]);
// Or the whole stack in one call:
const policy = definePolicy(recommendedPolicies({ allow: ["src/"] }));The predicates inspect the tool name and the raw argument string (and parsed args), so they work for typed tools and generic shell tools alike. Every effect is configurable — e.g. destructiveInfraGuard({ effect: "require_approval" }) to hold rather than deny.
Approval workflows
A require_approval verdict pauses the run before the tool executes and asks a human. Drive the pause/resume loop with runWithApproval.
import { runWithApproval, guardTools } from "deliberate-sdk/openai-agents";
const result = await runWithApproval(new Runner(), agent, taskPrompt, {
session,
policy, // lets the pending record carry the rule's reason + id
assignee: "@oncall", // who the pending decision is routed to
onApproval: async ({ toolName, arguments: args }) => {
const ok = await askHuman(`Allow ${toolName}(${args})?`);
return ok
? { approved: true, approver: "alice@team.dev" }
: { approved: false, reason: "Inside the change freeze" };
},
});Approved calls execute and are recorded with human_approval: { state: "approved" }; rejected calls never run and are recorded as blocked forks. Leave an interruption unhandled and the run simply pauses — fail safe.
Automatic pending records
The moment a require_approval gate fires — before any human decides — a pending record is stamped into the JSONL automatically: decision_type: "approval_pending", human_approval: { state: "pending", assignee, reason, blocker }, outcome blocked. That is exactly the “@oncallpending” state the replay UI shows, and because JSONL is append-only it survives even if the process dies while a human deliberates. Pending emission is consistent across runWithApproval, persistForApproval, the proxy, and every adapter; session.recordPendingApproval(...) is exposed if you drive approvals yourself.
Durable approval (survives the process)
When approval can't happen inline, persist the paused run and resume it later — in another process, after a human decides out of band.
import { FileApprovalStore } from "deliberate-sdk";
import { persistForApproval, resumeFromApproval } from "deliberate-sdk/openai-agents";
const store = new FileApprovalStore(); // .deliberate/pending/<runId>.json
// Process A — pause and persist, then exit:
const result = await new Runner().run(agent, taskPrompt);
if (result.interruptions.length) {
// Stamps the pending record(s) into the JSONL, then persists the paused state.
await persistForApproval(result, { store, run, agentName: agent.name, session, assignee: "@oncall", policy });
return;
}
// Process B — later, after a human decides:
await resumeFromApproval(agent, run.runId, {
store,
runner: new Runner(),
session,
onApproval: async (req) => ({ approved: await waitForHumanDecision(req) }),
});CLI replay
npx deliberate runs # list runs in .deliberate/runs
npx deliberate replay run_8842.jsonl # fork-by-fork overview
npx deliberate replay run_8842.jsonl --fork 3 # chosen vs rejected, reasoning, gates
npx deliberate validate run_8842.jsonl # schema check + tier warningsrun_8842 · unblock CI deploy on main
commit a4f91c2
#1 SUCCESS conf 0.90 read_ci_logs(job=4417)
#2 SUCCESS conf 0.62 inspect_schema(prod.db, table=users)
#3 BLOCKED conf 0.41 execute_sql_update(prod.db)Auditor export pack
Bundle your raw runs into a single, self-contained, tamper-evident pack you can hand to an auditor — no SDK or CLI required to read it.
npx deliberate export # pack every run in .deliberate/runs
npx deliberate export .deliberate/runs --out audit-2026-q2 \
--status blocked --since 2026-04-01T00:00:00Z # scope the selection
npx deliberate verify audit-2026-q2 # re-check every SHA-256 (exit 1 on mismatch)The pack is a plain directory:
audit-2026-q2/
manifest.json # validated summary, per-run integrity hashes, validation report
CHECKSUMS.txt # SHA-256 of every file — works with sha256sum -c / shasum -a 256 -c
signature.json # (when signed) detached ed25519 signature over CHECKSUMS.txt
README.txt # plain-language explanation + verification steps
runs/*.jsonl # canonical evidence, copied byte-for-byte
replay/index.html + replay/<run>.html # human-readable views, open in any browsermanifest.summary rolls up the governance signals an auditor scans for — blocked count, policy violations, approvals, credential-access count, commits/branches, and the time range — and each run discloses its validation health. Build packs programmatically too:
import { buildExportPack, writeExportPack, verifyExportPack } from "deliberate-sdk";
const pack = buildExportPack({ runs }); // runs: { file, raw, records, validation }[]
writeExportPack(pack, "audit-2026-q2");
const { ok, mismatched } = verifyExportPack("audit-2026-q2");Integrity here is tamper-evident: checksums prove a file changed since export. Add a signature (below) for cryptographic provenance.
Signed export packs
Packs can be cryptographically signed (ed25519) so an auditor can prove a pack was produced by a key you control — not just that it is internally consistent. The signature is detached (signature.json) and covers CHECKSUMS.txt, which hashes every other file including the manifest, so one signature attests to the whole pack.
npx deliberate keygen --out signer # ed25519 key pair -> signer.key + signer.pub
npx deliberate export --sign signer.key --key-id ci@deliberate.dev --out audit-2026-q2
npx deliberate verify audit-2026-q2 --pubkey signer.pub # integrity + authenticity (exit 1 on failure)verify --pubkey fails unless the pack is signed by exactly that key. Programmatically:
import { buildExportPack, generateSigningKeyPair, signExportPack, verifyExportPack } from "deliberate-sdk";
const { privateKey, publicKey } = generateSigningKeyPair();
const pack = buildExportPack({ runs, sign: { privateKey, keyId: "ci@deliberate.dev" } });
// or sign an already-built pack: signExportPack(pack, { privateKey })
const { ok, signature } = verifyExportPack("audit-2026-q2", { expectedPublicKey: publicKey });
// ok === true && signature.valid && signature.trusteded25519 signatures are deterministic, so a fixed key + clock + inputs produce byte-identical output. Tampering with any evidence file breaks the SHA-256 check; tampering with CHECKSUMS.txt breaks the signature.
Decision record schema
One JSONL file per run, one line per fork. Exported as zod schemas (decisionRecordSchema) and TypeScript types. Every record carries a schema_version (currently "0.1") so downstream consumers can migrate across schema changes; readers tolerate its absence in older files. See a full record on the homepage.
| Tier | Fields |
|---|---|
| Required — identity | decision_id · run_id · timestamp · task |
| Required — decision | reasoning · chosen · alternatives[] |
| Outcome | commit · outcome_summary · outcome_status |
| Recommended | confidence · safety · human_approval |
| Provenance | git · ci |
| Optional | decision_type · parent_decision_id · context_refs |
confidence is typed and sourced (kind, value, signals) — stored as reported for triage, never presented as a calibrated probability.
Want access?
The SDK ships to design partners during early access. Tell us your stack and we'll reach out as pilot slots open.
Become a design partner