Integration docs

Deliberate SDK

Capture structured forks — chosen path, ruled-out alternatives, and reasoning — one JSONL decision record per fork. Strict mode enforces capture before guarded actions run. Traces tell you what ran; Deliberate tells you what else was on the table, why path A won, and whether anyone should have blocked it. These docs mirror the SDK README.

Availability

Public on npm · early access. The SDK is published to public npm — npm install deliberate-sdk and wrap your agent loop. Design partners additionally get hands-on pilot support shaping the schema and adapter priorities — join the pilot.

What ships today (v0.4.0): everything in v0.3 — the core JSONL recorder; adapters for OpenAI Agents and LangGraph; an MCP / IDE-host adapter; a framework-agnostic proxy layer; optional strict decision capture (capture-before-action, enforced); policy gates with preset policy packs; human-approval workflows (inline and durable) with automatic pending records; a replay CLI (doctor, serve, HTML replay); and signed, tamper-evident auditor export packs — plus v0.4: createDeliberateAgent (one-call integration), passive capture (chosen.arguments, stream observed forks), trace helpers, and a deliberate-sdk/testing subpath for unit tests.

Install

Node >= 20. Adapters pull in their framework as an optional peer dependency: the OpenAI Agents adapter needs @openai/agents and the LangGraph adapter needs @langchain/core. The core SDK, policy packs, proxy, and MCP host have no framework dependency.

bash

npm install deliberate-sdk

Published on public npm (v0.4.0). Open source, MIT-licensed.

Import	What it gives you
deliberate-sdk	Core: startRun, policy engine, signed export packs, trace helpers, recommendedProductionConfig
deliberate-sdk/openai-agents	OpenAI Agents — createDeliberateAgent, deliberation capture, stream capture, policy gates, approvals
deliberate-sdk/langgraph	LangGraph adapter — same story for LangChain/LangGraph tools
deliberate-sdk/mcp	MCP / IDE-host adapter for Cursor/Windsurf-style orchestration
deliberate-sdk/proxy	Framework-agnostic proxy + HTTP middleware that wraps the agent loop
deliberate-sdk/policies	Preset policy packs for the common incident classes
deliberate-sdk/testing	Dev helpers: createTestRun, simulateToolCall, readRecords

Quickstart (manual API)

Works with any agent loop — record forks wherever decisions happen.

typescript

import { startRun } from "deliberate-sdk";

const run = startRun({ task: "unblock CI deploy on main" });

run.recordDecision({
  chosen: { action: "execute_sql_update(prod.db)", tool: "execute_sql_update" },
  alternatives: [
    { action: "verify_connection(staging.db)", rejected_reason: "Staging schema mismatch assumed", confidence: 0.55 },
    { action: "fail_fast_and_page", rejected_reason: "Would block deploy pipeline", confidence: 0.48 },
  ],
  reasoning: "Direct SQL chosen despite low confidence; matches migration pattern on line 412.",
  confidence: { kind: "self_report", value: 0.41 },
  safety: { irreversible: true },
  outcome_status: "blocked",
  outcome_summary: "Blocked pending human approval",
});

run.end();
// -> .deliberate/runs/run_<id>.jsonl

When the outcome isn't known yet, open the fork first and complete it after execution. run.end() flushes anything still pending with status unknown — evidence is never dropped, even when your loop throws.

typescript

const pending = run.openDecision({ chosen, alternatives, reasoning });
// ... execute the action ...
pending.complete({ outcome_status: "success", outcome_summary: "2 rows updated" });

OpenAI Agents adapter

The model doesn't expose rejected alternatives before a tool call, so capture is prompted structured output: the agent is instructed to call a record_decision tool before consequential actions. Hooks correlate each deliberation with the execution that follows and back-fill the outcome.

typescript

import { Agent, Runner } from "@openai/agents";
import { startRun } from "deliberate-sdk";
import {
  attachDeliberateHooks,
  createDeliberateSession,
  DELIBERATION_INSTRUCTIONS,
  recordDecisionTool,
} from "deliberate-sdk/openai-agents";

const run = startRun({ task: "unblock CI deploy on main" });
const session = createDeliberateSession(run);

const agent = new Agent({
  name: "deploy-agent",
  instructions: `${yourInstructions}\n\n${DELIBERATION_INSTRUCTIONS}`,
  tools: [recordDecisionTool(session), ...yourTools],
});

const runner = new Runner();
attachDeliberateHooks(runner, session);

const result = await runner.run(agent, taskPrompt);
session.end({ outcome_status: "success", outcome_summary: String(result.finalOutput) });

In the default best-effort mode, tool executions with no preceding deliberation are still recorded — flagged decision_type: "untracked_execution" so coverage gaps are visible, not silent. Switch to strict capture to hard-block them instead.

LangGraph adapter

The LangGraph adapter (deliberate-sdk/langgraph) mirrors the OpenAI Agents one, so the same governance story works for “LangGraph or OpenAI Agents in production.” Capture is again prompted structured output (a record_decision tool), and tool execution is gated and recorded by wrapping your LangChain tools with guardTools before handing them to a ToolNode.

typescript

import { ToolNode } from "@langchain/langgraph/prebuilt";
import { startRun } from "deliberate-sdk";
import {
  createDeliberateSession,
  DELIBERATION_INSTRUCTIONS,
  guardTools,
  recordDecisionTool,
} from "deliberate-sdk/langgraph";

const run = startRun({ task: "ship schema migration" });
const session = createDeliberateSession(run);

const tools = [
  recordDecisionTool(session),
  ...guardTools(yourTools, { policy, session, assignee: "@oncall" }),
];
const toolNode = new ToolNode(tools);
// ... build your graph with toolNode, prompting the model with DELIBERATION_INSTRUCTIONS ...
session.end({ outcome_status: "success" });

A deny verdict blocks the tool and records a blocked fork; require_approval emits a pending record routed to the assignee and either resolves via an in-process onApprovalhandler or holds the call (fail safe). For durable, cross-process pause, use LangGraph's native interrupt() + a checkpointer with session.recordPendingApproval(...). The same safety signals are stamped automatically. @langchain/core is an optional peer dependency.

MCP / IDE host

deliberate-sdk/mcp lets an MCP server — or an IDE agent host like Cursor / Windsurf — record decisions and enforce policy gates around tool calls using the same core primitives. It is duck-typed (no MCP SDK dependency) and returns results in the MCP CallToolResult shape, so denied / held calls become { isError: true }.

typescript

import { startRun, definePolicy } from "deliberate-sdk";
import { recommendedPolicies } from "deliberate-sdk/policies";
import { createMcpHost } from "deliberate-sdk/mcp";

const host = createMcpHost(startRun({ task: "fix failing deploy" }), {
  policy: definePolicy(recommendedPolicies({ allow: ["src/"] })),
  assignee: "@oncall",
  onApproval: async (call) => ({ approved: await askHuman(call) }),
});

// Register the record_decision tool, then wrap each handler:
server.registerTool(host.recordDecisionTool());
server.tool("run_shell", schema, host.wrapTool("run_shell", runShell));

Proxy layer

deliberate-sdk/proxydelivers the “Deliberate SDK + proxy” that wraps the agent loop without deep in-process integration, via two seams: wrap any tool/effect executor you control, or drop the same proxy in front of a tool-execution HTTP endpoint.

typescript

import { createProxy, createPolicyMiddleware } from "deliberate-sdk/proxy";

const proxy = createProxy(run, { policy, assignee: "@oncall", onApproval });

// 1) Wrap any tool/effect executor you control:
const result = await proxy.call(
  { name: "execute_sql_update", arguments: { database: "prod.db" } },
  () => realTool(args),
);

// 2) Or drop the same proxy in front of a tool-execution HTTP endpoint:
app.use("/tools", express.json(), createPolicyMiddleware(proxy)); // 403 deny / 409 held / forward

It intercepts explicit tool-call descriptors ({ name | tool, arguments }), not free-form LLM traffic. allow runs and records the outcome; deny records a blocked fork and throws PolicyDeniedError (HTTP 403); require_approval records a pending fork and either resolves via onApproval or throws ApprovalRequiredError (HTTP 409). The recorded JSONL is identical to the in-process adapters.

Strict decision capture

By default capture is best-effort: a tool that runs without a preceding record_decision is still recorded (as untracked_execution) so the gap is visible — but it runs. Set captureMode: "strict" to make the lead promise — capture the options the agent considered before it acts — mechanically enforceable: a guarded, consequential action must have a matching deliberation recorded before it, or it is refused. The option is accepted by every adapter — the OpenAI Agents session, the LangGraph session, the MCP host, and the proxy.

typescript

import { startRun, createProxy, DeliberationRequiredError } from "deliberate-sdk";

const run = startRun({ task: "apply schema migration" });
const proxy = createProxy(run, { captureMode: "strict" });
// adapters take the same option:
//   createDeliberateSession(run, { captureMode: "strict" })
//   createMcpHost(run, { captureMode: "strict" })

// 1) No matching deliberation yet -> the real tool never runs.
try {
  await proxy.call(
    { name: "execute_sql_update", arguments: { database: "staging.db" } },
    () => realTool(),
  );
} catch (error) {
  if (error instanceof DeliberationRequiredError) {
    // a blocked "deliberation_required" fork was recorded; ask the agent to deliberate, then retry
  } else throw error;
}

// 2) Record the decision for THIS action (chosen.tool must match), then retry -> it proceeds.
proxy.recordDeliberation({
  task: "apply schema migration",
  reasoning: "Direct SQL matches the migration pattern on line 412; staging verified first.",
  chosen: { action: "execute_sql_update(staging.db)", tool: "execute_sql_update" },
  alternatives: [
    { action: "run the migration tool", rejected_reason: "not wired up for this schema", confidence: 0.3 },
  ],
});
const result = await proxy.call(
  { name: "execute_sql_update", arguments: { database: "staging.db" } },
  () => realTool(),
); // stamped deliberated_before_execution: true

When a guarded call has no matching deliberation, the real tool never runs; an auditable refusal is recorded (decision_type: "deliberation_required", outcome_status: "blocked", safety.scope_check: "fail") instead of a silent untracked_execution; and the seam returns a deny-style message (OpenAI / LangGraph), throws DeliberationRequiredError (proxy), or returns { isError: true } (MCP) — so the agent gets a retry round and records the decision before trying again.

Binding is name-matched and buffer-ordered: a deliberation only satisfies an execution of the same chosen.tool, so one fork's reasoning can never be bound to a different action. Each bound execution is stamped with a mechanical deliberated_before_execution: true — computed from buffer ordering, never a model-provided timestamp; refusals are stamped false.

Honest framing. Strict mode enforces that a decision was declared before the action — it does not verify that the declared alternatives are truthful or complete, or that confidence is calibrated. The contents of alternatives / reasoningremain the model's self-report (confidence.kind: "self_report"); what is now mechanical is that capture happened first. A deliberation that honestly considered only one option still proceeds — empty alternatives is surfaced as a capture-health warning, not a block.

Capture-health validator

checkRecord — and therefore the validation block in every export manifest — deterministically flags weak capture without running anything: empty alternatives, a chosen.action whose call target disagrees with chosen.tool, untracked_execution / deliberation_required records, and deliberated_before_execution === false. Because the manifest is hashed and signable, capture health becomes a first-class, tamper-evident number — not a vibe.

Policy gates

Block a tool call before it runs based on a rule. Wrap your tools with guardTools — a denied call never reaches the real tool; the model receives a blocked message, and the attempt is recorded as a blocked fork with the rule in safety.policy_violations.

typescript

import { definePolicy, guardTools } from "deliberate-sdk/openai-agents";

const policy = definePolicy([
  {
    id: "prod-write-requires-approval",
    when: (ctx) => ctx.toolName === "execute_sql_update" && ctx.arguments.includes("prod"),
    effect: "deny", // "allow" | "require_approval" | "deny"
    reason: "Writes to prod require human approval",
  },
]);

const agent = new Agent({
  instructions: `${yourInstructions}\n\n${DELIBERATION_INSTRUCTIONS}`,
  tools: [recordDecisionTool(session), ...guardTools(yourTools, { policy, session })],
});

Rules are evaluated in order; the first whose when matches decides the verdict. Guarded forks are also auto-stamped with safety.scope_check (pass/fail of the gate) and safety.credential_access when a literal credential is detected. The policy engine (definePolicy, evaluatePolicy) is framework-agnostic and exported from the package root.

Preset policy packs

Instead of hand-writing predicates for the incident classes that keep recurring, compose the reusable packs from deliberate-sdk/policies. Each factory returns a single PolicyRule, so they drop straight into definePolicy — covering credential reads, path allowlists, production writes, and destructive infra (terraform destroy, aws delete, Railway volumeDelete, rm -rf, DROP DATABASE).

typescript

import { definePolicy } from "deliberate-sdk";
import {
  credentialReadGuard,    // hold reads of .env / credentials / *.pem / ~/.aws/…
  pathAllowlist,          // hold file access outside the task scope
  productionWriteGate,    // hold writes that touch prod (execute_sql_update(prod.db))
  destructiveInfraGuard,  // deny terraform destroy, aws delete, Railway volumeDelete, rm -rf, DROP DATABASE…
  recommendedPolicies,    // a sensible default stack of all four, most-severe first
} from "deliberate-sdk/policies";

const policy = definePolicy([
  destructiveInfraGuard(),                      // effect: "deny" by default
  credentialReadGuard(),                        // effect: "require_approval"
  pathAllowlist({ allow: ["src/", "docs/"] }),  // effect: "require_approval"
  productionWriteGate(),                        // effect: "require_approval"
]);

// Or the whole stack in one call:
const policy = definePolicy(recommendedPolicies({ allow: ["src/"] }));

The predicates inspect the tool name and the raw argument string (and parsed args), so they work for typed tools and generic shell tools alike. Every effect is configurable — e.g. destructiveInfraGuard({ effect: "require_approval" }) to hold rather than deny.

Approval workflows

A require_approval verdict pauses the run before the tool executes and asks a human. Drive the pause/resume loop with runWithApproval.

typescript

import { runWithApproval, guardTools } from "deliberate-sdk/openai-agents";

const result = await runWithApproval(new Runner(), agent, taskPrompt, {
  session,
  policy,                 // lets the pending record carry the rule's reason + id
  assignee: "@oncall",    // who the pending decision is routed to
  onApproval: async ({ toolName, arguments: args }) => {
    const ok = await askHuman(`Allow ${toolName}(${args})?`);
    return ok
      ? { approved: true, approver: "alice@team.dev" }
      : { approved: false, reason: "Inside the change freeze" };
  },
});

Approved calls execute and are recorded with human_approval: { state: "approved" }; rejected calls never run and are recorded as blocked forks. Leave an interruption unhandled and the run simply pauses — fail safe.

Automatic pending records

The moment a require_approval gate fires — before any human decides — a pending record is stamped into the JSONL automatically: decision_type: "approval_pending", human_approval: { state: "pending", assignee, reason, blocker }, outcome blocked. That is exactly the “@oncallpending” state the replay UI shows, and because JSONL is append-only it survives even if the process dies while a human deliberates. Pending emission is consistent across runWithApproval, persistForApproval, the proxy, and every adapter; session.recordPendingApproval(...) is exposed if you drive approvals yourself.

Durable approval (survives the process)

When approval can't happen inline, persist the paused run and resume it later — in another process, after a human decides out of band.

typescript

import { FileApprovalStore } from "deliberate-sdk";
import { persistForApproval, resumeFromApproval } from "deliberate-sdk/openai-agents";

const store = new FileApprovalStore(); // .deliberate/pending/<runId>.json

// Process A — pause and persist, then exit:
const result = await new Runner().run(agent, taskPrompt);
if (result.interruptions.length) {
  // Stamps the pending record(s) into the JSONL, then persists the paused state.
  await persistForApproval(result, { store, run, agentName: agent.name, session, assignee: "@oncall", policy });
  return;
}

// Process B — later, after a human decides:
await resumeFromApproval(agent, run.runId, {
  store,
  runner: new Runner(),
  session,
  onApproval: async (req) => ({ approved: await waitForHumanDecision(req) }),
});

CLI replay

Runs are written as JSONL under .deliberate/runs/. Use the CLI to list, validate, and replay them in the terminal or as HTML.

bash

npx deliberate runs                            # list runs in .deliberate/runs
npx deliberate replay run_8842.jsonl           # fork-by-fork overview (terminal)
npx deliberate replay run_8842.jsonl --fork 3  # chosen vs rejected, reasoning, gates
npx deliberate replay run_8842.jsonl --html    # write run_8842.html — open in any browser
npx deliberate validate run_8842.jsonl         # schema check + tier warnings
npx deliberate doctor                          # capture-health checklist for recent runs
npx deliberate serve                           # local replay server (default :3847)

text

run_8842 · unblock CI deploy on main
commit a4f91c2

  #1 SUCCESS   conf 0.90  read_ci_logs(job=4417)
  #2 SUCCESS   conf 0.62  inspect_schema(prod.db, table=users)
  #3 BLOCKED   conf 0.41  execute_sql_update(prod.db)

Visual replay

JSONL is the source of truth; HTML is for humans reviewing a run. Each page shows chosen action and structured chosen.arguments, alternatives, reasoning, confidence, policy gates, and approval state — the same data as the terminal replay.

bash

# One run — static HTML beside the JSONL
npx deliberate replay .deliberate/runs/run_8842.jsonl --html
open run_8842.html

# Many runs — browse in the browser
npx deliberate serve
# -> http://127.0.0.1:3847/  (index, /replay/<file>, /raw/<file>)

# Share — export pack includes replay/index.html
npx deliberate export .deliberate/runs --out audit-2026-q2
open audit-2026-q2/replay/index.html

Auditor export pack

Bundle your raw runs into a single, self-contained, tamper-evident pack you can hand to an auditor — no SDK or CLI required to read it.

bash

npx deliberate export                          # pack every run in .deliberate/runs
npx deliberate export .deliberate/runs --out audit-2026-q2 \
  --status blocked --since 2026-04-01T00:00:00Z   # scope the selection
npx deliberate verify audit-2026-q2            # re-check every SHA-256 (exit 1 on mismatch)

The pack is a plain directory:

text

audit-2026-q2/
  manifest.json   # validated summary, per-run integrity hashes, validation report
  CHECKSUMS.txt   # SHA-256 of every file — works with sha256sum -c / shasum -a 256 -c
  signature.json  # (when signed) detached ed25519 signature over CHECKSUMS.txt
  README.txt      # plain-language explanation + verification steps
  runs/*.jsonl    # canonical evidence, copied byte-for-byte
  replay/index.html + replay/<run>.html   # human-readable views, open in any browser

manifest.summary rolls up the governance signals an auditor scans for — blocked count, policy violations, approvals, credential-access count, commits/branches, and the time range — and each run discloses its validation health. Build packs programmatically too:

typescript

import { buildExportPack, writeExportPack, verifyExportPack } from "deliberate-sdk";

const pack = buildExportPack({ runs }); // runs: { file, raw, records, validation }[]
writeExportPack(pack, "audit-2026-q2");
const { ok, mismatched } = verifyExportPack("audit-2026-q2");

Integrity here is tamper-evident: checksums prove a file changed since export. Add a signature (below) for cryptographic provenance.

Signed export packs

Packs can be cryptographically signed (ed25519) so an auditor can prove a pack was produced by a key you control — not just that it is internally consistent. The signature is detached (signature.json) and covers CHECKSUMS.txt, which hashes every other file including the manifest, so one signature attests to the whole pack.

bash

npx deliberate keygen --out signer            # ed25519 key pair -> signer.key + signer.pub
npx deliberate export --sign signer.key --key-id ci@deliberate.dev --out audit-2026-q2
npx deliberate verify audit-2026-q2 --pubkey signer.pub   # integrity + authenticity (exit 1 on failure)

verify --pubkey fails unless the pack is signed by exactly that key. Programmatically:

typescript

import { buildExportPack, generateSigningKeyPair, signExportPack, verifyExportPack } from "deliberate-sdk";

const { privateKey, publicKey } = generateSigningKeyPair();
const pack = buildExportPack({ runs, sign: { privateKey, keyId: "ci@deliberate.dev" } });
// or sign an already-built pack: signExportPack(pack, { privateKey })
const { ok, signature } = verifyExportPack("audit-2026-q2", { expectedPublicKey: publicKey });
// ok === true && signature.valid && signature.trusted

ed25519 signatures are deterministic, so a fixed key + clock + inputs produce byte-identical output. Tampering with any evidence file breaks the SHA-256 check; tampering with CHECKSUMS.txt breaks the signature.

Decision record schema

One JSONL file per run, one line per fork. Exported as zod schemas (decisionRecordSchema) and TypeScript types. Every record carries a schema_version (currently "0.1") so downstream consumers can migrate across schema changes; readers tolerate its absence in older files. See a full record on the homepage.

Tier	Fields
Required — identity	decision_id · run_id · timestamp · task
Required — decision	reasoning · chosen · alternatives[]
Outcome	commit · outcome_summary · outcome_status
Recommended	confidence · safety · human_approval
Provenance	git · ci
Optional	decision_type · parent_decision_id · context_refs

confidence is typed and sourced (kind, value, signals) — stored as reported for triage, never presented as a calibrated probability.

Want access?

Install from public npm today — npm install deliberate-sdk (v0.4.0): one-call createDeliberateAgent, passive capture, HTML replay (deliberate replay … --html), and deliberate doctor. Join the design partner pilot for hands-on support as we refine adapters and schema. Tell us your stack and we'll reach out as pilot slots open.

Become a design partner