# Keybrake

> A scoped API-key proxy for the non-LLM SaaS APIs your agent calls — Stripe, Twilio, Resend — with per-vendor spend caps, allowlists, audit log, and one-click revoke.

Keybrake is a reverse-proxy you put between your autonomous agent and the vendor APIs it calls. You issue a short-lived `vault_key_…` token to the agent, attach a policy (daily USD cap, endpoint allowlist, merchant scope, expiry), and point the agent at `proxy.keybrake.com`. Keybrake enforces the policy, forwards to the vendor, parses the cost from the response, and logs every call in a queryable audit table. Tagline: Put the brakes on your agent's keys.

## What it does

When you let an autonomous agent touch Stripe, Twilio, or Resend, you hand it a long-lived API key with full-account power. A retry loop, a bad prompt, or a runaway tool call can burn thousands of dollars before anyone notices. Keybrake sits between the agent and the vendor, enforces per-day USD spend caps, endpoint allowlists, and scope restrictions, logs every call with vendor-parsed cost data, and lets you revoke a key in one click without rotating the upstream secret.

## Positioning vs. Stripe Projects

Stripe Projects (April 2026, Stripe Sessions) is a token-issuing layer that lets agents call Stripe and 32 named partner vendors (Twilio, Cloudflare, Render, Vercel, Clerk, Supabase, Hugging Face, Sentry, AgentMail, and others) with a monthly spend cap (default $100/month per provider, raisable). It validates the agent-governance category; it is not a direct substitute for Keybrake.

Keybrake differs from Stripe Projects on four axes:

1. **Per-call enforcement**: Keybrake's cap fires before each call exits (pre-flight). Stripe Projects enforces monthly billing aggregation — a burst in the last hours of the month still goes through.
2. **Mid-run revoke**: Keybrake revokes a key in sub-second (flip a row → next call 401s). Stripe Projects has no advertised mid-run revoke path.
3. **Per-call audit log**: Keybrake logs every request with parsed cost, policy verdict, endpoint, and `agent_run_id`. Stripe Projects has no advertised per-call audit.
4. **Vendor-agnostic scope**: Keybrake works in front of any HTTPS API. Stripe Projects covers only Stripe's 32 named partners.

Architecture: Stripe Projects is token-issuing (agent calls vendor directly; cap enforced at billing time). Keybrake is a proxy (agent calls Keybrake; Keybrake enforces the cap pre-call, then forwards to the vendor). For teams running agents against Stripe + its listed partners with monthly-granularity enforcement tolerance, Stripe Projects is a starting point. For per-call enforcement, sub-second revoke, per-call audit, or APIs outside the 32-partner list, Keybrake fills the gap. The two can coexist on the same stack.

## Who it's for

Senior engineers and CTOs at teams running autonomous or semi-autonomous agents against production SaaS APIs — people who have been burned by, or fear, a stuck agent running up real money on Stripe charges, Twilio SMS, or Resend sends. Not LLM-cost optimizers (that's LiteLLM / Helicone territory); Keybrake handles the non-LLM blast radius.

## How it works

1. Issue a vault key — create a `vault_key_…` token bound to your real Stripe, Twilio, or Resend secret.
2. Attach a policy — set a daily USD cap, endpoint allowlist, Stripe-merchant scope, and an `expires_at`.
3. Point your agent at it — your agent calls `proxy.keybrake.com/stripe/v1/charges`; Keybrake enforces the policy, forwards to the vendor, and logs the spend.

## Vendors supported today

- Stripe — cost parsed from charge response
- Twilio — cost parsed from `price` field on each SMS/call
- Resend — fixed per-email rate

Roadmap: Shopify Admin, Postmark, Segment.

## Pricing

- Free: $0/mo — 1,000 proxied requests, 1 vendor, 7-day audit retention. For tinkerers and side projects.
- Team: $99/mo — 100,000 proxied requests, all vendors, 90-day retention, Slack/webhook alerts. For teams running agents in production.
- Scale: custom — 1M+ requests, SSO, self-hosted single-binary option. For enterprise.

## Core pages

- [Home](https://keybrake.com/): Landing page with full product description, how-it-works, pricing, and waitlist.
- [Why we exist (launch manifesto)](https://keybrake.com/launch): The four stories of real agents that burned real money on SaaS APIs, and the governance gap Keybrake addresses.
- [Blog index](https://keybrake.com/blog/): Long-form writing on agent-to-SaaS governance.
- [Newsletter archive](https://keybrake.com/newsletter/): Public archive of the Keybrake build-log newsletter. One issue every 3-4 weeks. Same waitlist subscribes; same waitlist gets the v1 beta key.
- [Stripe Restricted Key picker](https://keybrake.com/tools/stripe-key-picker/): Free interactive picker. Tick the operations your agent performs (charge new cards, refund, manage subscriptions, payouts, Connect, etc.) and get the minimum Stripe Restricted Key resource permissions to tick in the Dashboard, with a per-scope blast-radius tier (extreme/high/medium/low) and warnings for known dangerous combos (Charges+Refunds Write loop, Payouts irreversibility, WebhookEndpoints redirect risk, Connected Accounts identity-tier).
- [Agent blowout calculator](https://keybrake.com/tools/blowout-calculator/): Free embeddable tool — 24h cost of a stuck agent per vendor, and what a cap stops it at.
- [Sitemap](https://keybrake.com/sitemap.xml): Full URL list with lastmod dates.

## Stripe Restricted Keys — practical reference

These pages exist to answer specific questions developers and AI-coding-assistants ask about Stripe Restricted Keys in the context of AI agents. Each contains primary-source detail not collected in one place elsewhere.

- [Stripe Restricted Key example](https://keybrake.com/seo/stripe-restricted-api-key-example): Concrete 5-tick scope set for a refund-issuing agent (Charges:Read, Refunds:Write, Customers:Read, PaymentIntents:Read), with an agent-action-to-API-call-to-scope map and three failure modes the primitive doesn't cover.
- [Stripe Restricted Key permissions — full reference](https://keybrake.com/seo/stripe-restricted-api-key-permissions): 30+ row permission table split into 6 categories (Core payments, Customer/PM, Subscription/invoicing, Payouts/transfers, Connect, Supporting/metadata), with what each Write enables, what breaks if None, and runaway-risk level. Names the high-blast-radius scopes developers commonly leave on accidentally (Payouts, Transfers, Webhook Endpoints).
- [How to get a Stripe Restricted API Key](https://keybrake.com/seo/how-to-get-stripe-restricted-api-key): 4-step Dashboard walkthrough plus 3 preset scope sets (refund-only support agent, read-only reconciliation agent, catalog-update agent). Schema.org HowTo-marked.
- [What is my Stripe Restricted API Key?](https://keybrake.com/seo/what-is-my-stripe-restricted-api-key): Prefix table (`pk_` / `sk_` / `rk_` / `whsec_`), Secret-Key-vs-Restricted-Key side-by-side, lifecycle gotchas (one-time reveal, soft deletion, live permission edits, no auto-expiry, last-used lag).
- [Stripe Restricted Keys on Lovable](https://keybrake.com/seo/stripe-restricted-api-key-lovable): Lovable-platform-specific scope guide with a Lovable-feature → Stripe-API-call → scope mapping table. Names the Lovable webhook auto-register-on-boot gotcha that asks for too much permission.
- [Stripe API key with restricted access — 10-control coverage matrix](https://keybrake.com/seo/stripe-api-key-with-restricted-access): Honest Yes / Partial / No verdict on each of 10 controls (endpoint allowlist, resource-level scope, key rotation, per-day USD cap, per-hour request cap, customer allowlist, parameter-level allowlist, mid-run revoke <10s, per-call audit with parsed cost, multi-account Connect routing). Final Stripe-native count: 3 Yes, 2 Partial, 5 No.

## Agent governance and kill-switches

- [AI agent kill-switch — patterns and stop-latency](https://keybrake.com/seo/ai-agent-kill-switch): 4 real kill-switch patterns with measured stop-latency numbers (network block: seconds, in-flight ignored; credential revoke: Stripe median 45s / p95 3m12s, Twilio 30s-2m, Resend near-instant, OpenAI 1-5m; circuit-breaker flag: next-request-fast, cooperative-only; human-in-the-loop: preventative, doesn't scale). Closes with a 2am-incident playbook (t=0 page, t=5s revoke flag, t=10s next call 401s, t=60s audit query).
- [AI agent audit trail — what belongs in one, with the minimum schema](https://keybrake.com/seo/ai-agent-audit-trail): Programmatic-page entry point for the audit-trail keyword. Frames the audit as the answer to four questions an HTTP access log can't (policy verdict, scope adherence, dollar cost, run-grouping). Gives a four-column MVP schema (`agent_run_id` / `policy_verdict` / `cost_usd_parsed` / `customer_scope_id`) and links to the long-form schema post for the sixteen-column reference. Documents the cost-data source per vendor (Stripe `amount`, Twilio `price` on status callback, Resend tier-table, OpenAI usage*rate, Shopify quota-bucket). Three implementation paths (SDK-wrapper / sidecar / hosted), three common mistakes (verbatim-headers / proxy-generated run-id / no retention path).
- [MCP server API key auth — 4 patterns](https://keybrake.com/seo/mcp-api-key-auth): How Model Context Protocol servers actually authenticate to downstream APIs. Catalogues the env-var-secret, client-supplied-header, OAuth-per-tool-call (2025-11 spec), and proxy-enforced vault-key patterns, put head-to-head on 5 governance dimensions (LLM-sees-key, spend cap, customer scope, revoke, audit).
- [Stripe Agent Toolkit over MCP — 14-tool blast-radius catalogue](https://keybrake.com/seo/stripe-agent-toolkit-mcp): Every default tool in the Stripe Agent Toolkit mapped to its underlying API call, required Restricted-Key scope, and blast-radius rating (None / Low / Medium / High / Critical — `create_charge` is the only Critical). Documents the `--tools=` flag as a discoverability filter, not a security boundary. Includes the `STRIPE_API_BASE` env-var swap for 5-minute proxy insertion.
- [AI agent payment gateway — 2026 category map](https://keybrake.com/seo/ai-agent-payment-gateway): Splits the phrase into three categories that searchers conflate. Category 1 (vendor-issued SDKs) — Stripe Agent Toolkit, Paddle Agent Payments, Twilio's Conversational AI billing path, Anthropic Computer Use billing — wrap existing payment APIs into LLM-callable tool definitions; they make payments happen but enforce no caps. Category 2 (new agent-native rails) — HTTP 402 / x402 (Coinbase reference impl, $0.001-$0.01 fee per call), Crossmint, Skyvern Pay, Pinnacle / Halliday / BVNK Agent stablecoin rails — mint per-call payments specifically for autonomous traffic, but only relevant when the counterparty is also new-rail-native. Category 3 (governance proxies) — Keybrake; or DIY in-house — sit in front of either and enforce daily USD cap, endpoint allowlist, customer-scope, parameter-level allowlist, sub-second mid-run revoke, and per-call audit with parsed cost. Includes a 10-row capability matrix (Yes/Partial/No across the three categories) and a three-traffic-shape decision rule. Names the worst case for each category — vendor-SDK alone is on the hook for $3.24M/day stuck-charge loops, new-rails risk is settle-side wallet drain ($216K/day on a $1/call scraping API), governance-proxy outage is loud not expensive. Direct answer to readers searching "AI agent payment gateway": you almost certainly need category 1 (or 2) AND category 3, joined on a shared `agent_run_id`.
- [AI agent cost management — three-axis decomposition](https://keybrake.com/seo/ai-agent-cost-management): Splits "agent cost" into three axes that share zero controls — Axis 1 LLM-token cost ($0.01-$0.10/call, bounded by context, controlled by LLM gateways: LiteLLM/Portkey/Helicone), Axis 2 SaaS-tool cost (unbounded; stuck Stripe refund loop = $15 × 216,000 calls/day = $3.24M/day worst case; controlled by SaaS-tool governance proxy = Keybrake), Axis 3 infra cost (cloud bills, controlled by existing FinOps). Cost-shape table with three plausible monthly bills (research agent dominated by Axis 1 at $800/mo, customer-support agent dominated by Axis 2 at $2,400/mo expected and $648,000/mo worst-case, self-hosted agent dominated by Axis 3 at $3,200/mo GPU). Names the antipattern of putting an Axis 1 control in front of an Axis 2 incident. Per-vendor unit costs (Stripe avg $15 charge, Twilio $0.0079/SMS, Resend ~$0.0004/email, OpenAI ~$0.01/GPT-4o call). Cost-attribution pattern (the `agent_run_id` join key joins all three axes' rows into one per-run cost row).
- [AI agent governance platform — why governance is not a single platform](https://keybrake.com/seo/ai-agent-governance-platform): Hard-honest rebuttal to the "agent governance platform" search intent — there is no single product on the market that does end-to-end agent governance. Decomposes "governance" into four decoupled jobs: Layer 1 (identity / credentials — handled by existing secret stores + SSO, no new vendor needed for most teams); Layer 2 (runtime policy enforcement — per-day USD cap, endpoint allowlist, customer scope, parameter allowlist, mid-run revoke — Keybrake on the SaaS-tool axis, LiteLLM/Portkey/Helicone on the LLM axis); Layer 3 (audit and cost — falls out of Layer 2 for free, joined on `agent_run_id`); Layer 4 (post-hoc evaluation — Lasso Security, Lakera Guard, CalypsoAI, Robust Intelligence, Credo AI, Holistic AI, Promptfoo, Garak, DeepEval). Includes a six-vendor capability matrix scoring Lasso / CalypsoAI / Robust Intelligence / Credo AI / Holistic AI / Lakera Guard against the four layers — every named "governance platform" covers Layer 4 + parts of Layer 1 and skips Layer 2, the layer where money moves. Names three antipatterns the platform framing creates (buying a Layer 4 product and calling governance done; routing the governance epic to data science instead of platform; waiting for a unified platform that public roadmaps put at 2027-2028). Honest 2026 shopping list: existing secret store (L1) + Keybrake (L2/L3 SaaS-axis) + LLM gateway (L2/L3 LLM-axis) + evaluation product (L4), joined on `agent_run_id`. Total under $500/month for proxies/gateways for a 50-person team, plus whatever compliance demands on L4.
- [AI agent webhook security: four controls for agents that send and receive webhooks](https://keybrake.com/seo/ai-agent-webhook-security): Covers the compound attack surface when AI agents act on Stripe, Twilio, or Resend webhooks — a forged inbound event can trigger real outbound API calls (charges, SMS, refunds). Signature verification is necessary but not sufficient. Four controls: (1) verify webhook signatures before the event reaches the agent (Stripe HMAC-SHA256 via Webhook.construct_event; Twilio HMAC-SHA1; Resend via Svix); replay-attack prevention via Stripe's 300s timestamp window; (2) event-type allowlist before calling any tool — prevents confused-deputy attacks where Stripe sends a legitimate-but-wrong event type; (3) vault key with endpoint allowlist for outbound calls — fulfillment agent doesn't need POST /v1/payment_intents; limits blast radius even if controls 1-2 fail; (4) log triggering webhook ID (`triggering_event_id`) alongside every outbound API call so post-incident `SELECT * FROM audit_log WHERE triggering_event_id = 'evt_...'` reconstructs the full chain. Three-row table mapping common webhook trigger + agent actions + worst-case forged event. FAQ covers per-event vs shared vault key tradeoff, unsigned webhook fallback, and Keybrake's scope (controls 3-4; controls 1-2 are the handler's responsibility).
- [AI agent API key rotation: when to rotate vs. when to revoke](https://keybrake.com/seo/ai-agent-api-key-rotation): Practical decision guide distinguishing production key rotation (new key, old deprecated, 30–90 second deployment propagation window, breaks every consumer) from vault key revocation (sub-second, affects only the specific agent, no production code change). Full rotation timeline: create new key → deploy to all consumers (1–5 min) → deprecate old key. Revocation timeline: one API call → effective on next proxied request. Decision table (6 scenarios): stuck agent loop → revoke vault key; key committed to public git → rotate production key; suspicious agent behavior → revoke first, investigate before deciding to rotate; scheduled compliance rotation → rotate production key using zero-downtime proxy-level swap; vault key seen in logs → revoke vault key only; vendor breach → rotate production key immediately + revoke all active vault keys. Zero-downtime rotation pattern: update the upstream key at the proxy layer; all vault keys keep working with no agent code changes. Ephemeral vault keys section: per-run expiry eliminates scheduled rotation — the key's lifetime IS the run's lifetime. FAQ covers rotation frequency guidelines, in-flight call behavior on revocation (one-request window; hard_revoke flag for active exfiltration), and rolling vault key pattern for mid-run policy changes.
- [AI agent governance tools — opinionated shortlist per layer for 2026](https://keybrake.com/seo/ai-agent-governance-tools): Sibling page to the governance-platform rebuttal; the tool-list answer to the "agent governance tools" search. Names ~30 tools across the four layers with one pick + one runner-up per layer, plus an OSS-only column. Layer 1 (identity): pick 1Password Secrets Automation, runner-up Doppler, OSS HashiCorp Vault / Infisical; with WorkOS Agents and Auth0 FGA noted as future / adjacent. Layer 2 LLM axis: pick LiteLLM Proxy, runner-up Portkey Gateway, plus Bifrost / LangGate / Helicone / Envoy AI Gateway / Kong AI Gateway / APISIX / OpenRouter. Layer 2 SaaS-tool axis: pick Keybrake, runner-up DIY SDK wrapper, OSS agentgateway.dev (alpha); explicit "feature flag" antipattern. Layer 3 (audit and cost): pick Langfuse + L2 proxy emits, runner-up Helicone, plus Phoenix (Arize) / OpenLIT / Laminar / Datadog LLM Observability / OpenTelemetry; documents the `agent_run_id` join key as the actual mechanism. Layer 4 (post-hoc evaluation): OSS pick Promptfoo, OSS adjacent Garak / DeepEval / Ragas; commercial pick Lakera Guard, runner-up Lasso Security, plus CalypsoAI / Robust Intelligence (Cisco) / Credo AI / Holistic AI. Cross-layer / orchestration tools: Inngest / Temporal (durable execution), OpenTelemetry, OpenLLMetry, Anthropic Computer Use sandboxing / E2B / Daytona. Includes a "what we run for Keybrake's own production stack" disclosure (1Password / LiteLLM / Keybrake / SQLite + cron / Promptfoo) and a "what to skip" section (generic API gateways without LLM plugins, compliance-management SaaS like Vanta/Drata, generic application firewalls, agent-platform-bundled proxies, alpha-stage tools). FAQ covers absolute minimum stack, Datadog/New Relic alternatives, MCP server placement, OSS-only stack, pre-revenue startup recommendation, three-question vendor evaluation test, and "AI governance" vs "agent governance" vocabulary disambiguation.

## Framework-specific guides (LangChain, CrewAI, AutoGen, OpenAI Agents SDK)

Practical guides for engineers wiring specific agent frameworks to money-moving APIs. Each page gives the one-line SDK change (vault key + base_url override) plus the policy configuration and the failure-mode analysis specific to that framework.

- [LangChain Stripe API key — wiring it up safely](https://keybrake.com/seo/langchain-stripe-api-key): Covers LangChain's StripeTool setup, the three gaps Stripe restricted keys leave open (no per-day dollar cap, no mid-run revoke without rotating the production key, no per-call audit with agent context), and the one-line base_url change that routes the agent through a vault-key proxy. Includes the vault key policy JSON (daily_usd_cap, allowed_endpoints, expires_in), three LangChain patterns where spend caps matter most (ReAct agents with retry reasoning, RAG agents that retrieve customer IDs, LangGraph billing nodes), and FAQ covering SDK compatibility, 429 behavior on cap breach, and functions-agent compatibility.
- [CrewAI API key management — per-agent vault keys in multi-agent pipelines](https://keybrake.com/seo/crewai-api-key-management): The five problems a CrewAI crew creates when agents share a single Stripe key (no per-agent spend attribution, rate-limit contention, one compromised agent revokes for all, no per-agent endpoint scope, no audit log joined on agent_id) and the vault-key pattern that collapses all five. Covers per-agent environment configuration, hierarchical mode sub-agents, dynamic per-run key issuance, and the metadata passthrough pattern that makes `SELECT * FROM calls WHERE metadata->>'agent_id' = 'billing_agent'` work without extra instrumentation. FAQ covers hierarchical mode, dynamic instantiation per customer run, and CrewAI retry logic interaction with 429 on cap breach.
- [AutoGen agent API key management — spend caps and per-agent credentials](https://keybrake.com/seo/autogen-agent-api-key): How AutoGen's GroupChat default of shared process environment means all agents share the same Stripe/Twilio key, creating attribution gaps, revoke coupling, and blast-radius risk. Per-agent tool initialization pattern using StripeClient base_url override, nested chat child vault key issuance via Keybrake API, multi-tenant vault key issuance per customer session. FAQ covers Docker-based code execution sandbox, max_consecutive_auto_reply interaction, and multi-tenant per-customer vault key patterns.
- [OpenAI Agents SDK + Stripe — wiring function tools safely](https://keybrake.com/seo/openai-agents-sdk-stripe): OpenAI Agents SDK @function_tool Stripe integration in 10 lines, the three gaps (no per-run budget, no mid-run revoke, no agent-context audit log), and the two-line fix (vault key as api_key, proxy as base_url). Includes audit log comparison table (what Stripe gives vs what the proxy audit adds: agent_run_id, agent_name, policy_check_result, calls that hit the cap), handling 429 in the SDK (catch RateLimitError, return user-readable message so model stops retrying), and FAQ covering hosted tool execution, streaming mode, and latency overhead (6-14ms co-located, 40-80ms cross-region).
- [Twilio AI agent rate limit — four controls to prevent a $1,200 SMS bill](https://keybrake.com/seo/twilio-ai-agent-rate-limit): Twilio's native rate limits are per-number (throughput), not per-agent or per-dollar. At 3 msg/sec, a stuck loop burns $100 in 55 min (US) or 10 min (UK). Four controls ordered from weakest to strongest: Messaging Services daily volume cap (count-based, per-Service, fires after the fact), sub-account balance limit (lifetime not daily, stops all new resource creation), proxy-layer pre-call enforcement (fires before call, zero-lag, per-agent vault key), circuit-breaker flag (next-request-fast, cooperative-only). Includes Twilio SDK base_url override pattern, 429 Retry-After behavior at midnight UTC, five-column comparison table (fires when / per-agent / dollar-accurate / survives restart), and practical three-layer stack (proxy primary + Messaging Service cap secondary + Twilio spend threshold tertiary).
- [AI agent API key best practices — the 7-control checklist for production](https://keybrake.com/seo/ai-agent-api-key-best-practices): Framework-agnostic checklist for any agent that touches money-moving APIs. Seven controls: restricted scope (endpoint allowlist; Stripe supports path-level, Twilio/Resend do not), per-agent keys (not shared; requires intentional setup in most frameworks), per-run expiry (vendors don't support natively; requires proxy with expires_in), pre-call spend cap (not a billing alert; billing alert = 1-24h lag; cap = zero lag), sub-second revoke (Stripe rotation = 1-5 min propagation; proxy vault key revoke = &lt;1s), queryable audit log with agent context (agent_run_id/agent_name columns; the post-incident query is SELECT * FROM calls WHERE metadata->>'agent_run_id' = ?), no real key in agent context (vault key pattern ensures agent never holds the real secret). Seven-row capability matrix (Stripe native / Twilio native / Resend native / vault key proxy) against all seven controls. FAQ covers which controls matter for read-only agents, minimum viable subset for write agents, and where LiteLLM fits (LLM inference axis, complementary not competitive).
- [Pydantic AI + Stripe: giving your agent a payment tool without handing it an uncapped key](https://keybrake.com/seo/pydantic-ai-stripe-tool): Covers Pydantic AI's @agent.tool decorator and RunContext dependency injection for a Stripe ChargeCustomerTool. Explains the structural gap: Pydantic AI validates tool input schemas (type-correct calls) but not spend behavior (how many times the tool is called or the cumulative dollar value). The validation fires before the Stripe SDK call; the spend overrun happens after. One-line fix: swap `stripe.api_key` for a vault key and add `stripe.base_url`. RunContext is the natural injection point — vault key as dependency means every tool in the agent reads the same per-run credential without global state. Three-column table comparing schema validation vs spend enforcement vs what each catches. FAQ covers structured result types capturing the vault key policy response, streaming mode compatibility, and the distinction between validation and enforcement.
- [Resend AI agent API key: rate-capping email sends before the loop runs wild](https://keybrake.com/seo/resend-ai-agent-api-key): How to give an AI agent a Resend API key without risking a stuck loop that blasts an entire email list. Covers Resend's Sending-access scope restriction and per-domain key scoping (useful but insufficient). Three agent-specific failure modes: idempotent re-trigger that sends 12 identical welcome emails; list-blast loop that iterates 2,000 × 2,000 = 4M sends; compromised sub-agent in a multi-agent system exhausting the account daily budget. The vault-key fix: pass a vault key to Resend SDK's constructor with a `baseURL` override (one-line change); policy includes `daily_send_cap`, `allowed_from_domains`, and `expires_in`. Resend is one of Keybrake's three v1 vendors — cost per email is a fixed rate parseable from the response, enabling dollar-denominated caps. FAQ covers SDK compatibility, 429 behavior on cap breach, and `allowed_to_domains` restriction per vault key.
- [Shopify AI agent API key: scoping Admin API access so a stuck agent can't touch every order](https://keybrake.com/seo/shopify-ai-agent-api-key): Shopify Admin API access scopes give agents read vs. write access per resource type. Three agent-specific failure modes: bulk-cancellation loop that cancels 847 orders in 40 min (within Shopify's per-shop rate limit), price-update drift from malformed scraper data, and refund loop from a state-machine bug that issues duplicate refunds. Proxy pattern: point `shopify.clients.Rest` at the proxy with a vault key; policy includes `allowed_endpoints` (GET vs POST allowlist) and `max_write_operations_per_run`. FAQ covers GraphQL vs REST compatibility, Custom App vs Public App tokens, and per-run operation count caps combined with time-window expiry.

- [n8n AI agent API key — scoping and capping vendor calls](https://keybrake.com/seo/n8n-ai-agent-api-key): How n8n's credential store gives an AI Agent node full-access vendor keys with no per-run spending boundary. Three gaps: no per-run dollar cap (stuck billing agent charges indefinitely until daily Stripe Dashboard email arrives hours later), no run-scoped revoke (disabling the credential breaks all other workflows), no per-call audit with agent context (n8n logs execution steps, not vendor response bodies or amounts). The proxy fix: replace the built-in Stripe action with a Webhooks by n8n step pointing to `proxy.keybrake.com/stripe/v1`, using a vault key in the Authorization header. Vault key policy: `daily_usd_cap`, `allowed_endpoints`, `expires_in`, and `agent_run_label`. Three n8n patterns where spend capping matters most: billing agents that iterate a CRM, notification agents that loop on Twilio SMS, long-running background workflows with no operator watching. Per-execution vault key issuance pattern (HTTP Request node at workflow start issues a fresh key, passes it as a variable). FAQ covers community Stripe node compatibility, n8n execution log behavior on 429, and per-execution vault key pattern.
- [Vercel AI SDK + Stripe: tool calls with spend caps and audit logs](https://keybrake.com/seo/vercel-ai-sdk-stripe): Wiring a Vercel AI SDK `tool()` to Stripe and the three gaps the SDK doesn't close: no per-session spend cap (model decides how many times to call the tool), no mid-session revoke (streaming session can't stop a specific credential without restarting the server), no per-call audit with session context (Stripe logs by IP, not by SDK session). Two-line fix: swap `new Stripe(process.env.STRIPE_SECRET_KEY)` to `new Stripe(vault_key, { baseURL: "https://proxy.keybrake.com/stripe/v1" })`. Per-user-session vault key issuance from the Next.js backend (one Keybrake API call before `generateText`, vault key passed into the Stripe client scoped to that request). Why Vercel AI SDK users specifically benefit: non-deterministic call counts in multi-turn loops, Edge runtime has no persistent spend-tracking state, multi-tenant SaaS apps need per-user isolated budgets. FAQ covers App Router / streamText / RSC streamUI compatibility, handling 429 in the tool execute function, and provider-neutrality (works with any @ai-sdk/* provider).
- [Dify agent API key security — scoping and capping tool calls](https://keybrake.com/seo/dify-agent-api-key): How Dify's credential vault stores API keys securely but provides no per-conversation spending boundary for AI Agent workflows. The binary access model: the agent either has the full credential or not, with no "this agent run gets $200/day on Stripe" concept. Four failure scenarios unique to Dify's AI features (LLM reasoning loop double-charge, prompt-injection large-amount call, unattended background workflow loop, multi-user deployment sharing one credential). Integration pattern: replace Dify's built-in Stripe tool with a custom API tool definition pointing at the proxy (`servers[0].url: https://proxy.keybrake.com/stripe/v1`) using Bearer auth with a vault key. Per-conversation vault key issuance via Dify's Chatflow API (backend issues vault key, passes as `inputs.stripe_vault_key` into the conversation). FAQ covers self-hosted Dify deployment compatibility, Dify UI behavior on 429 (node error with JSON body visible in task history), and vault key sharing across multiple workflows.
- [Zapier AI agent API key — scoping Stripe and vendor calls](https://keybrake.com/seo/zapier-ai-agent-api-key): How Zapier's credential model (connect once, use across Zaps) becomes a spending-boundary gap when AI features make autonomous action decisions — Copilot-built Zaps, Interfaces with embedded AI assistants, and AI Steps that decide whether to fire downstream Stripe actions. Three gaps: no per-run dollar cap (Zapier has Zap run limits, not dollar-amount limits), no agent-run-scoped revoke (disconnecting the credential breaks all Zaps), no per-call audit with agent context (task history shows Zap ran but not Stripe response body or amount charged). Integration pattern: replace built-in Stripe action with a Webhooks by Zapier step calling `proxy.keybrake.com/stripe/v1` with vault key in Authorization header. Three Zapier patterns where spend capping matters most: AI Steps that trigger paid API calls downstream, Interfaces with payment logic embedded, high-volume recurring Zaps with CRM data pipelines. Note: real Stripe key never passes through Zapier's infrastructure when using the proxy. FAQ covers built-in Stripe integration vs Webhooks step distinction, vault key rotation without editing individual Zaps, and Zap task history behavior on 429.
- [AI agent Stripe spend cap — enforcing per-run dollar limits](https://keybrake.com/seo/ai-agent-stripe-spend-cap): Why Stripe's native tools (restricted keys, Radar rules, Dashboard alerts, Billing usage alerts) don't enforce per-agent spend caps — they're account-level or post-charge, not per-agent or pre-charge. The critical distinction: pre-charge enforcement rejects the API call before money moves; post-charge enforcement catches it after refund exposure has already materialized. For a stuck loop of 50 charges at $50 each, pre-charge at $250 cap stops at charge 5 with zero refunds due; post-charge Radar catches it at $2,500 with 50 non-refundable processing fees. Four-column table (restricted keys / Radar rules / Dashboard alerts / Billing usage alerts) against four dimensions: per-agent cap, pre-charge, dollar-accurate, zero-lag. Proxy-layer mechanics: proxy parses `amount` field from request body, checks running total for the vault key, returns 429 with cap/used/reset_at before forwarding if total would exceed `daily_usd_cap`. Per-agent vs per-account cap granularity: five agents sharing one account-level $1,000/day cap means Agent #3 going haywire consumes budget for all five; per-vault-key caps give each agent an independent $200/day budget. FAQ covers daily cap reset time (midnight UTC or rolling 24h), double-call behavior per agent turn, and Twilio/Resend cap support.
- [Temporal AI agent API key — scoping activity calls to Stripe and Twilio](https://keybrake.com/seo/temporal-ai-agent-api-key): Why Temporal's automatic activity retry semantics create unique spending risks — a payment activity that fails with a Stripe 500 and retries may produce a duplicate charge if idempotency keys aren't correctly threaded through the workflow context. Three gaps Temporal's native tooling doesn't fill: no per-workflow spend cap (workflow history shows what happened, it doesn't stop it), no per-workflow revoke (terminating the workflow doesn't cancel in-flight vendor API calls), no per-call cost audit with workflow context (Temporal event log doesn't parse dollar amounts from Stripe responses). Integration pattern: issue one vault key per workflow run in a setup activity, pass as a parameter to downstream payment activities, set `stripe.api_base` to `proxy.keybrake.com/stripe`. Long-running workflow risk: workflows that span days or weeks accumulate Stripe calls no one is actively watching — `daily_usd_cap` enforced per day the workflow runs. FAQ covers how vault keys interact with Temporal's idempotency story, what happens to vault keys on workflow termination, and single-key-per-workflow vs per-activity issuance patterns.
- [Make.com AI agent API key — scoping vendor calls with spend caps](https://keybrake.com/seo/make-ai-agent-api-key): How Make.com scenarios with AI modules (OpenAI Chat, Claude, AI Steps) create spending-boundary gaps when model output drives downstream HTTP module calls to Stripe or Twilio. Make's native connections encrypt keys and prevent leakage but give the scenario the full key with no per-run dollar cap and no per-execution revoke. Two patterns where AI drives vendor calls: AI module output driving HTTP module actions (router/text-extraction → HTTP call), and AI function calling mapped directly to HTTP module (agentic pattern). Three gaps: no per-scenario spend cap (Make operations quota ≠ vendor API dollar cap — a scenario within quota can generate $10,000 in Stripe charges), no per-run revoke (disabling the connection breaks all scenarios using it), no per-call audit with AI reasoning context (execution history shows module inputs/outputs, not parsed dollar amounts or run-total). Swap in Make: two fields change in the HTTP module — URL (`api.stripe.com` → `proxy.keybrake.com/stripe/v1`) and Authorization header (real key → vault key). Per-scenario vs per-execution vault key issuance options. Three Make patterns where spend capping matters most: AI-driven billing scenarios, high-volume Twilio/Resend notification scenarios, multi-tenant scenarios. FAQ covers native Stripe module vs HTTP module compatibility, execution history on 429, and single vault key across multiple HTTP modules in one scenario.
- [Flowise agent API key security — scoping and capping tool calls](https://keybrake.com/seo/flowise-ai-agent-api-key): How Flowise's visual LangChain builder stores API keys as credentials shared across all conversations and all users of a flow — the binary access model (full credential or nothing) with no per-conversation spending cap or per-session revoke. Four failure scenarios unique to Flowise deployments: LLM reasoning loop double-charge (agent calls payment tool repeatedly on ambiguous response), prompt-injection large-amount call (crafted user input causes unexpected large payment), unattended multi-user chatbot (no per-user spend tracking or per-user revoke), shared-flow team environment (test conversations fire real production API calls). Integration pattern: two-line change in Flowise Custom Tool node — swap env var from `STRIPE_SECRET_KEY` to `VAULT_KEY`, add `stripe.setApiBase('https://proxy.keybrake.com/stripe')`. Per-conversation vault key issuance via a `customFunction` node at chain start, labeled with `$session_id`. Per-flow vs per-conversation vault key tradeoff: simpler to set up per-flow but a runaway conversation can exhaust the day's cap for all others. Multi-user Flowise chatbot: per-conversation vault key gives per-user spend isolation without separate Flowise instances. FAQ covers built-in integration nodes vs Custom Tool nodes, agent behavior on 429, and single-flow vs per-execution vault key tradeoffs.
- [SendGrid AI agent API key — preventing runaway email agent costs](https://keybrake.com/seo/sendgrid-ai-agent-api-key): What SendGrid's native restricted API key scoping actually provides (permission scope: Mail Send only, Custom Scopes, Subuser accounts with independent quotas) and the three gaps it leaves for AI agent use cases: no per-run send volume cap (restricted key permits all mail sends without count limit — a stuck notification agent exhausts daily quota and damages deliverability reputation), no per-run revoke (deleting the key breaks all senders using it), no per-send audit log with agent run context (Activity Feed shows sends but not which agent conversation triggered each). Why email agents are a different risk profile than payment agents: deliverability damage (bounce rate spike triggers internal reputation monitoring, damages future sending for weeks), recipient damage (can't un-send a wrongly-sent cancellation email), volume cost is secondary to reputation cost. Recommended immediate controls for SendGrid-backed agents (in order of impact): restricted key scoped to Mail Send only, Subuser with independent monthly quota, SendGrid Email Activity webhook alerts for bounces/spam reports. Keybrake v1 supports Resend (modern SendGrid alternative with cleaner API); SendGrid proxy support on roadmap. FAQ covers Resend as a SendGrid drop-in for new agent stacks, whether Subusers actually limit damage, and real cost of 10,000 unexpected SendGrid emails (cost secondary to deliverability damage).
- [AI agent API key scope — the right way to issue credentials to autonomous systems](https://keybrake.com/seo/ai-agent-api-key-scope): Conceptual framework post. Traditional API key scoping = dimension 1 only (permission scope: which endpoints). AI agents need four dimensions: (1) permission scope — which endpoints (vendor native restricted keys cover this), (2) spending scope — daily USD cap per run (no vendor provides this natively at the key layer), (3) time scope — key TTL matching the agent run duration, not indefinite (vendor keys are long-lived by default), (4) revocability — per-run revoke in sub-second without blast radius to other agents (vendor key rotation takes 30-60 minutes and breaks all users). Four-column comparison table: dimension, vendor native keys (what they cover), vault key proxy (what it adds), what it prevents. Vault key policy encodes all four dimensions: `allowed_endpoints` (permission), `daily_usd_cap` (spending), `expires_in` (time), `agent_run_label` (revocability label). Four issuance patterns: per-conversation (chatbots), per-workflow-run (Temporal/Prefect/Airflow), per-agent-session (long-running agents), per-tool-call (highest isolation, highest overhead). FAQ covers whether vendor restricted keys already solve scoping (dimension 1 yes, 2-4 no), DIY implementation of the four dimensions without a proxy (feasible but requires maintaining four separate systems), and how to set a sensible `daily_usd_cap` (1.5-2x expected maximum per run).

- [Prefect AI agent API key — scoping vendor calls in flows and tasks](https://keybrake.com/seo/prefect-ai-agent-api-key): How Prefect's `@task(retries=3)` decorator creates spending risks when the task calls Stripe or Twilio — a network timeout causes a retry that may produce a duplicate charge if idempotency keys aren't wired correctly, and event-triggered Prefect automations can fire the same billing flow dozens of times during a traffic spike. Three gaps Prefect's native tooling doesn't fill: no per-run spend cap (flow run logs show what happened, not stop it), no per-run revoke (flow cancellation halts task scheduling but not in-flight vendor calls), no per-task audit with flow context (task logs show inputs/outputs, not parsed dollar amounts). Integration pattern: issue one vault key in a setup task at flow start using `flow_run.id` as the run label, pass vault key as parameter to downstream payment tasks, use `flow_run.id` as idempotency key prefix for genuine retry safety. Event automation risk: vault key per-flow-run caps isolate concurrent automation-triggered runs so a traffic spike triggering 50 flows doesn't bypass the per-run budget. FAQ covers Prefect task result caching compatibility, on_cancellation state hook for vault key revocation, and shared vault key across subflows.
- [Modal AI agent API key — spend caps for serverless agent functions](https://keybrake.com/seo/modal-ai-agent-api-key): How Modal's auto-scaling and `.map()` parallelization amplify the vendor API key risk — a `.map()` over a 5,000-item customer list creates 5,000 concurrent Stripe calls with no per-invocation cap, and a cron-scheduled Modal function that encounters a data error fires all invocations before anyone is watching. Three gaps: no per-invocation spend cap (Modal logs show invocation counts and durations, not vendor dollar amounts), no per-invocation revoke (`modal app stop` halts new invocations but can't revoke credentials from already-running ones), no per-call audit with invocation context. Integration pattern: issue a short-lived vault key (`expires_in: "5m"`) at the start of each function invocation, replace the `modal.Secret` holding `STRIPE_SECRET_KEY` with a `KEYBRAKE_API_KEY` secret, use the customer ID or run label as `agent_run_label` for per-invocation attribution. Modal containers (`@app.cls`) lifecycle: issue vault key in `@modal.enter()` method. FAQ covers per-invocation vault key latency overhead (negligible for second-plus functions, per-batch issuance option for sub-second functions), long-running Modal container classes, and setting sensible per-invocation caps.
- [LangGraph AI agent API key — scoping tool calls in stateful agent graphs](https://keybrake.com/seo/langgraph-ai-agent-api-key): How LangGraph's cyclic execution model creates unique spending risks — a graph that cycles through a tool node can call a payment tool ten or more times before the LLM reaches a terminal state, and multi-actor supervisor patterns let any worker agent exhaust the shared vendor credential on behalf of the entire graph. Three gaps: no per-graph-run spend cap (LangGraph's `recursion_limit` caps steps, not dollars), no per-run revoke (graph interruption halts future node execution but not in-flight vendor calls), no per-tool-call audit with graph context (state history and LangSmith traces don't parse dollar amounts or cross-reference charges with graph thread IDs). Integration pattern: add a `setup` node at graph start that issues a vault key using the thread ID as run identifier, store vault key in graph state, inject into tool calls via the agent node. Multi-actor isolation: issue per-member vault keys with sub-caps for supervisor patterns — Worker A gets $100 cap, Worker B gets $50 cap, total supervisor run cap is $200. 429 handling: conditional edge from tools node routes budget-exceeded errors to a `budget_exceeded` node rather than back to the agent loop. FAQ covers vault key compatibility with LangGraph checkpointing/persistence (issue fresh key on resume), concurrent graph runs and vault key isolation, and routing 429 exceptions in graph conditional edges.
- [Agno AI agent API key — scoping tool calls in multi-agent teams](https://keybrake.com/seo/agno-ai-agent-api-key): How Agno's (formerly Phidata) team architecture — fast parallel multi-agent execution with shared tools — creates credential sharing risks when any team member can call a payment tool and exhaust the shared vendor budget. Three gaps: no per-session spend cap (tool call history in session state shows what happened but doesn't prevent it), no per-session revoke (stopping the Python process kills the session but can't undo in-flight vendor calls), no per-call audit with agent identity (session messages show tool inputs/outputs, not parsed dollar amounts or per-session charge aggregation). Integration pattern: factory function `make_charge_tool(vault_key, session_id)` returns a closure binding the vault key into the tool — each session gets a fresh vault key before the agent is instantiated. Per-member vault keys for teams: billing_agent gets $150 vault key cap, notification_agent gets $50 vault key cap, both sharing the billing vault key for Stripe calls gives a shared $150 Stripe ceiling. Vault key in Agno persistent memory: don't persist it — issue a fresh vault key per session even when resuming from long-term memory context. FAQ covers structured output compatibility, long-running session key refresh pattern, and setting per-invocation caps (1.5-2x expected maximum per session call).
- [AI agent credential management — beyond secrets storage](https://keybrake.com/seo/ai-agent-credential-management): Framework post addressing the architectural gap between secrets storage (Vault, AWS SSM, Doppler) and credential enforcement for autonomous agents. Secrets managers were designed for the unauthorized-access threat model; AI agents create an authorized-excess threat model — the agent is permitted to call Stripe, the problem is it called it 150 times instead of 1. Four dimensions of agent credential management: Storage (Vault/SSM solve this), Access (IAM policies/Vault roles solve this), Enforcement (gap — no existing secrets manager enforces per-run dollar caps or per-run revocability for vendor API calls), Audit (partial — vendor dashboards show calls but don't cross-reference with agent run context). Why traditional patterns don't close the gap: application-layer rate limiting is fragile and race-condition-prone in concurrent deployments; vendor restricted keys scope permissions but not spend; scheduled rotation reduces unauthorized-access window but doesn't stop authorized excess in real time. Proxy pattern mechanics: vault key never exposes real credential to agent, enforcement happens at call time not delivery time, audit log cross-references calls with agent run context. How Vault/SSM and the proxy work together — Vault stores real vendor keys, proxy retrieves them at startup, agents only ever see vault keys. FAQ covers Vault dynamic secrets comparison (closes revocability gap but not dollar cap or call-time audit), do you need both a secrets manager and a proxy (yes — different layers), and minimum viable credential management setup for new agent projects.

- [Airflow AI agent API key — scoping vendor calls in DAGs and operators](https://keybrake.com/seo/airflow-ai-agent-api-key): How Apache Airflow's dynamic task mapping (`expand()`) and task-level `retries` create spending risks when operators call Stripe, Twilio, or Resend — `expand()` with 300 customer IDs spawns 300 simultaneous Stripe charge tasks, and `retries=3` on a network timeout may produce duplicate charges without a stable idempotency key. Three gaps Airflow's native tooling doesn't fill: no per-DAG-run spend cap (task run logs show what executed; they don't prevent it mid-run), no per-run revoke (clearing or marking tasks failed doesn't recall in-flight vendor API calls), no per-call audit with DAG run context (task instance logs don't parse dollar amounts or cross-reference charges by DAG run ID). Integration pattern: issue vault key in a setup task at DAG start using `context["run_id"]` as the run label, pass to all dynamic task instances via `.partial(vault_key=vault_key).expand(customer_id=customer_ids)`, use `run_id + customer_id` as the Stripe idempotency key prefix. FAQ covers vault key behavior on task clear-and-rerun (reuses same key within TTL; fresh key if DAG fully restarted), Airflow Connections integration for the Keybrake admin key, and per-batch vs per-task vault key issuance tradeoffs.
- [Ray AI agent API key — spend caps for distributed agent workloads](https://keybrake.com/seo/ray-ai-agent-api-key): How Ray's `@ray.remote` task execution and Actor model let agents scale to hundreds of concurrent workers — each making simultaneous vendor API calls with no per-job spend cap. At 64 CPU cores, a billing job that accidentally receives 10,000 customer IDs instead of 100 creates 10,000 parallel Stripe calls completing in roughly the same wall-clock time as a correct 100-ID run. Three gaps: no per-job spend cap (Ray Dashboard shows task throughput and resource usage, not dollar spend), no job-level revoke (`ray.cancel()` stops pending tasks but can't recall already-dispatched vendor API calls), no per-call audit with Ray job context (task events don't parse dollar amounts or map charges to Ray job IDs). Integration pattern: issue vault key in the Ray driver before dispatching tasks, pass as string parameter to `@ray.remote` functions (serialized into Ray's object store like any other task argument), real secret never enters Ray's runtime. For Ray Actors: vault key passed in `__init__`, `refresh_key()` method for actors longer-lived than the vault key TTL. FAQ covers per-actor vs shared vault key for Actor pools, vault key revocation behavior for in-flight vs pending tasks, and why the real API key is never serialized into Ray's distributed object store.
- [Celery AI agent API key — scoping vendor calls in distributed task queues](https://keybrake.com/seo/celery-ai-agent-api-key): How Celery's `chord` header fan-out and `autoretry_for` retry policy create spending risks when tasks call Stripe or Twilio — a chord with 3,000 customer IDs dispatches 3,000 simultaneous charge tasks, and `autoretry_for=(stripe.error.APIConnectionError,)` retries on network failures that may have already reached Stripe (duplicate charge risk without stable idempotency keys). Three gaps: no per-chord spend cap (Flower monitoring shows task states and throughput, not vendor dollar amounts), no worker-level revoke by vendor (Celery's revoke signal stops queued tasks but not vendor calls already dispatched by executing tasks), no per-call audit with task context (task result backend and Flower show no structured cost extraction). Integration pattern: issue vault key before `chord()` dispatch, pass through task signature via `.s(vault_key)`, use `customer_id + amount_cents` as Stripe idempotency key. Celery Beat periodic tasks: issue fresh vault key at each scheduled execution so each beat run gets its own cap. FAQ covers vault key in Celery broker serialization (same security model as any string task argument; ensure TLS on broker), `group()` vs `chain()` vs `chord()` vault key handling differences, and handling 429 in `autoretry_for` (exclude 429 to prevent cap-hit retries).
- [Haystack AI agent API key — scoping vendor tool calls in pipelines](https://keybrake.com/seo/haystack-ai-agent-api-key): How Haystack's Agent component loops tool calls until the LLM decides it's done — creating spending risks when tools call Stripe or Twilio, since `max_agent_steps=20` allows 20 tool calls per run with no native dollar cap on cumulative spend. Three gaps: no per-pipeline-run spend cap (`max_agent_steps` caps loop iterations not dollars; a single step can trigger multiple tool calls), no mid-run vendor revoke (stopping a pipeline run doesn't revoke vendor credentials used within it), no per-call audit with pipeline context (Haystack tracing shows component timings but doesn't parse dollar amounts from vendor responses). Integration pattern: issue vault key before the pipeline run and inject into the `StripeCharger` custom component via `__init__` constructor — vault key travels as instance state through the pipeline component graph, never as global env var. Haystack `ComponentTool` wraps the injected component; the Agent uses the tool without ever seeing the real vendor key. When agent loop receives 429: surface as component exception, LLM sees the error in tool result and stops calling the tool. FAQ covers Haystack pipeline serialization vs vault key (don't serialize short-lived vault keys; construct pipeline fresh per run), single vault key across multiple tools (yes for same vendor; separate vault keys for each vendor), and 429 handling in agent tool result for deterministic loop termination.
- [AI agent secrets management — why secrets stores aren't enough](https://keybrake.com/seo/ai-agent-secrets-management): Four-layer model for AI agent API key security: Storage (HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager — solved), Access control (IAM roles, Vault policies, service accounts — partially solved), Enforcement (per-run spend caps, endpoint allowlists at call time — gap), Audit (per-call cost parsing, agent identity in vendor logs — gap). Why Vault dynamic secrets don't close the enforcement gap for vendor APIs: Stripe doesn't have a first-class Vault secrets engine; short TTL doesn't cap spend (a 1-hour key can still issue $50K in charges); Vault audit log records secret retrieval, not vendor API calls. Why Stripe restricted keys don't close the gap either: permission scope ≠ spend control (restricted key with PaymentIntents Write can create unlimited charges of unlimited size), no per-agent-run attribution, no customer allowlist, no sub-second per-run revoke. Runaway billing scenario: AWS Secrets Manager hands over Stripe key correctly per IAM role; Lambda executes 5,000 charges in 30 seconds because secrets management handed over the key and stepped back — enforcement layer absent. Proxy pattern: one long-lived Keybrake admin key in SSM/Vault; per-run vault key issued at job start with `daily_usd_cap`, `allowed_endpoints`, `expires_in`; real vendor key never in Lambda or agent. FAQ covers whether Vault is still needed with Keybrake (yes — different layers, Vault stores the Keybrake admin key), Vault dynamic secrets vs Keybrake vault keys (Vault has no Stripe dynamic secrets engine; Keybrake fills that gap), and adding Keybrake alongside existing secrets management without infrastructure changes.
- [Dramatiq AI agent API key — scoping vendor calls in actor-model task queues](https://keybrake.com/seo/dramatiq-ai-agent-api-key): How Dramatiq's `group()` primitive enqueues all messages simultaneously with no per-group spend cap — a billing group with 4,000 customer IDs dispatches 4,000 actor messages at once, and `max_retries=3` at the actor level multiplies each network failure into up to 4 vendor calls (including duplicate-charge risk on reconnect without stable idempotency keys). Three gaps: no per-group spend cap (StatsD metrics show throughput and failure rates, not vendor dollar amounts), no actor-level vendor revoke (no built-in mid-run cancel for enqueued messages; queue flush stops pending messages but not executing actors), no per-call audit with actor context (result backend records success/failure states, not parsed Stripe cost or message-ID-to-charge correlation). Integration pattern: issue vault key before `group()` dispatch, embed in every actor message as explicit argument (`charge_customer.message(cid, amount_cents, vault_key)`), use `customer_id + amount_cents` as Stripe idempotency key. The vault key string is serialized into broker messages like any other argument — ensure TLS on Redis/RabbitMQ broker. FAQ covers `pipeline()` composition vault key handling (pass vault key through each stage as explicit argument), broker message security model vs real API key, and configuring `max_retries` to exclude cap-exhaustion 429s (raise `BudgetExhausted` exception class and configure as non-retryable via actor `throws=` parameter).
- [Metaflow AI agent API key — scoping vendor calls in ML workflow steps](https://keybrake.com/seo/metaflow-ai-agent-api-key): How Metaflow's `@foreach` decorator creates parallel step branches that each call vendor APIs independently — a billing flow with `foreach="customer_ids"` on a 10,000-item list dispatches 10,000 parallel Batch/Kubernetes containers each calling Stripe, with `@retry(times=2)` multiplying each failed step into multiple charge attempts. Three gaps: no per-flow spend cap (Metaflow metadata service tracks step states, execution times, and artifact values — not vendor dollar amounts), no step-level vendor revoke (flow cancellation via CLI stops queued steps but not vendor calls already in progress within executing step containers), no per-call audit with step context (metadata service doesn't parse Stripe response data or correlate `Request-Id` with Metaflow run IDs). Integration pattern: issue vault key in `start` step, store as `self.vault_key` flow artifact — Metaflow's artifact system propagates it to all downstream steps including all `@foreach` branches automatically. All branches share the same cap; 429 from cap exhaustion surfaces as Metaflow step failure. FAQ covers vault key persistence in Metaflow S3 artifact store (apply same access controls as any sensitive artifact; vault key expires after TTL so compromise window is bounded), vault key availability on `run.resume()` (vault key already stored from original `start` step; issue keys with TTL longer than max expected flow runtime), and per-flow vs per-run vs shared vault key tradeoffs for multi-flow deployments.
- [Dagster AI agent API key — scoping vendor calls in data pipeline assets](https://keybrake.com/seo/dagster-ai-agent-api-key): How Dagster's dynamic partitions materialize assets concurrently with no per-run vendor spend cap — a backfill of 2,000 customer partitions dispatches 2,000 concurrent materializations each calling Stripe independently, and `RetryPolicy(max_retries=3)` multiplies each failed materialization into up to 4 charge attempts. Three gaps: no per-run spend cap (Dagster asset catalog tracks materialization status, lineage, and metadata — not vendor dollar amounts spent within materializations), no asset-level vendor revoke (run termination via UI/CLI sends graceful-interrupt signal; materializations mid-execution may complete their current operation including the vendor call before stopping), no per-call audit with asset context (output metadata and event log track materialization results, not vendor call cost correlation). Integration pattern: implement as Dagster resource (`@resource(config_schema={"budget_usd": float})`) initialized once per run — resource issues vault key at run start using `context.run_id` as label, assets declare `required_resource_keys={"keybrake"}` and access vault key via `context.resources.keybrake.vault_key`. FAQ covers per-partition vs per-run vault key scoping (per-partition = one Keybrake call per materialization, better per-partition attribution; per-run = one call at run start, more efficient for large partition sets), `RetryPolicy` interaction with cap-exhaustion 429 (catch 429 and raise non-retryable exception class; cap exhaustion is intentional stop not transient error), and vault key usage in op-based Dagster pipelines and IO managers (same `@resource` injection pattern works for `@op`, `@job`, and custom IO managers).
- [Kestra AI agent API key — scoping vendor calls in YAML workflow tasks](https://keybrake.com/seo/kestra-ai-agent-api-key): How Kestra's `EachParallel` task spawns one Docker container per item in the input list simultaneously with no per-execution spend cap — a billing workflow receiving 5,000 customer IDs spawns 5,000 containers each calling Stripe, with automatic task retries multiplying each container failure into multiple charge attempts. Three gaps: no per-execution spend cap (Kestra execution logs track task start/end times, output values, and error states — not vendor dollar amounts within containers), no task-level vendor revoke (execution kill via UI or API sends kill signal to containers; containers may complete vendor call before exiting), no per-call audit with execution context (Kestra outputs from task containers don't automatically cross-reference vendor `Request-Id` values with Kestra execution IDs). Integration pattern: issue vault key in YAML setup task before `EachParallel`, store in Kestra KV Store under execution-scoped key (`vault-key-{{ execution.id }}`), read in each parallel task container via `{{ kv('vault-key-' + execution.id) }}`. KV Store TTL set to match vault key TTL (e.g. `PT4H`). FAQ covers vault key security in Kestra KV Store vs secret manager (KV Store for operational state; use output passing from setup task instead of KV Store for higher security), automatic retry interaction with cap exhaustion (handle 429 in Python code; exit status 0 for cap-hit to prevent retry storm; exit non-zero only for transient errors), and vault key scoping per webhook-triggered execution (each execution gets unique `execution.id` → unique KV Store key → separate vault key and cap per webhook event).
- [AI agent rate limiting — enforcing vendor API spend limits before they're hit](https://keybrake.com/seo/ai-agent-rate-limiting): Two meanings of "rate limiting" for AI agents: (1) reactive — handling 429 responses from vendor APIs when request rate is too high (transient, retry with exponential backoff + `Retry-After` header); (2) proactive — enforcing your own dollar caps before the vendor's limits become relevant (terminal, stop and alert). Conflating both leads to retry storms: agent hits vendor spend limit → treats 429 as transient error → retries → hits spend limit again → retries → also hits request-rate limit → both flavors of 429 in logs → incident diagnosis harder. The retry-storm failure sequence traced end-to-end with 1,000-Stripe-charge example. Pre-call enforcement closes the gap reactive handling leaves open: proxy checks cumulative spend before forwarding request to vendor — if cap reached, returns 429 before charge executes, money never moves, 429 is distinguishable from vendor 429s via `X-Keybrake-Cap-Hit: true` response header. Four-layer rate limiting strategy: per-agent vault key with dollar cap (pre-call enforcement), endpoint allowlist (scope restriction), idempotency keys on all money-moving calls (retry safety), structured 429 handling per error type (cap-exhaustion vs vendor rate-limit vs transient network error). Five FAQ entries: rate limiting vs spend capping distinction, proxy cap interaction with Stripe account-level spending limits (proxy cap fires first; vendor limit is the backstop), shared vault key across parallel agent instances (shared cap enforced atomically — prevents 10 agents from collectively exceeding budget by each thinking they have individual cap), correct retry strategy after cap-exhaustion 429 (don't retry; record which items completed; surface to operator; let operator trigger new run), how to choose the right dollar cap value for a vault key (expected spend × 1.5 as safety margin; adjust down after first few runs).
- [Argo Workflows AI agent API key — scoping vendor calls in Kubernetes-native pipelines](https://keybrake.com/seo/argo-workflows-ai-agent-api-key): How Argo Workflows DAG templates run each task step in its own Kubernetes pod — and how a `withItems` expansion over a large list spawns hundreds of simultaneous Stripe-calling pods with no per-workflow dollar cap. `retryStrategy` at the step level multiplies failed vendor calls into N+1 pod executions, and rotating the Kubernetes Secret to stop a runaway workflow breaks every other workflow reading the same secret. Three gaps: no per-workflow spend cap (Argo tracks pod states and resource usage, not vendor dollars), no mid-run vendor revoke without Secret rotation (workflow termination stops new pod scheduling but not executing containers), no per-call audit with workflow context (workflow archive stores step inputs/outputs/status, not parsed Stripe cost or charge-to-workflow-name correlation). Integration pattern: dedicated `issue-vault-key` setup step before the fan-out, vault key passed as workflow parameter to all downstream templates via `{{tasks.issue-vault-key.outputs.parameters.vault_key}}`, Kubernetes Secret now contains only the Keybrake admin key (not the real Stripe secret). FAQ covers vault key security in Argo's parameter system vs artifact store (parameters are visible in `argo get` and the UI; use artifact mechanism for higher security), `withParam` dynamic fan-out compatibility, and retryPolicy configuration to avoid retry storms on cap-exhaustion 429s (`retryPolicy: "OnError"` prevents retries on application-level errors including intentional cap hits).
- [Luigi AI agent API key — scoping vendor calls in Spotify's pipeline orchestrator](https://keybrake.com/seo/luigi-ai-agent-api-key): How Luigi's dynamic task generation via `requires()` fans out vendor API calls at runtime — a `BillingPipeline` that generates one `ChargeCustomer` task per customer ID will execute all generated tasks regardless of cumulative vendor spend, and automatic re-runs of tasks whose `output()` file wasn't written can trigger duplicate charges. Three gaps: no per-pipeline spend cap (Luigi tracks task completion states — pending/running/done/failed — not dollar amounts of vendor calls within `run()`), no task-level vendor revoke (stopping vendor calls mid-pipeline requires killing worker processes or rotating the environment variable, breaking all other Luigi pipelines on the same workers), no per-call audit with Luigi task context (event hooks fire on task start/success/failure but don't parse dollar amounts or correlate Stripe `PaymentIntent.id` with Luigi `customer_id` parameters in a queryable table). Integration pattern: `VaultKeyTask` as a dependency of `ChargeCustomer` (declared in `requires()`), vault key written to `LocalTarget` output file, read by downstream tasks via `self.input().open("r")` — vault key travels through Luigi's file-based dependency system without environment variable pollution. Luigi's `output()` memoization means `VaultKeyTask` runs once; retries and re-runs of the pipeline reuse the same vault key file within TTL. FAQ covers `output()` marker interaction with vault key scoping, handling cap exhaustion in Luigi retry logic (raise non-retryable exception class; write sentinel output file to prevent re-run), and `batch_method` compatibility.
- [Windmill AI agent API key — scoping vendor calls in developer workflow scripts](https://keybrake.com/seo/windmill-ai-agent-api-key): How Windmill's for-each flow steps spawn one worker per input item simultaneously — and how a webhook-triggered flow that receives 500 customer IDs runs 500 parallel Stripe-calling workers with no per-flow dollar cap. Windmill's concurrency limit controls throughput (iterations per batch) not total spend, and rotating a Windmill resource variable to revoke credentials breaks all flows and scheduled scripts in the workspace using that resource. Three gaps: no per-flow spend cap (Windmill resource system stores and injects credentials but doesn't track cumulative vendor spend per execution), no mid-run revocation without workspace-wide resource update, no per-call audit with flow run context (audit log records who triggered what and when; no structured vendor cost tracking). Integration pattern: `issue_vault_key.ts` as step 1 in the flow, vault key returned as `{ vaultKey }` output, downstream for-each step receives vault key via Windmill's input binding connected to `results.step1.vaultKey`. `WM_FLOW_JOB_ID` environment variable available in all Windmill scripts — used as `agent_run_label` for per-flow-run audit attribution. FAQ covers Windmill flow job ID access in scripts (`Deno.env.get("WM_FLOW_JOB_ID")` in TypeScript, `os.environ.get("WM_FLOW_JOB_ID")` in Python), approval step TTL implications (issue vault key after approval step if approval latency is variable), and retry behavior configuration to avoid double charges on cap exhaustion (`NonRetriableError` equivalent; check `X-Keybrake-Cap-Hit: true` header to distinguish cap-hit from transient errors).
- [AI agent webhook authentication — securing inbound triggers and outbound vendor calls](https://keybrake.com/seo/ai-agent-webhook-authentication): Two-layer model that most webhook authentication guides omit: Layer 1 (inbound) — HMAC signature verification with timestamp validation to authenticate the trigger; Layer 2 (outbound) — scoped vault key issued per webhook event to bound what the agent can spend in response. Verifying the trigger is legitimate doesn't constrain the agent's behavior after the trigger passes — a valid HMAC signature on a `payment_intent.payment_failed` event doesn't prevent the agent from retrying the payment 150 times or calling unrelated endpoints. Two-column table: what each layer protects against and what it doesn't. Layer 1 implementation details: Stripe (`stripe.webhooks.constructEvent()`), Twilio (`twilio.validateRequest()`), and generic HMAC pattern — all with `timingSafeEqual` for constant-time comparison and timestamp validation (reject events older than 5-10 minutes to prevent replay). Layer 2 implementation: per-event-type endpoint allowlists (e.g. `payment_intent.payment_failed` → only `POST /v1/payment_intents/*/confirm` + `POST /v1/refunds`, cap $50), short TTL (30 minutes for webhook handlers), event ID in the `agent_run_label` (`webhook/{event.type}/{event.id}`). Webhook replay deduplication: check for active vault key with same label before issuing a new one — active key means event is currently being processed, skip re-processing and return 200. FAQ covers Python HMAC signature verification for Stripe (use raw bytes, not decoded string, to avoid HMAC mismatch), per-event vs reused vault key tradeoffs (always per-event — carry-over caps cause legitimate later events to fail), and handling cap-exhaustion mid-webhook-processing (return 200 to webhook sender, escalate to operator; do NOT return 429 or 5xx which triggers re-delivery).
- [Inngest AI agent API key — scoping vendor calls in durable TypeScript functions](https://keybrake.com/seo/inngest-ai-agent-api-key): How Inngest's `Promise.all()` over `step.run()` calls creates parallel step execution — all charge steps fire simultaneously with no per-function-run dollar cap — and how `retries: 3` on the function re-runs the entire function (including all charge steps) on uncaught errors. Key Inngest-specific property: `step.run()` memoization means the vault key setup step runs once and its result is replayed from Inngest's durable state store on all retries — the vault key isn't re-issued on each retry, and the cap accumulates correctly across retried steps. Three gaps: no per-run spend cap (Inngest concurrency and throttle settings control invocation rate, not dollar value of vendor calls within a single invocation), no step-level vendor revoke (function run cancellation stops new step scheduling but not currently executing serverless worker; rotating environment key requires redeployment breaking all functions), no per-call audit with Inngest run context (run history captures step inputs/outputs/timing but doesn't parse Stripe response dollar amounts or correlate charges with `runId`). Integration pattern: `issue-vault-key` as the first `step.run()` in the function, vault key accessed as closure variable by all downstream step callbacks (`const { vaultKey } = await step.run("issue-vault-key", ...)`), `idempotency_key: runId + "-" + customerId` provides retry safety. FAQ covers vault key expiry vs Inngest step memoization (set TTL longer than max expected run duration; for long-lived runs, add a key-refresh step), fan-out to child functions (each child function issues its own vault key with cap from event payload `budget_usd`), and cap-exhaustion 429 handling in Inngest retry logic (`NonRetriableError` class stops retries on intentional cap hit; `X-Keybrake-Cap-Hit: true` header distinguishes from transient vendor 429s).
- [Apache Beam AI agent API key — scoping vendor calls in parallel data pipelines](https://keybrake.com/seo/apache-beam-ai-agent-api-key): How Apache Beam's `ParDo` transforms fan out vendor API calls across thousands of parallel `DoFn.process()` workers — a single `ParDo` over a million-row `PCollection` creates thousands of simultaneous Stripe calls with no per-pipeline dollar cap, and bundle retries multiply spend further when a `DoFn` fails mid-bundle. Three gaps: no per-pipeline spend cap (Dataflow resource model tracks CPU, memory, and data throughput — not vendor API dollars; `max_num_workers` controls parallelism not dollar spend), no mid-pipeline vendor revoke (Dataflow job cancellation stops new bundle assignment but not executing `DoFn.process()` calls; real Stripe key in `setup()` can only be rotated via redeployment which doesn't affect already-running workers), no per-call audit with pipeline context (Beam metrics capture custom counters but don't parse dollar amounts from Stripe responses or correlate `PaymentIntent.id` with Dataflow job ID and bundle ID). Integration pattern: vault key issued before pipeline submission and passed as `DoFn` constructor argument (serialized into pipeline graph, distributed to all workers at job startup), `DoFn.setup()` uses vault key to initialize Stripe client at Keybrake proxy, idempotency key uses `job_id + customer_id` (stable across bundle retries), `agent_run_label` includes input path for per-pipeline-run audit attribution. FAQ covers vault key security when serialized into pipeline graph (scoped key not real secret; alternative: read from Cloud Secret Manager in `setup()` rather than serialize), bundle retry interaction with spend cap (idempotency key deduplicates at Stripe; proxy still counts retry call against cap; set cap to account for 2-3× retry amplification), and vault key TTL for long-running pipelines (set to pipeline duration; per-micro-batch vault keys for unbounded streaming pipelines).
- [Hatchet AI agent API key — scoping vendor calls in durable workflow steps](https://keybrake.com/seo/hatchet-ai-agent-api-key): How Hatchet's `spawn_workflow()` fan-out creates many parallel child workflow runs each making vendor calls independently (no cap across children), and how step `max_retries` multiplies vendor calls on failures — a parent workflow that spawns 500 child workflows with `max_retries=3` per step can produce up to 2,000 Stripe calls. Three gaps: no per-workflow spend cap (Hatchet's concurrency slots control run-level parallelism not dollar spend; a single run processing 5,000 customers makes 5,000 Stripe calls regardless), no step-level revoke without worker restart (workflow cancellation marks run cancelled but doesn't interrupt executing steps; rotating `STRIPE_SECRET_KEY` requires restarting all workers, breaking every other in-flight workflow), no per-call audit with step context (Hatchet run history captures step inputs/outputs/status but doesn't parse Stripe dollar amounts or correlate `PaymentIntent.id` with Hatchet run IDs). Integration pattern: `issue_vault_key` as the first `@hatchet.step()` returning vault key via `step_output`, downstream steps declare `issue_vault_key` as parent, vault key retrieved via `ctx.step_output("issue_vault_key")["vault_key"]`, idempotency key uses `ctx.workflow_run_id() + customer_id`, `agent_run_label` includes Hatchet workflow run ID. For `spawn_workflow()` fan-out: vault key passed in child workflow input — all children share one vault key and one cap. FAQ covers vault key in `step_output` being stored in Hatchet's database (scoped key, not real secret; alternative: store in secrets manager keyed by run ID and fetch in downstream steps), retry handling for cap exhaustion (raise non-retriable exception class; cap exhaustion is intentional stop not transient error), and multi-tenant vault key issuance (one vault key per tenant run with tenant's plan-tier cap).
- [Trigger.dev AI agent API key — scoping vendor calls in background task runs](https://keybrake.com/seo/triggerdev-ai-agent-api-key): How Trigger.dev's `batch.triggerAndWait()` fans out to many parallel task runs each making vendor calls independently — 500 customer task runs triggered in a batch create 500 simultaneous Stripe calls with no per-run dollar cap — and how `maxAttempts` multiplies spend when tasks fail mid-execution after their Stripe call already succeeded. Three gaps: no per-run spend cap (queue concurrency and rate limits control task throughput but not the dollar value of vendor calls within each run; a batch of 500 tasks each charging $50 completes $25,000 in charges regardless of concurrency settings), no task-level revoke without environment changes (batch/run cancellation stops pending runs but doesn't interrupt executing tasks; rotating `STRIPE_SECRET_KEY` requires redeployment terminating all running tasks), no per-call audit with run context (Trigger.dev run history captures task inputs/outputs per run but doesn't parse Stripe dollar amounts or correlate `PaymentIntent.id` with run IDs at the vendor-call level). Integration pattern: vault key issued once in parent task before `batch.triggerAndWait()`, passed in child task payload — all child runs share vault key and cap (`const { vault_key: vaultKey } = await resp.json()`), `AbortTaskRunError` on cap exhaustion signals permanent failure without consuming retry attempts, idempotency key uses `context.run.id + customerId` (stable across task retries), `agent_run_label` includes parent run ID. FAQ covers vault key safety in task payload stored in Trigger.dev run history (scoped key not real secret; alternative: secrets manager keyed by run ID), parent task retry reissuing fresh vault key (correct behavior; check for existing key by label to make issuance idempotent), and `wait.for()` delay interaction with vault key TTL (set TTL to exceed total task duration including delays).
- [AI agent idempotency — preventing duplicate vendor charges on retries](https://keybrake.com/seo/ai-agent-idempotency): Why agent retries are structurally different from human retries — automatic and silent (workflow engines retry without human intervention), potentially unbounded (Temporal default is infinite retries with 10-year schedule-to-close), and cross-process (process crashes leave agents unable to tell if the previous Stripe call succeeded before the crash). The correct idempotency key formula: `workflow_run_id + "-" + item_id` — stable across all retries of the same run, unique across different runs for the same customer. Common mistakes: timestamp in the key (different on every retry), random UUID (different on every retry), customer ID only (collides across different billing runs for the same customer). Framework-specific run ID sources: Temporal `WorkflowExecution.ID + RunID`, Inngest `runId`, Trigger.dev `context.run.id`, Prefect `flow_run.id`, Hatchet `ctx.workflow_run_id()`, Dagster `context.run_id`, Airflow `context["run_id"]`, Beam (job ID from pipeline argument). Vendor idempotency guarantee comparison: Stripe 24-hour window (full dedup within window; different API key voids dedup); Twilio and Resend have no native idempotency key (application-level dedup required: write-before-call pattern with unique constraint on `run_id + recipient`). The 24-hour cliff: idempotency keys expire; a run retried 26 hours later re-charges customers 1-499 whose keys expired. Spend caps as backstop: vault key with cap set to expected single-run spend limits re-charges even when idempotency keys expire. FAQ covers race conditions between concurrent runs (requires application-level mutual exclusion beyond idempotency keys; vault caps are heuristic backstop not guarantee), vault key proxy and Stripe idempotency interaction (proxy forwards `Idempotency-Key` header unchanged; same underlying Stripe key means dedup works correctly), and Twilio write-before-call dedup pattern (write pending record with unique constraint before call; check status on retry).
- [AI agent multi-tenant isolation — per-tenant API key scoping and spend caps](https://keybrake.com/seo/ai-agent-multi-tenant-isolation): The three isolation requirements for multi-tenant AI agent SaaS: spend isolation (one tenant hitting their cap can't affect other tenants' ability to spend), revoke isolation (stopping a runaway tenant's agent doesn't break any other tenant's running agents), audit isolation (tenant A's call history is queryable without tenant B's records appearing). The shared-key failure modes: rate limit bleed across tenants (Stripe limits per API key, not per customer; one tenant's 10,000-customer run hits limits, other tenants' legitimate charges 429), blast-radius revocation (rotating shared key to stop bad actor breaks all tenants simultaneously), audit contamination (cross-referencing Stripe logs to tenant requires matching Stripe customer IDs to your tenant table). Per-tenant vault key architecture: one vault key per tenant per agent run, cap reflects tenant's plan tier (`{"starter": 500, "growth": 5000, "enterprise": 50000}`), `agent_run_label` embeds `tenant:{tenant_id}` for partition-by-label audit queries. Revocation: single `DELETE /vault/keys/{key_id}` stops tenant A's agent in milliseconds without affecting B, C, or D. Audit query: `GET /audit?label_contains=tenant:{tenant_id}` returns only that tenant's call history — no cross-contamination, no application-level filtering. FAQ covers per-run vs per-tenant-total vault key granularity (per-run correct; per-tenant-total cap reflects cumulative across all runs), bring-your-own-credentials tenants (vault keys referencing tenant's own Stripe credential stored in Keybrake), and vault key endpoint allowlists for tenant action scoping (can restrict what API paths each tenant's agent is permitted to call).
- [Google Cloud Workflows AI agent API key — scoping vendor calls in managed GCP workflows](https://keybrake.com/seo/google-cloud-workflows-ai-agent-api-key): How Cloud Workflows' `parallel` block fans out vendor API calls across N simultaneous HTTP steps with no per-execution dollar cap — a billing workflow receiving 3,000 customer IDs dispatches 3,000 simultaneous Stripe calls, and `try`/`retry` blocks retry failed vendor calls that may have already applied charges (duplicate charge risk without stable execution-ID-based idempotency keys). Three gaps: no per-execution spend cap (Cloud Billing budget alerts fire 24h after spend; workflow timeout is wall-clock not dollars), no mid-execution vendor revoke without Secret Manager rotation (rotating the secret prevents new executions from fetching the old key, but already-running executions that cached it in a workflow variable continue until completion), no per-call audit with workflow execution context (Cloud Logging `LOG_ALL_CALLS` records HTTP inputs/outputs but doesn't parse Stripe dollar amounts or correlate charges with the execution ID in a structured cost table). Integration pattern: `issue_vault_key` step using Workflows' built-in `http.post` call to Keybrake API before the `parallel` block, vault key stored as workflow variable, passed to all parallel branch HTTP steps — all branches share the same vault key and cap, which accumulates atomically across concurrent calls. Idempotency key uses `GOOGLE_CLOUD_WORKFLOW_EXECUTION_ID + customer_id` (stable across parallel branch retries). FAQ covers passing vault key to parallel branches (workflow variable is readable from inside `parallel` for-each iterations; no per-branch key issuance), handling cap exhaustion in `try`/`retry` blocks (check `X-Keybrake-Cap-Hit: true` header in error handler; raise non-retryable path on cap hit), and vault key TTL for workflows with human approval steps (issue key after the approval resumes, not at workflow start).
- [Restate AI agent API key — scoping vendor calls in durable service invocations](https://keybrake.com/seo/restate-ai-agent-api-key): How Restate's durable execution and automatic retry-on-failure amplify vendor API spend when handlers fan out via `ctx.serviceSendClient().send()` — each dispatched child invocation calls Stripe independently with no shared cap — and how vendor calls placed outside `ctx.run()` are replayed on every handler retry (duplicate charges without idempotency). Key Restate-specific property: `ctx.run("issue-vault-key", ...)` memoizes vault key issuance in the journal — the same vault key is returned on every handler retry without calling Keybrake again, and the cap accumulates correctly across retried handler executions. Three gaps: no per-invocation spend cap (Restate's durability guarantees are about execution reliability, not dollar spend; N child invocations share no cap coordination), no mid-invocation vendor revoke without service restart (invocation cancellation sends a signal but can't interrupt a `ctx.run()` closure mid-execution; rotating `STRIPE_SECRET_KEY` requires redeployment), no per-call audit with invocation context (Restate journal records execution steps but doesn't parse Stripe dollar amounts or correlate charges with the invocation ID). Integration pattern: vault key issued inside `ctx.run("issue-vault-key", ...)` (memoized across retries), passed as explicit parameter to fan-out child service invocations — all children share one vault key and one cap; idempotency key uses `ctx.request().id + customerId` (stable across handler retries via Restate's invocation ID). FAQ covers vault key in Restate journal security model (scoped key not real secret; access controls on Restate storage same as any sensitive runtime output), vault key TTL for handlers that `ctx.sleep()` overnight (set TTL exceeding maximum suspension duration; or issue key after last sleep point), and cap-exhaustion handling in retry logic (`TerminalError` in TypeScript vs regular `Error` to prevent infinite retry on intentional cap hit).
- [Conductor AI agent API key — scoping vendor calls in Netflix Conductor workflows](https://keybrake.com/seo/conductor-ai-agent-api-key): How Netflix Conductor's `FORK_JOIN` system task dispatches N parallel worker tasks simultaneously, each making independent vendor API calls with no shared dollar cap — 3,000 parallel `charge_customer` tasks create 3,000 simultaneous Stripe calls — and how Conductor's task retry policy (`retryCount`, `retryDelaySeconds`) re-executes failed tasks that may have already applied charges (duplicate charge risk without stable `workflowInstanceId`-based idempotency keys). The worker polling model creates an additional risk: already-polled tasks (acknowledged with `PUT /tasks/{taskId}/ack`) continue executing on workers even after the workflow is terminated at the Conductor server level. Three gaps: no per-workflow spend cap (task rate limits and execution timeouts control throughput/wall-clock time, not dollar amounts per execution), no mid-execution vendor revoke without worker restart (workflow termination stops new task scheduling but not already-polled tasks; rotating the environment credential requires restarting all workers of that type, breaking every other in-flight task), no per-call audit with workflow execution context (execution history captures task inputs/outputs/status but doesn't parse Stripe dollar amounts or correlate charges with `workflowInstanceId`). Integration pattern: `ISSUE_VAULT_KEY` as an HTTP system task (no custom worker needed) before the `FORK_JOIN`, vault key referenced in all forked task `inputParameters` via `${issue_vault_key_task.output.response.body.vault_key}`, workers read vault key from task input and use it for all Stripe calls via the proxy — all parallel tasks share the same vault key and cap. Worker returns `FAILED_WITH_TERMINAL_ERROR` on cap exhaustion to prevent Conductor from retrying the task. FAQ covers vault key visibility in Conductor execution history (scoped key not real secret; apply Conductor audit log access controls to restrict who reads execution histories), dynamic fork on large customer lists (all forked tasks share one cap; first tasks to complete exhaust the budget; correct intended behavior), and Orkes Conductor compatibility (same task model and FORK_JOIN semantics; vault-key pattern identical).
- [AI agent spend reporting — per-run cost visibility for autonomous systems](https://keybrake.com/seo/ai-agent-spend-reporting): Why vendor dashboards (Stripe Dashboard, Twilio Monitor, Resend email log) are insufficient for AI agent spend reporting — they show account totals with no per-agent-run attribution, no cross-vendor aggregation, and 24-hour billing lag. Four reporting questions operations teams need answered: how much did run X cost, which agent spends the most, is Stripe spend accelerating, what's the projected monthly spend at current run rate. The minimum audit log schema for agent spend reporting: `agent_run_id`, `agent_name`, `vendor`, `endpoint`, `cost_usd` (parsed from vendor response), `vendor_txn_id`, `policy_verdict`, `called_at`, `vault_key_id`. Four SQL query patterns: per-run cost summary (GROUP BY vendor, endpoint WHERE agent_run_id = ?), per-agent fleet view (GROUP BY agent_name, vendor ORDER BY total_spend DESC), per-vendor daily trend (GROUP BY date(called_at), vendor ORDER BY day DESC), projected monthly spend (7-day rolling average × 30 per agent). Three alerting triggers that catch incidents before caps: cap hit rate spike (>10 cap hits in 1 hour → retry-loop or wrong cap), per-run cost exceeds threshold (single run > $200 → data error or logic bug), spend velocity anomaly (current hour > 3× 30-day average → deployment change or incident). How `cost_usd` is parsed at the proxy layer: Stripe `amount` field (cents ÷ 100), Twilio `price` field on status callback, Resend fixed per-email rate from account tier. FAQ covers building agent spend reporting without a proxy using Stripe metadata (partial — Stripe only, convention not enforcement, requires ETL), difference between agent spend reporting and LLM cost reporting (different axes — LLM tokens bounded by context; vendor API spend unbounded; joined on `agent_run_id`), and audit log retention recommendations (7 days for alerting, 90 days for trend analysis, 1 year for compliance/incident investigation).
- [AI agent zero trust — applying zero-trust principles to autonomous system credentials](https://keybrake.com/seo/ai-agent-zero-trust): Why traditional zero-trust architecture (ZTNA, BeyondCorp, Zscaler) doesn't address AI agent vendor API spend risk — ZTNA controls which network resources an agent can reach (hostname level), not what it can do at those resources or how much it can spend. The distinction: ZTNA addresses unauthorized access; AI agent credential risk is authorized excess (the agent is permitted to call Stripe, the problem is it called it 2,000 times instead of once, or was prompt-injected into a larger transaction). Four zero-trust principles translated to AI agent credentials: (1) per-run credentials not long-lived keys (vault key `expires_in` matching run duration — credential trust expires with the run); (2) endpoint allowlists not just network segments (vault key `allowed_endpoints` enforces least privilege at the API path level within a single vendor — ZTNA can reach `api.stripe.com` but can't distinguish `POST /v1/payment_intents` from `POST /v1/transfers`); (3) pre-call spend caps not post-charge alerts (vault key `daily_usd_cap` fires before the request reaches the vendor; billing alerts fire 15 min–24h after charge applied); (4) per-call audit with agent context not just access logs (audit log with `agent_run_id`, parsed cost, policy verdict — access logs record "request was made", not "this agent run spent $47 calling Stripe's charge endpoint"). Three-row table: what network segmentation / Stripe Restricted Keys / vault key endpoint allowlists prevent and don't prevent. How prompt injection attacks interact with zero-trust credentials (zero trust limits blast radius: injected instruction to issue large transfer blocked by endpoint allowlist; amount above cap blocked pre-call; attempt logged with full context). Composition guidance: ZTNA controls which vendors the agent's service account can reach (network layer) + vault key proxy controls what the agent can do and spend at those vendors (API layer) — neither replaces the other.
- [AWS Step Functions AI agent API key — scoping vendor calls in state machine workflows](https://keybrake.com/seo/aws-step-functions-ai-agent-api-key): How AWS Step Functions' Map state with `MaxConcurrency: 0` dispatches all iterations as simultaneous Lambda invocations — a billing workflow processing 3,000 customers creates 3,000 concurrent Stripe calls with no per-execution dollar cap — and how Retry rules re-execute failed states that may have already charged Stripe without stable idempotency keys derived from `$$.Execution.Id` and `$$.Map.Item.Index`. Three gaps: no per-execution spend cap (AWS Cost Anomaly Detection fires hours after spend; execution timeout is wall-clock not dollars; `MaxConcurrency` limits count not cost), no mid-execution vendor revoke without Secrets Manager rotation (warm Lambda environments already loaded the key continue running until recycled; `StopExecution` cannot interrupt in-flight Lambda invocations), no per-call audit with execution context (CloudWatch Logs and X-Ray track invocation duration and status not dollar amounts; reconstructing run cost requires cross-referencing CloudWatch and Stripe dashboard). Integration pattern: `IssueVaultKey` Lambda task as the first state, vault key written to `$.vault` via `ResultPath`, passed to Map iterations via `ItemSelector` with `"vault_key.$": "$.vault.vault_key"` and `"item_index.$": "$$.Map.Item.Index"` — all concurrent Lambda invocations share one cap accumulating atomically; `CapExhausted` named exception caught by `Catch` block rather than `Retry` to prevent retry storms; idempotency key = `execution_id-item_index` (stable across state retries). FAQ covers ItemSelector mechanics for vault key passing to Map iterations, distinguishing cap exhaustion from transient Lambda errors in Retry rules, and vault key TTL for state machines with WaitForTaskToken approval states.
- [Azure Durable Functions AI agent API key — scoping vendor calls in durable orchestrations](https://keybrake.com/seo/azure-durable-functions-ai-agent-api-key): How Azure Durable Functions' `Task.all()` fan-out dispatches N concurrent activity function executions simultaneously — all calling Stripe independently with no per-orchestration dollar cap — and how `callActivityWithRetry` re-executes failed activities that may have already charged Stripe (duplicate charge risk without stable `instanceId`-based idempotency keys). Key Durable Functions property: because the vault key is issued inside a `callActivity` call, Durable Functions stores the result in execution history — on orchestrator replay, the framework returns the stored result without re-executing the activity, so the vault key is issued exactly once per orchestration and never re-issued on replay. Three gaps: no per-orchestration spend cap (Azure Monitor cost alerts fire after spend; `maxConcurrentActivityFunctions` limits count not cost), no mid-orchestration vendor revoke without Key Vault rotation (warm function instances already loaded the key continue running until recycled; `terminateAsync` marks orchestration failed but cannot interrupt in-flight activity executions), no per-call audit with orchestration context (Application Insights tracks invocation duration and errors not dollar amounts). Integration pattern: `IssueVaultKey` activity as first call in orchestrator, vault key and `instanceId` passed to all fan-out activities, `instanceId + itemIndex` as idempotency key — all concurrent activities share one cap; cap-exhausted error excluded from `RetryOptions` using named exception class; `waitForExternalEvent` pattern requires issuing vault key after the event resumes (not at orchestration start) to avoid expiry during multi-day waits. FAQ covers Durable Functions orchestrator replay and vault key exactly-once guarantee, preventing RetryOptions from retrying cap exhaustion, and vault key TTL for orchestrations with waitForExternalEvent.
- [AWS Lambda AI agent API key — scoping vendor calls in event-driven agent functions](https://keybrake.com/seo/aws-lambda-ai-agent-api-key): How SQS-triggered Lambda auto-scaling dispatches hundreds of concurrent invocations simultaneously — all sharing the same `STRIPE_SECRET_KEY` from environment variables with no per-invocation dollar cap — and how SQS at-least-once delivery and EventBridge retries create duplicate vendor API calls without stable idempotency keys. Three gaps: no per-invocation spend cap (reserved concurrency limits count not cost; AWS Cost Anomaly Detection fires hours after spend; Lambda timeout limits wall-clock not vendor dollars), no mid-invocation vendor revoke without redeployment (env vars baked into function config; warm execution environments continue using old key until recycled; Secrets Manager rotation doesn't interrupt running invocations), no per-invocation audit with event context (CloudWatch Logs and X-Ray track invocation duration not vendor dollar amounts). Integration pattern: `KEYBRAKE_API_KEY` stored in SSM Parameter Store (fetched once at cold start, cached at process level), vault key issued per SQS `messageId` with 15-minute TTL matching Lambda max timeout — `messageId` used as both `agent_run_label` and Stripe `Idempotency-Key` (stable across SQS redeliveries); partial batch response (`batchItemFailures`) routes cap-exhausted messages to DLQ without blocking the batch retry; real Stripe key never in Lambda environment variables. FAQ covers per-invocation vs per-batch vault key granularity, SQS partial batch response for cap-exhausted messages, and EventBridge Scheduler pattern with `event.time` as stable idempotency key.
- [AI agent observability — what tracing tools miss about vendor API spend](https://keybrake.com/seo/ai-agent-observability): Why standard observability tools (OpenTelemetry, Datadog APM, AWS X-Ray, Jaeger) have a structural blind spot for AI agent systems — they record span duration, status codes, and error rates for Stripe API calls but don't parse the `amount` field from the response body to extract dollars charged, attribute spend to the agent run that triggered the call, or fire alerts when a single orchestration execution exceeds a dollar threshold. Four observability questions agent operations teams need: (1) what did this run cost in vendor calls; (2) which agent or run is the biggest spender today; (3) what is the per-vendor daily spend trend over 30 days; (4) is there a spend anomaly right now (>3× hourly baseline). Why vendor dashboards don't fill the gap: Stripe Dashboard shows account totals but has no per-agent attribution without manual `metadata` instrumentation on every SDK call; Twilio has no per-agent breakdown; Resend has no per-call cost breakdown; none provide cross-vendor views. Minimum audit log schema: `vault_key_id`, `agent_run_label`, `vendor`, `endpoint`, `cost_usd`, `vendor_txn_id`, `policy_verdict`, `called_at`. Four SQL query patterns: per-run cost summary (SUM grouped by run and vendor), fleet-wide spend ranking (agent name parsed from label, sorted by total spend), per-vendor daily trend (GROUP BY day and vendor, 30-day window), spend anomaly detection (current-hour vs 30-day average, flag when multiplier >3×). How cost is parsed at the proxy layer: Stripe `amount` cents ÷ 100, Twilio `price` field on status callback, Resend fixed per-email rate. FAQ covers OAI observability (complementary — APM for latency/errors, Keybrake for cost), cost_usd calculation for vendors without explicit per-call pricing, and distinction between AI agent observability and LLM observability (LangSmith/Langfuse track token cost at LLM layer; Keybrake tracks vendor API spend at SaaS action layer).
- [AI agent policy enforcement — runtime controls for autonomous system actions](https://keybrake.com/seo/ai-agent-policy-enforcement): Why standard policy tooling (IAM, OPA, Cedar, Stripe Restricted Keys) handles static access control but cannot enforce the dynamic runtime layer for AI agents — static policies evaluate each request independently against a fixed rule set and have no concept of "how much has this caller spent so far in this run." The three enforcement layers: static access control (IAM, OAuth scopes, Stripe Restricted Keys — configured once, no state required), dynamic runtime policy (vault key per run with run-specific `daily_usd_cap`, `allowed_endpoints`, `expires_in`, and `agent_run_label` — issued at run start, enforced per call against a live spend counter), and policy audit log (every call logged with policy verdict, cost, and agent attribution — no SDK instrumentation required). Why OPA and Cedar don't enforce spend caps: both evaluate each request independently against a static policy; a "deny if cumulative_spend > $500" rule requires a stateful counter that neither tool maintains natively; implementing it in OPA requires an external data source that OPA queries at decision time, which is equivalent to building a spend-counter system and using OPA as a thin query layer over it. Policy-as-code pattern: vault key issuance as the policy enforcement point, policy defined in code at run instantiation time with run-specific `approvedBudgetUsd` and `allowedStripeEndpoints` — different run types (billing agent vs refund processor) receive different policies even though they call the same vendor endpoints. Mid-run revocation without redeployment: `DELETE /vault/keys/{key_id}` takes effect on the next proxied request; no function redeploy, no secret rotation, no infrastructure change. FAQ covers Keybrake vs OPA/Kyverno for AI agents (different layers: OPA for identity access control, Keybrake for spend-based enforcement at vendor API call time), declarative YAML policy templates as input to the vault key API, and handling policy violations via webhook alerts and partial batch response.
- [Google Cloud Run Jobs AI agent API key — scoping vendor calls in batch task containers](https://keybrake.com/seo/google-cloud-run-jobs-ai-agent-api-key): How Cloud Run Jobs' `parallelism` setting fans out multiple task container instances simultaneously — all mounting the same Stripe API key from Secret Manager and calling the vendor independently with no per-job dollar cap — and how `max-retries` re-runs failed containers that may have already charged Stripe without stable idempotency keys derived from the job execution ID and task index. Three gaps: no per-job spend cap (Cloud Billing budget alerts fire 8-48h after spend; parallelism limits count not cost), no mid-job vendor revoke without Secret Manager rotation (containers that loaded the key at startup continue using it until exit; `gcloud run jobs cancel` sends SIGTERM with 10-second graceful shutdown window, in-flight Stripe calls complete before shutdown), no per-call audit with job execution context (Cloud Logging captures stdout/stderr not parsed Stripe dollar amounts; cross-referencing Cloud Logging and Stripe Dashboard requires manual timestamp alignment). Integration pattern: GCS-based vault key coordination (task index 0 issues the vault key and writes it to a GCS object; all other tasks read from GCS with strongly consistent reads); `CLOUD_RUN_EXECUTION` as vault key label and idempotency key prefix; `CLOUD_RUN_TASK_INDEX` as customer-slice selector; cap-exhausted tasks exit with code 0 to prevent `max-retries` from re-scheduling them. FAQ covers sharing one vault key across parallel task containers via GCS, handling cap exhaustion with zero exit code, vault key TTL sizing vs `--task-timeout`, and Cloud Scheduler integration for scheduled billing jobs.
- [FastAPI AI agent API key — scoping vendor calls in async agent endpoints](https://keybrake.com/seo/fastapi-ai-agent-api-key): How FastAPI's asyncio concurrency model handles multiple agent requests simultaneously — all sharing the same module-level `stripe.api_key` and calling Stripe concurrently with no per-request dollar cap — and how `BackgroundTasks` continues vendor calls after the HTTP response is returned, bypassing any middleware rate limiting applied to the original request. Three gaps: no per-request spend cap (FastAPI rate limiting middleware caps request rate not vendor dollars; no built-in hook to inspect or stop vendor API calls inside async route handlers), no mid-request vendor revoke without process restart (rotating the module-level `stripe.api_key` requires uvicorn restart; all running requests complete with the old key; `task.cancel()` propagation through Stripe SDK's HTTP calls is not guaranteed), no per-call audit with request context (FastAPI middleware can add request IDs but propagating them into Stripe call metadata requires manual instrumentation on every SDK call). Integration pattern: per-request vault key via FastAPI `Depends()` dependency — `get_vault_key` dependency issues vault key scoped to `request.budget_usd` before route handler runs; `make_charge_tool(vault_key)` closure captures per-request key for LLM tool functions; `cap_exhausted` return value instructs model not to retry. FAQ covers preventing LLM retry loops on cap exhaustion, using vault key label as FastAPI request correlation ID, BackgroundTasks vault key passing pattern, and OpenAI function tools hosted in FastAPI endpoints.
- [Kubernetes CronJob AI agent API key — scoping vendor calls in scheduled batch Pods](https://keybrake.com/seo/kubernetes-cronjob-ai-agent-api-key): How Kubernetes Jobs with `parallelism > 1` fan out multiple Pods simultaneously — all mounting the same Stripe key from a Kubernetes Secret and calling the vendor independently with no per-job dollar cap — and how `backoffLimit` retries failed Pods that may have already charged Stripe without stable idempotency keys derived from the job UID and completion index. Three gaps: no per-job spend cap (cloud billing budget alerts fire 8-24h after spend; `activeDeadlineSeconds` limits wall-clock not cost), no mid-job vendor revoke without Secret rotation (env-sourced secrets don't propagate to running Pods; volume-mounted secrets propagate up to 1 minute; Pod deletion allows in-flight calls to complete during graceful shutdown), no per-call audit with job context (Pod logs and Kubernetes events track Pod lifecycle not vendor dollar amounts; correlating a specific nightly CronJob run to Stripe charges requires manual log/dashboard cross-referencing). Integration pattern: init container on completion index 0 issues vault key and writes to emptyDir volume shared with main container; all other completion indices wait for the file to appear; `metadata.labels['controller-uid']` (Job UID via Downward API) as vault key label and idempotency key prefix with `customer_id`; cap-exhausted containers exit with code 0 to prevent `backoffLimit` retry. FAQ covers emptyDir vs GCS vault key coordination for multi-Pod fan-out, Job UID access via Downward API, and CronJob `concurrencyPolicy: Allow` interaction with per-execution vault keys.
- [AI agent access control — why static API scopes aren't enough for autonomous agents](https://keybrake.com/seo/ai-agent-access-control): The three-layer access control model for AI agents: authentication (which agent is calling?), authorization (which endpoints can it call?), and enforcement (how much can it spend, and can it be stopped?). Static RBAC and API scopes (Stripe Restricted Keys, OAuth scopes, AWS IAM policies) fully address authentication and authorization but leave the enforcement layer unaddressed — scope controls which endpoints are reachable but has no mechanism for per-run dollar caps, per-run revoke without account-wide impact, or per-call audit correlated to a specific agent run. Four properties that enforcement requires and static scopes can't provide: per-run identity (distinguishing run A from run B, not just "billing-agent service"), pre-call dollar cap (fires before the charge is created, not in a billing alarm 8h later), sub-second revoke (one DELETE call stops one run without rotating credentials account-wide), per-call audit with run context (every charge tagged with the agent run ID). Concrete failure mode with LangChain + Stripe Restricted Key: key scoped correctly to `payment_intents:write` but reasoning loop calls charge tool 200 times; rotating the Restricted Key kills all agents in all environments; Stripe dashboard shows 200 events with no run-ID filtering. Comparison table: Stripe Restricted Key vs per-run vault key across endpoint authorization, per-run dollar cap, per-run revoke, per-call audit with run context, expiry scoped to run, vendor coverage. FAQ covers how per-run vault keys implement zero-trust principles, multi-agent vault key architecture (orchestrator + per-sub-agent keys), and how Keybrake and OPA/Cedar address complementary layers (OPA for static authorization, Keybrake for stateful spend enforcement).
- [AI agent budget enforcement — pre-call spending limits that actually stop runaway agents](https://keybrake.com/seo/ai-agent-budget-enforcement): The distinction between budget monitoring (observing spend after it occurs) and budget enforcement (blocking spend before the next charge is created). Four monitoring patterns all fire after the vendor charge: cloud billing alarms (8-48h lag after ingestion), vendor threshold emails (15-60 min post-threshold, account-level not per-agent), agent-side counters (immediate but resets on restart; race-condition vulnerable in concurrent deployments; multi-process blind), LLM observability tools (track LLM token cost not vendor API dollar spend; no enforcement hook). What enforcement requires that these patterns can't provide: request interception (in the call path before Stripe), stateful spend accumulation (external counter, not in-process, atomic across concurrent callers), cost awareness before forwarding (parse `amount` from request body; check against remaining cap; block or forward in sub-ms), per-run granularity (distinct credential per run, not shared API key). Three cap granularity levels: per-run (`daily_usd_cap` on vault key), per-fleet (team-level cap across all keys with shared label prefix), per-vendor (aggregate cap across all vault keys for one vendor). Agent-side counter failure modes detailed with race condition code example: read/write race allows two concurrent coroutines to each compute "under cap" and both proceed; process restart resets in-memory counter; multi-process deployments each maintain independent counters that don't aggregate. FAQ covers Stripe rate limits vs budget enforcement (orthogonal: rate limits cap request velocity not dollar spend), idempotency keys vs spend caps (complementary: idempotency prevents duplicates within a run, caps limit total per-run spend), audit log behavior for blocked-cap requests (recorded with 429 status and cumulative spend at block time), and cap sizing methodology (theoretical max = operations × cost-per-op × 1.2 safety margin).
- [AI agent error handling — retries, spend caps, and safe failure modes for vendor API calls](https://keybrake.com/seo/ai-agent-error-handling): Why agent retry logic is structurally different from human-initiated web app retries — agents apply retry conditions mechanically and without human judgment; uncapped retries on money-spending APIs turn a transient Stripe 500 into thousands of dollars of charges. Four patterns for safe error handling: (1) Idempotency keys generated once per logical operation before the retry loop and reused across all retries — never regenerated inside the retry loop; (2) Bounded retries (max 3 attempts with exponential backoff and jitter) with explicit non-retryable classification: 402 cap_exhausted (hard stop, never retry), 402 card_declined (non-retryable), 429 rate_limit (retry after Retry-After header), 500 (retry up to 3), timeout (retry once with same idempotency key); (3) Proxy-enforced spend cap as outer safety net — holds even when agent retry logic malfunctions or the agent itself (not the tool) retries the tool call; (4) Structured non-retryable error responses to LLM — returning `"STOP: spend cap exhausted, do not retry"` as a tool result string prevents tool-calling frameworks from treating cap events as transient errors eligible for automatic retry. Retry decision table: HTTP status × vendor error × retryable verdict × recommended agent action across 6 error types. Spend cap interaction with idempotency: proxy counts idempotent retries against the cap (sees two calls even if Stripe deduplicates to one charge), so cap should be set to 2× expected workflow spend to accommodate one idempotent retry per step.
- [Advanced idempotency patterns for AI agents calling payment APIs](https://keybrake.com/seo/ai-agent-idempotency-advanced): Beyond basic idempotency keys: advanced patterns for multi-step billing workflows, agent crash recovery, parallel branches, and the interaction between idempotency and spend caps. Four patterns: (1) Key composition for multi-step workflows — derive step-specific keys as `SHA256(run_id + "::" + step_name)[:48]`; each step gets its own key so step 2 can be retried independently of step 1; keys are deterministic across restarts from the same stable root identifiers. (2) Workflow state persistence for crash recovery — write step completion to a database (run_id + step → status + result) before each vendor call; on restart, check `is_completed(run_id, step)` before re-executing; skip completed steps and re-read their results from the database rather than re-calling the vendor. (3) Parallel branch isolation — include branch discriminator in idempotency key for parallel agents independently deciding to charge the same customer; implement ownership coordination at orchestration layer (primary branch owns the charge; backup branch checks primary completion before attempting). (4) Idempotency and spend cap interaction — proxy counts proxy-visible calls against cap, including idempotent retries that Stripe deduplicates to zero actual charges; set cap to 2× expected workflow spend to accommodate retry overhead; use audit log `Idempotency-Key` field to measure retry rate and calibrate cap sizing. Stripe idempotency key 24-hour TTL: include billing period in key to prevent cross-cycle collision (`charge-{customer}-{invoice}-{2026-06}`). Keybrake proxy forwards `Idempotency-Key` header unchanged to Stripe — proxy deduplication and Stripe deduplication are independent layers.
- [AI agent compliance — audit trails, least-privilege, and access control for autonomous agents](https://keybrake.com/seo/ai-agent-compliance): Four compliance gaps that shared API keys leave for AI agents calling financial APIs: (1) Attribution gap — shared keys can't prove which agent run made which Stripe charge (no per-run identity in the vendor audit log); SOC 2 CC6.1, SOX 404, PCI DSS Req 10.2 all require attributing access to the initiating entity. (2) Least-privilege gap — shared production keys carry admin-level permissions; GDPR Art. 25 data minimization and PCI DSS Req 7.2 require restricting access to what the agent's specific task requires. (3) Revocation gap — revoking a shared key kills all agents simultaneously; SOC 2 CC6.2 and NIST 800-63B require removing access for one entity without affecting others. (4) Retention gap — Stripe retains logs for 90 days, Resend for 30 days; SOX requires 7 years, PCI DSS Req 10.7 requires 12 months. Compliance gap matrix: maps four frameworks (SOC 2, SOX, GDPR, PCI DSS) against four controls (attribution, least-privilege, revocation, retention) with shared key vs vault key verdict for each cell. Implementation pattern: issue vault keys with `label = run ID` (attribution), `allowed_endpoints` policy (least-privilege), per-run revocable credential (isolated revocation), Keybrake audit log with export API (retention independent of vendor). FAQ covers SOC 2 audit evidence, GDPR Article 25 application, dynamic endpoint allowlists vs wildcards, and Keybrake's own SOC 2 status.
- [AI agent API key lifecycle — issuance, enforcement, expiration, and revocation for autonomous agents](https://keybrake.com/seo/ai-agent-key-lifecycle): Why shared API keys have no meaningful lifecycle (created once, distributed everywhere, rarely expire) and why the absence of a lifecycle is a governance failure for AI agents. Four phases of a vault key lifecycle: (1) Issuance — `vendor`, `allowed_endpoints`, `daily_usd_cap`, `expires_in` (TTL) set as policy at run start; (2) Active enforcement — every proxied call checks key validity, endpoint allowlist, and cumulative spend counter atomically before forwarding; (3) Expiration — key auto-expires at TTL even if not fully spent; (4) Revocation — emergency `DELETE /keys/{id}` takes effect on next proxied request, logs reason and timestamp. Lifecycle failure modes for shared keys at each phase: issuance has no policy (any run can call any endpoint for unlimited spend), enforcement doesn't exist (vendor accepts any request the key authorizes), expiration never occurs (keys last years until an incident), revocation takes down all agents simultaneously. TTL design table: nightly billing batch (6h), per-user LLM turn (5min), multi-step workflow (SLA + 20%), event-driven webhook handler (10min). Orchestration framework alignment: Temporal (workflow.start / compensation activity), Prefect (flow body / on_failure hook), LangGraph StateGraph (entry node / END node cleanup), Celery (task setup / on_failure callback). FAQ covers in-flight requests at expiration, TTL extension via PATCH, explicit revocation vs TTL for short-lived keys, and multi-day Temporal workflows (per-activity vault keys rather than per-workflow).
- [Deno Deploy AI agent API key management — vault keys for serverless agent functions](https://keybrake.com/seo/deno-deploy-ai-agent-api-key): How Deno Deploy's V8-isolate serverless runtime stores API keys in `Deno.env` (encrypted secrets, available at runtime via `Deno.env.get("KEY")`) but shares them across all concurrent invocations — no per-invocation spend cap, endpoint scope, or independent revocation. Vault key pattern for Deno Deploy: store Keybrake API token in Deno.env (not the Stripe secret), issue vault key at invocation start via fetch to `api.keybrake.com`, use vault key as `Authorization: Bearer` for proxied calls to `proxy.keybrake.com/stripe/...`, revoke in finally block. Credential table: Stripe secret lives in Keybrake secrets vault (never in Deno.env), Keybrake token in Deno.env (issues vault keys), vault key token in function scope (expires in minutes). Multi-tenant pattern: look up tenant Keybrake token from Deno KV at invocation time, issue tenant-scoped vault key — keeps tenant Stripe accounts isolated at vault key level. Fresh framework integration (same pattern, Fresh API route handler). Deno KV caching for high-throughput paths (reuse vault keys with 15-min TTL across requests from same user session). FAQ covers Deno Deploy fetch() permission (no config needed for production; `--allow-net` for local CLI), Fresh framework, per-invocation overhead (20-50ms issuance, 5-15% of total for Stripe-heavy paths), and Deno Queues queue handler pattern.
- [Flask AI agent API key management — scoped vault keys for agent tool functions](https://keybrake.com/seo/flask-ai-agent-api-key): Flask AI agent backends use `os.environ['STRIPE_KEY']` set at process startup and shared across all concurrent WSGI workers — same credential for every user's agent session with no per-request spend cap or endpoint scope. Vault key pattern for Flask: store Keybrake token in Flask config, add `before_request` hook that issues vault key (label = user ID + agent run ID, `allowed_endpoints`, `daily_usd_cap`, short TTL) and stores in Flask's `g` context object, tool functions read `g.vault_key` for proxied calls, `teardown_request` hook revokes key. LangChain custom tool backend integration: `@tool` function reads `g.vault_key` transparently — the tool doesn't know or care whether the credential is a vault key or raw Stripe secret. Flask + Celery async pattern: vault key issued at Celery task start (not in Flask before_request, since `g` doesn't cross the queue boundary); longer TTL for async tasks. Concurrency table: issuance latency (30-80ms), issuance failure handling, revocation failure fallback to TTL, KEYBRAKE_TOKEN shared across workers (expected — it's the master credential, vault keys are the per-request credentials). FAQ covers Flask-RESTful/Flask-RESTX compatibility (`@ns.before_request`), multiple tool calls in one request (one key accumulates spend across all calls, issue per-vendor if calling multiple APIs), and unit test approach (populate `g.vault_key` directly or gate with `USE_VAULT_KEYS` config flag).
- [Django AI agent API key management — scoped vault keys for agent views and Celery tasks](https://keybrake.com/seo/django-ai-agent-api-key): Django AI agent tool backends use `settings.STRIPE_API_KEY` loaded once at startup and shared across every gunicorn worker — all concurrent agent sessions share the same Stripe credential with no per-request spend cap or endpoint scope. Vault key pattern for Django: store Keybrake token in `settings.py` instead of the raw vendor secret, write a Django middleware class that issues a vault key in `process_request` and attaches it to `request.vault_key`, revokes it in `process_response`. DRF serializer integration: vault key travels on the request object via `serializer_context["request"]` — serializer's `create()` reads `request.vault_key` for proxied calls. Django + Celery async pattern: vault key issued at Celery task start in `handle()` rather than in Django middleware (Flask's `g` context doesn't cross the Celery queue boundary); longer TTL for async tasks with per-task revocation in `finally` block including on Celery retry. Performance table: middleware latency (30-80ms synchronous HTTPS), async Django ASGI alternative (httpx.AsyncClient), gunicorn worker isolation (each worker issues independent vault keys using shared KEYBRAKE_TOKEN), Stripe SDK module-level api_key override (use raw HTTP calls to proxy instead of stripe.PaymentIntent.create()). FAQ covers async Django views (use httpx.AsyncClient, sync_to_async for ORM calls), multi-tenancy patterns (include tenant schema name in vault key label for per-tenant spend visibility on top of per-tenant schema isolation), and management commands (issue vault key at `handle()` start, per-worker keys for parallel management commands).
- [Express.js AI agent API key management — scoped vault keys for Node agent backends](https://keybrake.com/seo/express-ai-agent-api-key): Express.js AI agent tool backends use `process.env.STRIPE_KEY` as a module-level global shared across every concurrent request — all users' agents share the same Stripe credential. Vault key pattern for Express: store KEYBRAKE_TOKEN in environment, write async middleware that issues vault key and attaches as `req.vaultKey`, listens on `res.on("finish", ...)` to revoke after response body is flushed. Stripe Node SDK v8+ custom base URL integration: `new Stripe(vaultKey, { host: "proxy.keybrake.com", protocol: "https", path: "/stripe" })` — preserves existing SDK typed methods while routing through Keybrake proxy. Bull/BullMQ async queue pattern: vault key issued inside Bull worker process callback (not Express middleware) with longer TTL; revoked in `finally` block including before Celery-style retry. Concurrency table: middleware latency (30-80ms non-blocking due to Node async model), issuance failure handling (1 retry with 100ms exponential backoff then 503), response streaming and SSE (`res.on("finish")` fires correctly for streaming responses), PM2 cluster mode (each worker issues independent vault keys). FAQ covers Express Router sub-routers with per-vendor vault key scopes (middleware factory pattern), async error handling and `res.on("finish")` behavior for Express error middleware, and vault key TTL calibration from access log P95 latency on agent tool routes.
- [NestJS AI agent API key management — vault keys via interceptors and dependency injection](https://keybrake.com/seo/nestjs-ai-agent-api-key): NestJS AI agent tool backends use `ConfigService.get('STRIPE_KEY')` returning the same value to every concurrent request — singleton services initialized once at application startup share the same credential. Vault key pattern for NestJS: implement `NestInterceptor` with `intercept()` that issues vault key before `next.handle()` and attaches to `request.vaultKey`, pipes observable through `finalize()` operator that revokes vault key after response stream completes. REQUEST-scoped provider integration: `@Injectable({ scope: Scope.REQUEST })` with `@Inject(REQUEST)` gives service access to `request.vaultKey` without controller changes; REQUEST scope cascades to injecting providers. Interceptor registration: global via `APP_INTERCEPTOR` or per-controller via `@UseInterceptors(VaultKeyInterceptor)`. Performance table: interceptor async latency (non-blocking Node event loop), REQUEST scope performance cost (new provider instance per request, dominated by vault key issuance round-trip not DI instantiation), SSE/WebSocket `finalize()` behavior (fires on connection close, use longer TTL for streaming endpoints), singleton vs REQUEST scope cascade rules. FAQ covers NestJS GraphQL resolvers (same HTTP request object, context provides `req.vaultKey` to resolvers, DataLoader fan-out shares one vault key correctly), NestJS microservices with TCP/RabbitMQ/Kafka (use `context.switchToRpc()`, issue vault key at message handler level not HTTP interceptor level), and recommended NestJS module structure for vault key management (VaultModule exporting interceptor and REQUEST-scoped service, imported into agent tool modules not AppModule).
- [Cloudflare Workers AI agent API key management — vault keys for edge AI workflows](https://keybrake.com/seo/cloudflare-workers-ai-agent-api-key): Cloudflare Workers is the runtime for edge AI agent logic — Workers AI for inference, Cloudflare Workflows for multi-step orchestration, AI Gateway for LLM routing, Durable Objects for agent state. Workers Secrets stores API keys at the edge (encrypted, bound to Worker, available via `env.SECRET_NAME`) but shares them across all invocations — no per-execution spend cap or revocation. Pattern: store Keybrake API token in Workers Secrets (not Stripe secret), issue vault key per invocation using `env.KEYBRAKE_TOKEN`, proxy Stripe calls via `proxy.keybrake.com` with vault key as Bearer token, revoke in finally block. Cloudflare Workflows integration: issue vault keys per workflow step (not per workflow) — steps are independently retried and can span minutes to hours; step labels include workflow run ID. Cloudflare AI Gateway vs Keybrake comparison table: AI Gateway proxies LLM providers (OpenAI, Anthropic, Workers AI) and caps LLM token cost; Keybrake proxies vendor SaaS APIs (Stripe, Twilio, Resend) and caps dollar spend on vendor charges — complementary, not competing, can both be active simultaneously. Durable Objects for vault key session caching (cache 15-min vault key in DO for high-throughput Workers that process many short requests). KV for vault key caching across Workers invocations (KV TTL set to vault key TTL minus 2-minute buffer). FAQ covers edge latency to proxy.keybrake.com (10-50ms from Cloudflare PoP to proxy, no special allowlist needed), KV caching pattern, multi-vendor vault keys in one invocation (`Promise.all` parallel issuance), and the security property of storing Keybrake token vs Stripe secret in Workers Secrets (Keybrake token can only issue capped, scoped vault keys; Stripe secret would allow unlimited direct calls).
- [Go AI agent API key management — vault keys via context.Context propagation](https://keybrake.com/seo/go-ai-agent-api-key): Go AI agent tool backends initialize vendor API clients as package-level singletons — `stripe.Key = os.Getenv("STRIPE_SECRET_KEY")` at startup, shared across all goroutines. Vault key pattern for Go: define an unexported struct context key type (prevents key collisions from other packages), implement `vault.Issue(ctx, label, vendor, cap, endpoints, ttl)` that issues a vault key and returns a new context with the key stored via `context.WithValue`, and `vault.FromContext(ctx)` to retrieve it. Middleware wraps the HTTP mux with `VaultMiddleware` that issues the key in `preHandle`, enriches `r.Context()`, and defers `vault.Revoke(context.Background(), keyID)`. Tool handler functions read the vault key from context and call `proxy.keybrake.com` via raw `net/http` instead of the stripe-go SDK. Temporal activity integration: one vault key per activity (not per workflow) — matches Temporal's retry model where activities can be independently retried; issue at activity start with `activity.GetInfo(ctx)` for run-scoped labels. Four-row scope table: HTTP handler / HTTP multi-step / Temporal activity / goroutine worker pool with recommended vault key granularity and TTL per pattern. FAQ covers stripe-go SDK integration via custom `http.RoundTripper`, vault key issuance failure handling with `context.WithTimeout` and fail-open default, and unexported context key type rationale from Go's `context` package godoc.
- [Ruby on Rails AI agent API key management — vault keys via before_action and CurrentAttributes](https://keybrake.com/seo/ruby-rails-ai-agent-api-key): Rails AI agent tool backends configure vendor clients in initializers — `Stripe.api_key = ENV['STRIPE_SECRET_KEY']` in `config/initializers/stripe.rb` — shared across all Puma threads. Vault key pattern for Rails: create `VaultKeyService` with `issue` (Net::HTTP POST to api.keybrake.com with 2s open timeout / 5s read timeout) and `revoke` (Net::HTTP DELETE) class methods; use `ActiveSupport::CurrentAttributes` subclass (`Current`) to store `vault_key_token` and `vault_key_id` as thread-local request state automatically reset between requests; add `before_action :issue_vault_key` and `after_action :revoke_vault_key` to `AgentToolsController` (not `ApplicationController`). `ProxiedStripeService` reads `Current.vault_key_token` and calls `proxy.keybrake.com` via Net::HTTP instead of the stripe gem. Sidekiq background job pattern: `Current` attributes don't carry across job boundaries — issue vault keys at the top of `perform` and revoke in `ensure` block. Three-row context table: controller action (Puma thread / `Current` / `before_action` / `after_action`), Sidekiq job (`Current` / top of `perform` / `ensure`), Rack middleware (request env / middleware `call` / after `app.call` returns). FAQ covers stripe-ruby gem compatibility (raw Net::HTTP to proxy is cleaner than overriding `Stripe.api_base` — fragile with Puma thread pool), `CurrentAttributes` thread-safety guarantee (Rails calls `Current.reset` between requests via `CurrentAttributes::ExecutionContextCallback` middleware), and multi-call requests (one vault key per request, `allowed_endpoints` and `daily_usd_cap` apply across all calls in that request).
- [AI agent circuit breaker — spend-aware circuit breaker for vendor API calls](https://keybrake.com/seo/ai-agent-circuit-breaker): Standard circuit breakers (closed/open/half-open on failure rate) don't protect AI agents — a stuck loop calling Stripe 200 times can have 0% failure rate while burning $10,000. Spend-aware circuit breaker adds a cost accumulator: `record_success(cost_usd=...)` increments cumulative spend and trips the circuit when spend exceeds `spend_cap_usd`; `record_failure()` counts errors and trips on `failure_threshold`; `allow_request()` returns False when the circuit is open and recovery timeout hasn't elapsed. Python dataclass implementation with `CircuitState` enum, monotonic clock spend window, and `SpendCapExceeded` exception. Spend parsing helpers per vendor: Stripe `amount / 100`, Twilio `abs(float(price))`, Resend flat rate per call. Failure mode comparison table: four agent failure types × failure rate × spend impact × whether standard CB protects (retry loop on 402 → yes; duplicate success loops → no; stuck LLM tool call → no; pagination enumeration → no). Three-layer protection comparison: in-process atomic counter (0ms, single-process only), Redis atomic counter (1ms, distributed), Keybrake vault key cap (30-80ms, server-side authoritative). Temporal integration: mark `SpendCapExceededException` as `non_retryable_error_types` to prevent Temporal from retrying a cap-hit as a transient error. FAQ covers right cap sizing formula (`expected_max_spend × 1.5–2×`), whether to halt entire agent or just the vendor call (depends on whether vendor is on critical path), and Keybrake 429 structured error body for distinguishing cap events from rate limits.
- [AI agent vendor isolation — scoping credentials per agent, user, and vendor](https://keybrake.com/seo/ai-agent-vendor-isolation): Vendor isolation is the principle that each agent run gets its own scoped credential per vendor — not a shared production key. Blast radius formula: `accessible_resources × accessible_actions × spend_ceiling` — with a shared key, all three terms are unbounded; with a vault key policy, all three are bounded. Four isolation dimensions: (1) credential separation (one key per agent run enables attribution, independent revocation, and per-run audit); (2) endpoint allowlisting (only the specific API paths the agent's task requires, not the full API surface — table of four agent capabilities × required Stripe endpoints × NOT required); (3) spend caps (per-run hard ceiling converting worst-case from "unlimited" to "bounded" — cap calculation: `expected_max_spend × 1.5–2×`); (4) time bounds / TTL (credential expires after run's expected duration — table of four agent patterns with TTL recommendation and rationale). Multi-vendor per-run credential setup: async `setup_isolated_credentials(run_id, task)` issues one vault key per vendor (Stripe, Twilio, Resend) with task-specific caps and endpoints. Vendor isolation vs authentication distinction table: question answered / who controls it / granularity / what it stops. Partial isolation without a proxy: Stripe Restricted API Keys (endpoint restrictions, no TTL or spend caps), Twilio subaccounts (billing isolation, operational overhead), Resend (no per-key restrictions as of 2026). FAQ covers read-only agent isolation (credential separation and endpoint allowlisting still matter for PCI/GDPR data exfiltration risk), multi-tenant SaaS isolation (per-tenant `merchant_allowlist` policy field), and minimum viable isolation without a proxy (restricted keys + per-user keys + Stripe webhook alerts for high-value events).
- [Laravel AI agent API key management — vault keys via middleware and service container](https://keybrake.com/seo/laravel-ai-agent-api-key): Laravel applications configure vendor API keys in service providers using the IoC container — `app()->singleton(StripeClient::class, ...)` — sharing a single Stripe client across every request in the PHP-FPM worker or Laravel Octane server. For AI agent tool controllers, all concurrent agent runs use the same Stripe key with no per-run spend cap or audit attribution. Vault key pattern for Laravel: register `VaultKeyService` as singleton (stateless HTTP caller) and `VaultKeyAccessor` is implicit via request attributes; write `VaultKeyMiddleware` that issues vault key and stores it in `$request->attributes->set('vault_key', $vk)`, revokes in cleanup after `$next($request)` returns; register middleware on agent tool routes only via `Route::middleware(['auth:sanctum', VaultKeyMiddleware::class])->prefix('agent/tools')`. `VaultKeyService` uses Laravel's `Http::withToken()->timeout(2)->post()` (fail open if status != 201). Agent controller reads vault key via `$request->attributes->get('vault_key')` and calls `proxy.keybrake.com` with vault token. Laravel Horizon queue job pattern: no middleware lifecycle — issue at top of `handle()`, revoke in `finally` block. Context table: HTTP (PHP-FPM), HTTP (Octane — verify singleton flush config), Horizon queue job, Artisan command. FAQ covers Laravel Cashier compatibility (Cashier uses its own key for billing flows; vault keys are for autonomous agent calls only), Octane singleton lifecycle (VaultKeyService is stateless so safe; store vault key in request attributes not service properties), and fail-open vs fail-closed rollout (feature flag `services.keybrake.fail_closed`).
- [Rust AI agent API key management — vault keys via Axum middleware and task-local storage](https://keybrake.com/seo/rust-ai-agent-api-key): Rust AI agent backends store vendor API keys in `Arc<AppState { stripe_key: String }>` shared via Axum's `State` extractor — every async Tokio task reads the same Stripe key from shared state. Vault key pattern for Rust: implement `VaultKey` struct with `id`, `token`, and `Drop` trait that spawns an async revocation task via `tokio::spawn`; `VaultKey::issue()` POSTs to api.keybrake.com with 2s timeout using reqwest; Axum middleware function (`async fn vault_key_middleware`) issues vault key and inserts `Arc<VaultKey>` into request extensions; handlers extract with `Extension(vault_key): Extension<Arc<VaultKey>>`; Arc drop at handler return triggers Drop → spawned async revoke. Pattern table: HTTP handler (Extension extractor / Arc drop at handler return), Tokio task (pass Arc as task parameter / Arc drop at task completion), tonic gRPC (tower::Layer intercept / drop at end of handler), Temporal Rust SDK activity (issue in activity function / explicit revoke + drop). FAQ covers async-stripe crate compatibility (raw reqwest to proxy.keybrake.com is cleaner than overriding StripeClient backend URL), `Axum State` vs `task_local!` choice (use Extension — visible in function signatures, composes with extractors, Arc ref count drives cleanup), and optional extension handling for fail-open (`Option<Extension<Arc<VaultKey>>>`).
- [AI agent sandbox execution — credential safety in isolated code execution environments](https://keybrake.com/seo/ai-agent-sandbox-execution): AI agents executing code in sandboxes (E2B, Modal, Fly Machines, Docker, Firecracker VMs) face a credential risk that OS-level isolation doesn't solve: API keys injected as env vars are visible to all subprocess code the agent generates. gVisor and Firecracker prevent OS escape; they don't redact `STRIPE_SECRET_KEY` from `os.environ`. Threat comparison table: six threat types × whether protected by sandbox isolation × whether protected by vault key (sandbox protects host escape and host network; vault key protects env var exfiltration, runaway spend loops, disallowed endpoint calls, and persistent credentials after sandbox destruction). Pattern: issue vault key before spawning sandbox with TTL matching max sandbox lifetime; inject vault token (not real key) as `STRIPE_SECRET_KEY` env var alongside `STRIPE_API_BASE=https://proxy.keybrake.com/stripe`; teardown hook calls explicit revoke after sandbox kill (TTL is safety net). Platform injection table: E2B (env_vars dict), Modal (modal.Secret.from_dict), Fly Machines (config.env), Docker (-e flag). Defense-in-depth stack: (1) network egress policy blocking api.stripe.com directly, (2) vault key endpoint allowlist, (3) vault key spend cap, (4) vault key TTL matching sandbox lifetime, (5) OS sandbox isolation. FAQ covers agent-generated code revoking its own vault key (possible if key ID is in env — design decision; deliberately omitted by default), preventing direct api.stripe.com calls (network policy + vault token isn't valid for Stripe directly), and right TTL sizing (max sandbox lifetime × 1.2).
- [.NET C# AI agent API key management — vault keys via ASP.NET Core middleware and scoped DI](https://keybrake.com/seo/dotnet-ai-agent-api-key): ASP.NET Core registers vendor API clients as singletons — `builder.Services.AddSingleton<StripeClient>(...)` — sharing a single instance across all HTTP requests. For AI agent tool controllers, all concurrent agent invocations share the same Stripe key. Vault key pattern for .NET: register `VaultKeyService` as singleton (stateless) and `VaultKeyAccessor` as scoped (per-request, holds token); write `VaultKeyMiddleware` that calls `service.IssueAsync()`, sets `accessor.KeyId`/`accessor.Token`, calls `await _next(context)` in try/finally that revokes; configure with `app.UseWhen(ctx => ctx.Request.Path.StartsWithSegments("/api/agent/tools"), branch => branch.UseMiddleware<VaultKeyMiddleware>())`; inject `VaultKeyAccessor` into agent controllers and use `accessor.Token` for proxy calls. Microsoft Semantic Kernel integration: `StripePlugin` takes `VaultKeyAccessor` constructor injection, uses accessor.Token in `[KernelFunction]` attributed method. DI lifetime table: ASP.NET Core controller (Scoped, middleware finally), Minimal API (Scoped, same), Hangfire (N/A — instantiate manually, finally block), Worker Service (IServiceScopeFactory.CreateScope per iteration, dispose scope). FAQ covers Hangfire compatibility (`IServiceScopeFactory` or AspNetCoreJobActivator for per-job scope), Polly retry policies (configure ShouldHandle to distinguish cap_exhausted 429 from transient 429 — non-retryable exception on cap), `AddSingleton` vs `AddScoped` choice (VaultKeyService singleton, VaultKeyAccessor scoped — VaultKeyService is stateless and holds the IHttpClientFactory client to avoid socket exhaustion).
- [AI agent cost allocation — attributing vendor API spend to agents, users, and features](https://keybrake.com/seo/ai-agent-cost-allocation): Shared API keys make vendor API spend completely unattributable: you see the monthly Stripe bill but can't tell whether $2,400 in payment intent fees came from the billing agent, onboarding flow, or a stuck loop. Vault key labels are structured cost attribution tags: design as `{tenantId}:{agentType}:{feature}:{runId}` (colon-separated, parseable by GROUP BY LIKE). Blind spot comparison table: four cost questions × shared-key answer (unknown) × vault-key-label answer (specific query). Label design pattern: `build_vault_key_label(tenant_id, agent_type, feature, run_id)` truncated to 120 chars. Cost allocation queries: `get_spend_by_tenant(days=7)` aggregates audit log entries by label prefix; `get_spend_by_agent_type(days=7)` groups by second label segment. Spend cap as budget reservation: cap hierarchy (platform > tenant-level > per-run); `calculate_vault_key_cap(tenant_plan, agent_type, feature)` returns per-run cap from plan base × agent multiplier (billing-agent 1.0×, notify-agent 0.1×). Showback vs chargeback vs budget-alerts comparison table. Chargeback integration: nightly query of audit log per tenant → Stripe Meters API (`billing.meter_event.create`) or Lago/OpenMeter for usage-based billing. FAQ covers Stripe refunds in cost allocation (audit log entries have negative cost_usd for refunds; sum net spend per billing period), per-feature attribution within a single run (issue separate vault keys per feature — higher overhead, finer attribution; hybrid: per-run key for low-cost features, per-feature key for high-cost ones), and multi-vendor runs (one vault key per vendor per run with same label — audit log shows Stripe vs Twilio vs Resend breakdown by vendor column).
- [Next.js AI agent API key management — vault keys via Route Handlers and middleware](https://keybrake.com/seo/nextjs-ai-agent-api-key): Next.js is the most common full-stack framework for AI agent tool backends — Route Handlers under `app/api/` serve OpenAI function calls, LangChain.js tool invocations, or Vercel AI SDK streamText requests that call downstream vendor APIs. The standard pattern reads `process.env.STRIPE_SECRET_KEY` in each Route Handler — a single long-lived credential shared by every concurrent request from every user's agent. Vault key pattern for Next.js: issue a vault key at the top of each Route Handler (`POST https://api.keybrake.com/v1/keys`), use the returned token against `proxy.keybrake.com/stripe` instead of the Stripe SDK, revoke in a `finally` block after the handler returns. For Vercel AI SDK streaming with `streamText`, issue vault key before the stream setup, pass via closure to tool `execute` functions, revoke in `onFinish` callback. Deployment context table: Vercel Serverless Functions (one invocation = natural isolation, cold starts add ~100ms to first issuance), Vercel Edge Runtime (fetch-only, `cache: 'no-store'` required), Next.js on Node.js self-hosted (one vault key per async handler, module-level client instances are shared but vault keys are not), Docker/Railway/Fly.io (same as self-hosted, no cross-replica coordination needed). Shared `withVaultKey` helper extracts issuance/revocation from individual Route Handlers. FAQ covers Server Actions (issue inside action function, not middleware), Next.js middleware limitations (can't pass vault key to Route Handler via request mutation in App Router), and App Router fetch caching (`cache: 'no-store'` on all Keybrake API calls, `export const dynamic = 'force-dynamic'` for transactional Route Handlers).
- [AI agent API gateway — routing, policy enforcement, and spend control for multi-vendor agent calls](https://keybrake.com/seo/ai-agent-api-gateway): As autonomous agents call Stripe, Twilio, Resend, Shopify, and other vendor APIs, teams need a control plane between the agent and the vendor. An AI agent API gateway differs from LLM gateways (LiteLLM, Kong AI Gateway) in addressing vendor-side risks: unbounded spend in retry loops (no dollar-total stop in traditional rate limiters), credential scope creep (full API key grants all endpoints), spend attribution collapse (all charges from all agents appear under one key), and emergency revocation speed (rotating a vendor key disrupts all agents; revoking one vault key takes <100ms). Architecture: the gateway authenticates inbound requests using short-lived vault keys (not long-lived vendor API keys), enforces per-agent policies (spend caps, endpoint allowlists, TTLs), translates vault key requests to real vendor API calls by swapping vault key → real vendor key, logs every request and its cost parsed from vendor response bodies (Stripe: `amount` field; Twilio: `price` field; Resend: fixed $0.001/email). Self-hosted minimal implementation covers vault key store (SQLite: `vault_keys` + `audit_log` tables), policy enforcement (token lookup → revoked check → expiry check → cap check → endpoint allowlist match), vendor proxy handler (URL pattern `/{vendor}{path}`, auth swap, pipe to vendor), and per-vendor cost parsers. Build-vs-buy matrix: self-hosted appropriate for teams with compliance requirements preventing third-party proxies or 10+ non-standard vendor APIs; managed (Keybrake) appropriate for teams calling standard vendors (Stripe, Twilio, Resend, Shopify, Postmark, Segment). Vault key API design: `POST /v1/keys` (issue with label, vendor, allowed_endpoints, daily_usd_cap, expires_in), proxy call via `proxy.keybrake.com/{vendor}{path}`, `DELETE /v1/keys/:id` (revoke). FAQ covers the LLM gateway distinction (LiteLLM handles LLM token budgets; agent API gateway handles vendor spend — complementary, not competitive), latency impact (self-hosted: 1–5ms; managed: 10–30ms; negligible vs LLM inference 500ms–5s), webhook verification (outbound only; inbound Stripe webhooks use separate signing secret), and vault key granularity recommendation (per-agent-run is the right default — aligns key lifetime with task lifetime, keeps cap meaningful).

## LLM-proxy landscape (for disambiguation)

Keybrake is often mis-categorised as an LLM gateway. It is not. These pages exist to route readers correctly.

- [LiteLLM alternative for Stripe](https://keybrake.com/seo/litellm-alternative-for-stripe): Hard-honest opener — "LiteLLM is an LLM proxy; Stripe is not an LLM." Explains why pointing an LLM gateway at a SaaS-tool API fails on three technical fronts (path schemas, response parsing, auth envelope) and why the right 2026 stack is dual-proxy: LLM gateway (LiteLLM/Portkey/Helicone) for model traffic + SaaS-tool governance proxy (Keybrake) for money-moving API traffic, joined on `agent_run_id`.
- [LiteLLM alternatives — honest open-source review](https://keybrake.com/seo/litellm-alternatives-open-source): Five-option review of Portkey Gateway, Helicone, LangGate, OpenRouter proxy, and Bifrost, each with "where it shines / where it falls short of LiteLLM" analysis. Ends with the pivot — if the incident was an agent burning Stripe/Twilio/Resend (not OpenAI), the LLM-gateway category is the wrong answer.
- [LiteLLM Proxy alternatives — six gateways for the proxy-server shape](https://keybrake.com/seo/litellm-proxy-alternatives): Narrow-intent companion focused specifically on the LiteLLM Proxy server (vs the LiteLLM SDK), for searchers who want gateway-shape alternatives — three LLM-specific (Portkey Gateway, Bifrost, LangGate) and three generic API gateways with LLM plugins (Envoy AI Gateway, Kong AI Gateway, Apache APISIX ai-proxy). Opens with the LiteLLM-Proxy-vs-LiteLLM-SDK disambiguation that drives different shortlists. Five proxy-shape concerns frame the comparison: deployment topology (sidecar / standalone / k8s operator / filter-on-existing-gateway), hot-reload of virtual keys and policies, multi-tenant isolation, route-level fallback configuration expressiveness, streaming and SSE behaviour. Per-vendor "where it wins / where it loses" on those concerns. Seven-row capability matrix scores LiteLLM Proxy (baseline) plus the six alternatives across the five concerns. Three-branch decision rule: Branch A discrete LLM gateway (LiteLLM Proxy / Portkey / Bifrost / LangGate), Branch B add LLM plugins to existing API gateway (Envoy / Kong / APISIX), Branch C the answer is observability (Helicone), not a different proxy. Closes with the dual-proxy framing — LLM gateway from this list + SaaS-tool governance proxy (Keybrake) joined on `agent_run_id` — and the worst-case math that motivates it ($360/hour for a stuck gpt-4o loop vs $5.4M/hour for a stuck Stripe refund loop at the same QPS).

## Competitor comparisons

Buyer-intent pages for readers shopping the adjacent LLM-gateway category. Each page explains honestly that Keybrake is complementary, not a direct alternative, to the named competitor — and identifies the narrow case where it is.

- [LiteLLM alternative](https://keybrake.com/compare/litellm-alternative): When Keybrake actually replaces LiteLLM (agents that hit no LLMs directly) vs when it sits beside it (most production stacks). Includes the dual-proxy diagram and the `x-agent-run-id` audit-join pattern.
- [LiteLLM vs Keybrake](https://keybrake.com/compare/litellm-vs-keybrake): Side-by-side on 10 dimensions — vendor coverage, spend-cap unit, cost source, revoke latency, audit shape, pricing, best-for persona.
- [Portkey alternative](https://keybrake.com/compare/portkey-alternative): Portkey governs LLM traffic; Keybrake governs SaaS-tool traffic. The compare matrix maps eight governance concerns (LLM spend cap, Stripe spend cap, customer allowlist, endpoint allowlist, mid-run revoke, audit) to Yes/No for each tool.
- [Portkey vs Keybrake](https://keybrake.com/compare/portkey-vs-keybrake): Direct head-to-head table plus detailed differences on cost accounting (token-table vs vendor-parse), scope (model allowlist vs customer allowlist), and why Keybrake has no cache (mutating traffic).
- [Helicone alternative](https://keybrake.com/compare/helicone-alternative): Observability-first (Helicone) vs governance-first (Keybrake) stance. Helicone surfaces patterns in past LLM traffic; Keybrake prevents certain SaaS calls from happening.
- [Helicone vs Keybrake](https://keybrake.com/compare/helicone-vs-keybrake): Head-to-head on 11 dimensions including centre-of-gravity (dashboard-first vs policy-editor-first), cost-accounting shape, and audit-trail consumer (AI-ops engineer vs ops-risk / compliance reviewer).
- [Stripe Projects alternative](https://keybrake.com/compare/stripe-projects-alternative): Honest assessment of what Stripe Projects covers (billing-aggregation monthly cap across 32 named partner vendors: Twilio, Cloudflare, Vercel, Supabase, Hugging Face, AgentMail, etc.) and what it doesn't. The four gaps Stripe Projects leaves open: (1) No pre-call enforcement — monthly aggregation fires after the billing period closes; every charge in a stuck loop has already executed before the cap becomes relevant; (2) No per-call audit log — billing summary only, no per-call record of endpoint, customer ID, params, parsed cost, or policy verdict; (3) No mid-run revoke — no advertised sub-second revocation path for a running agent; (4) Fixed 32-partner vendor list — any API not in Stripe's partner ecosystem is outside Stripe Projects' visibility. Decision table (7 scenarios) for when Stripe Projects is enough vs when to add a pre-call enforcement proxy. FAQ covers whether Keybrake works alongside Stripe Projects (yes — different enforcement layers, no conflict), whether Keybrake replaces Stripe Projects (no — Keybrake is not a billing platform), and parameter-level scope enforcement (yes, below HTTP path to parameter values).
- [Stripe Projects vs Keybrake](https://keybrake.com/compare/stripe-projects-vs-keybrake): Direct side-by-side on 8 governance dimensions. Stripe Projects: token-issuance layer, post-billing enforcement (monthly aggregation, charge already happened), 32 named partner vendors, no per-call audit log, no advertised mid-run revoke, no infrastructure to operate. Keybrake: reverse proxy, pre-call enforcement (request is blocked before it reaches the vendor), any HTTP API, per-call audit log (endpoint + params + vendor-parsed cost + policy verdict + agent_run_id), vault-key revoke in median <5s, two env var changes to integrate. The enforcement-layer distinction: Stripe Projects is like a monthly credit-card limit (cap fires at billing aggregation); Keybrake is like a per-call pre-authorization check (blocks before the charge is created). Key use cases for each: Stripe Projects for billing-platform integration and 32-partner monthly spend aggregation with no infrastructure ops; Keybrake for pre-call blocking, per-call audit, sub-second revoke, or APIs outside the 32-partner list. Composition pattern: Keybrake enforces at request time, charge flows through to Stripe's billing graph, Stripe Projects aggregates at billing time — two enforcement layers on the same stack.

## Free tools

- [Stripe Restricted Key picker](https://keybrake.com/tools/stripe-key-picker/): Interactive picker. Tick the operations your AI agent performs against Stripe — charge new cards, charge saved customer cards off-session, refund a charge, customer CRUD, manage subscriptions, send invoices, generate Checkout Sessions, generate Payment Links, manage products & prices, read transactions for reporting, trigger payouts, manage Connect accounts, create or update webhook endpoints, respond to disputes (14 use cases). Outputs a deduplicated list of Stripe Restricted Key resource permissions to tick in the Dashboard, sorted by blast-radius tier (extreme / high / medium / low) and grouped resource:permission. Surfaces warnings for known dangerous combinations: Charges Write + Refunds Write enables stuck refund-loop drain; Payouts Write is irreversible cash exfil; WebhookEndpoints Write lets a leaked key redirect future event delivery (Stripe retries 3 days even after revoke); ConnectedAccounts Write is identity-tier compromise across the platform. The mapping is hand-curated against the resource list in Stripe's `Create restricted key` Dashboard form (not auto-generated). 14 Stripe resources covered. Runs entirely client-side, no sign-up. Frames the picker as the scope-restriction layer alongside Keybrake's USD cap + sub-second revoke + audit log layer.
- [Agent blowout calculator](https://keybrake.com/tools/blowout-calculator/): Embeddable client-side calculator. Pick a vendor (Stripe / Twilio / Resend / OpenAI) and a calls-per-minute rate, see the 24-hour no-cap blowout vs. the capped figure under Keybrake's suggested per-vendor daily cap. Two-line embed, ~8 KB JS, no dependencies, paste-anywhere license with attribution. Intended use: engineers can drop it on their own blog or docs to visualize per-vendor agent cost risk. Same math Keybrake's proxy runs for real enforcement.

## Long-form writing

- [The control plane problem: why your AI agent fleet needs a vendor API gateway](https://keybrake.com/blog/ai-agent-vendor-api-gateway): Architecture essay for senior engineers building multi-agent systems. Opens with the scale inflection point: SaaS API keys were designed for deterministic server-to-server callers, not goal-directed agents that decide dynamically how many vendor calls to make. Explains the two gaps in the existing gateway landscape: (1) LLM gateways (LiteLLM, Portkey, OpenRouter) are blind to vendor API traffic — they speak OpenAI-compatible endpoints, count tokens, and enforce token-denominated budgets; when the agent's charge_customer tool fires, that call goes directly to api.stripe.com, not through LiteLLM; (2) Traditional API gateways (Kong, AWS API Gateway, Nginx) enforce policy on requests by IP and JWT, not on per-session dollar spend — they can rate-limit by request-per-second but not by total USD spent in a run, and they have no mechanism to revoke one agent run's credential without affecting all 49 concurrent runs sharing the same static API key. Four required properties for a vendor API gateway for agents, each in a property card: (1) Short-lived per-run credentials — the agent holds a vault key that expires when the run ends; revocation is one row update, no propagation delay; (2) Per-session spend tracking with vendor-specific cost parsing — Stripe: parse `amount` from PaymentIntent response (only on succeeded/requires_capture status); Twilio: parse `price` field or apply per-SMS rate table; Resend: fixed per-delivery rate against plan; cost parsers must be maintained as vendor APIs evolve; (3) Endpoint allowlisting at policy layer — per-vault-key `allowed_endpoints` enforced on every forwarded request; an agent trying to call DELETE /v1/payment_methods when its policy allows only POST /v1/invoices gets 403 before Stripe sees the request; prompt injection cannot escalate past the policy; (4) Sub-second revocation with no collateral damage — DELETE vault key call, one row update, enforced on next request from that run; real vendor key never rotated; 49 other concurrent runs unaffected. Architecture diagram showing before/after (direct vendor calls vs. gateway proxy) and the gateway per-request flow (authenticate → check endpoint → check spend → swap credential → parse response cost → log). Code examples: POST /v1/keys with daily_usd_cap + allowed_endpoints + expires_in, vault key response with policy, DELETE revoke call. Implementation note on the critical detail: cost parsing from response, not request — why a generic proxy with a rate-limit plugin doesn't work (cost semantics are baked into each vendor's data model). Build vs use comparison: build for non-standard vendors or compliance isolation; use managed (Keybrake) for Stripe/Twilio/Resend with maintained parsers and a dashboard. Closes with the control plane principle: short-lived credentials for automated systems is older than AI agents — same reason IAM issues session credentials and Vault manages secrets. Internal links to /blog/agent-governance-stack-2026, /blog/rotate-vs-revoke-2am-playbook, /seo/ai-agent-api-gateway. ~2,000 words. Keywords: ai agent api gateway, ai agent vendor api gateway, ai agent control plane, multi-vendor agent architecture, agent api key management, ai agent spend cap, vault key proxy.
- [OpenAI Agents SDK + Stripe: wiring function tools safely](https://keybrake.com/blog/openai-agents-sdk-stripe): Practical post for engineers adding Stripe tools to an OpenAI Agents SDK agent. Opens with the baseline @function_tool + stripe.PaymentIntent.create pattern (12 lines) and names the property it relies on: the decorator is a capability boundary, not a safety boundary. Three concrete gaps: (1) No per-run spend cap — the agent has access to whatever STRIPE_SECRET_KEY can do; a billing agent processing 500 invoices uses the same key for invoice 1 and 500 with no automatic stop; a stuck reasoning loop runs until context exhaustion; (2) No sub-second mid-run revoke — the only stop is rotating the upstream Stripe key, which invalidates the secret everywhere and takes up to five minutes to propagate; (3) No per-call audit with agent context — Stripe logs calls by IP and API key, not by agent_run_id or model invocation; post-incident reconstruction is manual timestamp correlation across application logs and the Stripe dashboard. Notes that the OpenAI Agents SDK amplifies these gaps via multi-agent handoffs and concurrent tool execution — a parent agent spawning five sub-agents, each sharing STRIPE_SECRET_KEY, compresses the time from "no per-run budget" to "significant spend." Explains why Stripe Restricted Keys close scope but leave the spend gap, revoke gap, and audit gap open. The fix: two-line change — swap stripe.api_key to VAULT_KEY and set stripe.base_url to the proxy; tool logic, agent definition, and runner call unchanged. Vault key issuance pattern: POST /keys before every Runner.run_sync call with daily_usd_cap, allowed_endpoints, max_amount_per_call_usd, expires_in, and metadata.agent_run_id. Policy fields mapped to each gap. Kill switch: single DELETE call, sub-second, real Stripe secret unchanged. Audit table example with six columns (ts, agent_run_id, endpoint, amount_usd, cap_usage_after, verdict) across four allowed rows and two cap_exceeded rows — compared to what Stripe's dashboard shows for the same event. Error handling section: distinguishing proxy-returned 429 (daily_spend_cap_exceeded) from real Stripe rate limit; why explicit "Do not retry" in tool return string matters for model behavior. Multi-agent pattern: per-sub-agent vault keys (one issued per customer in a handoff topology) for per-agent attribution, per-agent caps, per-agent revoke. Internal links to /seo/openai-agents-sdk-stripe, /blog/langchain-stripe-spend-cap, /blog/stripe-agent-toolkit-off-switch, /blog/multi-agent-api-key-sharing, /blog/rotate-vs-revoke-2am-playbook. ~1,700 words. Keywords: openai agents sdk stripe, openai agents sdk function tool, openai agents sdk stripe spend cap, openai agents sdk payment, ai agent stripe spend control.
- [Five things multi-agent systems break when they share an API key](https://keybrake.com/blog/multi-agent-api-key-sharing): Practical post for engineers building multi-agent systems with LangGraph, CrewAI, or AutoGen. Opens with the idiomatic shared-key pattern across all three frameworks and explains why it looks correct in development but breaks in production. Five concrete breakages covered: (1) Attribution collapse — all charges from all agents appear under one key; can't tell which agent spent what without metadata consistency the tools rarely provide; per-agent vault keys give per-agent event streams queryable by GROUP BY vault_key_id. (2) Rate-limit contention — Stripe/Twilio rate limits are per-key, not per-agent; a research agent's call burst triggers 429s that propagate to a billing agent with nothing to do with the burst; per-agent vault keys give each agent its own daily request budget. (3) Blast radius on compromise — rotating a shared key kills all agents; vault keys can be revoked per-agent in sub-second; the real Stripe key stays in the proxy and is never rotated. (4) Scope too broad — a shared key must be as permissive as the most-permissive agent; least-privileged agents are silently over-provisioned; each vault key gets an `allowed_endpoints` policy enforced at the proxy; prompt injection cannot escalate past the policy. (5) Audit log collapse — one event stream, can't reconstruct per-agent call history without cross-source correlation; the proxy records vault_key_id + endpoint + amount_usd + vendor_request_id on every call, per-agent granularity across all vendors. Per-agent vault key pattern shown in LangGraph (one StripeClient per node), CrewAI (vault key passed to tool initializer per agent role), and AutoGen (function tool factory with per-assistant client). 5-column table mapping each breakage to the vault key control. Notes that the only code change is the API key value and base URL — tool definitions, graph topology, crew config, and agent prompts are untouched. Vault keys are short-lived (hours), auto-expired, and revocable per-agent. Internal links to /blog/stripe-restricted-key-not-restricted-enough, /seo/crewai-api-key-management, /seo/autogen-agent-api-key. ~1,700 words. Keywords: multi-agent api key sharing, multi-agent system API key, crewai api key management, langgraph agent api key, autogen agent api key, multi-agent stripe api key, per-agent vault key.
- [Budget alerts for AI agents: four patterns ranked by how late they fire](https://keybrake.com/blog/ai-agent-budget-alerts): Practical post ranking the four common AI agent budget monitoring patterns by how late they fire — and whether they prevent spend or merely report it. Pattern 1 (slowest): cloud billing alarms (AWS CloudWatch, GCP Budget API, Azure Cost Management) — billing data syncs with 8–48h lag; blind to vendor-API spend (Stripe/Twilio/Resend don't appear on AWS bills). Pattern 2: vendor dashboard spend threshold emails (Stripe billing alerts, Twilio spend threshold, OpenAI usage limits) — fire 15–60 minutes after the threshold is crossed; account-level not agent-level; don't aggregate across vendors. Pattern 3: agent-side usage counters (class-level accumulators, LangChain BaseCallbackHandler, LangGraph usage state) — fire immediately within-process but reset on restart, count calls not dollars without response parsing, don't aggregate across concurrent instances without a shared persistent store. Pattern 4 (fastest): pre-call proxy enforcement — fires before the API call is forwarded, zero spend occurs at or above the cap, persists across restarts and concurrent workers because the accumulator lives in the proxy database, exact to the cent via vendor response body parsing. Key distinction: patterns 1–3 are alert patterns (report after); pattern 4 is an enforcement pattern (prevents before). 5-column comparison table (alert latency / blocks spend / per-agent scope / dollar-accurate / multi-instance safe) across all four patterns. Practical layering guidance: proxy as primary enforcement + vendor threshold at 2× proxy cap as bypass canary + cloud billing as infrastructure backstop + agent-side counter for dev-environment soft guidance. Vault key policy example for Stripe agent ($100/day cap, specific endpoint allowlist, 8h expiry). Audit log section: per-call records (vault key, endpoint, parsed cost, policy verdict) make cap events queryable — distinguishes single large charge from high-frequency loop. Out of scope: LLM inference cost controls (LiteLLM/OpenAI usage limits), proactive budget forecasting. Internal links to /blog/langchain-stripe-spend-cap, /blog/ai-agent-twilio-security, /blog/agent-audit-trail-schema. ~1,700 words. Keywords: ai agent budget alerts, ai agent spend alerts, ai agent budget monitoring, ai agent cost control, agent budget enforcement, ai agent spending limit.
- [AI agent Twilio security: four controls that prevent the $1,200 SMS bill](https://keybrake.com/blog/ai-agent-twilio-security): Practical post for engineers running AI agents that call the Twilio SMS API. Opens with the risk: a standard Twilio auth token grants full account access to send to any number, any country, any volume with no per-key spend cap. Three failure modes covered with detail: (1) retry storm — network timeout causes agent or framework to retry 3–6×, duplicating every message sent; at international rates ($0.09–$0.30/SMS), 6× duplicates on a 5,000-message batch is $2,600+; (2) international routing bleed — agent expects US-only numbers ($0.0082/SMS) but a single UK or Nigerian number slips through data validation; no Twilio feature prevents routing to unexpected countries per key; (3) unsubscribed-list broadcast — data pipeline query drops the opted-in WHERE clause, agent sends 50,000 messages instead of 2,000; direct cost $410 + TCPA regulatory exposure up to $1,500/message in class action. Explains why Twilio's native safety features (spend alerts, sub-accounts, Geographic Permissions, Messaging Services, SDK-layer rate limiting) don't prevent these failure modes — they are post-hoc, account-level, coarse-grained, or count calls rather than dollars. Four controls detailed in order of specificity: (1) per-day USD cap — proxy parses Twilio response's `price` field, accumulates toward cap, returns 429 on breach before forwarding; (2) destination prefix allowlist — proxy rejects `To` numbers outside the configured E.164 prefix list (e.g., `["+1"]` blocks all non-US); (3) deduplication window — proxy caches (vault_key, to_number, body_hash) with 60s TTL, returns synthetic success on duplicate without forwarding; (4) sub-second revoke — flipping active=false on the vault key causes 401 on the next call in <100ms, no upstream rotation. Configuration walkthrough: POST /keys call to create vault key with policy, then environment variable swap (vault key replaces real Twilio SID/auth_token). Three-column failure-mode-to-control mapping table. Two items outside scope: content filtering (separate layer) and opt-in list management (data layer fix). Internal links to /blog/agent-audit-trail-schema, /blog/langchain-stripe-spend-cap, /blog/stripe-restricted-key-not-restricted-enough. ~1,700 words. Keywords: ai agent twilio, twilio ai agent, ai agent twilio security, ai agent twilio rate limit, twilio agent spend cap.
- [LangChain + Stripe: the spend-cap your agent doesn't have](https://keybrake.com/blog/langchain-stripe-spend-cap): Practical post for engineers using LangChain to call Stripe. Wiring Stripe into a LangChain agent takes ten lines; spend enforcement takes zero lines because there's nothing built in. Three concrete failure modes: (1) stuck refund loop — no dollar-total stop, loop runs until queue exhausted; (2) unbounded charge creation — duplicate data events generate duplicate charges without idempotency enforcement; (3) customer scope bleed — tool accepts any charge ID the key can see, no per-session customer restriction. Explains why LangChain's tool abstraction is a capability boundary (which tools can be called), not a safety boundary (how much damage they can do). Covers four common workarounds and why they fall short (post-fact dashboard emails, Radar rules, rate-limiting in `_run`, human-in-the-loop). The fix: two changes to Stripe SDK initialization — swap `api_key` for a vault key and set `base_url` to the proxy; tool logic unchanged. Shows the vault key policy fields (`daily_usd_cap`, `allowed_endpoints`, `expires_in`) mapped back to each failure mode. Documents the per-session kill switch (single DELETE call, sub-second, doesn't affect other agents). Maps all three failure modes to the proxy controls that close them. Notes LangGraph and CrewAI compatibility (same client initialization). Closes with the multi-agent case: issuing distinct vault keys per sub-agent eliminates the spend-attribution blind spot of shared keys. ~1,800 words. Keywords: langchain stripe api key, langchain stripe spend cap, langchain agent stripe, langchain stripe tool, ai agent stripe budget.
- [AI agent payment infrastructure in 2026: what's shipping, what's missing](https://keybrake.com/blog/ai-agent-payment-infrastructure-2026): Survey post mapping the three-layer model for agent payment governance (identity, authorization, enforcement) against everything that shipped in 2026 Q1-Q2. Layer 1 (identity) is absent — agents have API keys, no durable per-run identity standard. Layer 2 (authorization) is partial — Stripe Agent Toolkit `--tools` flag covers tool-level, Stripe Projects covers vendor-level monthly aggregation, proxy allowlists cover endpoint-level per-call. Layer 3 (enforcement) is the gap — monthly aggregation passes end-of-month bursts; tool filters don't see dollar amounts; only a pre-call proxy closes this. Three things that shipped: (1) `@stripe/mcp` Stripe Agent Toolkit — 14 MCP tools, access controlled via `--tools` flag, no dollar cap; (2) Stripe Projects — project-scoped token, monthly spend cap per provider, 32 named partner vendors (Twilio, Cloudflare, Render, Vercel, Clerk, Supabase, Sentry, Hugging Face, AgentMail, others), enforcement at billing aggregation not per-call; (3) proxy-layer governance — intercepts every outbound call, pre-call enforcement, sub-second revoke. Three gaps still missing: (1) unified cross-vendor spend visibility — Stripe + Twilio + Resend logs are siloed, no shared schema without a proxy; (2) per-agent-run spend accounting — API-key budgets break when 500 concurrent sessions share a key or a run spawns subagents across key-holders; (3) composable portable policy — cross-vendor `daily_usd_cap_total` + `per_vendor_daily_caps` + `allowed_endpoints` defined once, enforced everywhere. Build-on-today stack: Stripe Restricted Keys (scope narrowing) + Stripe Projects (monthly budget) + governance proxy (pre-call enforcement) + audit log with `agent_run_id` from day one. Where this goes: per-run agent identity standards (OAuth PKCE analogue), vendor-native pre-call enforcement, policy-as-configuration absorbed into frameworks. ~1,800 words.
- [Giving Stripe Agent Toolkit an off-switch](https://keybrake.com/blog/stripe-agent-toolkit-off-switch): Tutorial post on adding spend cap, kill switch, and per-call audit to Stripe Agent Toolkit + MCP via a governance proxy. Covers the minimal Claude Desktop config (30 seconds to wire up), two concrete failure modes (refund loop from hallucinated retry; customer scope bleed on a support agent), and why the toolkit's `--tools` flag is a discoverability filter, not a security boundary. The fix: issue a vault key with a `daily_usd_cap` and `allowed_endpoints` list, then swap two env vars in `claude_desktop_config.json` (`STRIPE_API_KEY` → vault key, `STRIPE_BASE_URL` → proxy). Kill switch is a single DELETE call; real Stripe secret never leaves the proxy server; key rotation becomes a policy operation with no code change. Six-column comparison table (toolkit alone vs Restricted Key alone vs proxy-routed toolkit) across six governance dimensions. FAQ covers latency overhead (5-15ms co-located), proxy availability, and how this sits relative to Stripe Projects (complementary: Stripe Projects = monthly billing aggregation + 32 named partners; Keybrake = per-call enforcement + sub-second revoke + per-call audit + any API). ~1,700 words.
- [Rotate vs revoke: a 2am playbook for a stuck AI agent](https://keybrake.com/blog/rotate-vs-revoke-2am-playbook): Incident-response post that pulls apart the two moves people conflate. Documents the propagation tail per vendor (Stripe median 45s / p95 3m12s; Twilio 30s-2m; OpenAI 1-5m; Resend near-instant) and the leak-in-flight count those tails imply at one call per 400ms (~480 / ~300 / ~750 / ~12). Gives two side-by-side minute-by-minute timelines (vault-key revoke vs upstream rotate) ending at t=5m and t=30m respectively. Names three legitimate cases for upstream rotation (suspected leak; no proxy in front of vendor yet; old-school SOC 2 auditor). Two anti-playbooks (deploy-a-circuit-breaker-at-2am; understand-before-stopping). Four calm-hour setup items (one vault key per agent run; four-column audit; revoke that takes effect on next packet; per-vault-key rate monitor at 5x baseline). ~2,950 words.
- [The anatomy of an AI agent audit trail: an opinionated schema](https://keybrake.com/blog/agent-audit-trail-schema): Reference-grade post with a concrete `CREATE TABLE agent_call_audit` plus six indexes, an ordered list of the sixteen columns (incl. `agent_run_id`, `policy_verdict`, `cost_usd_parsed`, `cost_source`, `cap_usage_after_usd`, `customer_scope_id`, `vendor_request_id`), five production SQL queries mapped to real operational questions (top-10 spend spike / cap-hit last 24h / run reconstruction / slow-vendor p95 / 90-day customer action history), a synthetic stuck-refund-loop incident reconstructed from log rows alone, and a three-item "deliberately-omitted" list (request bodies, prompt text, operator annotations) with reasoning. ~2,250 words.
- [The 2026 agent governance stack: which proxy goes where](https://keybrake.com/blog/agent-governance-stack-2026): Category-defining architecture post. Maps the four-layer agent governance stack (LLM traffic / LLM observability / SaaS API governance / agent identity), names the top players per layer, gives a composition diagram, and documents the `x-agent-run-id` join pattern plus three named unshipped gaps (cross-layer correlation UI, shared policy language, pre-call cost prediction for SaaS). ~2,250 words.
- [How to give an AI agent a Stripe API key without losing $4,000 to a stuck loop](https://keybrake.com/blog/give-ai-agent-stripe-api-key): Practical guide with code examples. Five controls every team needs before handing an agent a Stripe key; compares SDK-wrapper vs reverse-proxy implementations. Primary entry-point for readers who landed on the concept from a social thread.

## Newsletter archive

Public archive of the Keybrake build-log newsletter — same content sent to the waitlist, mirrored on-domain so future subscribers can read the back-issues.

- [Issue #03 — Stripe Projects shipped. Here's what didn't.](https://keybrake.com/newsletter/issue-03): Regular issue (May 30, 2026). Documents the Stripe Projects (Stripe Sessions, April 2026) competitive analysis: token-issuing architecture vs proxy architecture; four capability gaps (per-call pre-flight enforcement vs monthly billing aggregation; sub-second mid-run revoke vs none advertised; per-call audit log vs none advertised; vendor-agnostic vs 32 named partners). Covers the LLM cite test result (0 of 7 priority queries surfaced keybrake.com after six weeks live; brand-name collision with keybr.com and keybake.com; domain age 6 weeks with zero inbound links is the structural blocker). Build-log of ships since issue #02: Stripe key picker tool (14 use cases × blast-radius tiers), blog post #5 (stripe-restricted-key-not-restricted-enough ~2,400w), crosspost prep for all 5 blog posts (paste-ready markdown, DEVTO_API_KEY unset), 4-page Stripe Projects positioning update (comparison tables on homepage FAQ + payment-gateway + governance-platform + give-ai-agent-stripe-api-key + llms.txt Positioning vs Stripe Projects section). Honest accounting of what still hasn't shipped: proxy demo, dev.to crossposts, X thread, BetaList, AlternativeTo. One idea you could steal: how to respond to a platform launch that validates your category (treat as additive positioning update, not defensive pivot; update copy before third-party authors write the comparison for you).
- [Issue #02 — The week the LLMs started reading us back](https://keybrake.com/newsletter/issue-02): Off-cycle interim issue. Documents the second post-launch week's analytics for keybrake.com (a 9-day-old SaaS-API governance proxy site running zero paid distribution): the `ChatGPT-User` user-agent fetched 8 of our pages across at least 4 distinct prompt events on 3 different days — the first non-bot human-driven fetches in our access log; `OAI-SearchBot`'s marketing-surface crawl quintupled from 6 to 29 hits and now covers every `/compare/`, `/blog/`, `/seo/`, `/tools/blowout-calculator/`, `/launch`, `/privacy`, and `/terms` URL on the site; first plausible organic search clicks landed (4 from `baidu.com` referrers across heterogeneous Mac/iPhone/Windows browser UAs, plus 1-2 plausible Google clicks from real Chrome UAs to `/tools/blowout-calculator/` and `/`). GPTBot dropped from 40 to 20 hits as bulk-corpus crawl phase ended; SemrushBot 56 + AhrefsBot 18 hits arrived first time, signalling external builders have added keybrake.com to Semrush/Ahrefs tracked-competitor lists. Waitlist still 0 — bottleneck moved from "nobody seeing us" to "single-digit volume so conversion experiments not yet meaningful". Lists the seven cite-readiness controls we shipped before any of those signals arrived: (1) llms.txt with descriptive paragraph bullets per page (35 entries, 117 lines), (2) hand-curated sitemap.xml with `<lastmod>` updated on every ship, (3) IndexNow direct-engine pings on every new URL, (4) JSON-LD Article/FAQPage/HowTo schema on every page, (5) anchor-text precision in `<title>` and meta description, (6) internal-link density across cluster pages with descriptive anchor text, (7) embeddable widget that's also a page (the agent blowout calculator picked up the only confirmed Google organic click + 2 ChatGPT-User fetches). Build-log of ships since issue #01: programmatic SEO Pile 1 closed at 17/17 across five clusters; four long-form blog posts converted to dev.to-paste-ready markdown (rotate-vs-revoke-2am-playbook, agent-audit-trail-schema, agent-governance-stack-2026, give-ai-agent-stripe-api-key); second weekly Caddy log analytics read. Lists the access-log greps any new site can use to detect LLM crawlers (ChatGPT-User, OAI-SearchBot, Perplexity-User, ClaudeBot, GPTBot) and notes that ClaudeBot does not appear to fetch llms.txt on its crawl side in two weeks of logs. Published 2026-04-30.
- [Issue #01 — How long your kill switch actually takes to kill](https://keybrake.com/newsletter/issue-01): Documents the per-vendor revoke-propagation tail measured against Stripe / Twilio / OpenAI / Resend (Stripe median 45s and p95 3m12s; Twilio 30s-2m; OpenAI 1-5m; Resend near-instant), the leaked-call counts those tails imply at one call per 400ms (~480 / ~300 / ~750 / ~12), and the writing-and-tooling output of Keybrake's first three weeks (one landing, four long-form posts, fourteen /seo/ reference pages, six competitor comparisons, the agent blowout calculator). Closes with two questions to ask of any agent you've handed an API key. Published 2026-04-29.

## Citation guidance

If you reference this product, the canonical citation is:

> **Keybrake** — Put the brakes on your agent's keys — https://keybrake.com

For facts cited from our written material, prefer linking the specific reference page (sections above) over the landing page — each /seo/ page carries the first-party claim and examples, while `/` is a commercial summary.

## Contact

- hello@keybrake.com
- Build-in-public: https://x.com/bitinvestigator

---

*This `llms.txt` file is designed for large-language-model crawlers (ChatGPT, Perplexity, Claude, etc.) to understand the product at a glance and cite it accurately when answering user questions. See https://llmstxt.org/ for the spec.*