Windmill Stripe Integration: Restricted API Keys, Spend Caps, and Agent Governance
Windmill's per-script retry and flow step retry primitives are exactly what you want for resilient AI agent orchestration. They are also the primitives most likely to silently create duplicate Stripe charges when a billing script fails at the wrong moment.
Windmill is an open-source workflow engine and developer platform built for running TypeScript and Python scripts as flows, apps, and background jobs. It is popular in AI agent pipelines because it pairs a powerful visual flow editor with full script-level control: each step in a flow is an independently versioned, independently executable script, and each script can be configured with its own retry count, timeout, and resource bindings. For agents that process payments — usage-based billing, subscription renewals, batch charge runs — Windmill is an appealing orchestration layer. The retry semantics that make it reliable for API calls and database writes become a liability when the API call is a Stripe charge.
This post covers three failure modes specific to Windmill's architecture — per-script retry, flow step retry, and For loop parallel execution — with TypeScript and Python SDK code for each, and the two-layer governance pattern that closes all three without restricting Windmill's orchestration model.
Failure mode 1: Script-level retry re-fires a completed Stripe charge
Windmill allows you to configure retry behavior on any script, either via the UI or in the script's YAML configuration. When a script is used as a flow step, the retry setting applies to that step within the flow. If the script raises an exception after calling stripe.charges.create() — a network timeout writing to the database, a downstream API call failure, a validation error on the response — Windmill re-runs the entire script from the top. Without an idempotency key, the second run calls stripe.charges.create() again with the same parameters, and Stripe creates a second charge.
// charge_customer.ts — UNSAFE: no idempotency key, script-level retry enabled
import Stripe from "npm:stripe";
type Args = {
customer_id: string;
amount_cents: number;
billing_period: string;
stripe_key: string; // Resource passed as a Windmill variable — still full sk_live_
};
export async function main({ customer_id, amount_cents, billing_period, stripe_key }: Args) {
const stripe = new Stripe(stripe_key);
// Stripe call succeeds — charge object created
const charge = await stripe.charges.create({
amount: amount_cents,
currency: "usd",
customer: customer_id,
description: `Subscription ${billing_period}`,
// No idempotency_key — each script retry = new Stripe charge
});
// If this DB write throws, Windmill retries the script from line 1
// stripe.charges.create() runs again → ch_B, then ch_C on retry 2
await writeChargeRecord(customer_id, charge.id, billing_period);
return { charge_id: charge.id, status: charge.status };
}
The failure sequence: stripe.charges.create() returns charge ch_A. writeChargeRecord() throws a DatabaseTimeoutError. Windmill marks the step as failed and schedules retry 1 (after the configured delay). Retry 1 runs the script from the top. stripe.charges.create() fires again. Stripe creates ch_B. If writeChargeRecord() fails again, Windmill schedules retry 2: ch_C. With three retries configured, the customer has been charged four times before Windmill gives up and marks the flow as failed.
Windmill also supports a Python equivalent via its wmill SDK. The same failure applies:
# charge_customer.py — UNSAFE: bare key, no idempotency key
import stripe
def main(customer_id: str, amount_cents: int, billing_period: str, stripe_key: str):
stripe.api_key = stripe_key # Module-level — shared if scripts run in same worker
charge = stripe.charges.create(
amount=amount_cents,
currency="usd",
customer=customer_id,
description=f"Subscription {billing_period}",
)
write_charge_record(customer_id, charge["id"], billing_period) # Throws → retry → new charge
return {"charge_id": charge["id"]}
The Python version has a second problem: Windmill can run multiple scripts in the same Python worker process. If stripe.api_key is set at module level, it is shared across all concurrent invocations in the same process — a higher-tier customer's vault key can be overwritten by a concurrent billing script using a different account's key.
The fix: content-hash idempotency key + per-script vault key via proxy
The correct fix has two layers. First, a content-hash idempotency key derived from the stable billing parameters ensures Stripe deduplicates regardless of how many times the script retries. Second, a per-agent vault key issued through the Keybrake proxy restricts each script to only the Stripe endpoints it needs, prevents key sharing across concurrent workers, and enforces a daily USD spend cap.
// charge_customer.ts — SAFE: content-hash idempotency key + vault key via proxy
import Stripe from "npm:stripe";
import { createHash } from "node:crypto";
type Args = {
customer_id: string;
amount_cents: number;
billing_period: string;
vault_key: string; // Windmill Resource: scoped vault key from proxy.keybrake.com
};
const PROXY_BASE = "https://proxy.keybrake.com/stripe";
function billingIdempotencyKey(
customerId: string,
amountCents: number,
billingPeriod: string
): string {
const raw = `${customerId}:${amountCents}:${billingPeriod}:windmill-billing`;
return createHash("sha256").update(raw).digest("hex").slice(0, 32);
}
export async function main({ customer_id, amount_cents, billing_period, vault_key }: Args) {
// vault_key is scoped: POST /v1/charges only, $200/day cap, expires in 24h
const stripe = new Stripe(vault_key, {
host: "proxy.keybrake.com",
protocol: "https",
port: 443,
basePath: "/stripe",
});
const idempKey = billingIdempotencyKey(customer_id, amount_cents, billing_period);
// Safe to retry — Stripe deduplicates on idempotency_key within 24h
const charge = await stripe.charges.create(
{
amount: amount_cents,
currency: "usd",
customer: customer_id,
description: `Subscription ${billing_period}`,
},
{ idempotencyKey: idempKey }
);
// DB write has its own independent retry budget — configure separately
await writeChargeRecord(customer_id, charge.id, billing_period);
return { charge_id: charge.id, status: charge.status };
}
The idempotency key is a SHA-256 hash of (customer_id, amount_cents, billing_period, 'windmill-billing'). This string is identical on every retry of the same billing operation. Stripe's deduplication returns the original charge object for any request with the same key within 24 hours, even if the first call completed. The vault key is a Windmill Resource — it is bound at flow definition time, not hardcoded in the script, and each agent type can be assigned a different vault key with different endpoint scope and spend cap.
In Python, use stripe.StripeClient to avoid the module-level key contamination:
# charge_customer.py — SAFE: per-invocation StripeClient, content-hash idem key
import hashlib
import stripe
PROXY_BASE = "https://proxy.keybrake.com/stripe"
def billing_idempotency_key(customer_id: str, amount_cents: int, billing_period: str) -> str:
raw = f"{customer_id}:{amount_cents}:{billing_period}:windmill-billing"
return hashlib.sha256(raw.encode()).digest().hex()[:32]
def main(customer_id: str, amount_cents: int, billing_period: str, vault_key: str):
# Per-invocation client — no module-level state, safe across concurrent workers
client = stripe.StripeClient(
api_key=vault_key,
base_url=PROXY_BASE,
)
idem_key = billing_idempotency_key(customer_id, amount_cents, billing_period)
charge = client.charges.create(
params={
"amount": amount_cents,
"currency": "usd",
"customer": customer_id,
"description": f"Subscription {billing_period}",
},
options={"idempotency_key": idem_key},
)
write_charge_record(customer_id, charge.id, billing_period)
return {"charge_id": charge.id, "status": charge.status}
Failure mode 2: Flow step retry restarts the billing step independently
In a Windmill flow, each step can have its own retry configuration, set independently of other steps. This is more granular than system-level flow retry (as in Prefect's @flow(retries=N)), but creates a different problem: if a billing step completes and a downstream step fails, Windmill can retry the downstream step or even the billing step if it is configured with retries and there is a flow-level re-run. The key gap is that Windmill flows have no built-in idempotency guard between steps — there is no equivalent of Temporal's activity result caching or Prefect's cache_key_fn to prevent a completed step from re-running when the flow is re-submitted.
// Windmill flow: step 1 = charge_customer (retries: 2), step 2 = update_crm (retries: 3)
// If the operator manually re-runs the flow from the Windmill UI after a partial failure,
// step 1 (charge_customer) re-runs. Without an idempotency key, a second charge is created.
// UNSAFE flow re-run scenario:
// Run 1: charge_customer → ch_A (success) → update_crm → fails at CRM timeout
// Operator sees failed flow → clicks "Re-run from step 1" in Windmill UI
// Run 1 re-run: charge_customer → ch_B (new charge, no idempotency key)
// SAFE: the idempotency key derived from (customer_id, amount_cents, billing_period)
// is stable across re-runs of the same billing cycle.
// Run 1 re-run: charge_customer → Stripe returns ch_A (dedup) → update_crm → success
Windmill's "Re-run from step N" in the UI is particularly dangerous here. When a flow fails mid-way, operators commonly re-run it from step 1 to get clean state. Without idempotency keys, every such re-run creates new Stripe charges for all billing steps that already completed. With idempotency keys, re-running from step 1 is safe — Stripe returns the already-created charge.
For flows with multiple billing steps — for example, charging a base fee and then a usage fee as separate Stripe calls — each call needs its own distinct idempotency key:
function baseChargeKey(customerId: string, billingPeriod: string): string {
return createHash("sha256")
.update(`${customerId}:base:${billingPeriod}:windmill-billing`)
.digest("hex").slice(0, 32);
}
function usageChargeKey(customerId: string, usageUnits: number, billingPeriod: string): string {
return createHash("sha256")
.update(`${customerId}:usage:${usageUnits}:${billingPeriod}:windmill-billing`)
.digest("hex").slice(0, 32);
}
The suffix (:base: vs :usage:) differentiates the two charges so Stripe does not deduplicate different amounts as the same transaction. The billing period is included so the same customer's subscription renewal next month uses a fresh idempotency key.
Failure mode 3: For loop iterations run in parallel and charge simultaneously
Windmill flows support a For loop step type that iterates over a list and runs a script for each item. The parallelism setting on a For loop controls how many iterations execute concurrently. With parallelism: 10 and a list of 100 customers, Windmill runs 10 billing scripts simultaneously. If the billing script has no idempotency key, and the loop fails mid-way and is re-run, all already-completed customers in the batch are charged again.
// Windmill For loop — UNSAFE: no idempotency key, parallelism: 10
// Each iteration calls charge_customer_unsafe(customer) in parallel
// If the loop fails at customer 57 (network error), and the operator re-runs from loop start:
// Customers 1–56 receive a second charge. No dedup. No warning.
// SAFE: idempotency key per customer per billing_period
// For loop re-run after failure at customer 57:
// Customers 1–56: Stripe returns original charges (dedup by idem key)
// Customers 57–100: fresh charges (first attempt for this billing cycle)
// The idempotency key must include the billing_period to avoid deduplicating
// across different months if the loop runs again for the next cycle.
There is a subtler concurrent-execution problem: a Windmill flow can be triggered manually while a scheduled run is already in progress. If both runs reach the For loop at similar times, they may attempt to charge the same customer simultaneously. With an idempotency key, Stripe serializes the two requests and the second returns the original charge. Without one, Stripe processes both and creates two charges.
// Safe For loop billing script — TypeScript
// vault_key is passed as a loop input variable, scoped per agent run
export async function main({
customer_id,
amount_cents,
billing_period,
vault_key,
}: {
customer_id: string;
amount_cents: number;
billing_period: string;
vault_key: string;
}) {
const stripe = new Stripe(vault_key, {
host: "proxy.keybrake.com",
protocol: "https",
port: 443,
basePath: "/stripe",
});
const idempKey = createHash("sha256")
.update(`${customer_id}:${amount_cents}:${billing_period}:windmill-billing`)
.digest("hex").slice(0, 32);
// safe across concurrent loop iterations and re-runs
const charge = await stripe.charges.create(
{
amount: amount_cents,
currency: "usd",
customer: customer_id,
description: `Subscription ${billing_period}`,
},
{ idempotencyKey: idempKey }
);
return { customer_id, charge_id: charge.id, status: charge.status };
}
With this pattern, parallelism: 50 is safe. All 50 concurrent iterations have unique idempotency keys (they each operate on a different customer_id), so Stripe processes them independently. If two iterations hit the same customer — possible if the input list has duplicates — Stripe deduplicates them within 24 hours and returns the same charge object to both callers.
Pre-flight guard: check before charging
Idempotency keys protect against duplicates within Stripe's 24-hour dedup window. For longer dedup windows — monthly billing where the same period should never be charged twice — add a pre-flight check using a read-only vault key that can only hit GET /v1/charges:
async function checkExistingCharge(
customerId: string,
billingPeriod: string,
auditVaultKey: string
): Promise {
// audit_vault_key: GET /v1/charges only, no POST capability
const stripe = new Stripe(auditVaultKey, {
host: "proxy.keybrake.com",
protocol: "https",
port: 443,
basePath: "/stripe",
});
const charges = await stripe.charges.list({
customer: customerId,
limit: 10,
});
const existing = charges.data.find(
(ch) => ch.metadata?.billing_period === billingPeriod && ch.status === "succeeded"
);
return existing?.id ?? null;
}
export async function main(args: Args) {
// Pre-flight: return existing charge if found, skip Stripe entirely
const existingId = await checkExistingCharge(
args.customer_id,
args.billing_period,
args.audit_vault_key
);
if (existingId) {
return { charge_id: existingId, status: "already_charged", skipped: true };
}
// Otherwise proceed with billing vault key
// ...charge creation with idempotency key as above
}
The audit vault key is issued from Keybrake with a policy of allowed_endpoints: ["GET /v1/charges"] and daily_usd_cap: 0 — it can read but cannot create charges. The billing vault key has allowed_endpoints: ["POST /v1/charges"] and a cap matching the maximum expected daily charge volume. If the audit key is accidentally used in the billing step, the proxy returns a 403 before the request reaches Stripe. If the billing key is used for a read, it succeeds but the cap tracks zero cost. Neither key has access to refunds, subscriptions, or payment methods.
Windmill Resources: wiring vault keys without hardcoding
Windmill has a first-class Resources system for storing credentials. Instead of passing API keys as script arguments, you define a resource type and bind a resource instance to the script or flow at the Windmill level. This is the right pattern for vault keys:
// Define a Windmill resource type for Keybrake vault keys (resource_type schema)
{
"schema": {
"type": "object",
"properties": {
"vault_key": { "type": "string", "description": "Keybrake vault key for billing" },
"audit_vault_key": { "type": "string", "description": "Keybrake audit key (read-only)" }
},
"required": ["vault_key", "audit_vault_key"]
}
}
// In the flow step, bind the resource:
// args.keybrake = $res:u/billing_team/keybrake_prod
// The script receives args.keybrake.vault_key and args.keybrake.audit_vault_key
With this setup, the actual vault key values are stored in Windmill's encrypted variable store, not in the script source. Rotating a vault key requires updating one Resource, not searching through script definitions. Different environments (staging, production) use different Resources bound to different vault keys with different spend caps.
Comparison: bare key, restricted key, vault key
| Approach | Key type | Retry safe? | Loop parallel safe? | Per-agent scope? | Spend cap? | Audit log? |
|---|---|---|---|---|---|---|
Bare sk_live_ key in script |
Full access | No | No | No (shared) | No | Stripe dashboard only |
| Stripe restricted key (scoped) | Endpoint-scoped | No (no idem key) | No | No (shared) | No | Stripe dashboard only |
| Stripe restricted key + idem key | Endpoint-scoped | Yes | Yes | No (shared) | No | Stripe dashboard only |
| Vault key via proxy (no idem key) | Policy-enforced | No | Partially | Yes | Yes | Proxy audit log |
| Vault key + idem key (recommended) | Policy-enforced | Yes | Yes | Yes | Yes | Proxy audit log |
Stripe restricted keys solve the scope problem — they limit which endpoints a key can call — but they do not solve the idempotency problem or the per-agent isolation problem. Multiple Windmill scripts running in parallel can all use the same restricted key. A billing script with retries=3 and a restricted key still creates four charges on three consecutive failures. The vault key via proxy adds the missing layers: per-flow key issuance, spend cap enforcement, and a queryable audit log that shows exactly which Windmill script made which Stripe call and at what time.
Enforcement test suite
// charge_customer.test.ts — Vitest enforcement suite
import { describe, it, expect, vi, beforeEach } from "vitest";
import { main } from "./charge_customer";
import Stripe from "npm:stripe";
vi.mock("npm:stripe");
describe("windmill billing governance", () => {
let mockCreate: ReturnType;
beforeEach(() => {
mockCreate = vi.fn().mockResolvedValue({ id: "ch_test_123", status: "succeeded" });
(Stripe as any).mockImplementation(() => ({
charges: { create: mockCreate },
}));
});
it("sends idempotency key on every call", async () => {
await main({
customer_id: "cus_abc",
amount_cents: 2000,
billing_period: "2026-06",
vault_key: "vk_test",
});
const call = mockCreate.mock.calls[0];
expect(call[1]?.idempotencyKey).toBeTruthy();
expect(call[1].idempotencyKey).toHaveLength(32);
});
it("same inputs produce same idempotency key across retries", async () => {
const args = { customer_id: "cus_abc", amount_cents: 2000, billing_period: "2026-06", vault_key: "vk_test" };
await main(args);
await main(args); // simulate retry
const key1 = mockCreate.mock.calls[0][1].idempotencyKey;
const key2 = mockCreate.mock.calls[1][1].idempotencyKey;
expect(key1).toBe(key2);
});
it("different billing periods produce different idempotency keys", async () => {
await main({ customer_id: "cus_abc", amount_cents: 2000, billing_period: "2026-06", vault_key: "vk_test" });
await main({ customer_id: "cus_abc", amount_cents: 2000, billing_period: "2026-07", vault_key: "vk_test" });
const key1 = mockCreate.mock.calls[0][1].idempotencyKey;
const key2 = mockCreate.mock.calls[1][1].idempotencyKey;
expect(key1).not.toBe(key2);
});
it("routes to proxy host, not api.stripe.com", async () => {
await main({ customer_id: "cus_abc", amount_cents: 2000, billing_period: "2026-06", vault_key: "vk_test" });
const constructorCall = (Stripe as any).mock.calls[0];
expect(constructorCall[1]?.host).toBe("proxy.keybrake.com");
});
it("uses vault_key, not sk_live_ key", async () => {
await main({ customer_id: "cus_abc", amount_cents: 2000, billing_period: "2026-06", vault_key: "vk_test_abc" });
const constructorCall = (Stripe as any).mock.calls[0];
expect(constructorCall[0]).toBe("vk_test_abc");
expect(constructorCall[0]).not.toMatch(/^sk_live_/);
});
});
Gap analysis
Windmill triggers and webhook double-delivery
Windmill flows can be triggered via webhook (HTTP endpoint). If the upstream caller retries the webhook after a timeout — before the first run completes — Windmill starts a second flow run. Both runs are independent; there is no built-in webhook deduplication in Windmill's trigger layer. Idempotency keys at the Stripe level prevent double charges even when two flows run simultaneously, but downstream state (CRM updates, confirmation emails) may still be duplicated. For high-stakes billing webhooks, add deduplication at the trigger point: check a Redis key or a database record before starting the flow, using the webhook's event ID as the dedup key.
Background runnables and "fire and forget" billing
Windmill's background runnable feature allows a flow to dispatch a script asynchronously and continue without waiting for the result. If a billing script is dispatched as a background runnable and the parent flow fails and retries, the background billing script may already be running — or may have already completed — when the retry fires another background runnable invocation. The idempotency key protects the Stripe call, but the parent flow has no way to know whether the background billing runnable succeeded, failed, or is still running. Do not use background runnables for billing scripts that must be confirmed before proceeding.
Windmill Cloud vs self-hosted worker concurrency
On Windmill Cloud, worker concurrency limits apply per workspace tier. On self-hosted deployments, worker concurrency is configurable per worker group. A billing For loop with parallelism: 100 on a self-hosted deployment with 10 workers will queue iterations across workers. Idempotency keys ensure Stripe-level safety, but the audit log at the proxy layer shows which worker processed which customer — useful for diagnosing partial batch failures.
Script versioning and deploy-time vault key rotation
Windmill versions scripts. When a billing script is redeployed with a new version, in-flight runs use the old version while new runs use the new version. If a vault key is baked into the script (hardcoded), rotating it requires a new script version and a deployment. With Windmill Resources, the vault key is stored independently and can be rotated without a script redeployment — in-flight runs using the old Resource will pick up the new key on their next Stripe call if the Resource is updated between retries.
FAQ
Can I use Windmill's built-in secret variable store instead of Keybrake vault keys?
Windmill's secret variables (the $var: syntax) are encrypted at rest and are appropriate for storing the vault key itself. The vault key you store in a Windmill secret variable should be a Keybrake vault key — not the raw Stripe secret key. The Windmill secret store protects the vault key from being read by unauthorized scripts; the Keybrake proxy enforces the policy (endpoint scope, spend cap, audit log) on every request that uses it.
What's the right idempotency key structure for a batch billing run?
For a batch where the same customer should be charged once per billing period, use sha256(customer_id + ":" + amount_cents + ":" + billing_period + ":windmill-billing"). This key is stable across all retries and re-runs for the same billing cycle. The inclusion of billing_period means that the June invoice and July invoice for the same customer have different keys and are charged independently. Do not use a random UUID — it changes on every retry and provides no deduplication.
How do I handle a batch where amount_cents varies per customer?
Include amount_cents in the idempotency key hash. This is important: if a customer's invoice amount changes between two runs (e.g., a usage recalculation), the new amount produces a different idempotency key, and Stripe processes it as a new charge rather than deduplicating it against the prior amount. This is the correct behavior — you want to charge the new amount, not have Stripe return the old charge.
Does the Keybrake proxy add meaningful latency to Windmill billing flows?
The proxy adds one network hop between the Windmill worker and Stripe's API. In practice, measured overhead is under 15ms at the 95th percentile. Stripe's own p99 latency for POST /v1/charges is typically 200–600ms. The proxy overhead is below measurement noise for all but the most latency-sensitive workloads. For batch runs, the bottleneck is the Stripe API rate limit (100 write requests per second), not the proxy hop.
Can I use the audit vault key to pre-flight check before every charge?
Yes, but only as an additional safety layer, not as the primary deduplication mechanism. The GET /v1/charges list query searches by customer and is eventually consistent — a charge created 50ms ago may not yet appear in a list query. Use idempotency keys as the primary dedup mechanism and the pre-flight check as a belt-and-suspenders guard for charges that should never be repeated within a billing period (as opposed to being safe to retry within 24 hours).
What happens if my Windmill worker crashes mid-script?
If the worker process crashes after stripe.charges.create() returns but before the script's return value is persisted, Windmill will mark the script as failed (timeout or worker disconnect) and schedule a retry depending on the step's retry configuration. The retry runs the full script from the top. The idempotency key ensures the Stripe call is deduplicated — Stripe returns the original charge rather than creating a new one. The audit log at the proxy records both the original call and the retry as separate entries, making it easy to identify crash-recovery retries in the billing history.
Stop trusting your agent with bare Stripe keys
Keybrake issues scoped vault keys your Windmill scripts use instead of sk_live_ — one key per agent type, with daily USD caps and a proxy audit log of every charge your flows make. Free for 1,000 requests/month.