perishable vs Cloudflare API Gateway vs HashiCorp Vault: Short-Lived Credentials for LLMs

The problem

Long-lived API keys are a security liability. A leak exposes the account indefinitely. For LLM APIs, the problem is acute: the keys often grant access to expensive models, large contexts, and a billing account.

The standard fix in 2026 is ephemeral credentials: short-lived, scoped tokens issued by a proxy in front of the LLM API. The proxy issues a token valid for minutes or hours, with a constrained scope (specific model, max tokens, allowed endpoints). When the token expires, the credential is unusable.

perishable is Skelf’s purpose-built version for LLM APIs. This post is the comparison we wish we had when we started perishable.

What perishable is

perishable is a TypeScript proxy in front of any LLM API. The application requests a token from perishable (usually via a sidecar in the same pod/container); perishable authenticates the request, scopes the token, and returns it. The token expires automatically.

The key design choices:

LLM-aware scopes. perishable understands LLM concepts: model (gpt-4o, claude-3-5-sonnet, etc.), max tokens, allowed endpoints (chat, embeddings, etc.). General-purpose credential systems don’t.
Sub-minute TTLs. perishable tokens are typically valid for 5-60 minutes, not hours. A leaked token has a tiny blast radius.
Audit trail. perishable logs every issued token, every use, and every revocation. The audit trail is exportable to mpl for compliance mapping.

What each option is

perishable is the LLM-specific ephemeral credential proxy. TypeScript, runs as a sidecar, understands LLM scopes, and emits a compliance-grade audit trail.

Cloudflare API Gateway is a general-purpose API gateway with rate limiting, auth, and observability. It can issue ephemeral tokens via service tokens or Workers, but it doesn’t understand LLM-specific scopes out of the box.

HashiCorp Vault is the de-facto secret manager. It issues dynamic credentials for many backends (AWS, GCP, databases, etc.) and can be extended to LLM APIs via a custom plugin. The most feature-rich, but the most operational complexity.

AWS IAM + STS is the AWS-native way to issue short-lived credentials. Doesn’t apply cleanly to third-party LLM providers (OpenAI, Anthropic) but is the default for AWS Bedrock.

The five dimensions

Dimension	perishable	Cloudflare API Gateway	Vault	AWS STS
Primary use case	LLM API credentials	General API gateway	Multi-backend secret management	AWS resources
LLM scope aware	Yes (model, max tokens)	No (build it yourself)	No (build it yourself)	No
Token TTL	5-60 min (configurable)	Configurable (typically longer)	Configurable (typically longer)	1 hour max
Operational complexity	Low (single sidecar)	Low (managed)	High (Vault cluster)	Low (AWS-native)
Audit trail	First-class (LLM-aware)	Yes (general)	Yes (general)	Yes (AWS-native)
Compliance mapping	EU AI Act, SOX, GDPR (built-in)	DIY	DIY	DIY
Audit → mpl integration	Native	DIY	DIY	DIY
License	MIT	Proprietary (cloud)	Business Source License	Proprietary (AWS)
Cost	Free (self-host)	Pay-per-request	Free (self-host) + paid cloud	Pay-per-request

When to use which

Use perishable when:

You are issuing credentials for LLM APIs specifically.
You need LLM-aware scopes (model, max tokens, allowed endpoints).
You want a compliance-grade audit trail that maps to EU AI Act / SOX / GDPR.
You are running agents (perishable pairs naturally with mpl).

Use Cloudflare API Gateway when:

You have a general-purpose API gateway need and the LLM use case is one of many.
You are already on Cloudflare.
The LLM scope (model, max tokens) can be encoded as generic API keys with custom claims.

Use Vault when:

You have a multi-backend secret management need (LLM, AWS, GCP, databases, etc.) and want one tool for all of them.
You have the operational team to run a Vault cluster.
The LLM use case is a small fraction of your overall secret-management story.

Use AWS STS when:

You are using AWS Bedrock (the AWS-native LLM service).
The credentials are for AWS resources, not third-party APIs.

A concrete example: agent LLM access

Imagine you have a fleet of 50 LLM agents, each making 100 requests per day. You want:

Each agent has its own credential, scoped to the models it’s allowed to use.
The credential expires after 1 hour (auto-rotated by the agent runtime).
A leaked credential can only be used for the models the agent was authorised for, with the budget it was given.
The audit trail is exportable to your compliance team.

With perishable:

// In the agent runtime
const token = await perishable.issue({
  agent_id: "agent_42",
  allowed_models: ["gpt-4o-mini", "claude-3-5-haiku"],
  max_tokens_per_request: 4096,
  ttl: "1h",
});
// token is opaque; agent uses it for the next hour, then
// requests a new one

With Vault:

# Vault policy
path "transit/issue/litellm-agent-42" {
  capabilities = ["create", "update"]
}

Then build the per-model and per-token-budget scope in your application code. Possible, but more work.

With Cloudflare: Set up a Worker that issues signed JWTs with the model and max-tokens in the claims. Doable, but you build the LLM-aware pieces.

The point: perishable is the path of least resistance for LLM-specific ephemeral credentials. Vault is the path of least resistance for general secret management. Cloudflare is the path of least resistance for general API gateway.

Compliance mapping

perishable is built with compliance in mind. The audit trail exports to mpl for tamper-evident storage. The built-in maps:

EU AI Act — per-call audit log of model, prompt hash, completion hash, timestamp, and operator.
GDPR — data-handling proof via the policy engine.
SOX — tamper-evident records (BLAKE3 hash chain) for every API call.
HIPAA — quality thresholds and access control on the credential issuance.

If compliance is the primary concern, perishable + mpl is the most direct path.

A 5-minute perishable eval

# 1. install
npm install -g perishable

# 2. start perishable pointing at your LLM provider
perishable serve \
    --upstream https://api.openai.com/v1 \
    --auth-source vault \  # or filesystem, kubernetes
    --default-ttl 30m \
    --audit-log audit.jsonl

# 3. issue a token
perishable issue --agent agent_42 --model gpt-4o-mini --ttl 30m

# 4. the agent uses the token
curl -X POST http://localhost:8080/v1/chat/completions \
    -H "Authorization: Bearer ${TOKEN}" \
    -d '{"model": "gpt-4o-mini", "messages": [...]}'

The audit.jsonl is the compliance trail. Pipe it to mpl for tamper-evident storage.