GATEWAY & CACHING

SAVE UP TO 40% WITH
SMART CACHING

OpenAI-compatible proxy with semantic response caching. Multi-provider routing with automatic failover. Low latency overhead.

Key Capabilities

Semantic Caching

Cache responses based on meaning, not just exact matches. Similar prompts hit the cache even with different wording.

Up to 40% Cost Reduction

Dramatically reduce LLM costs by serving cached responses for similar requests.

Multi-Provider Routing

Route requests to OpenAI, Anthropic, Google, Mistral, Azure, and more from a single endpoint.

Automatic Failover

If one provider is down or slow, automatically route to alternatives. Zero downtime.

Low Latency Overhead

Security and caching add minimal latency. Full streaming support maintained.

Per-Key Rate Limits

Set different rate limits for different API keys. Control costs and prevent abuse.

How the Gateway Works

1

Route

Your request comes in. We check the cache and select the optimal provider based on your configuration.

2

Secure

All security checks run in parallel. Threats are blocked, PII is redacted, policies are enforced.

3

Respond

Response is returned (from cache or provider) and stored for future similar requests.

Drop-in Replacement

typescript
import OpenAI from 'openai';

// Just change the base URL - works with any provider
const client = new OpenAI({
  baseURL: 'https://api.promptguard.co/api/v1',
  apiKey: process.env.OPENAI_API_KEY,
  defaultHeaders: {
    'X-API-Key': process.env.PROMPTGUARD_API_KEY
  }
});

// All your existing code works unchanged
const response = await client.chat.completions.create({
  model="gpt-5-nano",
  messages: [{ role: 'user', content: 'Hello!' }]
});

// Automatic caching, security, and routing!

Why PromptGuard Gateway?

✓ PROMPTGUARD

  • Semantic caching (meaning-based)
  • Up to 40% cost reduction
  • Multi-provider with failover
  • Low latency overhead
  • Full streaming support

✗ OTHER SOLUTIONS

  • Exact-match caching only
  • Limited cost savings
  • Single provider lock-in
  • High latency overhead
  • Streaming often broken

Start Saving on LLM Costs

Drop-in replacement for OpenAI. Same code, significant cost savings, enterprise security included.