OpenAI-compatible proxy with semantic response caching. Multi-provider routing with automatic failover. Low latency overhead.
Cache responses based on meaning, not just exact matches. Similar prompts hit the cache even with different wording.
Dramatically reduce LLM costs by serving cached responses for similar requests.
Route requests to OpenAI, Anthropic, Google, Mistral, Azure, and more from a single endpoint.
If one provider is down or slow, automatically route to alternatives. Zero downtime.
Security and caching add minimal latency. Full streaming support maintained.
Set different rate limits for different API keys. Control costs and prevent abuse.
Your request comes in. We check the cache and select the optimal provider based on your configuration.
All security checks run in parallel. Threats are blocked, PII is redacted, policies are enforced.
Response is returned (from cache or provider) and stored for future similar requests.
import OpenAI from 'openai';
// Just change the base URL - works with any provider
const client = new OpenAI({
baseURL: 'https://api.promptguard.co/api/v1',
apiKey: process.env.OPENAI_API_KEY,
defaultHeaders: {
'X-API-Key': process.env.PROMPTGUARD_API_KEY
}
});
// All your existing code works unchanged
const response = await client.chat.completions.create({
model="gpt-5-nano",
messages: [{ role: 'user', content: 'Hello!' }]
});
// Automatic caching, security, and routing!Drop-in replacement for OpenAI. Same code, significant cost savings, enterprise security included.