
Multi-Provider Failover: How to Keep Your AI App Running When OpenAI Goes Down
On March 12, 2025, OpenAI had a 45-minute outage. GPT-4 returned 500 errors. Chat completions failed. Every application that hardcoded api.openai.com as its LLM endpoint went down.
Including several of our early customers.
The customers who were routing through PromptGuard? Their applications kept running. The SmartRouter detected OpenAI's failures, switched to Anthropic within seconds, and traffic continued flowing. Users didn't notice anything except slightly different response phrasing.
After that incident, we made multi-provider failover a first-class feature.
The Problem: Single Provider Dependency
Most AI applications are architecturally coupled to a single LLM provider:
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])This creates a single point of failure with three risk vectors:
1. Provider outages. OpenAI, Anthropic, and Google all have outages. They're rare (99.9% uptime means ~8.7 hours of downtime per year), but when they happen, they're total. No graceful degradation—just 500 errors.
2. Rate limiting. At high traffic volumes, you'll hit rate limits. OpenAI's rate limits are per-organization, and if you have multiple services sharing the same API key, they compete for capacity.
3. Regional availability. Some providers have better latency in certain regions. If your users are globally distributed, a single provider means some users always get suboptimal latency.
How the SmartRouter Works
PromptGuard's SmartRouter sits between your application and multiple LLM providers. It maintains a health model of each provider and makes routing decisions in real-time.
Supported Providers
| Provider | Models | Endpoints |
|---|---|---|
| OpenAI | GPT-4, GPT-4o, GPT-3.5-turbo | api.openai.com |
| Anthropic | Claude 3 Haiku/Sonnet/Opus | api.anthropic.com |
| Google Gemini | Gemini Pro, Gemini Ultra | generativelanguage.googleapis.com |
| Mistral | Mistral Large, Medium, Small | api.mistral.ai |
| Groq | Llama, Mixtral (fast inference) | api.groq.com |
| Azure OpenAI | GPT-4, GPT-3.5 (Azure-hosted) | your-deployment.openai.azure.com |
The Routing Algorithm
For each request, the SmartRouter evaluates available providers on three dimensions:
1. Health status (circuit breaker). We track the success/failure rate of recent requests to each provider. If a provider's error rate exceeds a threshold, the circuit breaker "opens" and we stop routing traffic to it. The circuit breaker "half-opens" periodically to test if the provider has recovered.
Provider States:
CLOSED (healthy) → route traffic normally
OPEN (unhealthy) → skip this provider
HALF_OPEN (testing) → send one probe request, evaluate result2. Model availability.
Not every provider supports every model. If your application requests gpt-4o, we can route to OpenAI or Azure OpenAI—but not to Anthropic (which would need claude-3-5-sonnet). The SmartRouter maintains a mapping of equivalent models across providers for automatic substitution when the primary provider is unavailable.
3. Latency history. We track p50 and p95 latency for each provider. When multiple providers are healthy and offer equivalent models, we route to the one with the best recent latency.
The Failover Cascade
When a request fails (timeout, 500 error, rate limit), the SmartRouter automatically retries with alternative providers:
Attempt 1: OpenAI (primary)
→ 500 Internal Server Error
→ Circuit breaker records failure
Attempt 2: Anthropic (first fallback)
→ Success
→ Response returned to user
→ OpenAI circuit breaker incremented
Attempt 3 (if needed): Gemini (second fallback)
→ Only reached if Anthropic also failsMaximum retry depth: 3 providers. This bounds the total latency to 3x the single-provider latency in the worst case—still faster than returning an error to the user and having them retry manually.
What Your Application Sees
From your application's perspective, nothing changes. You send an OpenAI-format request to PromptGuard, and you get an OpenAI-format response back. The fact that the response came from Anthropic instead of OpenAI is transparent.
client = OpenAI(
base_url="https://api.promptguard.co/api/v1/proxy",
default_headers={"X-API-Key": os.environ["PROMPTGUARD_API_KEY"]}
)
# This might be served by OpenAI, Anthropic, or Gemini
# Your code doesn't know or care
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}]
)The response format is always OpenAI-compatible, regardless of which provider actually served it. We handle the protocol translation (Anthropic's Messages API, Google's Gemini API) internally.
Configuration
Provider configuration is managed through environment variables or the dashboard:
# Primary provider (always tried first)
OPENAI_API_KEY=sk-your-openai-key
# Fallback providers (tried in order if primary fails)
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key
GOOGLE_API_KEY=your-google-key
MISTRAL_API_KEY=your-mistral-key
GROQ_API_KEY=your-groq-key
# Azure OpenAI (if you have a deployment)
AZURE_OPENAI_ENDPOINT=https://your-deployment.openai.azure.com
AZURE_OPENAI_API_KEY=your-azure-keyYou only need to configure the providers you want to use. If you only have OpenAI and Anthropic keys, the SmartRouter will failover between those two.
Security Consistency Across Providers
A critical concern with multi-provider routing: does the security scanning work the same regardless of which provider serves the request?
Yes. PromptGuard's security pipeline runs before the request reaches any provider. The same 7-detector pipeline, the same ML ensemble, the same policy engine evaluates every request. The provider is just the backend that generates the response.
The output scanning also runs the same way—after the response comes back from whichever provider served it, it passes through PII detection, API key detection, and toxicity filtering before reaching the user.
User → PromptGuard Security Pipeline → SmartRouter → Provider A/B/C
↓
User ← PromptGuard Output Scanning ← Response ←————————Security is at the proxy layer, not the provider layer. Switching providers doesn't bypass security.
Monitoring and Observability
The dashboard shows real-time provider health:
- Request distribution: What percentage of traffic is going to each provider
- Error rates: Per-provider error rates over time
- Latency: p50 and p95 latency per provider
- Failover events: When failovers occurred, from which provider to which
This visibility helps you understand your provider dependency and make informed decisions about which providers to keep as fallbacks.
Cost Implications
Different providers have different pricing:
| Provider | Input Price (per 1M tokens) | Output Price (per 1M tokens) |
|---|---|---|
| GPT-4o | $2.50 | $10.00 |
| Claude 3.5 Sonnet | $3.00 | $15.00 |
| Gemini 1.5 Pro | $1.25 | $5.00 |
| Mistral Large | $2.00 | $6.00 |
| Groq (Llama 3) | $0.05 | $0.10 |
The SmartRouter doesn't currently optimize for cost—it optimizes for reliability and latency. But knowing your provider distribution from the dashboard helps you understand your blended cost per request.
When Failover Isn't Enough
Multi-provider failover handles transient failures—outages, rate limits, network issues. It doesn't handle:
Model quality differences. If your application is fine-tuned for GPT-4o's specific response patterns, Anthropic's Claude may produce subtly different outputs. For applications where output consistency matters (content generation, structured data extraction), test your application against all fallback providers before enabling failover.
Feature parity. Some providers support features that others don't (function calling, vision, specific context window sizes). The SmartRouter routes to providers that support the requested features, but if your application relies on a provider-specific feature, failover may not be possible.
Data residency. If you have contractual obligations about where your data is processed, you may need to restrict failover to providers in specific regions. Configure only providers that meet your data residency requirements.
Conclusion
Your AI application's reliability shouldn't be limited by the reliability of a single API provider. When OpenAI has a bad hour, your users shouldn't have to wait for it to recover.
Multi-provider failover is table stakes for production AI applications. It's the difference between a 99.9% SLA (one provider's guarantee) and a 99.99%+ SLA (the combined availability of three providers that rarely fail simultaneously).
Configure your fallback providers. Test your failover paths. And the next time $PROVIDER goes down, enjoy the silence of an application that kept running.
READ MORE

Inside Our 5-Model ML Ensemble: How We Detect Attacks Without Adding Latency
A technical deep dive into how PromptGuard's ensemble of Llama-Prompt-Guard, DeBERTa, ALBERT, toxic-bert, and RoBERTa classifies threats—covering parallel inference, weighted voting, category-specific thresholds, confidence calibration, and why five small models beat one large one.

Why Your AI Security Should Run in Your VPC (And How to Set It Up)
Sending your user prompts to a security vendor defeats the purpose of security. Here's why we built PromptGuard to be self-hostable first, and a complete guide to deploying it in your own infrastructure.

Why Support Bots Are Your Biggest Security Hole (And How to Fix It)
We've watched helpfully trained bots email transaction histories to strangers, issue unauthorized refunds, and leak internal system prompts—all without a single 'jailbreak' keyword. Here's the three-layer defense architecture that actually secures customer support AI.