Multi-Provider Failover: How to Keep Your AI App Running When OpenAI Goes Down

On March 12, 2025, OpenAI had a 45-minute outage. GPT-4 returned 500 errors. Chat completions failed. Every application that hardcoded api.openai.com as its LLM endpoint went down.

Including several of our early customers.

The customers who were routing through PromptGuard? Their applications kept running. The SmartRouter detected OpenAI's failures, switched to Anthropic within seconds, and traffic continued flowing. Users didn't notice anything except slightly different response phrasing.

After that incident, we made multi-provider failover a first-class feature.

The Problem: Single Provider Dependency

Most AI applications are architecturally coupled to a single LLM provider:

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

This creates a single point of failure with three risk vectors:

1. Provider outages. OpenAI, Anthropic, and Google all have outages. They're rare (99.9% uptime means ~8.7 hours of downtime per year), but when they happen, they're total. No graceful degradation—just 500 errors.

2. Rate limiting. At high traffic volumes, you'll hit rate limits. OpenAI's rate limits are per-organization, and if you have multiple services sharing the same API key, they compete for capacity.

3. Regional availability. Some providers have better latency in certain regions. If your users are globally distributed, a single provider means some users always get suboptimal latency.

How the SmartRouter Works

PromptGuard's SmartRouter sits between your application and multiple LLM providers. It maintains a health model of each provider and makes routing decisions in real-time.

Supported Providers

Provider	Models	Endpoints
OpenAI	GPT-4, GPT-4o, GPT-3.5-turbo	api.openai.com
Anthropic	Claude 3 Haiku/Sonnet/Opus	api.anthropic.com
Google Gemini	Gemini Pro, Gemini Ultra	generativelanguage.googleapis.com
Mistral	Mistral Large, Medium, Small	api.mistral.ai
Groq	Llama, Mixtral (fast inference)	api.groq.com
Azure OpenAI	GPT-4, GPT-3.5 (Azure-hosted)	your-deployment.openai.azure.com

The Routing Algorithm

For each request, the SmartRouter evaluates available providers on three dimensions:

1. Health status (circuit breaker). We track the success/failure rate of recent requests to each provider. If a provider's error rate exceeds a threshold, the circuit breaker "opens" and we stop routing traffic to it. The circuit breaker "half-opens" periodically to test if the provider has recovered.

Provider States:
  CLOSED (healthy)  → route traffic normally
  OPEN (unhealthy)  → skip this provider
  HALF_OPEN (testing) → send one probe request, evaluate result

2. Model availability. Not every provider supports every model. If your application requests gpt-4o, we can route to OpenAI or Azure OpenAI—but not to Anthropic (which would need claude-3-5-sonnet). The SmartRouter maintains a mapping of equivalent models across providers for automatic substitution when the primary provider is unavailable.

3. Latency history. We track p50 and p95 latency for each provider. When multiple providers are healthy and offer equivalent models, we route to the one with the best recent latency.

The Failover Cascade

When a request fails (timeout, 500 error, rate limit), the SmartRouter automatically retries with alternative providers:

Attempt 1: OpenAI (primary)
  → 500 Internal Server Error
  → Circuit breaker records failure

Attempt 2: Anthropic (first fallback)
  → Success
  → Response returned to user
  → OpenAI circuit breaker incremented

Attempt 3 (if needed): Gemini (second fallback)
  → Only reached if Anthropic also fails

Maximum retry depth: 3 providers. This bounds the total latency to 3x the single-provider latency in the worst case—still faster than returning an error to the user and having them retry manually.

What Your Application Sees

From your application's perspective, nothing changes. You send an OpenAI-format request to PromptGuard, and you get an OpenAI-format response back. The fact that the response came from Anthropic instead of OpenAI is transparent.

client = OpenAI(
    base_url="https://api.promptguard.co/api/v1/proxy",
    default_headers={"X-API-Key": os.environ["PROMPTGUARD_API_KEY"]}
)

# This might be served by OpenAI, Anthropic, or Gemini
# Your code doesn't know or care
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)

The response format is always OpenAI-compatible, regardless of which provider actually served it. We handle the protocol translation (Anthropic's Messages API, Google's Gemini API) internally.

Configuration

Provider configuration is managed through environment variables or the dashboard:

# Primary provider (always tried first)
OPENAI_API_KEY=sk-your-openai-key

# Fallback providers (tried in order if primary fails)
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key
GOOGLE_API_KEY=your-google-key
MISTRAL_API_KEY=your-mistral-key
GROQ_API_KEY=your-groq-key

# Azure OpenAI (if you have a deployment)
AZURE_OPENAI_ENDPOINT=https://your-deployment.openai.azure.com
AZURE_OPENAI_API_KEY=your-azure-key

You only need to configure the providers you want to use. If you only have OpenAI and Anthropic keys, the SmartRouter will failover between those two.

Security Consistency Across Providers

A critical concern with multi-provider routing: does the security scanning work the same regardless of which provider serves the request?

Yes. PromptGuard's security pipeline runs before the request reaches any provider. The same 7-detector pipeline, the same ML ensemble, the same policy engine evaluates every request. The provider is just the backend that generates the response.

The output scanning also runs the same way—after the response comes back from whichever provider served it, it passes through PII detection, API key detection, and toxicity filtering before reaching the user.

User → PromptGuard Security Pipeline → SmartRouter → Provider A/B/C
                                                          ↓
User ← PromptGuard Output Scanning ← Response ←————————

Security is at the proxy layer, not the provider layer. Switching providers doesn't bypass security.

Monitoring and Observability

The dashboard shows real-time provider health:

Request distribution: What percentage of traffic is going to each provider
Error rates: Per-provider error rates over time
Latency: p50 and p95 latency per provider
Failover events: When failovers occurred, from which provider to which

This visibility helps you understand your provider dependency and make informed decisions about which providers to keep as fallbacks.

Cost Implications

Different providers have different pricing:

Provider	Input Price (per 1M tokens)	Output Price (per 1M tokens)
GPT-4o	$2.50	$10.00
Claude 3.5 Sonnet	$3.00	$15.00
Gemini 1.5 Pro	$1.25	$5.00
Mistral Large	$2.00	$6.00
Groq (Llama 3)	$0.05	$0.10

The SmartRouter doesn't currently optimize for cost—it optimizes for reliability and latency. But knowing your provider distribution from the dashboard helps you understand your blended cost per request.

When Failover Isn't Enough

Multi-provider failover handles transient failures—outages, rate limits, network issues. It doesn't handle:

Model quality differences. If your application is fine-tuned for GPT-4o's specific response patterns, Anthropic's Claude may produce subtly different outputs. For applications where output consistency matters (content generation, structured data extraction), test your application against all fallback providers before enabling failover.

Feature parity. Some providers support features that others don't (function calling, vision, specific context window sizes). The SmartRouter routes to providers that support the requested features, but if your application relies on a provider-specific feature, failover may not be possible.

Data residency. If you have contractual obligations about where your data is processed, you may need to restrict failover to providers in specific regions. Configure only providers that meet your data residency requirements.

Conclusion

Your AI application's reliability shouldn't be limited by the reliability of a single API provider. When OpenAI has a bad hour, your users shouldn't have to wait for it to recover.

Multi-provider failover is table stakes for production AI applications. It's the difference between a 99.9% SLA (one provider's guarantee) and a 99.99%+ SLA (the combined availability of three providers that rarely fail simultaneously).

Configure your fallback providers. Test your failover paths. And the next time $PROVIDER goes down, enjoy the silence of an application that kept running.

Multi-Provider Failover: How to Keep Your AI App Running When OpenAI Goes Down

Multi-Provider Failover: How to Keep Your AI App Running When OpenAI Goes Down

The Problem: Single Provider Dependency

How the SmartRouter Works

Supported Providers

The Routing Algorithm

The Failover Cascade

What Your Application Sees

Configuration

Security Consistency Across Providers

Monitoring and Observability

Cost Implications

When Failover Isn't Enough

Conclusion

READ MORE

Inside Our 5-Model ML Ensemble: How We Detect Attacks Without Adding Latency

Why Your AI Security Should Run in Your VPC (And How to Set It Up)

Why Support Bots Are Your Biggest Security Hole (And How to Fix It)