
Why Your AI Security Should Run in Your VPC
There is a strange irony in the AI security market.
A company says: "We're worried about sending user data to OpenAI."
So they buy an AI security tool—and send all their user data to that security vendor.
They've traded one third-party risk for another. The prompts still leave their network. They still can't verify how the vendor handles their data. They still have a compliance exposure.
The only way to fully control your data is to run the security layer inside your own infrastructure. This is why we built PromptGuard to be self-hostable first—not as an afterthought, but as the primary deployment model.
The Data Sovereignty Problem
If you're a bank, a hospital, or a government contractor, data residency isn't a suggestion. It's law.
- GDPR Article 44-49: Personal data transfers outside the EU/EEA require specific legal mechanisms.
- HIPAA: Protected Health Information must be handled by covered entities or business associates with signed BAAs.
- FedRAMP / ITAR: Government data may not leave authorized infrastructure.
- PCI-DSS: Cardholder data must be processed within the assessed cardholder data environment.
You cannot pipe your user prompts through a startup's cloud in us-east-1 just to check for prompt injection. Even if the vendor is well-intentioned, even if they have a SOC 2 report, the data still left your perimeter. That's the compliance problem.
When you self-host PromptGuard, the data never leaves your network. We (PromptGuard Inc.) never see your prompts. We don't know what your users are asking. We don't know how many requests you're processing. We don't even know you're running our software.
The Latency Problem
There's also a physics problem that most vendors don't talk about.
If your application runs in aws-us-west-2 and your security vendor's API is in gcp-us-central1, every request adds:
- DNS resolution: ~5ms
- TLS handshake: ~20ms (for a new connection)
- Cross-cloud network latency: ~30ms
- Serialization overhead: ~5ms
That's ~60ms of overhead before the security check even starts. Round trip, you're adding 100ms+ just for the network transit.
If you run PromptGuard as a sidecar container in the same Kubernetes pod—or even in the same Docker Compose network—the network latency drops to sub-millisecond. The security check's overhead is almost entirely compute time, not network time.
For applications where every millisecond matters (voice agents, real-time copilots, gaming), this difference is the difference between a usable product and a laggy one.
The Architecture of Self-Hosted PromptGuard
The production deployment is five containers, orchestrated by Docker Compose:
services:
api: # Python/FastAPI — the core proxy and security engine
dashboard: # Next.js — configuration, analytics, log viewer
postgres: # PostgreSQL 17 — users, projects, events, feedback
redis: # Redis 7 — caching, session state, rate limiting
nginx: # Reverse proxy with TLS terminationThe API Server
The API container is the heart of the system. It handles:
- OpenAI-compatible proxy endpoints (
/chat/completions,/completions,/models) - Anthropic-compatible proxy endpoints (
/messages) - The full 7-detector security pipeline
- Bot detection with behavioral analysis
- Policy evaluation with custom rules and presets
- Webhook and email alerting
- Usage tracking and subscription enforcement
It's a stateless container. You can run one instance for a small deployment or scale horizontally behind a load balancer. State lives in PostgreSQL (persistent data) and Redis (ephemeral data like caches and rate limit counters).
The Dashboard
The dashboard provides a web UI for:
- Creating and managing projects and API keys
- Configuring security presets and custom rules
- Viewing security event logs and analytics
- Running red team assessments
- Managing team members and billing
It's a standard Next.js application. If you don't need a web UI (e.g., you manage everything via API), you can skip this container entirely.
PostgreSQL
Stores all persistent data:
- Users, projects, and API keys
- Security event logs (with configurable retention)
- Feedback entries for model recalibration
- Custom rules and policies
- Subscription and billing data
We use PostgreSQL 17 with Alpine for minimal image size. The schema is managed by SQL migrations in the supabase/migrations/ directory.
Redis
Handles ephemeral, high-frequency data:
- Detection cache (exact-match, 1-hour TTL)
- OAuth state tokens (10-minute TTL)
- Magic link codes (10-minute TTL)
- Bot detection state (blocked fingerprints, rate counters)
- Session data
Redis is optional for single-instance deployments—the system falls back to in-memory storage. But for multi-instance deployments, Redis is required to share state across API containers.
Nginx
Terminates TLS and routes traffic to the API and dashboard. In environments where you already have a load balancer or ingress controller handling TLS, you can skip this container and route directly to the API.
Deployment Guide
Step 1: Clone and Configure
git clone https://github.com/acebot712/promptguard
cd promptguard/deploy
cp .env.example .envEdit .env with your configuration:
# Required
DATABASE_URL=postgresql://promptguard:your-password@postgres:5432/promptguard
REDIS_URL=redis://redis:6379/0
JWT_SECRET=your-random-secret-minimum-32-chars
# ML Detection (requires HuggingFace API key for full ensemble)
HUGGINGFACE_API_KEY=hf_your_key_here
ENABLE_ML_DETECTION=true
# Optional: LLM Provider Keys (for proxying)
OPENAI_API_KEY=sk-your-key
ANTHROPIC_API_KEY=sk-ant-your-key
# Optional: Email Alerts
SMTP_HOST=smtp.example.com
SMTP_FROM_EMAIL=alerts@yourdomain.com
# Optional: Stripe (for billing, if needed)
STRIPE_API_KEY=sk_live_your-keyStep 2: Start the Stack
docker-compose up -dThat's it. Five containers start. PostgreSQL initializes the schema. Redis starts accepting connections. The API server runs health checks and begins accepting traffic.
Step 3: Verify
# Check health
curl http://localhost:8080/health
# Send a test request
curl http://localhost:8080/api/v1/proxy/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "X-API-Key: your-promptguard-api-key" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'Step 4: Point Your Application
Change your application's LLM base URL to point to your self-hosted instance:
client = OpenAI(
base_url="http://promptguard.internal:8080/api/v1/proxy",
default_headers={"X-API-Key": "your-promptguard-api-key"}
)Updating
Updates are a one-command operation:
cd deploy
docker-compose pull
docker-compose up -dThe containers pull the latest images, the database migrations run automatically, and traffic resumes. No downtime for most updates.
Scaling
For high-traffic deployments:
Horizontal scaling: Run multiple API containers behind a load balancer. Redis provides shared state for caching, rate limiting, and bot detection. PostgreSQL handles persistent storage with connection pooling.
GPU inference (optional): For teams that want to run the ML ensemble locally instead of using the HuggingFace API, you can deploy a HuggingFace-compatible inference server (like TGI or vLLM) and point the ML provider configuration at your local endpoint. This eliminates the ~100-140ms of API latency for ML inference.
Read replicas: For analytics-heavy workloads, add PostgreSQL read replicas to handle dashboard queries without impacting the API's write path.
The BYOC Future
We believe the future of security infrastructure is Bring Your Own Cloud (BYOC).
SaaS is great for non-critical tools—project management, documentation, design. But for infrastructure that sits in the critical path of your user data, you should own the deployment.
When you self-host PromptGuard:
- Your data never leaves your network
- Your latency is determined by your hardware, not someone else's cloud
- Your compliance posture is under your control
- Your security tool can't go down because a vendor has an outage
- Your costs scale with your infrastructure, not with per-request pricing
The code is open source. The deployment is containerized. The migration from hosted to self-hosted (or vice versa) is a URL change.
Own your security. Own your data. Own the metal.
READ MORE

Beyond Redaction: Why We Replace PII With Synthetic Data
Redacting PII with [SSN_REDACTED] breaks the LLM's ability to reason about data. Replacing it with realistic-looking fake data preserves the reasoning while eliminating the privacy risk. Here's how synthetic data replacement works and when to use it.

Multi-Provider Failover: How to Keep Your AI App Running When OpenAI Goes Down
When OpenAI has a 30-minute outage, your AI application doesn't have to go down with it. Here's how PromptGuard's SmartRouter automatically fails over across providers—OpenAI, Anthropic, Gemini, Mistral, Groq, and Azure—without your users noticing.

Securing LangChain Applications: The Complete Guide
LangChain makes it easy to build powerful agents. It also makes it easy to build security vulnerabilities. Here's how to add production-grade security to your chains, agents, and RAG pipelines without rewriting your application.