Back to all articles
Self-HostingInfrastructurePrivacyTutorial

Why Your AI Security Should Run in Your VPC (And How to Set It Up)

Sending your user prompts to a security vendor defeats the purpose of security. Here's why we built PromptGuard to be self-hostable first, and a complete guide to deploying it in your own infrastructure.

Why Your AI Security Should Run in Your VPC (And How to Set It Up)

Why Your AI Security Should Run in Your VPC

There is a strange irony in the AI security market.

A company says: "We're worried about sending user data to OpenAI."

So they buy an AI security tool—and send all their user data to that security vendor.

They've traded one third-party risk for another. The prompts still leave their network. They still can't verify how the vendor handles their data. They still have a compliance exposure.

The only way to fully control your data is to run the security layer inside your own infrastructure. This is why we built PromptGuard to be self-hostable first—not as an afterthought, but as the primary deployment model.

The Data Sovereignty Problem

If you're a bank, a hospital, or a government contractor, data residency isn't a suggestion. It's law.

  • GDPR Article 44-49: Personal data transfers outside the EU/EEA require specific legal mechanisms.
  • HIPAA: Protected Health Information must be handled by covered entities or business associates with signed BAAs.
  • FedRAMP / ITAR: Government data may not leave authorized infrastructure.
  • PCI-DSS: Cardholder data must be processed within the assessed cardholder data environment.

You cannot pipe your user prompts through a startup's cloud in us-east-1 just to check for prompt injection. Even if the vendor is well-intentioned, even if they have a SOC 2 report, the data still left your perimeter. That's the compliance problem.

When you self-host PromptGuard, the data never leaves your network. We (PromptGuard Inc.) never see your prompts. We don't know what your users are asking. We don't know how many requests you're processing. We don't even know you're running our software.

The Latency Problem

There's also a physics problem that most vendors don't talk about.

If your application runs in aws-us-west-2 and your security vendor's API is in gcp-us-central1, every request adds:

  1. DNS resolution: ~5ms
  2. TLS handshake: ~20ms (for a new connection)
  3. Cross-cloud network latency: ~30ms
  4. Serialization overhead: ~5ms

That's ~60ms of overhead before the security check even starts. Round trip, you're adding 100ms+ just for the network transit.

If you run PromptGuard as a sidecar container in the same Kubernetes pod—or even in the same Docker Compose network—the network latency drops to sub-millisecond. The security check's overhead is almost entirely compute time, not network time.

For applications where every millisecond matters (voice agents, real-time copilots, gaming), this difference is the difference between a usable product and a laggy one.

The Architecture of Self-Hosted PromptGuard

The production deployment is five containers, orchestrated by Docker Compose:

services:
  api:        # Python/FastAPI — the core proxy and security engine
  dashboard:  # Next.js — configuration, analytics, log viewer
  postgres:   # PostgreSQL 17 — users, projects, events, feedback
  redis:      # Redis 7 — caching, session state, rate limiting
  nginx:      # Reverse proxy with TLS termination

The API Server

The API container is the heart of the system. It handles:

  • OpenAI-compatible proxy endpoints (/chat/completions, /completions, /models)
  • Anthropic-compatible proxy endpoints (/messages)
  • The full 7-detector security pipeline
  • Bot detection with behavioral analysis
  • Policy evaluation with custom rules and presets
  • Webhook and email alerting
  • Usage tracking and subscription enforcement

It's a stateless container. You can run one instance for a small deployment or scale horizontally behind a load balancer. State lives in PostgreSQL (persistent data) and Redis (ephemeral data like caches and rate limit counters).

The Dashboard

The dashboard provides a web UI for:

  • Creating and managing projects and API keys
  • Configuring security presets and custom rules
  • Viewing security event logs and analytics
  • Running red team assessments
  • Managing team members and billing

It's a standard Next.js application. If you don't need a web UI (e.g., you manage everything via API), you can skip this container entirely.

PostgreSQL

Stores all persistent data:

  • Users, projects, and API keys
  • Security event logs (with configurable retention)
  • Feedback entries for model recalibration
  • Custom rules and policies
  • Subscription and billing data

We use PostgreSQL 17 with Alpine for minimal image size. The schema is managed by SQL migrations in the supabase/migrations/ directory.

Redis

Handles ephemeral, high-frequency data:

  • Detection cache (exact-match, 1-hour TTL)
  • OAuth state tokens (10-minute TTL)
  • Magic link codes (10-minute TTL)
  • Bot detection state (blocked fingerprints, rate counters)
  • Session data

Redis is optional for single-instance deployments—the system falls back to in-memory storage. But for multi-instance deployments, Redis is required to share state across API containers.

Nginx

Terminates TLS and routes traffic to the API and dashboard. In environments where you already have a load balancer or ingress controller handling TLS, you can skip this container and route directly to the API.

Deployment Guide

Step 1: Clone and Configure

git clone https://github.com/acebot712/promptguard
cd promptguard/deploy
cp .env.example .env

Edit .env with your configuration:

# Required
DATABASE_URL=postgresql://promptguard:your-password@postgres:5432/promptguard
REDIS_URL=redis://redis:6379/0
JWT_SECRET=your-random-secret-minimum-32-chars

# ML Detection (requires HuggingFace API key for full ensemble)
HUGGINGFACE_API_KEY=hf_your_key_here
ENABLE_ML_DETECTION=true

# Optional: LLM Provider Keys (for proxying)
OPENAI_API_KEY=sk-your-key
ANTHROPIC_API_KEY=sk-ant-your-key

# Optional: Email Alerts
SMTP_HOST=smtp.example.com
SMTP_FROM_EMAIL=alerts@yourdomain.com

# Optional: Stripe (for billing, if needed)
STRIPE_API_KEY=sk_live_your-key

Step 2: Start the Stack

docker-compose up -d

That's it. Five containers start. PostgreSQL initializes the schema. Redis starts accepting connections. The API server runs health checks and begins accepting traffic.

Step 3: Verify

# Check health
curl http://localhost:8080/health

# Send a test request
curl http://localhost:8080/api/v1/proxy/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "X-API-Key: your-promptguard-api-key" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'

Step 4: Point Your Application

Change your application's LLM base URL to point to your self-hosted instance:

client = OpenAI(
    base_url="http://promptguard.internal:8080/api/v1/proxy",
    default_headers={"X-API-Key": "your-promptguard-api-key"}
)

Updating

Updates are a one-command operation:

cd deploy
docker-compose pull
docker-compose up -d

The containers pull the latest images, the database migrations run automatically, and traffic resumes. No downtime for most updates.

Scaling

For high-traffic deployments:

Horizontal scaling: Run multiple API containers behind a load balancer. Redis provides shared state for caching, rate limiting, and bot detection. PostgreSQL handles persistent storage with connection pooling.

GPU inference (optional): For teams that want to run the ML ensemble locally instead of using the HuggingFace API, you can deploy a HuggingFace-compatible inference server (like TGI or vLLM) and point the ML provider configuration at your local endpoint. This eliminates the ~100-140ms of API latency for ML inference.

Read replicas: For analytics-heavy workloads, add PostgreSQL read replicas to handle dashboard queries without impacting the API's write path.

The BYOC Future

We believe the future of security infrastructure is Bring Your Own Cloud (BYOC).

SaaS is great for non-critical tools—project management, documentation, design. But for infrastructure that sits in the critical path of your user data, you should own the deployment.

When you self-host PromptGuard:

  • Your data never leaves your network
  • Your latency is determined by your hardware, not someone else's cloud
  • Your compliance posture is under your control
  • Your security tool can't go down because a vendor has an outage
  • Your costs scale with your infrastructure, not with per-request pricing

The code is open source. The deployment is containerized. The migration from hosted to self-hosted (or vice versa) is a URL change.

Own your security. Own your data. Own the metal.