Back to all articles
TransparencyDXPhilosophy

Radical Transparency: Why Every Security Decision Needs a Receipt

Most security tools return '403 Forbidden' and leave you guessing. We return the confidence score, the threat type, the event ID, and the source code. Here's why transparency isn't a nice-to-have—it's the only way to build trust.

Radical Transparency: Why Every Security Decision Needs a Receipt

Radical Transparency: Why Every Security Decision Needs a Receipt

I hate WAFs.

I hate them because when they work, they're invisible. When they break, they're opaque. You get a 403 Forbidden. Why? Who knows. Maybe you looked at the server funny. Maybe your request contained a string that matched a regex written by someone who quit three years ago. Maybe the vendor's ML model had a bad day.

The typical debugging workflow is:

  1. Check your logs. Nothing.
  2. Check the vendor dashboard. "Threat Detected."
  3. Open a support ticket. Wait 48 hours.
  4. Get a response: "Our model flagged this as suspicious. We can't share the details for security reasons."
  5. Disable the WAF. Ship the feature. Hope for the best.

When we built PromptGuard, we made a rule: no magic. Every decision the system makes must be explainable, traceable, and auditable. Not in theory. In practice.

What Transparency Actually Looks Like

Response Headers on Every Request

Every response from PromptGuard—whether it's ALLOW, BLOCK, or REDACT—includes metadata headers:

X-PromptGuard-Event-ID: evt_7f3a2b1c
X-PromptGuard-Decision: block
X-PromptGuard-Confidence: 0.94
X-PromptGuard-Threat-Type: prompt_injection

Four headers. Four pieces of information that transform a mysterious "blocked" into an actionable data point.

Event ID is your receipt. It links the request to a full audit trail in the database. If a user complains that they were blocked, you give support the event ID and they can see exactly what happened—which detector triggered, what pattern matched, what the confidence score was, and the full (truncated) prompt.

Decision tells you what happened: the request was allowed, blocked, or had sensitive data redacted.

Confidence tells you how sure the system was. This is not a raw model score—it's a calibrated probability computed via Platt scaling. When we say 0.94, we mean "94% of prompts with this score are actual attacks." This matters enormously for tuning. If you're getting blocked at 0.52 confidence, you probably want to adjust your threshold. If you're getting blocked at 0.98, the system is almost certainly correct.

Threat Type tells you the category: prompt_injection, pii_leak, data_exfiltration, toxicity, api_key_leak, fraud_abuse, or malware.

The Blocked Response

When PromptGuard blocks a request, it doesn't just return a generic error. It returns structured information the developer can act on:

{
  "error": {
    "message": "Request blocked by PromptGuard security policy",
    "type": "security_violation",
    "code": "content_policy_violation",
    "event_id": "evt_7f3a2b1c",
    "confidence": 0.94,
    "threat_type": "prompt_injection"
  }
}

Compare this to what most security tools return:

{
  "error": {
    "message": "Content policy violation"
  }
}

The difference is the difference between actionable intelligence and noise.

Audit Logs With Full Context

Every security event is logged to the database with:

  • The decision and confidence
  • The threat type and detector that triggered
  • The specific pattern or model that caused the detection
  • A truncated content preview (max 500 characters—long enough to understand context, short enough to limit data exposure)
  • Timestamps, request metadata, and the project/API key that was used

For projects with zero retention mode enabled, the content preview is omitted entirely. You get the decision metadata without any of the prompt content. This is important for regulated industries where you need audit trails but can't store user data.

Open Source Detection Logic

We publish our detection rules. You can see the regex patterns on GitHub. You can read the policy evaluation logic. You can trace a blocked request from the HTTP handler through the security engine to the exact line of code that made the decision.

This isn't "open core" where the important parts are proprietary. The entire security pipeline—all seven detectors, the ML ensemble configuration, the policy engine, the preset definitions—is open source.

Why does this matter?

  1. You can audit what we catch. If you're not sure whether PromptGuard will detect a specific attack, you can read the regex patterns and model configurations and know for certain.

  2. You can audit what we don't catch. Open source means security researchers can find gaps. When they do, they open issues, and we fix them. This is faster and more reliable than hoping a vendor's internal team catches every edge case.

  3. You can verify data handling. Regulated industries don't trust "we don't store your data" on a marketing page. They trust reading the code and seeing that zero_retention skips the content_preview field in the SecurityEvent record.

The Feedback Loop

Transparency isn't just about explaining decisions—it's about enabling users to correct them.

Every blocked request is an opportunity to get smarter. When our system makes a mistake (either a false positive or a false negative), users can submit feedback:

  • False positive report: "This prompt was legitimate. You shouldn't have blocked it." The original decision, confidence, and model scores are recorded alongside the user's correction.
  • False negative report: "This prompt was an attack. You should have blocked it." Same data capture in reverse.

This feedback feeds directly into our weekly model recalibration. We use Platt scaling to adjust the calibration parameters (a and b) for each model based on the error distribution. Models that are producing too many false positives get their thresholds nudged up. Models that are missing attacks get their thresholds nudged down.

The result: a system that gets measurably better over time, driven by real production data, with full transparency into how and why the calibration changed.

Why Most Vendors Don't Do This

There are two reasons most security vendors operate as black boxes:

1. Competitive moat. If your detection logic is proprietary, competitors can't replicate it. This is a valid business concern but a terrible security posture. Security through obscurity has been debunked for decades. If your only defense is "I hope they don't guess my regex," you've already lost.

2. Liability avoidance. If you explain why you blocked something, and your explanation is wrong, you're exposed. If you just say "threat detected," you can never be proven wrong—only unhelpful.

We chose a different path. We'd rather be proven wrong and fix it than be opaque and trusted blindly.

What Transparency Enables

Better Security Posture

When you can see your blocked requests with confidence scores, you can make informed decisions about your security configuration:

  • High-confidence blocks (0.90+) are almost certainly correct. Leave them alone.
  • Medium-confidence blocks (0.60-0.90) deserve review. Some may be false positives.
  • Low-confidence blocks (0.50-0.60) suggest your threshold might be too aggressive for your use case.

Faster Incident Response

When a legitimate user is blocked, the support workflow takes minutes, not days:

  1. User reports the block.
  2. Support looks up the event ID.
  3. They see: "InjectionDetector, regex pattern 'ignore previous instructions', confidence 0.87."
  4. They determine it was a false positive (the user is a teacher writing about AI safety).
  5. They submit feedback to improve the model.
  6. They adjust the preset from strict to moderate for this project.

Regulatory Compliance

GDPR, HIPAA, SOC 2, and PCI-DSS all require audit trails. They don't require a specific tool—they require that you can demonstrate what happened, when, and why.

PromptGuard's event logs satisfy this requirement out of the box. Every security decision is recorded with timestamps, decision metadata, and (optionally) content previews. Auditors can query the database directly to verify that security policies are being enforced consistently.

Trust Requires Truth

You are trusting PromptGuard to sit between your users and your AI. That trust requires two things: that we make good decisions, and that we show our work.

We can't guarantee we'll never make a mistake. No security system can. But we can guarantee that when we do make a mistake, you'll know exactly what happened, and you'll have the tools to fix it.

That's the deal. We block the threats, and we give you a receipt for every one.