The ngrok for PHI

One Import Change.
Automatic PHI Redaction.

Drop-in replacement for OpenAI, Anthropic, and Gemini SDKs. PHI is automatically redacted before reaching any LLM provider. All processing happens locally—no Redact API keys, no signup. Uses your existing LLM provider keys.

Get Started Free View on GitHub

your_app.py

# BEFORE - PHI goes directly to OpenAI
from openai import OpenAI

# AFTER - PHI is automatically redacted
from redact_proxy import OpenAI

# Everything else stays exactly the same
client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content":
        "Patient John Smith, DOB 01/15/1980, has diabetes."}]
)

# What OpenAI actually receives:
# "Patient [NAME_a1b2c3], DOB [DATE_d4e5f6], has diabetes."

Live Demo

Ready

Your Input (contains PHI)

Sent to LLM (PHI redacted)

NEW

Need PHI back in responses?

DE-ID removes PHI permanently. Our RE-ID SDK creates reversible tokens - LLM never sees PHI, but you can restore it. pip install redact-proxy[reid]

Learn About RE-ID →

How It Works

Intercept, Redact, Forward

Redact-Proxy wraps your LLM SDK and intercepts all requests before they leave your machine.

📝

Your Code

Contains PHI

→

🛡️

Redact-Proxy

Local processing

→

🤖

LLM Provider

PHI-free request

What gets sent to OpenAI

# Your original message:
"Patient John Smith, SSN 123-45-6789, was seen on 01/15/2024
at Springfield Medical Center. Contact: (555) 123-4567"

# What OpenAI receives:
"Patient [NAME_a1b2c3], SSN [SSN_d4e5f6], was seen on [DATE_g7h8i9]
at [FACILITY_j0k1l2]. Contact: [PHONE_m3n4o5]"

Features

Why Redact-Proxy?

Zero infrastructure. Zero signup. Just change your import.

One-Line Migration

Change one import statement. All your existing code works exactly the same, but now PHI is protected.

Zero Infrastructure

No API keys, no cloud services, no Docker containers. Everything runs locally in your Python environment.

🔒

PHI Never Leaves

All redaction happens locally before any network request. Your PHI never touches third-party servers.

⚡

Fast Detection

Pattern-based detection adds ~8ms latency. Transformer mode available for higher accuracy.

🔄

Drop-In Compatible

Same API, same methods, same parameters. Works with OpenAI, Anthropic, and Google Gemini.

📖

Open Source

MIT licensed core. Inspect the code, contribute, or fork it. No vendor lock-in.

Security Model

Where Your Data Goes

Understand exactly what happens to PHI at each step.

🏠

In-Process Redaction

All detection and redaction happens in your application's memory. No external service calls.

🚫

PHI Stays Local

Only redacted text goes to the LLM provider. Original PHI never leaves your machine.

🗑️

No Persistence

PHI↔placeholder mappings are stored in memory only and cleared after each request completes.

📝

Logging Off by Default

Debug logging is disabled by default. If enabled, ensure your logs are stored securely.

Watch Out

Common Pitfalls

PHI can still leak through other parts of your application.

⚡ Streaming Responses

PHI in streamed chunks may bypass redaction. Redact after the full response is assembled.

🔧 Tool/Function Calling

Function arguments may contain PHI. Redact tool inputs before passing to the LLM.

🔄 Retries & Error Handling

Stack traces can expose PHI in variables. Scrub exceptions before logging.

⏰ Background Jobs

Async workers may bypass the proxy. Use Redact Proxy in your worker code too.

💾 Prompt Caching

Cached prompts aren't re-redacted on retrieval. Cache only already-redacted prompts.

📊 App Logs & Analytics

Your logging framework and APM tools may capture request bodies. Review your full data flow.

Coverage

18 PHI Types Detected

Covers all HIPAA Safe Harbor identifiers plus clinical extensions.

👤

Names

📅

Dates

🔢

SSN

🏥

MRN

📞

Phone

📧

📍

Address

🏙️

City/State

📮

ZIP Code

🎂

Age (90+)

🏢

Facilities

👨‍⚕️

Providers

💳

Account #

🪪

License #

🚗

VIN/Plates

💊

DEA #

🌐

URLs/IPs

🔐

Biometrics

Detection Modes

Choose Your Speed/Accuracy Tradeoff

Three detection engines for different use cases.

Fast

~8ms per request

Patterns + Presidio. Best for structured EMR data with labels.

All 18 PHI types
97.3% precision
94.1% F1 score
CPU only, no GPU needed

Balanced

~400ms per request

spaCy NER + patterns. Good for mixed content types.

Named entity recognition
Context-aware detection
Requires spaCy model
Better name detection

Transformer

~800ms per request

Clinical NER model. Best for free-text narratives without structured labels.

Clinical language model
Higher recall on edge cases
GPU recommended
Continuously fine-tuned

# Configure detection mode
from redact_proxy import OpenAI

client = OpenAI(
    redact_mode="fast"      # or "balanced" or "accurate"
)

Supported Providers

Works With Your LLM

Same API you're already using. Just change the import.

🟢

OpenAI

from redact_proxy import OpenAI

client = OpenAI()
response = client.chat.completions.create(...)

🟤

Anthropic Claude

from redact_proxy import Anthropic

client = Anthropic()
response = client.messages.create(...)

🔵

Google Gemini

from redact_proxy import Gemini

client = Gemini()
response = client.generate_content(...)

Use Cases

Built for Healthcare AI

🩺

Clinical Documentation AI

Build AI scribes and documentation assistants that process patient encounters without sending PHI to cloud LLMs. Perfect for ambient listening apps.

🔬

Research & Analytics

Analyze clinical notes with GPT-4 or Claude without IRB concerns. Extract insights from medical records while maintaining patient privacy.

💬

Patient-Facing Chatbots

Build symptom checkers and health assistants. Patients can describe conditions freely knowing their information stays private.

📊

EHR Integration

Add AI features to your EHR or practice management system. Process clinical data with LLMs while staying HIPAA compliant.

Comparison

vs. Building It Yourself

Task	DIY Approach	Redact-Proxy
Setup time	Days to weeks	5 minutes
Code changes	Wrap every API call	Change 1 import
PHI patterns	Write your own regex	18 types included
Maintenance	Ongoing updates needed	pip upgrade
Testing	Build test suite	Validated on clinical data
Multi-provider	Implement per provider	OpenAI, Anthropic, Gemini

Quick Start

Get Running in 60 Seconds

Install the package

No additional dependencies beyond your existing LLM SDK.

pip install redact-proxy

Change your import

Just change where you import from. That's it.

# Change this:
from openai import OpenAI

# To this:
from redact_proxy import OpenAI

That's it - you're protected!

Your existing code now automatically redacts PHI before sending to any LLM.

# Your existing code works unchanged
client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "user", "content":
         "Patient John Smith, SSN 123-45-6789, has diabetes."}
    ]
)

# OpenAI receives: "Patient [NAME_a1b2c3], SSN [SSN_d4e5f6], has diabetes."
# PHI never leaves your machine

Pricing

DE-ID SDK is Free Forever

Run de-identification locally with no limits. Need to restore original PHI? Add RE-ID.

DE-ID SDK

Free forever

De-identification only

Unlimited local use
No API key required
18+ PHI types detected
Works 100% offline
Open source on GitHub
Community support

pip install redact-proxy

RE-ID SDK

$99/mo

DE-ID + Re-identification

Everything in DE-ID, plus:
Automatic re-identification
Unlimited RE-ID calls
Multi-turn support
Encrypted token storage
Priority email support

Get RE-ID SDK

Need Cloud Workspace, HIPAA Chat, or BAA? See full platform pricing →

FAQ

Frequently Asked Questions

Does this make me HIPAA compliant?

Redact-Proxy helps you avoid sending PHI to third-party LLMs, which is one aspect of HIPAA compliance. Full HIPAA compliance requires additional measures including BAAs, access controls, and security policies. Consult a compliance expert for your specific situation.

Is PHI ever sent to your servers?

No. All PHI detection and redaction happens locally in your Python environment. The open source version has zero network calls except to your chosen LLM provider. We never see your data.

What if it misses some PHI?

No de-identification system is 100% perfect. Our fast mode achieves 94.1% F1 on structured EMR data (97.3% precision, 91.1% recall). Transformer mode captures more edge cases in free-text narratives. For maximum safety, consider manual review for high-risk data.

Can I get the original PHI back?

The DE-ID only version permanently redacts PHI - there's no way to restore it. Need re-identification? Our DE-ID/RE-ID SDK automatically restores PHI in LLM responses using encrypted token storage.

Does it work with streaming?

Yes. Redact-Proxy supports streaming responses from all providers. PHI is redacted before the request, and streaming works normally for the response.

Can I use my own PHI patterns?

Yes. Pro users can add custom regex patterns for organization-specific identifiers like custom MRN formats or internal ID schemes.

One Import Change.Automatic PHI Redaction.