PII Detection
Semantic Router provides built-in Personally Identifiable Information (PII) detection to protect sensitive data in user queries. The system uses fine-tuned BERT models to identify and handle various types of PII according to configurable policies.
Overview​
The PII detection system:
- Identifies common PII types in user queries
- Enforces model-specific PII policies
- Blocks or masks sensitive information based on configuration
- Filters model candidates based on PII compliance
- Logs policy violations for monitoring
Supported PII Types​
The system can detect the following PII types:
| PII Type | Description | Examples | 
|---|---|---|
| PERSON | Person names | "John Smith", "Mary Johnson" | 
| EMAIL_ADDRESS | Email addresses | "user@example.com" | 
| PHONE_NUMBER | Phone numbers | "+1-555-123-4567", "(555) 123-4567" | 
| US_SSN | US Social Security Numbers | "123-45-6789" | 
| STREET_ADDRESS | Physical addresses | "123 Main St, New York, NY" | 
| GPE | Geopolitical entities | Countries, states, cities | 
| ORGANIZATION | Organization names | "Microsoft", "OpenAI" | 
| CREDIT_CARD | Credit card numbers | "4111-1111-1111-1111" | 
| US_DRIVER_LICENSE | US Driver's License | "D123456789" | 
| IBAN_CODE | International Bank Account Number | "GB82 WEST 1234 5698 7654 32" | 
| IP_ADDRESS | IP addresses | "192.168.1.1", "2001:db8::1" | 
| DOMAIN_NAME | Domain/website names | "example.com", "google.com" | 
| DATE_TIME | Date/time information | "2024-01-15", "January 15th" | 
| AGE | Age information | "25 years old", "born in 1990" | 
| NRP | Nationality/Religious/Political groups | "American", "Christian", "Democrat" | 
| ZIP_CODE | ZIP/postal codes | "10001", "SW1A 1AA" | 
Configuration​
Basic PII Detection​
Enable PII detection in your configuration:
# config/config.yaml
classifier:
  pii_model:
    model_id: "models/pii_classifier_modernbert-base_model"
    threshold: 0.7                 # Detection sensitivity (0.0-1.0)
    use_cpu: true                  # Run on CPU
    pii_mapping_path: "config/pii_type_mapping.json"  # Path to PII type mapping
Model-Specific PII Policies​
Configure different PII policies for different models:
# vLLM endpoints configuration
vllm_endpoints:
  - name: secure-model
    address: "127.0.0.1"
    port: 8080
    models: ["secure-llm"]
  - name: general-model
    address: "127.0.0.1"
    port: 8081
    models: ["general-llm"]
# Model-specific configurations
model_config:
  secure-llm:
    pii_policy:
      allow_by_default: false      # Block all PII by default
      pii_types:                   # Only allow these specific types
        - "EMAIL_ADDRESS"
        - "GPE"
        - "ORGANIZATION"
  general-llm:
    pii_policy:
      allow_by_default: true       # Allow all PII by default
      pii_types: []                # Not used when allow_by_default is true
How PII Detection Works​
The PII detection system works as follows:
- Detection: The PII classifier model analyzes incoming text to identify PII types
- Policy Check: The system checks if the detected PII types are allowed for the target model
- Routing Decision: Models that don't allow the detected PII types are filtered out
- Logging: All PII detections and policy decisions are logged for monitoring
API Integration​
PII detection is automatically integrated into the routing process. When a request is made to the router, the system:
- Analyzes the input text for PII using the configured classifier
- Checks PII policies for candidate models
- Filters out models that don't allow the detected PII types
- Routes to an appropriate model that can handle the PII
Classification Endpoint​
You can also check PII detection directly using the classification API:
curl -X POST http://localhost:8080/api/v1/classify \
  -H "Content-Type: application/json" \
  -d '{
    "text": "My email is john.doe@example.com and I live in New York"
  }'
The response includes PII information along with category classification results.
Monitoring and Metrics​
The system exposes PII-related metrics:
# Prometheus metrics
pii_detections_total{type="EMAIL_ADDRESS"} 45
pii_detections_total{type="PERSON"} 23
pii_policy_violations_total{model="secure-model"} 12
pii_requests_blocked_total 8
pii_requests_masked_total 15
Best Practices​
1. Threshold Tuning​
- Start with threshold: 0.7for balanced accuracy
- Increase to 0.8-0.9for high-security environments
- Decrease to 0.5-0.6for broader detection
2. Policy Design​
- Use allow_by_default: falsefor sensitive models
- Explicitly list allowed PII types for clarity
- Consider different policies for different use cases
3. Action Selection​
- Use blockfor high-security scenarios
- Use maskwhen processing is still needed
- Use allowwith logging for audit requirements
4. Model Filtering​
- Configure PII policies to automatically filter model candidates
- Ensure at least one model can handle each PII scenario
- Test policy combinations thoroughly
Troubleshooting​
Common Issues​
High False Positives
- Lower the detection threshold
- Review training data for edge cases
- Consider custom model fine-tuning
Missed PII Detection
- Increase detection sensitivity
- Check if PII type is supported
- Verify model is properly loaded
Policy Conflicts
- Ensure at least one model allows detected PII types
- Check allow_by_defaultsettings
- Review pii_types_allowedlists
Debug Mode​
Enable detailed PII logging:
logging:
  level: debug
  pii_detection: true
This will log all PII detection decisions and policy evaluations.