View Presidio Details

Duplo Presidio

The Duplo Presidio Service is a wrapper built on top of Microsoft Presidio that enhances its ability to detect and protect sensitive data by supporting custom recognizers and anonymizers.

This ensures that secrets, tokens, passwords, and credentials are automatically identified and anonymized before data leaves your Duplo-managed Kubernetes cluster. By disabling irrelevant recognizers and introducing project-specific patterns, we reduce noise and improve accuracy.

📂 Structure of the ConfigMap

Presidio is configured through a YAML-based ConfigMap. This config controls global flags, disabled built-in recognizers, and custom recognizers that we add.

Example (simplified from our deployment):

config: |
  global_regex_flags: 26
  supported_languages:
    - en

  # Disable unnecessary built-in recognizers
  disabled_recognizers:
    - DateRecognizer
    - UrlRecognizer
    - DomainRecognizer
    - EmailRecognizer
    - PersonRecognizer
    - LocationRecognizer

  # Example custom recognizers
  recognizers:
    - name: GitHubTokenRecognizer
      type: custom
      supported_entity: GITHUB_TOKEN
      patterns:
        - name: github_token_pattern
          regex: \b(ghp_[0-9a-zA-Z]{36})\b
          score: 0.95

    - name: PrivateKeyRecognizer
      type: custom
      supported_entity: PRIVATE_KEY
      patterns:
        - name: rsa_private_key_pattern
          regex: -----BEGIN\s+(?:RSA\s+)?PRIVATE\s+KEY-----[\s\S]*?-----END\s+(?:RSA\s+)?PRIVATE\s+KEY-----
          score: 0.99

🔎 Built-in Recognizers

Presidio ships with a rich set of built-in recognizers for detecting common forms of sensitive data (PII). These recognizers are enabled by default and cover global as well as region-specific entities.

👉 See the full list of built-in recognizers here: Presidio Supported Entities

Examples of Built-in Recognizers

  • EmailRecognizer → detects email addresses

  • PhoneRecognizer → phone numbers

  • UrlRecognizer → web URLs

  • IpRecognizer → IPv4/IPv6 addresses

  • CreditCardRecognizer → credit card numbers

  • UsSsnRecognizer → U.S. Social Security Numbers

  • PersonRecognizer → human names

  • IbanRecognizer → IBAN bank account numbers

✂️ Built-in Anonymizers

While recognizers detect sensitive data, anonymizers transform or redact that data to protect it. Presidio comes with several built-in anonymizers that can be applied when sanitizing text.

Common Anonymizers

  • replace → Replaces detected entity with a placeholder (e.g., <ENTITY_TYPE>)

  • mask → Masks characters with a symbol (e.g., ****1234)

  • redact → Removes the detected entity entirely

  • hash → Hashes the entity value (SHA256 by default)

  • encrypt / decrypt → Encrypts or decrypts values using AES

👉 Full list of built-in anonymizers: Presidio Anonymizers Documentation

Last updated

Was this helpful?