What is Philter?

Philter finds and redacts over 30 types of sensitive information, including PII, PHI, and NPPI. Philter can be used across many domains, such as, but not limited to, healthcare, legal, financial, and news.

Philter FAQ

Frequently asked questions about Philter. For any questions not answered here please refer to our Support.

What is Philter?
Philter is an application that redacts protected health information (PHI), PII (personally identifiable information), NPPI (non-public personal information), and other types sensitive information from text. Philter can process plain text, PDF documents, and Microsoft Word and Microsoft Excel files through its Microsoft Word and Excel add-ins.

Philter runs in your private cloud so your sensitive data never has to traverse the public internet. Use Philter’s API to process text from virtually any system or process. Philter was designed from the ground up to be a key component of an effective data loss prevention strategy and applicable in a wide range of industries and use-cases.

If you are wanting to redact PHI, PII, and other sensitive information from Apache Kafka data streams check out Phirestream.

Does Philter use ChatGPT or other third-party APIs?
No. Philter never transmits your text or documents to any third-party service.

Philter can run in a firewalled (or even air-gapped) environment. For example, if you are using AWS, you can deploy Philter to a private subnet and use security groups and network ACLs to prevent any outbound traffic from the Philter instance and its subnet. In fact, we recommend doing so to increase your overall security posture.

Is Philter open source?
Philter is built upon an open source project called Phileas, an open source project for finding and redacting PII and PHI in text and documents. Philter is a wrapper around Phileas to make it more user-friendly, provide an HTTP (REST) interface, and to provide NLP models. All other capabilities of Philter are powered by Phileas.

Everyone is welcome to check out the Phileas code to learn more about how Philter works, to submit an issue when one is found, and to contribute via pull requests. Phileas is licensed under the Apache License, version 2.

What types of PII, PHI and other sensitive information can Philter redact?
Philter can redact many types of PII, PHI, and other sensitive information. We are constantly adding new types of information and new versions of each type. For example, a person’s age may be written in many ways and we work to add new ways as we discover them. If you wish to discuss these types of information in depth please contact us.

Some of the types of PII, PHI, and sensitive information identified by Philter are listed below:

  • Ages
  • Bitcoin Addresses
  • US Cities
  • US Counties
  • Credit Card Numbers
  • Custom Dictionaries (define your own information)
  • Custom Identifiers (can be used to define custom medical record numbers, financial transaction numbers)
  • Dates
  • US Drivers License Numbers
  • Email Addresses
  • Hospital Names
  • IBAN Codes
  • IP Addresses (IPv4 and IPv6)
  • MAC Addresses
  • Passport Numbers
  • Persons’ Names (supports fuzzy matching, first name, last name, and whole name)
  • Phone/Fax Numbers
  • Physician Names
  • SSNs and TINs
  • Shipping Tracking Numbers
  • US States
  • URLs
  • VINs
  • US Zip Codes

How does Philter know what types of PII and PHI to redact?
You create policies that tell Philter what types of PII and PHI to find and redact. A policy lists the types of sensitive information (phone numbers, names, etc.), when to remove them, and how to remove them. Policies are detailed in Philter’s User’s Guide. You can have as many policies as you need and you can select which policy to apply when redacting text.

How does Philter identify sensitive information?
Philter uses a variety of methods to identify sensitive information, including specially trained machine learning models.

A Philter lens is a specially trained machine learning model created to identify sensitive information. Using a Lens that matches your type of text can provide increased accuracy and performance. Philter includes a “General Purpose” lens that is trained for many types of sensitive information in a variety of documents. If you are only redacting PHI and PII in healthcare documents, you may benefit from using one of Philter’s Healthcare or COVID-19 lenses. Please contact us for more information about Philter lenses and how to use them.

How is Philter deployed?
Philter can be deployed as a container or in your cloud in just a few minutes. See below for links to Philter on the cloud marketplaces. For container-based deployments, please contact us.

How do I send text to Philter for redaction?
There are a few ways. The first is by using the API directly. Philter’s HTTP-based API accepts text to process and returns the processed text. Philter’s API allows it to be integrated into many types of systems and processes. See the API in Philter’s User Guide for more information, but here’s an example to send a text file to Philter for processing:

curl -k -X POST “https://localhost:8080/api/filter?c=context” -d @file.txt -H Content-Type “text/plain”

A second way is by using the Philter CLI. You can also use the Philter CLI. This small application provides convenient access to Philter’s API.

You can use our open source SDKs for Java, .NET, and Go.

How do I use Philter from my application?
You can use our open source SDK clients.

What are Philter’s accuracy, precision, and recall metrics?
The precision and recall metrics depend greatly on your data. Each user’s data is different so comparing these metrics across users would be apples and oranges. So, us making a claim like “Philter’s F1 score is 99%” is meaningless if your data does not exactly match our test data (and we know it doesn’t!). If another vendor tells you their accuracy without even seeing your documents, be very, very cautious!

Instead, we will gladly accept some representative text and spend a few days to gather those metrics specific to your data. We will provide you with the collected metrics along with the redacted text. This will provide you an accurate overview of how Philter performed on your text.

Is Philter guaranteed to find 100% of all sensitive information in my text?
Philter uses state of the art natural language processing (NLP) technology to identify sensitive information in text. These NLP methods use trained models created from a large corpus of text. The process of applying the model to text is non-deterministic. There are many factors that could affect the identification of sensitive information in your text such as how similar your text is to the corpus that was used to train the model, how the text is formatted, and the length of the text. For these reasons, it is important that you assess Philter’s performance on your data prior to utilization in a production system.

The confidence value in the filter strategy condition can be used to tune the NLP engine’s detection. Each identified entity has an associated confidence score between 0 and 100 indicating the model’s estimate that the text is actually an entity, with 0 being the lowest confidence and 100 being the highest confidence. The confidence value in the filter strategy allows you to filter out entities based on the confidence. For example, the condition confidence > 75 means that entities having less than a 75 confidence value will be ignored and entities having a confidence value greater than 75 will be filtered from the text.

How does Philter compare with similar applications and services?
Comparing Philter with other applications and services such as Amazon Comprehend and Google Data Loss Prevention (DLP) API is difficult because Philter is designed differently. Philter goes beyond simple identification of values in text. Philter includes additional features such as support for disambiguation, ignore lists, and value replacement and anonymization. These are features that may be possible with the other applications and services but would require you to implement them yourself on top of the other products.

Philter is not a software-as-a-service where its API is managed by us and consumed by you. Instead, Philter is an application that you deploy into your environment and interact with its API through connections on your network. With this design your text never has to leave your network. We believe this to be more secure than using a third-party API product where your text may traverse many networks during processing.

Philter also differentiates itself from other services in its flexibility. With Philter you can use your own models if you choose to do so and you have full control over the filtering process to tailor it to your specific needs.

What platforms are supported by Philter?
Philter supports several platforms and which platform is used may be determined by your choice of cloud provider. For other platforms please contact us.

  • AWS Marketplace
  • Azure Marketplace
  • Google Cloud Marketplace

See Philter’s home page for more details.

What is Philter’s license agreement?
You can view the Philter License Agreement.