Philter Case Studies

Philter finds and redacts over 30 types of sensitive information, including PII, PHI, and NPPI. Philter can be used across many domains, such as, but not limited to, healthcare, legal, financial, and news.

Philter Case Studies

Philter has a wide range of use-cases because sensitive information and the need to find and remove it is pervasive across many industries. On this page we describe case studies where Philter was applied. We describe the problems at hand, how the problem was managed, and the lessons learned from each engagement.

Our Proven Approach

Common to all case studies is our engagement approach. Our process is to have an an exploratory virtual meeting with you to first determine if Philter is a viable option. If we believe Philter can address your problem at hand we will request sample data to demonstrate Philter’s effectiveness your data. Lastly, we arrange to share the results to help you make an informed decision.

Everyone’s Text and Data is Different

We approach Philter engagements in this manner because everyone’s text and data is different. The statistics for Philter for Company A may not translate to Company B because of likely disparities in the data between the two companies. Two companies in the same domain commonly have widely different text and text for companies across domains vary even more. This is why we ask for a small set of sample data in our exploratory engagements. We want to provide insights into Philter’s performance specific to each potential customer so that customer has the knowledge they need to make an informed decision.


Case Study – Filtering PHI in Patient Text for a Healthcare IT Solutions Provider

Background

We were contacted by a healthcare IT vendor of software used across many medical fields. The data being processed was plain text patient data that had been previously extracted from scanned PDF and image documents. The plain text data contained narratives and other free form (unstructured) information. The text was being received by medical providers sent to the IT vendor for processing by their healthcare applications. Now, the IT vendor needed a way to efficiently remove the sensitive information from the text to make the text available for secondary purposes.

Engagement

We engaged virtually with the IT vendor for an exploratory meeting to learn more about their problem. After discussing their current challenge and providing a brief demonstration of Philter on sample text, we and the IT vendor and agreed that Philter is potentially a good fit to solve their problem. We requested a set of sample data from the IT vendor to process through Philter. The results of this processing would provide the IT vendor with detailed statistics to give a clear view of how Philter will perform on their text.

The IT vendor’s requirements for Philter were:

  • Incorporating Philter into their pipeline should not introduce significant timing delays when processing text.
  • 95% of all sensitive information (patient names, ages, ID numbers, and dates) should be removed during the processing.

We executed a Business Associate Addendum with the IT vendor to facilitate the secure sharing of protected health information under HIPAA. We take all precautions and ensure sensitive information shared with us is encrypted at all times, only used for the stated purpose, is only accessible by the team members that need access to it, and is securely deleted when no longer needed.

After receiving the shared file, our team manually annotated the desired sensitive information in the files. By creating a “gold standard” we are able to determine Philter’s success rate at identifying that information and redacting it. This process took less than a week. (The process may take more or less time depending

Evaluation and Results

Our team processed the sample text through Philter and captured metrics on the efficiency of Philter’s processing to determine if Philter can meet the IT vendor’s requirements listed above. We configured Philter to redact the desired sensitive information (patient names, ages, ID numbers, and dates). Those types of information, when found, would be replaced by asterisks.

After processing the shared documents and analyzing the metrics, Philter did meet the requirements by requiring only milliseconds to process each document and by successfully identifying and redacting 98% of the sensitive information in the documents. These metrics were found to be acceptable by the IT vendor and we proceeded into the step of integrating Philter with the IT vendor’s text processing pipeline.

Integration

The IT vendor was processing information in a streaming pipeline in which data was being consumed from Apache Kafka and processed by downstream applications. We created and shared design documents and diagrams showing how Philter can work with Apache Kafka. Per our documentation, the IT vendor inserted Philter in their text processing pipeline by utilizing Philter’s API to process the documents as they were consumed from Apache Kafka. The result is that the downstream applications could now process the redacted text instead of the original text containing the sensitive information. This allowed the IT vendor to utilize the documents for their secondary purposes.

The IT vendor’s applications were all hosted in Amazon Web Services and the IT vendor chose to deploy Philter via the AWS Marketplace because this removes the need for payment information to be shared and it also puts the IT vendor in charge of the spending on Philter since billing is per running hour. This streamlined the procurement and deployment process of Philter in the IT vendor’s environment.

Summary

We were contacted by an IT vendor of healthcare application software with a need to redact sensitive information from text. We engaged with the IT vendor through our standard engagement process by starting off with an exploratory meeting. In the meeting we learned more about the IT vendor’s challenge and were able to mutually conclude that Philter is a potential good fit for addressing the problem.

The IT vendor provided us sample text to process so we could capture metrics about Philter’s performance. We processed the text and provided the metrics back to the IT vendor. The IT vendor chose to integrate Philter into their text processing pipeline. We assisted them with the process by creating design documentation to help guide their integration efforts. Text and documents processed through the IT vendor’s pipeline could now be used for secondary purposes.


Case Study – PII Filtering of Bankruptcy Documents for a Legal Firm

Background

We were contacted by a legal firm that handles federal bankruptcy filings. The legal firm generates documents that may be used in court filings in which the submitted documents are public and must be free of personally identifiable information under Rule 9037. Privacy Protection For Filings Made with the Court. The legal firm reached out to us to see if Philter could be a fit for their text redaction needs.

Engagement

We initiated our standard engagement process by starting with an exploratory virtual meeting to learn more about the legal firm’s needs. The legal firm’s representatives described to us that they needed to remove personally identifiable information (PII) from Microsoft Word documents. The desired PII and how to redact it:

  • SSN and TIN numbers – Redact to the last 4 digits.
  • Birthdates – Redact to just the 4 digit year.
  • Person’s names – Redact minor’s names to just initials.
  • Financial account numbers – Redact to the last 4 digits.

Since these are all capabilities of Philter and we believed Philter to be able to meet their redaction needs, we requested sample documents from the firm.

Evaluation and Results

We manually annotated the PII in the received documents, enabled Philter’s filtering of SSN/TIN numbers, configured the Identifier for account numbers, configured the NER filter to find person’s names and replace them with initials,  and filtered the documents with Philter.  We captured metrics and statistics during the filtering and we prepared the metrics and statistics for presentation back to the legal firm. Philter identified 100% of all SSNs, TINs, financial account numbers, and birthdates in the sample documents. The names of juveniles were identified through both a dictionary filter and an NER filter.

The legal firm was satisfied with the results and chose to immediately begin the process of deploying Philter into their document workflow.

Integration

The legal firm had a small outsourced IT team who managed the firm’s systems. The firm was in the process of migrating to the cloud but currently was mostly working with on-premises components. Since they wanted to move the the cloud, specifically AWS, we created the AWS resources to support the Philter deployment so it could be used from both on-premises and in the cloud software. Our deployment ensured encryption of data was used at all times. This also helped them to kick start their migration to the cloud. We also configured Philter based on our initial discussions with the legal firm.

Once Philter was deployed, we worked with the IT team to enable processing documents with Philter. The documents were stored in multiple folders on a Windows shared drive. The IT team wanted it such that when an employee of the legal firm  saved a document to the shared drive Philter would automatically process the document. This integration was implemented using the Philter Toolbox, a Windows application that facilitates watching a Windows folder for new documents and automatically processing them. This kept the filtering process transparent the accounting firm’s employees and didn’t place any new burden on them.

Summary

We were contacted by an legal firm who needed a way to automatically filter certain types of PII from their documents to make the documents appropriate for filing in court. We engaged with the firm through our standard engagement process by starting off with an exploratory meeting. In the meeting we learned more about the firm’s challenge.

The legal firm provides us sample text so we could evaluate Philter’s performance and provide the captured metrics and statistics to the legal firm. Based on the successful results, the legal firm chose to proceed with deploying Philter. We assisted the firm’s IT team with creating a secure AWS cloud environment and deploying Philter into the cloud. We configured Philter to meet the firm’s needs and then configured automatic document filtering for their Windows shared folders.