Philter Release Notes

Philter finds and redacts over 30 types of sensitive information, including PII, PHI, and NPPI. Philter can be used across many domains, such as, but not limited to, healthcare, legal, financial, and news.

Philter Release Notes

This page contains the release notes for Philter showing what’s new, what’s changed, and any known outstanding issues.

The release notes on this page use the following notation:

  • “New” indicates a feature or capability that has been added to the version.
  • “Tweak” denotes a minor change to a feature or capability.
  • “Fix” describes a change to a feature or capability to rectify the expected and observed behaviors.
  • “Removed” describes a change where functionality was removed.

Version 2.5.0 – TBD

  • Added a condition to the date filter strategy to identify death dates. This works analogous to the birthdate condition. Additionally, a new condition comparator “is not” was added enabling a date filter condition such as “token is not deathdate” or “token is not birthdate”.

Version 2.4.0 – October 19, 2023

  • New: Filter Profiles are now policies. This name change was made to more accurately describe the role of a filter profile and to better align with industry naming standards. This change has also been made in the open source projects around Philter, such as Phileas and the SDKs. The lowercase “policy” can be substituted for “filterProfile” in all places of the API. Note that the /api/filter API endpoint took a parameter p that denoted the filter profile to use during filtering and the name of function of the p parameter has not changed – it just now stands for policy instead of profile.
  • New: Dates can be shifted by random intervals using the shiftRandom filter strategy.

Version 2.3.0 – September 16, 2023

  • New: Adding support for our 3rd generation NLP models.
  • New: Tracking numbers can now be individually enabled by carrier.
  • New: Added a new Persons filter (PersonsV2) that can use Apache OpenNLP NER models.
  • New: Identifier (regex) filters can now specify the regex group number that contains the sensitive information.
  • New: Added support for apartment numbers and suite numbers to the street address filter. With this change, you can identify any information that’s part of a regular expression instead of just the entire match.
  • Tweak: Improvements to PDF processing.
  • Tweak: Philter now uses Java 17.

Version 2.2.1 – June 30, 2023

  • New: A redacted PDF can now be returned as a collection of images or as a single PDF document.
  • Tweak: Improvements to PDF processing.

Version 2.2.0 – June 15, 2023

  • Tweak: Improvements to the NLP filtering.

Version 2.1.0 – December 1, 2022

  • New: Added format-preserving encryption (FPE) as a filter strategy.
  • New: Can read encryption values from environment variables.
  • Tweak: Improved detection capabilities of unstructured sensitive information.

Version 2.0.0 – February 6, 2022

  • New: Improved natural language processing capabilities cut average redaction times in half while increasing accuracy.
  • Removed: Client-side document splitting is no longer necessary so it was removed from filter profiles.
  • Removed: The philter-ner system service has been removed.

Version 1.12.1 – December 27, 2021

Version 1.12.0 – December 12, 2021

  • Important: Addressed the log4j vulnerability.
  • New: Filter profiles can now include bounding boxes for redacting sections of PDF documents with known rectangular dimensions (such as a form field where the text is PII and its location is constant).
  • New: Filter profiles can now be combined per API request to allow for creating “building blocks” of reusable filter profiles.
  • New: The zip code filter can now make a delimiter in a 9 digit zip code be optional (e.g. 12345-6789 vs. 123456789).
  • New: Filter profiles can reference environment variable values.
  • New: Added support for more date formats.
  • New: Added support for more age formats.
  • New: Added support for more currency formats.
  • Tweak: Improved currency identification.
  • Removed: API and functionality for storing sensitive information in Elasticsearch.

Version 1.11.0 – May 31, 2021

  • New: Added support for redacting bank routing (ABA routing transit) numbers.
  • New: Added federal bankruptcy forms to automatic document identification. (Filters can be applied selectively to detected document types.)

Version 1.10.1 – March 28, 2021

  • New: Added support for two-way SSL for the Philter UI.
  • New: Prometheus metrics are enabled by default.

Version 1.10.0 – March 21, 2021

  • New: Added initial user interface dashboard for Philter that allows for testing filtering capabilities and managing filter profiles. The dashboard is available at http://<philter-IP>:9000.
  • New: Standardized public cloud deployment images on Ubuntu 20.04 LTS.
  • New: Confidence thresholds for named entities can be set on the NerFilterStrategy to control which entities are filtered based on their confidence score.
  • New: Post filters can be enabled and disabled per filter profile.
  • New: Added domain to filter profile to enable common industry base configurations.
  • Tweak: Phone number confidence values are dynamic based on the format of the phone number.
  • Tweak: Cache performance improvements.
  • Fix: Fixed issue identifying dates in the formats MMMM yyyy (e.g. March 2015) and MMMM yy (i.e. March 15).

Version 1.9.0 – January 21, 2021

  • New: Added support for filtering PDF documents.
  • New: Added support for two-way SSL/TLS authentication and removed API token authentication.
  • New: Added support for identifying dates in format M-uuuu.
  • New: Added filter strategy to shift dates by days, months, and/or years.
  • New: Added filter strategy for keeping last 4 characters of credit cards, IBAN codes, identifiers, SSNs/TINs, tracking numbers, and VINs.
  • New: Added filter strategy to abbreviate person’s names.
  • New: Added filter for physician names.
  • New: Added filter for U.S. street addresses.
  • Tweak: Added support for different age formats.
  • Tweak: Added -1 (negative one) as a valid value for the split threshold that when set disables splitting.
  • Tweak: Modified text processing should provide improved accuracy of person’s names detection.
  • Tweak: Increased default timeout for the NER filter to 600 seconds to easier accommodate large documents.
  • Fix: Fixed an issue with custom dictionary filters where the terms may be read and loaded incorrectly.
  • Fix: Fixed an issue where the surname filter may not produce any results.

Version 1.8.0 – November 5, 2020

  • New: Adding metric capture for execution time of each filter. These timing metrics can help assist with improving performance for filters.
  • New: Added ability to read lists of ignored terms from files for each filter type.

Version 1.7.0 – October 4, 2020

  • New: Added metrics reporting for Prometheus.
  • New: Added an experimental feature to accommodate requests to filter long text. We are looking for feedback on this initial functionality.  The new feature, when enabled, will split text into pieces based on new lines. The pieces are then processed and reassembled prior to being returned. Because of the splitting, the reassembled text will likely not be an exact match of the input text due to white space differences. If maintaining the format of the input text through the filtering process is important to you then the best course of action is to handle the splitting client side so you have control over it.
  • New: Terms can now be ignored based on regular expression patterns. Previously Philter had the ability to ignore specified terms but the terms had to match exactly. Now you can specify terms to ignore via regular expression patterns. An example use of this new feature is to ignore non-sensitive information that can change such as timestamps in log messages.
  • New: Added ability to read ignored terms from files outside of the filter profile.
  • New: Custom dictionary terms can now be phrases or multi-term keywords.
  • New: Added “classification” condition to Identifier filter to allow for writing conditionals against the classification value.
  • New: Added configurable timeout values to allow for modifying timeouts of internal Philter communication. This can help when processing larger amounts of text. See the Settings for more information.
  • New: Added option to IBAN Code filter to allow spaces in the IBAN codes.
  • New: Ignore lists for individual filters are no longer case-sensitive. (“John” will be ignored for “JOHN.”)
  • Fix: Fixed IBAN Code validation to fix issue where sometimes an invalid IBAN Code would validate.
  • Fix: Changes to improve performance when handling long input text.
  • Tweak: Updated base AWS AMI.
  • Tweak: Updated base Docker container to UBI 8.2.
  • Tweak: Reduced AWS EBS image size to 8 GB.

Version 1.6.1.3 – August 31, 2020

This version (1.6.1.3) of Philter will only be available on the Google Cloud Marketplace.

  • Updated the base image.

Version 1.6.1.2 – August 24, 2020

This version (1.6.1.2) of Philter will only be available on the AWS Marketplace.

  • Updated the base image.

Version 1.6.1.1 – July 29, 2020

This version (1.6.1.1) of Philter will only be available on the Google Cloud Marketplace.

  • Updated the base image.

Version 1.6.1 – July 8, 2020

This version (1.6.1) of Philter will only be available on the Microsoft Azure Marketplace.

  • Updated the Microsoft Azure base OS from CentOS 7.7 to CentOS 8.2.

Version 1.6.0 – June 9, 2020

Version 1.6.0 brings many new features and enhancements. Some of these changes may impact your existing filter profiles. Please contact us for assistance if you encounter any difficulties adapting your filter profile for 1.6.0.

  • New: Added ability to generate alerts when a filter strategy condition is met. Generated alerts are available through Philter’s API. Use alerts to trigger when certain sensitive information, such as a name, is identified.
  • New: Added a “span disambiguation” feature that disambiguates identified sensitive information for identical text. For example, if the text “123456789” is identified both as a phone number and an SSN, the span disambiguation will determine whether “123456789” more closely resembles a phone number or an SSN based on previously filtered text. The feature is optional and is disabled by default. Learn more about it in the User’s Guide.
  • New: Philter configuration properties can be set through environment variables.
  • New: Added Bitcoin address filter.
  • New: Added IBAN code filter.
  • New: Added tracking number filter for FedEx, UPS, and USPS.
  • New: Added a new replacement strategy to replace sensitive information with its SHA-256 hash value.
  • New: Added support for connecting to a Redis cache with a self-signed SSL certificate.
  • New: Added filter condition for “classification.” A classification can be an entity type such as “PER” for person, or it can be a passport country such as “US”, or a state for a driver’s license such as “CA”. A classification is to give a more granular description of some sensitive information.
  • New: Added “fuzzy” property to custom dictionary filter. When set to “true”, the filter will allow for searching for loose matching. When set to false, terms must appear as listed in the dictionary. Most cases will see a significant performance improvement when setting to “fuzzy” to false when not needed.
  • New: Added “files” property to custom dictionary filter. Now you can list custom dictionary terms in a file outside of the filter profile. This property is a list meaning you can specify multiple files. You must provide the full path to the file on the local file system.
  • Tweak: Changed “type” to “classification” for filters in a filter profile.
  • Fix: Fixed issue where MAC address filter strategies may not be loaded correctly.
  • Fix: Fixed issue where custom dictionary terms that are set to be ignored may not be ignored correctly.
  • Fix: Fixed issue where valid credit card numbers may be determined to be invalid. (Only affects when the credit card filter has verification enabled.)

Version 1.5.2 – May 20, 2020

  • There were no feature changes for this version and no need to upgrade from 1.5.0.
  • The changes were to allow Philter to be available as Docker containers.

Version 1.5.1 – May 8, 2020

  • There were no feature changes for this version and no need to upgrade from 1.5.0.
  • The changes were to allow Philter to run on RHEL8 on AWS.

Version 1.5.0 – May 1, 2020

  • New: Added new filter called “Section” to identify text between two markers.
  • New: Added ability to use custom NLP models.
  • New: Added ability to store filter profiles in an Amazon S3 bucket. This allows multiple instances of Philter to use the same filter profiles.
  • New: Added CloudFormation template and Terraform scripts to philter-infrastructure-as-code repository.
  • Tweak: Consolidated caches into a single cache.
  • Tweak: Model file can now be specified in the application properties.
  • Tweak: An error is generated at startup if API authentication is enabled but no API token is set.

Version 1.4.0 – April 10, 2020

  • New: Added optional basic authentication.
  • New: Added token condition to NerFilterStrategy. Can now write a condition on the token itself.
  • New: Added confidence condition to each type of filter strategy.
  • Tweak: Ignored spans are now dropped prior to overlapping spans.
  • Tweak: Docker container now uses Java 11.
  • Fix: Fixed potential issue with filtering state abbreviations.

Version 1.3.1 – February 20, 2020

  • New: Added CRYPTO_REPLACE redaction option to encrypt sensitive values.
  • New: Added %v redaction variable to be substituted for the original value of the sensitive text. With %v you can now annotate sensitive information instead of masking or removing it.
  • New: Added filter condition based on the context. You can now make a filter condition be dependent on the value of the context.
  • New: Added filter for network MAC addresses.
  • New: Added support for TINs (Tax Identification Numbers) to the SSN filter.
  • New: Now requires Java 11.
  • New: Client can set document ID per filter request instead of document ID always being auto-generated per request. This allows for splitting documents between multiple requests to increase throughput.
  • New: Philter Enterprise Edition is now certified for Red Hat Enterprise Linux 8.
  • Tweak: GCP image is now built on CentOS 8.
  • Tweak: Credit card filter now supports credit card numbers containing dashes and spaces.

Version 1.3.0 – January 28, 2020

This release focuses mainly on improving performance and error handling. No new functionality was added.

  • New: Now supports identifying URLs that use an IP address instead of a domain name.
  • New: Added option to URL filtering to require an URL to begin with http, https, or www.
  • Tweak: Removed trailing spaces from filtered values when they exist.
  • Tweak: Improving performance on API requests.
  • Tweak: Improving performance for larger documents.
  • Tweak: Changing format of generated document ID to be more random.
  • Tweak: Improved error handling if an API request to filter is not successful.
  • Tweak: Improved handling of just month names.
  • Tweak: When no filter strategies are specified, the default action will be to redact.

Version 1.2.0 – January 16, 2020

  • New: Added ignore lists specific to each filter to list items that should never be removed. Each filter can have its own ignore list.
  • New: Added support for encrypted connections to Redis.
  • New: Added enabled property to individual filters in a filter profile. Filters having enabled=false will not be executed.
  • New: Added option to filter profile credit cards to also include invalid credit card numbers. (Credit card numbers that match the pattern but are not valid per the card’s number algorithm.)
  • New: Added option to filter profile to require dates be valid dates. (The date February 30 is not a valid date and would be excluded when enabled.)
  • New: Added option to filter profile for NER to remove punctuation prior to processing.
  • Fix: Fixed issue where conditionals may not be applied to NER entities.
  • Tweak: Added Philter version to status API response.

Version 1.1.0 – December 15, 2019

  • New: Store changed from MongoDB to Elasticsearch for improved querying capabilities.
  • New: Added “auto” setting for distance to automatically calculate appropriate distance (fuzziness) of identified text.
  • New: Added ignore lists to filter profiles to support having a list of terms that are always not filtered.
  • New: Added support for using custom dictionaries in filter profiles. (Can now specify your own list of terms to be filtered.)
  • New: Added an explanation endpoint that describes how the identified PII/PHI was detected and filtered.
  • New: Added metrics per individual filter type.
  • New: Added “prefix” property for metrics to allow for improved metric organization.
  • New: Applying filter sensitivity level to NER entities.
  • New: Added API for managing filter profiles.
  • Fix: Fixed filter profile issue where appropriate filtering strategy may not be applied.

Version 1.0.1 – October 19, 2019

  • Tweak: Changed API HTTP response message when Philter is initializing.
  • Tweak: API endpoint /api/replacements returns HTTP 503 Service Unavailable when the replacement store is not enabled.
  • Improvement: Updated how identified spans are located.

Version 1.0.0 – October 7, 2019

  • Initial public release.
  • Known issue: Philter’s API /api/filter endpoint will return HTTP 500 if Philter has not finished initializing. This will be made more user-friendly in a later version. As a workaround, use the /api/status endpoint to determine if Philter has finished initializing prior to calling /api/filter.