Cutting out the ‘false positive’ with Lexical Expression Qualifiers

By Guy Bunker

When it comes to traditional Data Loss Prevention (DLP) solutions, the ‘false positive’ is frequently the downfall. This is where an event is triggered by a policy in error. For example, a 16-digit number could be a credit card number, or it could be a reference number.  If one is mistaken for the other, then this gives rise to a false positive.

All DLP events need to be investigated, so the false positive has been a time consuming and painful thorn in the IT Departments daily operation for many years.  But, ‘ideas’ and solutions have evolved and today, it is possible to mitigate false positives using Lexical Expression Qualifiers (LEQ) which reduces the burden on already overstretched IT departments.

Data detection challenges

DLP systems are used to detect and prevent sensitive data contained within a network from being shared outside the network unauthorized.  The types of sensitive data that needs to be protected may vary depending on the industry in which an organization operates.  Some examples include credit card data, Bank Account numbers, Patient ID’s, Passport numbers, Employee Addresses or Customer Accounts and Contact data.  Obviously, DLP technology needs to be able to recognize numerical or alphabetical sequences depending on the type of data it’s being asked to detect.

For example, when it comes to recognizing a credit card number, there is a well-known method called a Luhn check to verify that the number seen is not a random 16 digit number, but an actual credit card number.  A Luhn check can be run in conjunction with a Bank Identification Number (BIN) check to further reduce the possibility of the technology mis-recognizing the number as a valid credit card, creating a ‘false positive’ and subsequently blocking an email communication from being delivered when it wasn’t necessary.

This is all fine, but what about numbers which don’t have Luhn or BIN checks? Numbers like Customer Accounts or Patient IDs or Passport Numbers which are generally 6-10 digits and may or may not have an alpha prefix.  Even with a prefix, the number can frequently be recognized by technology as something other than that which it is. With the advent of web applications and very long URLs, even these can be misinterpreted as valid credit card numbers, complete with a Luhn check! 

With traditional DLP technologies, false positives have been the bane of these solutions since their inception.  Once its detected data, the system will then block the communication until it can be reviewed by the IT department and then released.

Clearswift’s Adaptive Redaction functionality (available in all its core email and web solutions), mitigates the false positive issue by removing just the data which breaks policy and leaving the rest to continue to its destination without delay.  In most cases this works well and ensures secure and continuous collaboration. Furthermore, where the unmodified data might be required, the original message or file can be quickly reviewed and released.  However, there are situations where redacting the information still creates an issue due to the information actually being required and the review/release cycle taking too long. In these instances, there is another piece of functionality which can be used - the Lexical Expression Qualifiers (or LEQ) file. 

Leveraging Lexical Expression Qualifiers (LEQs) to mitigate false positives

LEQs can be used as a method to validate information found against an external data source, for example a system database storing sensitive data.  At the simplest level, this database could be holding customer or patient data, including ID numbers.  To prevent Patient IDs from being shared outside of the organization through company systems such as email, a DLP system would need to verify that, for example, a 10-digit Patient ID number within an email its detected matches a Patient ID number from the database. But of course, there is a possibility that the number that has been detected is a false positive. 

To prevent a false positive, the Patient ID number can be augmented with another value from the same database record, for example the patient Surname. So, if the Patient ID number is detected and the Surname is detected, then the chances are that the Patient ID number is indeed an ID number and not just another numerical figure. This additional LEQ checking can be extended to the Patient ID number, the First Name, the Surname and the Date-of-birth. The more information verified through LEQs, the more the system can be sure of an appropriate policy match.

LEQ Diagram

Setting up LEQs

The idea behind LEQs is great, but how does this actually work?  For system administrators the thought of an external system constantly sending queries to the database is not one which is acceptable for performance reasons.  And, the idea of duplicating the data within another system is out the question.  

The answer to this is to take a database abstract which contains the appropriate information for the DLP system to use.  The abstract can be taken as frequently as is required and at a time where the normal day-to-day business is not impacted. Typically, this occurs in the early hours each morning and is fully automated. The resulting extract is then transformed into a series of one-way encrypted values for each of the fields, aka hashes, before being securely transferred and imported by the security gateway. 

SEG Diagram

This method of encryption ensures that even if the LEQ file was to fall into unauthorized hands, there is no way that the original data can be recreated, thereby protecting the information completely. The email or web gateway can then use the information in the LEQ file without impacting the performance of the database or the business.

It takes a little time to set up, but once it is set up, it’s fully automated and the end of result is the mitigation of false positives, which results in a reduction in operational time fixing false positives and most importantly, enhances the protection of sensitive information.

With more and more sensitive information being transferred between an ever-increasing number of individuals for business, it is critical to put modern measures in place to keep data secure at all times. So, while traditional DLP policies can create issues which slow collaboration down, advanced features such as Adaptive Redaction and LEQ files are designed to mitigate the false positive while keeping information safe and the business running at full speed. 

At Clearswift, our customers can leverage LEQ’s from within our core email and web products along with a multitude of advanced threat prevention and data protection features.  Contact our team for discussion or demonstration of our technology today. 

More information:

Contact the Clearswift Team

Clearswift Email Security Products

Clearswift Web Security Products 

Adaptive Data Loss Prevention