Debunking Lexical Expressions

Debunking Cybersecurity Jargon Part Five – What Are Lexical Expression Qualifiers?

Although in broad terms Clearswift is undoubtedly a technology firm, we always try and use business language rather than technical talk when discussing cybersecurity and how we can help organizations. Over the last few months, we’ve published a series of blog posts that explain certain cybersecurity terms – we’ve already looked at Adaptive Redaction, Deep Content Inspection, Information Governance Server, and Optical Character Recognition, and now turn our attention to lexical expression qualifiers (LEQs). Put simply, LEQs help us improve detection rates when performing keyword scans using lexical expressions on data.

How False Positives Impact DLP

Locating the right data and assessing the potential threat within it is one of the principal tenets of cybersecurity. The type of data that needs protecting will vary between different organizations, but typically it will include bank account numbers, passport numbers, addresses, and other personally identifiable information (PII).

Data Loss Prevention (DLP) solutions have emerged to find and prevent sensitive or confidential data from unauthorized sharing inside and outside the network. DLP works by recognizing numerical or alphabetical sequences depending on the type of identifier that it is being asked to detect. Some of these identifiers have a basic format, take an Arkansas driving license number for example, which is a number between 4 and 9 digits (1000-999,999,999) – a very wide spread of numbers considering the state has a population of just over 3m. If you were trying to detect Arkansas driving license numbers, then any number between 1000-999,999,999 would be a match and a potential false positive.

Once this data has been ‘detected’, then the communication might be automatically blocked until IT can review it. Any event triggered with a DLP solution needs to be investigated, so false positives can be highly time-consuming and costly to an organization, especially if they occur in significant numbers.

The first stage to reduce the number of false positives detected is to refine the query to look for more than one item that determines its true nature. For Arkansas driving license numbers this could include looking for an accompanying zip code (71601 to 72959). In this case we would add a regular expression (7[1-1][0-9][0-9][0-9]) to the policy rule.

Expressions to qualify data

The next stage is to qualify the data even further.

Introducing Lexical Expressions and Lexical Expression Qualifiers

Because of the additional burden on security and IT teams, technology has evolved to help mitigate false positives.

Built into all core Clearswift email and web products are pre-configured, standard lexical expressions that match general lexical patterns such credit card or passport numbers for example. When it comes to other specific values that need detecting, then LEQs are used as a method to validate ‘true’ information found against an external data source such as a database.

If we look at the healthcare industry as an example, patient data is often shared between doctors and hospitals to facilitate the care being provided. Here, the transmission of the data should always be encrypted, and in the case of North America, the HIPAA act mandates the use of encryption for transferring of patient data. To ensure the right data is encrypted, we can inspect the data for attributes that will identify a patient’s PII.

Ideally we would look for something unique like a patient record number, but if that happens to be a 10-digit number, then telephone numbers or part numbers might also generate false positives hence the need to further qualify the data we are looking for.

To do this, we can import a snapshot of the patients’ details that are serviced by the district. This LEQ file is then indexed and hashed for security.

LEQ process

When keyword search routines look for a patient record number in the data, we can use the LEQ to confirm whether the 10-digit number detected is actually valid. A policy rule can be configured to require three or more matches with the additional information from the LEQ file, such as the patient’s name, zip code and social security number, before permitting the data to be encrypted and sent securely.

Looking up patient data for qualification

The more information that can be verified through LEQs, the more the system can be sure of a policy match and automatically apply the appropriate action, reducing the amount of manual intervention required.

Minimizing False Positives

We aim with all our products and solutions to not only make an organization’s defense against data loss water-tight, but also to ensure communications and collaboration continue uninterrupted. The use of LEQs continues this tradition, ensuring the data seen really is the data being searched for, and as a result, we reduce the number of false positives, free up valuable IT resources and ultimately keep data safe and compliant.

Ask us for a demo

Related Resources

Blog: Cutting Out the False Positives with LEQs