Data Loss Prevention in Images

Next Generation Cyber Threats: Images

Cybersecurity is probably the most rapidly changing area of IT, and cyber-criminals are becoming increasingly more sophisticated in their bid to breach an organization's security and steal their crown jewels - their information. Traditional Data Loss Prevention (DLP) technology provides protection against the traditional threat of someone trying to send a file to an unauthorized individual, but it required a step change to enable Adaptive Data Loss Prevention with Deep Content Inspection (DCI) to address threats such as ransomware that is delivered embedded in innocuous-looking documents.

Clearswift delivered our first version of Adaptive Redaction in 2013 and have continuously improved the technology in every release since then. It was developed to modify the content of files on-the-fly to ensure continuous collaboration is achieved without the danger of critical information being shared with unauthorized individuals, or malicious content being received. However, new threats are emerging which need to be addressed, and image-based threats are at the forefront.

We often don’t give images a second thought. We download them every day and use them in presentations and documents all the time.  But in today’s world of digital collaboration, what sorts of risks can they pose and how can those threats be mitigated?  

These days the multi-function printer inside most organizations enables remote printing, standard photocopying and scanning to send as an attachment in an email. This last feature is the one which creates one of the latest risks currently being exploited. When the device scans the document, it typically creates a PDF – but each page in the document is actually an image. These images are not picked up and analyzed by most security or traditional DLP solutions, meaning those PDFs become a data loss risk. Any sensitive or confidential document can be readily copied into a PDF and sent out of the organization without being detected.

Optical Character Recognition (OCR) is a technique for analyzing images and extracting the text so that it can then be processed in the same way as a normal electronic document using DLP functionality. This issue with images is not restricted to scanned documents. Other techniques such as ‘screenshots’ can also be used to turn critical information into an image such as a JPEG, then shared via email or a Cloud collaboration application without being detected.  OCR enables images to be analyzed and DLP will prevent data leaks. OCR is available as an option for the Clearswift SECURE Email Gateway today, and for the Clearswift SECURE Web Gateway and Clearswift Endpoint DLP solution later this year.

A further enhancement to OCR analysis enables redaction of text in images, removing only the information which breaks policy by drawing a black box across the words. This is equivalent to the Clearswift Data Redaction option, but has now been extended to cover text in an image. With our DCI engine recursing by default to 50 levels deep, the image can be embedded in an Excel spreadsheet, which is embedded in a Word document, which is scanned to PDF, then shared via a ZIP archive attached to an email – Clearswift will detect the image, analyse it and redact any sensitive information, allowing the ‘safe’ file to continue to the recipient.

Images can also be used to ‘hide’ information in different ways. Some of this can be found in the document properties, for example, geographical co-ordinates as to where the picture was taken. This information can be used to identify locations and there have been a number of incidents with military personnel inadvertently leaking information through this means, or poachers using location data to track big game.

Document properties can also be subverted by a malicious insider to convey sensitive information outside the organization without suspicion. Document Sanitization is a technique to remove document properties to prevent that mechanism of data loss. Policy granularity means that only those properties which have been authorized are allowed to be used, the others are removed.

A technique called steganography can also be used to hide information in images. This is where tools can be used to subtly change the image by encoding and embedding the data, such that, to the naked eye, there is no visible difference and then it can be sent out, exfiltrating the information. A standard size image can easily hide several thousand customer contacts or account numbers. In this case, OCR will not help remove the risk as it isn’t a picture of the text. Steganography is an interesting technique as it is virtually impossible to tell if data has been hidden in an image. However, Clearswift’s anti-steganography functionality disrupts the image, such that no data can be extracted – but the image, to the naked eye, remains the same.

Furthermore, steganography is also used in botnets to communicate on the inbound traffic flow and download of malicious payloads to general purpose malware.  The same anti-steganography techniques can be used to disrupt that communication channel to keep the organization safe.

Images are often overlooked, however, a new generation of threats is emerging which uses them. Clearswift’s Advanced threat protection and DLP functionality can mitigate the threat, helping the organization stay safe.

More Information