Human error is one of the biggest problems businesses face when it comes to data leaks and unwanted data acquisition. There can be a number of systems in place to ensure that confidential files cannot be shared with unauthorized individuals, however, what’s often overlooked is the hidden metadata attached to everyday documents and files that has the potential to cause major data breaches.
People often forget that most digital office documentation contains automatically created sensitive information – such as the author name, revision history, application software and version number. This is known as metadata, which can be compromising when shared outside of an organization. It is important for businesses to understand how metadata can affect them and how it could result in a potential data breach.
Take for example, if a company creates a proposal in-house by copying a previous document and then revising it to suit the new opportunity, before submitting the proposal to the prospect. If the document had not been sanitized and the revision history removed before sending, the prospect in this example would have a wealth of sensitive information at their fingertips. Anyone with access to that document would be able to see the names of every person that has worked on the proposal or view the revision history such as changes to the original scope of works and budgets.
While this is potentially embarrassing, there is a far more sinister threat when metadata falls into the wrong hands. Metadata is invaluable to cybercriminals. For a hacker, knowing what software version is in use means they can craft an attack around known vulnerabilities in that software. Knowing the author and their email address means they can craft a phishing email with a weaponized attachment. But, how might documents fall into the wrong hands? The simplest route is through the Internet, ‘harvesting’ metadata attached to files on the company website.
The assurance that sensitive data does not leave an organization, or rather never reaches unauthorized recipients, is even more important with the impending enforcement of the GDPR. Good information governance offers a competitive advantage, but with GDPR, there are other requests which can be made for which organizations need to be prepared. The most onerous of which is often called ‘the right to be forgotten’ or RTBF.
After 25th May 2018, individuals, including customers, can request their right to be forgotten, which involves the discovery and potential removal of any and all personal information from your network and systems. Information spreads rapidly, so finding it and then making the decision as to whether it can be removed in an efficient manner is important. This does not necessarily end at the organization boundary. It can also apply to third parties which you have shared the information with. Or vice versa.
Receiving unwanted (or unauthorized) data can create as many challenges as a data leak. This is especially important when implementing RTBF requests. If you haven’t received the data, then you won’t have to find and delete it. For example, a spreadsheet might have been sent, but unknown to the sender, there were hidden columns that contained sensitive information which should have been removed. This unwanted/inadvertent data acquisition makes it even more challenging for organizations to track down and remove the sensitive data to comply with RTBF requests and GDPR in general.
Our recent report ‘The GDPR Divide: Board Views vs Middle Management’ reveals that almost half of board members believed they had duplicated customer data (for example by copying reports to multiple systems), suggesting it is increasingly difficult for organizations to rely on operator initiated processes – such as manual inspections and deletion.
Our whitepaper also reveals that only 17% of employees would actually delete an email that was sent to them from another company in error. Even fewer would make the sender aware, despite it containing sensitive information meaning that customer data is more likely to have been duplicated via unwanted data acquisition without any awareness. When it comes to GDPR compliance, it is critical to tackle this issue and ensure that customer data is not shared either by mistake or on purpose.
With employees already expressing uncertainty at their capabilities of handling RTBF requests, and just a third of management respondents believing the business can handle multiple requests concurrently, businesses need to implement a solution now.
There are two areas where technology can help. The first is preventing unwanted data acquisition. Clearswift's sanitization and redaction technology can automatically detect and remove unauthorized information from inbound documents and files via email and the web, before it enters the network. The second area is data discovery. Clearswift's CIP solution will automatically scan a network for unstructured sensitive data and then move, or remove it, based on the policy applied.
The same Clearswift technology which prevents unauthorized inbound data acquisition can also be used to prevent outbound data loss, without compromising continuous collaboration.
In order to remove the reliance on manual user processes, there is a need for company-wide systems that can automatically detect when sensitive information is about to be sent or received across an organizations boundary. A system which offers assurance and automatic protection across the entire network without causing a hindrance on the way business is conducted.
Clearswift’s Document Sanitization feature automatically purges common file formats of sensitive data to prevent inadvertent data leaks:
- Removes outstanding revision changes
- Clears history and fast-save data that potentially holds embarrassing critical information
- Completely removes document properties, such as “Author”, “Organization” and “Status”
- Removes data attached to photos, such as coordinates and other metadata
- Granularity ensures that specific properties can be preserved, such as classification information
Automation ensures that the policies are consistently applied across an organization. Document sanitization provides assurance that users are sharing documentation and files – either inside or outside of the organization – without posing a data breach threat or a compliance failure.
Document Sanitization is a component of Clearswift’s unique Adaptive Redaction technology, available with Clearswift’s Secure Email products and Web products. It can also be deployed to augment existing (non-Clearswift) email and web products.