The hidden dangers lurking inside public sector documents

hidden metadata in documents

Recently, President Obama reiterated a national commitment to securing Americans’ data under the Cyber Security National Action plan, acknowledging that while connecting online has given us all enormous opportunities, it has also made our personal data available for downloading and storing, making it vulnerable to accidental loss and malicious abuse.

Today, we live in an age where the ability to collaborate and share – with our colleagues, our family and our friends – has become critical to the public’s ability to operate as a whole. There has never been so much access to data and so many ways to share it, facilitating cooperation and collaboration between not only between citizens of the U.S., but also abroad.

While the private sector has led the way in embracing collaborative tools and services, the public sector is catching up fast, recognizing that by sharing information, results and ideas with citizens, national, state and local organizations can streamline processes, boost efficiency, and build support by being transparent. Communicating with the public has always been an important part of public service, and by keeping the public abreast of opportunities, successes and new information about policies and initiatives, all levels of government can ensure greater efficacy and understanding.

Official public reporting is a key part of engaging with citizens, and has become literally a “popular” process involving the widespread online distribution and availability of national, state and local government documentation. Although this approach has been celebrated by government and public alike, it has also dramatically increased the level of risk to the extent that public sector reporting now represents a major potential source of critical data loss.

This issue may sound complex, but think of all the sensitive information that goes into public reports; addresses and birth dates, Social Security Numbers, criminal records, health and medical data, child protection information, voter records, and so on. Once the reports are complete, however, data must be scrubbed and anonymized or included as examples of broader trends, and must not contain the specific personal details of any individuals.

The idea that personal identifying data would be included in a final report sounds crazy, and the inclusion of such details would not only be embarrassing and dangerous, it would also open any public agency up to a host of legal issues and criticism from the public and the media. A report with these personal details, whether on purpose or by accident, should never get through the approval process, and yet this has happened by mistake on many occasions in the past and will likely do so again without taking the proper measures to secure data.

Despite our best intentions, sensitive data can sometimes be hidden within documents, in places where most users don’t know to look. This can be as simple as hidden columns, rows or sheets in spreadsheets, or data such as revision history and comments, which are often retained within a document’s metadata. Information in metadata is frequently used in phishing and hacking attempts.

The vast majority of people are unaware of these risks, and therefore they don’t think about how they might be exploited. But for those who are aware of these risks and who have malicious intent, it’s easy to find things that someone thought they'd deleted or didn’t know were present in the first place.

Let’s imagine a worst-case scenario. A state child protective services agency puts out a report about children under their care. The unwitting compiler pastes tables with names and addresses of children into the document for ease of reference, saving as they go. Once they finish, they delete the tables, but those tables still exist in revision history or “fast-save” information. The report is published online, and someone with the IT skills searches the document and finds the “deleted” tables. They turn around and A) blackmail parents or even someone at the agency; B) sell the data on the dark web; or C) send it to the press with the hopes of embarrassing the agency. Or D) all of the above!

Nations learning from Nations

“Hidden” metadata has already caused embarrassments for governments and public agencies around the world. In August 2014, it was discovered that the Australian Federal Police mistakenly published highly sensitive information on criminal investigations. The police provided documents to the Senate, which were then made publicly available online. Years later it transpired that the documents contained information about the subjects of criminal investigations and telecoms interception activities which were “hidden behind electronic redactions within the document” and “could, under certain circumstances, be accessed”.

The information included the address of a target of surveillance, the types of criminal investigations and offenses being investigated, the names of several officers and other identifying information of individuals connected to investigations.

In April 2015, prior to the UK general election, a letter appeared in the media signed by a number of businesses lending their support to the British Conservative Party. This apparently independent endorsement seemed like a coup for the party, but later turned into an embarrassment when the document metadata revealed the letter had actually originated from the Conservative Party Headquarters.

While these incidents have been embarrassing they have not yet been disastrous. But with the increase in information flow between government and citizens, it is only a matter of time before a more catastrophic incident occurs.

Moreover, in today’s collaborative environment, with the extended enterprise being so key, departments don’t only have their own users to worry about. Let’s say 'Big Agency' works with a few smaller organizations, 'Small Non-Profit' and 'Small Service Provider'. These small organizations perhaps have a less complete approach to the protection of critical information, and because they are frequently sharing documents and data with Big Agency, someone with malicious intent could easily use the metadata of Big Agency to infiltrate it, despite not attacking Big Agency directly.

Prevention is key

Fortunately, there is a solution that easily and rapidly reduces this sort of risk. Generally, the best approach is to remove all metadata (sanitize) from documents before they are issued. It’s rare that metadata is useful to people outside the organization, and when it is, it’s more likely to have a damaging effect than a useful one; so unless you really know what you’re doing, the best thing is to strip it out completely.

While this can be done manually, for example using Microsoft Office and other document publishing software, this is only effective if users are aware of the functionality and remember to use it before they send every document. This is not a robust solution – it’s open to human error, and reliant on policies being understood and diligently implemented. The other approach is to implement a technological solution that automatically strips out metadata (document sanitization) and revision history information when documents leave the organization, allowing the visible aspects of the document to continue unaffected.

At Clearswift, we have recently seen a spike in government interest in this particular issue and we hope that this is reflective of growing recognition of the challenges faced by federal, state and local departments. All too often there is a problem without an effective solution – this time it’s different, where a problem can be addressed at source, rather than waiting for a major disaster to drive the need for a solution.

Dr. Guy Bunker, SVP Products at Clearswift

Additional Information:


Related Articles: