By Kevin Bailey, Head of Market Strategy.
There is an eerie similarity between the concept of water and that of data. They both harness tremendous power in large volumes, whilst also posing a great risk when unleashed into the wrong environment.
In the digital age, every byte of data that we have shared with the world is still available to be found in its original format – much like water. The earth is a closed system, similar to a terrarium, meaning that it rarely loses or gains extra matter. The same water that existed on the earth millions of years ago is still present today.
We hear a great deal about Big Data; a buzzword used to describe a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional techniques. This is comparable to a tsunami, where a mass of structured water combines with the spoils of unstructured water draining off the land and carrying a large amount of debris with it. Think of this as ‘big water’.
But where does big data come from? Like water that starts its life as small droplets that combine to increase its volume, big data starts its life as DataBits and increase to form Nibbles, Bytes, Kilobits, Kilobytes, Megabits, Megabytes and so on until you reach the common terms today such as Terabytes and Petabytes.
The U.S. has recently experienced the worst drought in decades, yet each year an estimated 1.7 trillion gallons of water leaks from pipes before it can be put to use. And according to the United Nations, currently 41% of the world’s population lives in a “water stressed area”. In January 2013 IBM and Waterfund started to create the first-ever water cost index. By using Big Data analytics, it will estimate the cost of producing water globally for 25% of the world’s GDP.
We now have an intersection of both water and data, but more importantly the data used to create the water index has a newly found value.
Big Data’s value comes not only in its “whole” but the sum of its parts.
At Clearswift we are all about protecting critical information, which can be likened to critical water supplies. Ensuring that we maintain the continuous flow of data (water) that does not disrupt business operations (supply) mitigates the possibility of intentional and unintentional data leaks that are costing organisations financially and in reputational value.
Water leaks cause damage. According to one study, water damage ranked second behind power outage as the leading cause of business outages, accounting for 27% of the cases (Contingency Planning Research). Moreover, downtime is costly, ranging from about $1M to $2.8M per hour depending on the industry (Ontrack International).
In comparison, it takes on average 2.6 years for a company to recover from a data breach. A recent study into data leaks found there was an associated cost of $201.18 per lost or stolen customer record – which means a data breach involving 100,000 or more customer records would cost more than $20 million (“Cost of a Data Breach” study, Ponemon Institute, May 2014).
If I were able to steal or manipulate the data stores that are combined to create the big datasets, I could affect the outcome of any big data analysis. This is the same as if I were to divert a water source away from a drought area, which would then be at the mercy of the diverter to have the resource reconnected.
Whatever the size of your datasets, it’s worth approaching critical information protection policies with this powerful analogy in mind.