Beware of the Digital Diogenes Syndrome
Being an avid follower of all things related to Big Data, I must confess that I am a little worried lately. It’s not that the technology may be not yet fully mature, although try convincing a traditional CIO about using something that it is in version 0.20.x (and, please, please, don’t dare to mention him about that porcupine-named tool). Neither it is that “Big Data” is such an elastic “buzzword” that seems to cover almost anything data-related these days. As buzzwords go, it is not worse than “Cloud” or “Web 2.0” were a few years back. And I am old enough so I can stand the deafening hype that’s now everywhere. We’ve been through all that before, and we, customers and providers together, have learned to ride the “adoption waves of technology”, even if this one seems Tsunami-like sometimes.
My concern relates to the psychological impact of all this hype. I’m beginning to see some claims that risk giving us false sense of unlimited, never ending power to tackle securely any kind of data related problem, independently of size, speed, or type. And that false sense of “data omnipotence” can bring us to very nasty situation in the long term. Because the reality is that, even as we have new tools and methods that enable us to tackle new challenges, data problems are not easy, they will never be, and so, they demand previous, detailed analysis.
To me, the subliminal message hidden in all this hype goes, more or less, this way:
- First, grab all the data. From every source. Formats don’t matter anymore. And don’t worry about data quality, we’ll take care of that later (or not).
- We need to store all of it, because “Data is the new Oil” (very catchy metaphor), so, the more we can grab, the better. Just put it out there, somewhere where we can locate it when needed and ‘refine’ it. Even if instead of Digital Oil, what we have is Digital Tar, as sticky as the real one.
- Then, we provide access to all this data to some very clever (and for sure, well paid) Data Scientists. By the way, we could call them (paraphrasing Arthur C. Clarke) “Data Magicians”, as we don’t understand a single word of the language they talk.
- And then, poooffff, the ‘Big Data Magic’ happens, and we get illuminated with business-changing insights, presented in very nice and colorful Visualizations. So nice and colorful, that some of them could be in a modern art museum.
Although there is a point of caricature in the previous points, I think that you need to be really aware of the underlying problem. Storing all kind of data, “just in case”, isn’t useful, neither necessary. It only takes you to what we could call a “Digital Diogenes Syndrome”. You may have seen some cases of it in your local TV news: an old, bearded guy, that has his house full to the seams of ‘useful, essential things’ for him, but what in reality is mostly garbage. Rotten garbage, that it is unhealthy even for its owner. You can translate it to your digital realm: following the ancient but true computer acronym GIGO, data rots easily, and its smells can contaminate business.
Don’t get me wrong. I’m fully convinced of the transformational capabilities that many tools and methods under that “Big Data” moniker can bring to business and society, as some kinds of data problems that we weren’t able to tackle before can be in our reach. But that doesn’t means that the basic, well known “hygienic practices” around Data don’t matter anymore. Data models matter. Data quality matters. Data lifecycle handling matters. Data interaction with business processes matters. Data context matters. All that mattered in the ‘small data’ era, it is still valid, and even more so, in this brave, new, overhyped world of Big Data.
So, next time you think about adding that terabyte-a-day, obscure, techie log file into some Hadoop-based storage, hold on for a minute, take a big breath, feel mentally the pungent odor of that garbage-filled house, and think with double care about it. You may save yourself of some very disgusting ‘smells’ in the future.