Franz Langmayer | posted on 10/12/2013
Big Data has recently been discovered by big companies as the treasure of the 21st century.
Look a bit closer and you realize that in reality data is a burden.
Information – the message being conveyed by data - is the basis of all our decisions.
So, if data is available, the question arises if it really contains any useful information.
The data which we know from our daily lives, e.g. the age for your friend’s cars, their favorite sort of red wine and their annual income is good for gossip but not a very impressive basis for business analytics. However, imagine having such data from hundreds of thousands of customers and you can understand the potential. It is only recently that the two inputs for “big data” have become readily available. First, there is a massive voluntary data transfer from consumers to the computers of dealers and manufacturers via electronic payment and web-based platforms. Second, there are large data-networks with sufficient computer power to perform analytics. Big data is isolated, heterogeneous, biased and with no known a-priori meaning – so simply rubbish, you might guess. A large number of analytical techniques can be applied automatically, e.g. time series to find auto-correlation and cross-correlations, noise factor analysis, scattering, etc. For optimization of a dealer’s portfolio it is sufficient to know that a customer who buys A does like B too and if he bought C in summer he will look for D in winter. Nobody seems to care, whether these correlations actually contain any meaning.
This “diagnostic abstinence” is not a new approach. It was – for different reasons - the basis for the paradigmatic change during the fin de siècle a hundred years ago. However, it was never thought to be the final step for interpreting reality. Instead, it gave way to new and much more robust theories in various fields. This progress to theory making was made because humans seek to understand the reason for the vanishing evidences. In contrary to Big Data our intrinsic approach to understanding could be called “Smart Data”. We are trained in neglecting data. The vast majority of all human sensor input is skipped immediately, either by de-sensitization of sensors or by sorting, aggregation, re-configuration and connection to a content in our processing units.
Nevertheless, being almost totally ignorant to the data which permanently hammer on all of our sensory channels, we still find our way in complex traffic situations, even in France! How does this work ? Basically it is performed via theories, i.e. via concepts which describe quantitatively the correlation between evidences and their causes within a wider context. Powerful theories, like thermodynamics, explain a vast variety of effects from turkey cooking to large scale weather phenomena. In thermodynamics local data (e.g. the speeds of individual gas atoms) are replaced by a few simple observables (the temperature), which characterizes the system state. With the result of just a few observers – plus the theory - the system reaction can be predicted and it’s deviation from expectation values can be detected. However, theories are not for free. They demand fantasy, speculation hedged by all sorts of logic reasoning, of course the full artillery of statistics, clever experimental checking, tailoring of concepts, re-checking for consistency, etc.
In essence, theories and Smart Data are what makes a human being a scientist or an engineer.
So, if it comes to Big Data, keep them stored in some cloud. Take a proper theory (or make one), tailor it to your problem, find the smart data you need and have fun with the simple solution which explains the complex reality.