Big Data ≧ Data + Questions + Algorithmic Knowledge + Commitment + Collaboration – Part I
Many big data activities fail on establishing necessary preconditions for success. For companies the main challenge in applying big data is to ask the right questions. In science the main challenges are the access to data and the collaboration across disciplines.
Is big data just a hype? Is it more a sales argument than scientific progress? There is a standard story about big data in industry: A company spends enormous amounts of money, creates an impressing big data infrastructure, and then it finds out that it has no questions to be answered by the big data machinery. This story describes a rather common reality of big data. Depending on whom you talk to, 8 to 9 out of 10 big data projects fail.
There are several reasons for these discouraging statistics. Sometimes failure happens because there is simply no potential for successful big data applications, but this may only turn out after it has been tried. In many other cases big data fails because the basic requirements for success are not fulfilled. These requirements are in short, that before you start you need good data, useful questions, proper algorithmic knowledge (plus tools), a clear commitment to apply the findings, and possibly also the ability to collaborate across multidisciplinary teams.
Make or buy
First and foremost, you need as a minimum data, questions, and algorithmic knowledge in order to apply big data successfully. Being a company in most cases you can buy algorithmic knowledge. Usually you can even buy the required data (though a lack of affordable high quality data may limit innovations significantly as I shall explain below). However, what is for sure is that you must be able to make your questions yourself. And for this, for asking the right questions, it is necessary that you understand the potential of applied data science (which is a synonym for big data as we understand it, but sounds more scientific) and that you understand your market and your company.
Of course, external consulting can help with the questions. However, if consulting cares about creating value for its customers, it will only stimulate the search for the right questions. It will provide instruments and tools and it will take care that good practices are considered, but it will not force its customers into prefabricated questions. Rather it will take care that the search for questions is not limited to the questions that have been asked before by the competitors in the market. Adapting good practices to your own company is useful, but you really have to adapt them (e.g. to your market position and the capabilities inside your company) and you should be open for creating innovations of your own – at best innovation that cannot be copied by others and thus create a new “Blue Ocean” in the sense of Kim and Mauborgne.
More (applied) mathematics than science
Applied data science is no magic bullet though. First, it primarily enables those who have a profound understanding of their own business to further develop this understanding. Second, its impact depends on the capability to implement change and innovation. Thus, applied data science neither is likely to help those, who do not know their business well, nor it is likely to change companies which are unwilling to change. In addition, big data is more mathematics than science in that it does not ensure validity ex ante (like most standard scientific practices), but its findings must be validated ex post. Unfortunately, different from pure mathematics with its proofs of facts, ex post validation in big data creates at best a weak form of evidence, which is to be “falsified” in the sense of Sir Karl Popper through implementation in real life (like with most applied mathematics).
Quite naturally, in real life business applied data science creates the risk of not scrutinizing the algorithmic results sufficiently before they are turned into business innovations. As with every new tool, also the application of the tools of applied data science must be learned – and this learning is rarely fun for free.
Part of this learning addresses also a very old and embarrassing problem: ever since data are used for controlling companies and business, there is a strong resistance to using factual observations derived from data. The belief, that statistics are lies, is deeply ingrained in many contemporary managers. The fundamental lack of evidence in most big data results helps them to justify their disbelief in scientific management. As a consequence, even highly useful results fitting reality rarely create impact, because their usage is blocked. If companies want to apply big data successfully they have to learn how they can let facts guide their management decisions to a much larger extent than in the past.
In search for data
Still, the situation for a company heading for business improvement or innovation is rather simple as compared with the standard situation in research and science. There, the application of measured data is rather well accepted – at least as long as the data do not contradict a dominant paradigm of a specific scientific community – but the use of advanced is only at the very beginning. By advanced big data I mean making explicit what is hidden implicitly in the data plus deriving an understanding how data can be used from the data themselves (e.g. through the application of learning algorithms). Although the “digital turn” of science has already created impressing first results, many of these results popped up in areas where you would not intuitively expect it – such as economic history – while in those areas where applications of big data seem to be obvious, researchers face hard problems.
For example, it is very clear, that applied data science may lead to a huge progress in healthcare: It has the potential to dramatically improve prevention, diagnosis, therapy, and monitoring as such. In addition, it further has the potential to significantly improve healthcare practices and the running of healthcare units as well as to contribute to a better organization and resource planning of the healthcare system as a whole. However, it is also evident that this will happen if and only if patients’ health data will be available for research. You can’t do big data without data! And you can do big data in healthcare only to a limited extent if you can only use data from abroad. This is because the distributions of genes, lifestyles, and diseases vary between countries as do the healthcare practices and organizational setups.
Healthcare may look like an application domain with specifically hard challenges, but the problems are similar in many application areas. We encounter in many settings a dramatic lack of accessible data, and this blocks progress and innovations. For example, we would also need much better access to data on economic and societal issues, just to name two further big “construction sites”, to perform more in depth empirical research on and for policy making. This creates a high urgency on the national level: If Switzerland does not solve data access problems in several critical areas in the next few years it will face severe problems both in research, in economic development, and in government.
In search for skills and sharing
Having enough data and proper questions is good, and for many companies it is good enough, but not for all and sure not for more advanced scientific research….