In a recent Bloomberg opinion piece, economics professor Gary Smith wrote, “Today, the problem is not the scarcity of data, but the opposite. We have too much data, and it is undermining the credibility of science.”

Whether it’s surveillance video, sales records, consumer preferences, or demographic information, businesses are working overtime to leverage their existing data. At the same time, companies are increasingly investing in technologies like IoT to extract further insights from manufacturing environments, agricultural farms, and retail shops to name just a few. As this happens, it should come as no surprise that the data and analytics market is expected to reach $100 billion by 2027. 

But is it possible that too much data can obfuscate or, in some cases, mislead organizations? Smith certainly thinks so, and points to a too-low threshold for statistical significance that resulted in numerous subsequently debunked conclusions as evidence. 

As he writes in the Bloomberg piece, in the 1920s 5% became the hurdle for the “statistically significant” certification required for research publication and funding. The issue is that it’s relatively easy to achieve this 5% significance with correlations that are actually just coincidence. Among the findings Smith highlights that have since been discredited are: 

  • Women who eat breakfast cereal daily are more likely to have male babies (published in the Proceedings of the Royal Society) 
  • Hurricanes are deadlier if they have female names (published in the Proceedings of the National Academy of Sciences) 
  • Students do better on a recall test if they study for the test after taking it (published in the Journal of Personality and Social Psychology) 

So, what does all this mean from an enterprise perspective? The concern for companies is that as data volumes increase, they may fall into the trap of using data for its own sake. For example, if organizations are overly bullish about spotting patterns they run the risk of eventually finding correlations that are coincidental, have a low likelihood of utility, and may even be erroneous. 

To avoid this situation, Smith stresses that statistical significance should not supersede common sense. It’s important to never accept controversial insights at face value without conducting further validation—for example, testing the findings via pilots or having a different team check the conclusions.

It’s also critical that companies have the right combination of data culture, people, and tools to overcome the challenges of our data-deluged environment.

For more on how to achieve this, take a look at this recent APEX of Innovation post.