InformationWeek’s Lisa Morgan recently wrote, “Business professionals have traditionally viewed the world in concrete terms and sometimes even round numbers. That legacy perspective is black and white compared to the shades of gray that data science produces. Instead of producing a single number result, such as 40 percent, the result is probabilistic, combining a level of confidence with a margin of error.” Or, to put it more succinctly, data science is not an exact science.
There are a number of reasons this is the case, including:
It’s possible a company may not have all the data required to answer a particular question. Even with a complete data set, if there are any issues with data quality, then there is a good chance that the analysis may be biased or skewed. This underscores the importance of investing in a solid foundational platform that can address data quality issues and ensure both external and internal data is clean, consistent, and prepared to yield desirable outcomes.
As Morgan puts it, “It’s been said that if one wants better answers, one should ask better questions. Better questions come from data scientists working together with domain experts to frame the problem.” Pre-existing assumptions, available resources, constraints, goals, and success metrics are other considerations that can impact how the question is posed and, in turn, the resulting answer.
Those working in the field understand that data science, machine learning, and artificial intelligence have significant limitations. Others more removed from the day-to-day often have a different viewpoint, and these unrealistic expectations can often produce challenges.
Context plays a big role in the success or failure of an analytics initiative—a model may work perfectly in one scenario and yield disappointing results in another. Morgan states, “Even within the same use case, a prediction model can be inaccurate. For example, a churn model based on historical data might place more weight on recent purchases than older purchases or vice versa.”
Image recognition is fueled by labeled data, but it’s not always easy to label all content. Image classification can vary based on cultural differences, social norms, and current events. Morgan elaborates, “Similarly, if a neural network is trained to predict the type of image coming from a mobile phone, if it has been trained on songs and photos from an iOS phone, it won’t be able to predict the same type of content coming from an Android device and vice versa.”
These variables make it difficult to always get absolute answers, but, as Morgan points out, that’s not necessarily a bad thing. Check out her piece in its entirety here to read more.