AI Researcher Warns Data Science Could Face a Reproducibility Crisis
Unlike Machine Learning, Data Science is not an academic discipline, with its own set of algorithms and methods… There is an immense diversity, but also disparities in skill, expertise, and knowledge among Data Scientists… In practice, depending on their backgrounds, data scientists may have large knowledge gaps in computer science, software engineering, theory of computation, and even statistics in the context of machine learning, despite those topics being fundamental to any ML project. But it’s ok, because you can just call the API, and Python is easy to learn. Right…?
Building products using Machine Learning and data is still difficult. The tooling infrastructure is still very immature and the non-standard combination of data and software creates unforeseen challenges for engineering teams. But in my views, a lot of the failures come from this explosive cocktail of ritualistic Machine Learning:
– Weak software engineering knowledge and practices compounded by the tools themselves;
– Knowledge gap in mathematical, statistical, and computational methods, encouraged black boxing API;
– Ill-defined range of competence for the role of data scientist, reinforced by a pool of candidates with an unusually wide range of backgrounds;
– A tendency to follow the hype rather than the science. –
What can you do?
– Hold your data scientists accountable using Science.
– At a minimum, any AI/ML project should include an Exploratory Data Analysis, whose results directly support the design choices for feature engineering and model selection.
– Data scientists should be encouraged to think outside-of-the box of ML, which is a very small box
– Data scientists should be trained to use eXplainable AI methods to provide context about the algorithm’s performance beyond the traditional performance metrics like accuracy, FPR, or FNR.
– Data scientists should be held at similar standards than other software engineering specialties, with code review, code documentation, and architectural designs.
The article concludes, “Until such practices are established as the norm, I’ll remain skeptical of Data Science.”
Read more of this story at Slashdot.