Anti-Patterns (Bad Smells) in Data Science Organizations

Atul Singh, PhD
4 min readOct 25, 2020

--

Anti-patterns or bad smells, in data science organizations are the management principles, and overarching vision that may appear to be appropriate and effective but can have serious bad consequences. This article distills the author’s experience in working on data science teams across different organizations to identify prominent anti-patterns that may have debilitating effect on the organization’s investment in building a data science practice. Please do read and comment if you agree or disagree with them, have encountered the same in your experience, and have experienced anti-patterns that I have missed.

Photo by Darryl Brooks on Unsplash

Jugaad is not scalable

It is human tendency to improvise a solution instead of being bogged down by a problem. We in India have celebrated this quintessential human spirit and given it the name Jugad.

by Rajdou, Prabhu and Ahuja

A good product company knows that the costs to repay the debt accrued due to a makeshift solution is often higher than the short-term gains. This is typically because the product is deployed and maintained in variegated environments. Companies with service and consultancy backgrounds who develop their solution for a small number of often internal clients are more prone to suffer from this anti-pattern.

The anti-patten manifests as hacked solutions such as using regular expressions and custom rules instead of machine learning in building the solution. The challenge here is that these approaches are not machine learning, will not scale, and waste the organization’s resources spend with an intention of adopting and absorbing the benefits of artificial intelligence and machine learning.

Think outside the bun (Supervised Machine Learning)

This anti-pattern is my favorite. Supervised machine learning models are easier to understand, implement and adopt and have consequently become the vanguard in an organization’s artificial intelligence and machine learning strategy. The adoption of supervised machine learning is widely accepted, and I have even heard suggestions to use it for optimization problems which have their own well-studied and effective set of tools.

Supervised machine learning is a nifty tool to make sense of patterns in a complex scenario and use this learning to enable automation and informed decision making. This comes with the limitation of need for labelled data that contains samples for all the patterns that are expected in the deployment. The learning will not be effective if the patterns change (data drift) which necessitates the need for rebuilding the environment.

Machine learning is not a golden hammer

Law of the hammer (also called golden hammer) is a cognitive bias that involves over-reliance on a familiar tool. With the euphoria around machine learning it is natural that people want to use machine learning as a golden hammer, to solve problems such as interest rate predictions that have a robust theory behind them.

Simple model is contextual

While starting the machine learning journey as a data scientist one is extremely likely to encounter the suggestion to start with simple models. Data science teams often interpret it to mean models like regression which even though easy to interpret, require diligent effort and lots of experience to get right. In contrast models like random forests, boosting, and even neural networks are comparatively easier to use, give good performance with much less effort, but may sound complex especially to management teams and business stakeholders. The suggestion for data scientists will be to focus on the story around the business problem while interacting with stakeholders, rather than focusing on the specifics of the technique.

Data Science is not Software Engineering

This may sound trivial, but increasingly I find that data science organizations are overlooking this obvious difference. Both data scientists and manager can suffer from this anti-pattern. On a lighter note the goal of data science is to (not 😀) find meaningful insights such as this

Data scientists that come from a robust software engineering background tend to spend more time on automating their workflows rather than focusing on data. Data Science team managers display this anti-pattern when they fail to account for the time required to analyze the data and results and convert it into a meaningful story. And, instead they only account for the time effort for writing the code to build the models.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Atul Singh, PhD
Atul Singh, PhD

Written by Atul Singh, PhD

Data scientist, with extensive experience of design, development, and industrialization of AI/ML based solutions for finance, telecom, retail and healthcare.

No responses yet

Write a response