Data Science from Trenches: Notes on Deploying Machine Learning Models

4 min readNov 10, 2020

Sunny Deol essaying the role of Maj. Kuldeep Singh Chandpuri in Border(1997) a Bollywood flick which presents a dramatized but thoroughly entertaining portrayal of the decisive Indian victory in the Battle of Longewala. Indian forces decimated an enemy equipped with far superior equipments thanks to outstanding military acumen and planning.

The real success of machine learning models is when they move from the safe havens of Proof of Concept (POC) and Minimum Viable Prototype (MVP) to the big bad world of the production environment. The most well-intentioned and technically superior models may fail to deliver the desired business outcome in the production environment due to poor planning and infrastructure. This article distills my experience with deploying machine learning models to identify the key pieces of functionality that are required for a successful machine learning model deployment. Please do mention is comments if I have missed something.

Create A Production Parallel Environment

Creating a production parallel environment that can help to debug the issues with the production deployment and test the stability of code with new features is a standard software development and deployment practice that holds good for the data science world also. Typically, the production parallel environment will be loaded with historic data.

Creating a production parallel environment would comparatively be easier for a model that is not run frequently than for systems where predictions are being done in near real-time. For the later scenario the production parallel environment will have to simulate continuous upload of data to mimic the real-life deployment.

Deploy Multiple Models in Production

Ideally you want to deploy multiple models in production to take into account the uncertainties associated with a real-life deployment. These models could be build using different feature sets, and supervised machine learning algorithms. The model to be used may be statically decided by a human agent based on performance (also referred as leader-challenger pattern). Alternatively, the results from the model with the best performance metric for the last n days could be used for sharing with the end user. Also note that you need to capture the actual outcome (for which the models made a prediction) to measure the performance.

A scene from the movie Minority Report (2002) by Steven Spielberg based on the novel with the same name by legendary prolific science fiction writer Philip Dick. In Minority report three precogs blessed with the ability to see the future predict an upcoming crime. The precogs could differ on the visions of the future in which case the majority opinion is selected. The movie explores the nuances of free will versus destiny debate.

Model and Prediction Provenance

Ensure that you have a system in place to identify the meta-data for the model that was used to serve a prediction. The meta-data would include information about the deployed model, the associated code for the model, and the training and test data used for building the model. You may use an existing library like mlflow to achieve this. However, it is also very easy to record this information from your model serving code into a database such as ElasticSearch.

Sample of attributesto be captured for model life cycle management

You will need to maintain two indexes (tables): prediction index that captures the feature values used for prediction, the predicted value and a model id that identifies the model used for prediction, model index has a model id, features used for training the model, time line of data used for training the model, and algorithm used. The model index in tandem with production index can be used to debug the predictions.

Monitor the Data Pipeline

Data is key to machine learning. Safe transportation and storage of data is crucial to building great ML deployments.

The data pipeline monitoring involves two activities:

Monitor that data is being ingested as expected. Typically, this would involve counting the number of expected records versus records received for a given time period.
Data quality check. Ensure that your production data pipeline should have checks in place to detect that no garbage data is being ingested. Furthermore, check the distribution of the attributes and flag the outliers.

A good practice is to develop a dashboard using a tool such as Kibana, that can be used to view and flag missing data and outliers.

Monitor the Model Health

Measuring and monitoring the model performance continuously, and ability to slice and dice the model performance is important.

Monitoring all the model predictions, and ability to slide and dice the model predictions is an important and must have capability to ensure that you are on top of things in your ML deployment. Writing the model predictions into a database such as Elastic Search provides an easy solution to use Kibana to create dashboards to view the model performance. Such dashboards should include precision, recall and accuracy figures for the model.

Automated Model Building

Scripts to automatically rebuild the model with new data can reduce the load on core data science teams.

Scripts to automatically recreate the models from the existing data especially the one that has been collected overtime while the ML solution was in deployment ensures that the operations teams can easily rebuild the model if required. This will be extremely helpful to rebuild the models with new data if the model performance changes due to data drift.