Data Science from Trenches: Notes on Deploying Machine Learning Models
The real success of machine learning models is when they move from the safe havens of Proof of Concept (POC) and Minimum Viable Prototype (MVP) to the big bad world of the production environment. The most well-intentioned and technically superior models may fail to deliver the desired business outcome in the production environment due to poor planning and infrastructure. This article distills my experience with deploying machine learning models to identify the key pieces of functionality that are required for a successful machine learning model deployment. Please do mention is comments if I have missed something.
Create A Production Parallel Environment
Creating a production parallel environment that can help to debug the issues with the production deployment and test the stability of code with new features is a standard software development and deployment practice that holds good for the data science world also. Typically, the production parallel environment will be loaded with historic data.
Creating a production parallel environment would comparatively be easier for a model that is not run frequently than for systems where predictions are being done in near real-time. For the later scenario the production parallel environment will have to simulate continuous upload of data to mimic the real-life deployment.
Deploy Multiple Models in Production
Ideally you want to deploy multiple models in production to take into account the uncertainties associated with a real-life deployment. These models could be build using different feature sets, and supervised machine learning algorithms. The model to be used may be statically decided by a human agent based on performance (also referred as leader-challenger pattern). Alternatively, the results from the model with the best performance metric for the last n days could be used for sharing with the end user. Also note that you need to capture the actual outcome (for which the models made a prediction) to measure the performance.
Model and Prediction Provenance
Ensure that you have a system in place to identify the meta-data for the model that was used to serve a prediction. The meta-data would include information about the deployed model, the associated code for the model, and the training and test data used for building the model. You may use an existing library like mlflow to achieve this. However, it is also very easy to record this information from your model serving code into a database such as ElasticSearch.
You will need to maintain two indexes (tables): prediction index that captures the feature values used for prediction, the predicted value and a model id that identifies the model used for prediction, model index has a model id, features used for training the model, time line of data used for training the model, and algorithm used. The model index in tandem with production index can be used to debug the predictions.
Monitor the Data Pipeline
The data pipeline monitoring involves two activities:
- Monitor that data is being ingested as expected. Typically, this would involve counting the number of expected records versus records received for a given time period.
- Data quality check. Ensure that your production data pipeline should have checks in place to detect that no garbage data is being ingested. Furthermore, check the distribution of the attributes and flag the outliers.
A good practice is to develop a dashboard using a tool such as Kibana, that can be used to view and flag missing data and outliers.
Monitor the Model Health
Monitoring all the model predictions, and ability to slide and dice the model predictions is an important and must have capability to ensure that you are on top of things in your ML deployment. Writing the model predictions into a database such as Elastic Search provides an easy solution to use Kibana to create dashboards to view the model performance. Such dashboards should include precision, recall and accuracy figures for the model.
Automated Model Building
Scripts to automatically recreate the models from the existing data especially the one that has been collected overtime while the ML solution was in deployment ensures that the operations teams can easily rebuild the model if required. This will be extremely helpful to rebuild the models with new data if the model performance changes due to data drift.