MLE3 : Handling Model Drift ~ Cues for Retraining and Incremental Learning

Sunday. January 16, 2022 - 1 min

Models in production need to keep up with the reality of real world representation and business processes in order to meet the performance metrics realized in development or risk failing over time in providing predictions and insights that are useful to the business. One good example is with the covid19 pandemic, ecommerce and retail behaviours changed drastically. Models such as the propensity to buy that were constructed by relying on previous business assumptions required retraining to retain impact the AI value. Introduction of new features into a product that are not included in the model will alter its performance and possibly cause the business to degrade over time as well. Therein lies the question, how do we observe and overcome a model performance dip, also known as error rate based on drift detection?

In this blog, I will give an overview to drift detection by firstly understanding the ways in which drifts can occur and why it is necessary to detect them. This will be followed by an introduction to some key drift detectors and their usage in a principled manner, especially in quantifying drift from data dimnesions such as tabular, computer vision and nlp data…

One strategy leans on detecting drift through comparing statistical distribution of features values used in training versus that in production. More often than not, after deploying a model, the prediction input data deviates from the data that the model was trained on.