AI model staleness – A Tester perspective

AI model staleness ( meaning having lost it’s freshness) is an common issue in all machine learning applications. In this post – we are going to learn about what it means, how it can be identified and ways to solve it.

Model Freshness and Quality curve ( Credits- shorturl.at/BJQZ1 )

What is model staleness?

Suppose you have built ( okay , not you – data scientists has ) an Financial fraud detection model. A model was trained on 3 years of data – he used 80% of data to train the model ( around 29 months) and remaining 7 months to validate the model if it’s working correctly.

Now – this model is in production. Since last one year.

At this stage, if there are any variation in data ( new ways of frauds , customer types , new locations information where bank opened branches, new deposit/loan schemes ) that happened in last one year – but model has not been trained on it , model would lost its freshness. It becomes stale.

Another example.

Imagine a recommendation model for e-commerce site like Amazon. This Model does one thing very efficiently. It tells you what products you would like with highest probability. And shows you. It does it so by grouping together similar types of customers ( this is called clustering or Collaborative recommendation systems).

Now data scientists has built this model on data of US customers, and based on orders, transaction history & customers data.Model has learnt a lot about US customers and what products they might like. Now, amazon went live with this model in India. Guess, what would happen.

Recommendations would go wrong for many reasons – behaviour of Indian customers, what products they would like, culture , buying patterns, pricing structures everything differs. Hence this recommendation model would be stale for new geography.

You get the idea.

How to identify such staleness in your project?

  1. Time parameter – check how old model is, means when Data scientist has trained this model. Especially look at till when data was used for training purpose. Remember that model only learns from training data , not test or validation dataset. This is especially true for forecasting models – where we are predicting future growth, revenue, weather based on certain time events.
  2. Accuracy – Compare accuracy of training phase and testing phase. If accuracy of training phase is very high but for testing/validation phase is low – raise issue for staleness. Remember this may be case of overfitting. We would talk about it later.
  3. Data engineering pipeline – Even if model has been trained on new data – it may not be taking in all new categories of data points (suddenly new type of product category added into dataset for sales ). If Model is trained but all parameters are not mentioned explicitly for tuning , and unit tests are not updated for new data points , we would not know why model showing staleness signs.
  4. Deployment options – In multiple deployment options, various versions of model to be presented for serving to understand impact of each model. And later most effective model – after comparing with business metrics , will be deployed for all places. Verify that this is completely automated and we are looking at newest version of model.
  5. New Features addition – See where in whole data pipeline new features being added , preferably in automated fashion. If there isn’t, you need to check if all features are represented in model. ( How to view any model ? I would cover that in my further posts in this series .)
  6. See Exceptions handling – If new trained model throws an error, was not compiled check what happens. Most time, older working model was accepted instead of new model in this scenario. Check logs are available for model training. ( Logs are useful to debug things and improve application Testability)

Way to improve model staleness –

  1. A/B comparison tests – For model edges, draw age-quality curve. You would get a line and use (business given) threshold value of accuracy to determine minimum accuracy needs to provide by model in validation dataset. (Not in the training dataset) and thereby minimum bearable age of model.
  2. Add feature pipeline , make sure all new features are added – Alert when new features not available , write unit tests to verify all required features are collected from dataset to train.
  3. Use cross -validation while training – Cross validation iterates through data multiple times are randomly select data points to train. This is rule out possibility of model being not trained for certain parts of data points all times.

Model staleness is a common problem and needs to be detected early in the project. I hope you have learn few things to identify and solve this issue. In case of any queries, let me know in comments or email at riyajshaikh at gmail do com.


  1. https://martinfowler.com/articles/cd4ml.html
  2. https://www.youtube.com/user/sentdex
  3. https://medium.com/thelaunchpad/how-to-protect-your-machine-learning-product-from-time-adversaries-and-itself-ff07727d6712
  4. https://anvaka.github.io/rules-of-ml