Re-using machine learning models and the “no free lunch” theorem
Why re-use machine learning models?
You can get a lot of value out of training a machine learning model to solve a single use case, like predicting emotion in your customer chatbot transcripts and putting the angry ones through to real humans. However, you might be able to extract even more value out of your model by using it in more than one use case. You could use an emotion model to prioritise customer chat sessions but also to help monitor incoming email inquiries and social media channels too. A model can often be deployed across multiple channels and use cases with significantly less effort than creating a new, complete training set for each problem your business encounters. However there are some caveats that you should be aware of. In particular, the “No Free Lunch Theorem” which is concerned with the theoretical drawbacks of deploying a model across multiple use cases.
No free lunch? What has food got to do with it?
So No Free Lunch means I can’t reuse my models?
Not exactly. The theorem says that there’s no correlation between your model’s performance in its intended environment and its performance in a completely new environment. However, it doesn’t rule out the possibility of be correlations if we know the nature of the new problems and data. You can think about it in terms of human expertise and specialisation. Humans learn to specialise as they work their way through the education system. A medical doctor and a veterinarian both go through extensive training in order to be able to carry out medical procedures on humans and animals respectively. A veterinarian might be excellent at operating on different species of animals. However, a veterinarian would not be able to operate on an injured human to the same level of professionalism as a medical doctor without some additional training.
How do I know whether my model will work well for a new problem?
What can domain adaptation do for model reuse?
On September 23rd, 1999, a $125 Million Mars lander burned up in orbit around the red planet, much to the dismay of the incredibly talented engineering teams who worked on it. The theory was sound. Diligent checks had been carried out. So why, then, had the mission gone up in flames? A review panel meeting showed that a miscommunication between teams working on the project was the cause. One set of engineers expressed force in pounds, the other team preferred to use newtons. Neither group were wrong but this small miscommunication had everything grinding to a half.
<div>
</div>
<div>
The point here is that the machine learning model that you trained for sentiment in one environment may not be that far off. It might just need “tweaking” to make it work well for the new domain. This exercise, known as “domain adaptation” is about mapping features (words) that help the classifier understand sentiment in one domain onto features (words) that help the it to understand the other domain. For example a model trained on reviews for mobile phones might learn to use negative words like “unresponsive, slow, outdated” but reviews for movies might use negative words like “cliched, dull, slow-moving”. This mapping of features is not an exact science but good mappings can be found using approaches like that of <a href="http://www.icml-2011.org/papers/342_icmlpaper.pdf">Glorot, Bordes and Bengio (2011).</a>
</div>
<h2>
Conclusion
</h2>
<div>
Building machine learning models is a valuable but time consuming activity. It makes sense to build and reuse models where possible. However, it is important to be aware of the fact that models can become ultra-specialised at the task that they’re trained on and that some adaptation may be required to get them working in new environments. We have given some tips for evaluating a model’s performance on new tasks as well as some guidance on when re-using a model in a new environment is appropriate. We have also introduced the concept of domain adaptation, sometimes referred to as “transfer learning” which allows your existing model to learn the “language” of the new problem space.
</div>
<div>
</div>
<div>
No Free Lunch theorem can sound pretty scary when it comes to model reuse. We can’t guarantee that a model will work in a new space given what we know about its performance on an existing problem. However, using some of the skills and techniques discussed above, you can have a certain level of confidence that a model will work on a new problem and you can always “domain adapt” to make it better. Ultimately the proof that these techniques are effective lies with suppliers like IBM, Microsoft and Google. These titans of tech have developed widely used and respected general models for families of problems that are relevant across many sectors for NLP, Voice Recognition and Image Processing. Some of them are static, some trainable (although ‘trainable’ often means domain-adaptation ready) but most of them work very well with little to no modification on a wide range of use cases. The most important thing is to do some due diligence around checking that they work for you and your business.
</div>