brainsteam.co.uk/brainsteam/content/posts/2021-01-02-nlp-model-rationale/index.md at c45c30dc5c36fb46f9076b480fb22664545fbbab

4.3 KiB

Raw Blame History

title

author

type

draft

resources

date

url

description

Introduction

The ability to understand and rationalise about automated decisions is becoming particularly important as more and more businesses adopt AI into their core processes. Particularly in light of legislation like GDPR requiring subjects of automated decisions to be given the right to an explanation as to why that decision was made. There have been a number of breakthroughs in explainable models in the last few years as academic teams in the machine learning space focus their attention on the why and the how.

Recent Progress in Model Explainability

Significant breakthroughs in model explainability were seen in the likes of LIME and SHAP where local surrogate models, which are explainable but only for the small number of data samples under observation, are used to approximate the importance/contribution of features to a particular decision. These approaches are powerful when input features are meaningful in their own right (e.g. bag-of-words representations where a feature may be the presence or absense of a specific word) but are less helpful when input features are too abstract or are the output of some other black box (e.g. multi-dimensional word vectors or RGB values from pixels).

Transformer-based models like BERT which use the concept of neural attention to learn contextual relationships between words can also be interrogated by visualising attention patterns inside the model). However, these visualisations are still quite complex (especially for transformer-based models which typically have multiple parallel attention mechanisms to examine) and do not provide concise or intuitive rationalisation for model behaviour.

Rationalization of Neural Predictions

In 2016, Lei, Barzilay and Jaakola wrote about a new architecture for rationale extraction from NLP models. The aim was to generate a new model that could extract a "short and coherent" justification for why the model made a particular prediction.

{{

}}

The idea is actually quite simple. Firstly, let's assume we're starting with a classification problem where we want to take document X and train a classifier function F(x) to predict label y based on the text in the document (e.g. X is a movie review and y is positive or negative sentiment).

{{

}}

What Lei, Barzilay and Jaakola propose is that we add a new step to this process. We're going to introduce G(X) - a generator- which aims to generate a rationale R for the document. Then we're going to train our classifier F(X) to predict y not from the document representation X but from the rationale R. Our new process looks something like this:

4.3 KiB Raw Blame History

Introduction

Recent Progress in Model Explainability

Rationalization of Neural Predictions

4.3 KiB

Raw Blame History