diff --git a/brainsteam/content/posts/2021-01-02-nlp-model-rationale/assets/simple_classifier.xml b/brainsteam/content/posts/2021-01-02-nlp-model-rationale/assets/simple_classifier.xml new file mode 100644 index 0000000..9253ca9 --- /dev/null +++ b/brainsteam/content/posts/2021-01-02-nlp-model-rationale/assets/simple_classifier.xml @@ -0,0 +1 @@ +zVdZc9owEP41nmkfYIyFOR45Qo9MaTpkpvRRsYWtibBcWYDpr+9Klm8TkrTTFBiQPmkP7X6rNRZa7NMPAsfhF+4TZjm2n1poaTnOYOg4lvrY/jlDps4wAwJBfbOpBDb0FzGgbdAD9UlS2yg5Z5LGddDjUUQ8WcOwEPxU37bjrG41xgFpARsPszb6nfoyzNCJMy7xj4QGYW55MJpmK3ucbzYnSULs81MFQjcWWgjOZTbapwvCVPDyuGRyqwurhWOCRPI5Ar31dH+0v63RaTW5jZJ1GvPbntFyxOxgDmw5Iwb65uFARZDRINL46OdBOTr3wBoR5RxGgfrd5nIPIodyBFxSyupgnAOfEpXSkAii7EVnGdIogOEDkdoQrOEIftbU4wyrzQuVMseercEt+zMP1eq9wEdgBbbQCmYr7KkdX3c7df5EK1ZfnqQ80kHzHonf7/efdDouMZ0/ec5JIUmqQyT3DIABDBNQqfxGS7ec3fMYgB7kBc1PIZVkEyvH0PIEtQIYPxKxY5oTIfV9ApGeC36IfKJSpqR2lLEFZ1xou2jnqrcyIAV/JJWVkX4pCR7JCp69ijOAQfD9IoEGBS2hngnfEynOsMUIjAyRTSU7QzM/lXVR7AkrNTExGDalGBSaS7bCwBD2BeR1OsjbyBSJ/Jm6BWDmAXsS6tXzlgkQv3UJXA1K5dRux6FzTBCGJT3W1XdFwli44xQMFzFHTiPozWAm/CA8YqSq1d9UNL6iSGIRENlSpBNTHPv1uXJbuVq9275v5aurTCrpwkmcXfI7mqoi0XTfGPFBPs+aCDSav0L7VuQ6eD/5l7Qf/TntSUrltjL+oe6bvmtmy9RcP3pyNpPnlgo4ojl5jQxvVlLTej5d+5UlNZy8bUmNL/Zunx6b/UwVRo0jeQNXC71El8wMNjjDOG1393OlLWaaWt2yw+ZlN7yiQ5WWkG0vFioXTeN3PKE630/70NnDay5c7eL54455yEFzdXFQeBicmYU9tGkl3tnOG627eTP9T628Rd2h03dbt9oAdVTh2H3xtQbT8jE3Y3/5ZwHd/AY= \ No newline at end of file diff --git a/brainsteam/content/posts/2021-01-02-nlp-model-rationale/images/figure1.png b/brainsteam/content/posts/2021-01-02-nlp-model-rationale/images/figure1.png new file mode 100644 index 0000000..71f44ed Binary files /dev/null and b/brainsteam/content/posts/2021-01-02-nlp-model-rationale/images/figure1.png differ diff --git a/brainsteam/content/posts/2021-01-02-nlp-model-rationale/images/simple_classifier.png b/brainsteam/content/posts/2021-01-02-nlp-model-rationale/images/simple_classifier.png new file mode 100644 index 0000000..8682179 Binary files /dev/null and b/brainsteam/content/posts/2021-01-02-nlp-model-rationale/images/simple_classifier.png differ diff --git a/brainsteam/content/posts/2021-01-02-nlp-model-rationale/index.md b/brainsteam/content/posts/2021-01-02-nlp-model-rationale/index.md index 17730ab..9565bd0 100644 --- a/brainsteam/content/posts/2021-01-02-nlp-model-rationale/index.md +++ b/brainsteam/content/posts/2021-01-02-nlp-model-rationale/index.md @@ -2,7 +2,7 @@ title: Explain Yourself! Self-Rationalizing NLP Models author: James type: post -draft: true +draft: false resources: - name: feature src: images/feature.jpg @@ -22,6 +22,37 @@ tags: --- + +## Introduction + The ability to understand and rationalise about automated decisions is becoming particularly important as more and more businesses adopt AI into their core processes. Particularly in light of legislation like GDPR requiring subjects of automated decisions to be given the right to an explanation as to why that decision was made. There have been a number of breakthroughs in explainable models in the last few years as academic teams in the machine learning space focus their attention on the why and the how. -Significant breakthroughs in model explainability were seen in the likes of [LIME](https://towardsdatascience.com/understanding-model-predictions-with-lime-a582fdff3a3b) and [SHAP](https://towardsdatascience.com/explain-your-model-with-the-shap-values-bc36aac4de3d) where local surrogate models, which are explainable but only for the small number of data samples under observation, are used to approximate the importance/contribution of features to a particular decision. \ No newline at end of file +## Recent Progress in Model Explainability + +Significant breakthroughs in model explainability were seen in the likes of [LIME](https://towardsdatascience.com/understanding-model-predictions-with-lime-a582fdff3a3b) and [SHAP](https://towardsdatascience.com/explain-your-model-with-the-shap-values-bc36aac4de3d) where local surrogate models, which are explainable but only for the small number of data samples under observation, are used to approximate the importance/contribution of features to a particular decision. These approaches are powerful when input features are meaningful in their own right (e.g. bag-of-words representations where a feature may be the presence or absense of a specific word) but are less helpful when input features are too abstract or are the output of some other black box (e.g. multi-dimensional word vectors or RGB values from pixels). + +[Transformer](https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html)-based models like [BERT](https://github.com/google-research/bert) which use the concept of neural attention to learn contextual relationships between words can also be interrogated by [visualising attention patterns inside the model](https://towardsdatascience.com/deconstructing-bert-part-2-visualizing-the-inner-workings-of-attention-60a16d86b5c1)). However, these visualisations are still quite complex (especially for transformer-based models which typically have multiple parallel attention mechanisms to examine) and do not provide concise or intuitive rationalisation for model behaviour. + +## Rationalization of Neural Predictions + +In 2016, [Lei, Barzilay and Jaakola](https://people.csail.mit.edu/taolei/papers/emnlp16_rationale.pdf) wrote about a new architecture for rationale extraction from NLP models. The aim was to generate a new model that could extract a "short and coherent" justification for why the model made a particular prediction. + +{{
}} + + +The idea is actually quite simple. Firstly, let's assume we're starting with a classification problem where we want to take document **X** and train a classifier function **F(x)** to predict label **y** based on the text in the document (e.g. X is a movie review and y is positive or negative sentiment). + + +{{
}} + +What Lei, Barzilay and Jaakola propose is that we add a new step to this process. We're going to introduce **G(X)** - a generator- which aims to generate a rationale **R** for the document. Then we're going to train our classifier **F(X)** to predict **y** not from the document representation **X** but from the rationale **R**. Our new process looks something like this: + + + + + + + + \ No newline at end of file