<blockquote>Suppose a human is given two sentences: “Noweapons of mass destruction found in Iraq yet.”and “Weapons of mass destruction found in Iraq.”They are then asked to respond 0 or 1 and receive areward if they are correct. In this setup, they wouldlikely need a large number of trials and errors be-fore figuring out what they are really being re-warded to do. This setup is akin to the pretrain-and-fine-tune setup which has dominated NLP in recentyears, in which models are asked to classify a sen-tence representation (e.g., a CLS token) into some</blockquote>This is a really excellent illustration of the difference in paradigm between "normal" text model fine tuning and prompt-based modelling