--- date: '2022-11-23T20:55:44' hypothesis-meta: created: '2022-11-23T20:55:44.414977+00:00' document: title: - 2022.naacl-main.167.pdf flagged: false group: __world__ hidden: false id: MrGLumtxEe21b1OADBLmyg links: html: https://hypothes.is/a/MrGLumtxEe21b1OADBLmyg incontext: https://hyp.is/MrGLumtxEe21b1OADBLmyg/aclanthology.org/2022.naacl-main.167.pdf json: https://hypothes.is/api/annotations/MrGLumtxEe21b1OADBLmyg permissions: admin: - acct:ravenscroftj@hypothes.is delete: - acct:ravenscroftj@hypothes.is read: - group:__world__ update: - acct:ravenscroftj@hypothes.is tags: - prompt-models - NLProc target: - selector: - end: 20146 start: 19539 type: TextPositionSelector - exact: Misleading Templates There is no consistent re-lation between the performance of models trainedwith templates that are moderately misleading (e.g.{premise} Can that be paraphrasedas "{hypothesis}"?) vs. templates that areextremely misleading (e.g., {premise} Isthis a sports news? {hypothesis}).T0 (both 3B and 11B) perform better givenmisleading-moderate (Figure 3), ALBERT andT5 3B perform better given misleading-extreme(Appendices E and G.4), whereas T5 11B andGPT-3 perform comparably on both sets (Figure 2;also see Table 2 for a summary of statisticalsignificances.) Despite a lack of pattern between prefix: structiveand misleading-extreme. suffix: 4 8 16 32 64 128 2560.50.550.60. type: TextQuoteSelector source: https://aclanthology.org/2022.naacl-main.167.pdf text: "Their misleading templates really are misleading \n\n{premise} Can that be\ \ paraphrased as \"{hypothesis}\" \n\n{premise} Is this a sports news? {hypothesis}" updated: '2022-11-23T20:55:44.414977+00:00' uri: https://aclanthology.org/2022.naacl-main.167.pdf user: acct:ravenscroftj@hypothes.is user_info: display_name: James Ravenscroft in-reply-to: https://aclanthology.org/2022.naacl-main.167.pdf tags: - prompt-models - NLProc - hypothesis type: annotation url: /annotation/2022/11/23/1669236944 ---
Misleading Templates There is no consistent re-lation between the performance of models trainedwith templates that are moderately misleading (e.g.{premise} Can that be paraphrasedas "{hypothesis}"?) vs. templates that areextremely misleading (e.g., {premise} Isthis a sports news? {hypothesis}).T0 (both 3B and 11B) perform better givenmisleading-moderate (Figure 3), ALBERT andT5 3B perform better given misleading-extreme(Appendices E and G.4), whereas T5 11B andGPT-3 perform comparably on both sets (Figure 2;also see Table 2 for a summary of statisticalsignificances.) Despite a lack of pattern betweenTheir misleading templates really are misleading {premise} Can that be paraphrased as "{hypothesis}" {premise} Is this a sports news? {hypothesis}