brainsteam.co.uk/brainsteam/content/annotations/2023/03/21/1679380149.md

68 lines
2.8 KiB
Markdown
Raw Normal View History

---
date: '2023-03-21T06:29:09'
hypothesis-meta:
created: '2023-03-21T06:29:09.945605+00:00'
document:
title:
- 'GPT-4 and professional benchmarks: the wrong answer to the wrong question'
flagged: false
group: __world__
hidden: false
id: sFZzLMexEe2M2r_i759OiA
links:
html: https://hypothes.is/a/sFZzLMexEe2M2r_i759OiA
incontext: https://hyp.is/sFZzLMexEe2M2r_i759OiA/aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks
json: https://hypothes.is/api/annotations/sFZzLMexEe2M2r_i759OiA
permissions:
admin:
- acct:ravenscroftj@hypothes.is
delete:
- acct:ravenscroftj@hypothes.is
read:
- group:__world__
update:
- acct:ravenscroftj@hypothes.is
tags:
- openai
- gpt
- ModelEvaluation
target:
- selector:
- endContainer: /div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/article[1]/div[4]/div[1]/div[1]/p[8]/span[2]
endOffset: 199
startContainer: /div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/article[1]/div[4]/div[1]/div[1]/p[8]/span[1]
startOffset: 0
type: RangeSelector
- end: 7439
start: 7071
type: TextPositionSelector
- exact: "Still, we can look for telltale signs. Another symptom of memorization\
\ is that GPT is highly sensitive to the phrasing of the question. Melanie\
\ Mitchell gives an example of an MBA test question where changing some details\
\ in a way that wouldn\u2019t fool a person is enough to fool ChatGPT (running\
\ GPT-3.5). A more elaborate experiment along these lines would be valuable."
prefix: ' how performance varies by date.'
suffix: "Because of OpenAI\u2019s lack of tran"
type: TextQuoteSelector
source: https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks
text: OpenAI has memorised MBA tests- when these are rephrased or certain details
are changed, the system fails to answer
updated: '2023-03-21T06:29:09.945605+00:00'
uri: https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks
user: acct:ravenscroftj@hypothes.is
user_info:
display_name: James Ravenscroft
in-reply-to: https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks
tags:
- openai
- gpt
- ModelEvaluation
- hypothesis
type: annotation
url: /annotations/2023/03/21/1679380149
---
<blockquote>Still, we can look for telltale signs. Another symptom of memorization is that GPT is highly sensitive to the phrasing of the question. Melanie Mitchell gives an example of an MBA test question where changing some details in a way that wouldnt fool a person is enough to fool ChatGPT (running GPT-3.5). A more elaborate experiment along these lines would be valuable.</blockquote>OpenAI has memorised MBA tests- when these are rephrased or certain details are changed, the system fails to answer