68 lines
2.8 KiB
Markdown
68 lines
2.8 KiB
Markdown
|
---
|
|||
|
date: '2023-03-21T06:29:09'
|
|||
|
hypothesis-meta:
|
|||
|
created: '2023-03-21T06:29:09.945605+00:00'
|
|||
|
document:
|
|||
|
title:
|
|||
|
- 'GPT-4 and professional benchmarks: the wrong answer to the wrong question'
|
|||
|
flagged: false
|
|||
|
group: __world__
|
|||
|
hidden: false
|
|||
|
id: sFZzLMexEe2M2r_i759OiA
|
|||
|
links:
|
|||
|
html: https://hypothes.is/a/sFZzLMexEe2M2r_i759OiA
|
|||
|
incontext: https://hyp.is/sFZzLMexEe2M2r_i759OiA/aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks
|
|||
|
json: https://hypothes.is/api/annotations/sFZzLMexEe2M2r_i759OiA
|
|||
|
permissions:
|
|||
|
admin:
|
|||
|
- acct:ravenscroftj@hypothes.is
|
|||
|
delete:
|
|||
|
- acct:ravenscroftj@hypothes.is
|
|||
|
read:
|
|||
|
- group:__world__
|
|||
|
update:
|
|||
|
- acct:ravenscroftj@hypothes.is
|
|||
|
tags:
|
|||
|
- openai
|
|||
|
- gpt
|
|||
|
- ModelEvaluation
|
|||
|
target:
|
|||
|
- selector:
|
|||
|
- endContainer: /div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/article[1]/div[4]/div[1]/div[1]/p[8]/span[2]
|
|||
|
endOffset: 199
|
|||
|
startContainer: /div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/article[1]/div[4]/div[1]/div[1]/p[8]/span[1]
|
|||
|
startOffset: 0
|
|||
|
type: RangeSelector
|
|||
|
- end: 7439
|
|||
|
start: 7071
|
|||
|
type: TextPositionSelector
|
|||
|
- exact: "Still, we can look for telltale signs. Another symptom of memorization\
|
|||
|
\ is that GPT is highly sensitive to the phrasing of the question. Melanie\
|
|||
|
\ Mitchell gives an example of an MBA test question where changing some details\
|
|||
|
\ in a way that wouldn\u2019t fool a person is enough to fool ChatGPT (running\
|
|||
|
\ GPT-3.5). A more elaborate experiment along these lines would be valuable."
|
|||
|
prefix: ' how performance varies by date.'
|
|||
|
suffix: "Because of OpenAI\u2019s lack of tran"
|
|||
|
type: TextQuoteSelector
|
|||
|
source: https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks
|
|||
|
text: OpenAI has memorised MBA tests- when these are rephrased or certain details
|
|||
|
are changed, the system fails to answer
|
|||
|
updated: '2023-03-21T06:29:09.945605+00:00'
|
|||
|
uri: https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks
|
|||
|
user: acct:ravenscroftj@hypothes.is
|
|||
|
user_info:
|
|||
|
display_name: James Ravenscroft
|
|||
|
in-reply-to: https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks
|
|||
|
tags:
|
|||
|
- openai
|
|||
|
- gpt
|
|||
|
- ModelEvaluation
|
|||
|
- hypothesis
|
|||
|
type: annotation
|
|||
|
url: /annotations/2023/03/21/1679380149
|
|||
|
|
|||
|
---
|
|||
|
|
|||
|
|
|||
|
|
|||
|
<blockquote>Still, we can look for telltale signs. Another symptom of memorization is that GPT is highly sensitive to the phrasing of the question. Melanie Mitchell gives an example of an MBA test question where changing some details in a way that wouldn’t fool a person is enough to fool ChatGPT (running GPT-3.5). A more elaborate experiment along these lines would be valuable.</blockquote>OpenAI has memorised MBA tests- when these are rephrased or certain details are changed, the system fails to answer
|