brainsteam.co.uk/brainsteam/content/annotations/2023/03/21/1679380149.md

2.8 KiB
Raw Blame History

date hypothesis-meta in-reply-to tags type url
2023-03-21T06:29:09
created document flagged group hidden id links permissions tags target text updated uri user user_info
2023-03-21T06:29:09.945605+00:00
title
GPT-4 and professional benchmarks: the wrong answer to the wrong question
false __world__ false sFZzLMexEe2M2r_i759OiA
html incontext json
https://hypothes.is/a/sFZzLMexEe2M2r_i759OiA https://hyp.is/sFZzLMexEe2M2r_i759OiA/aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks https://hypothes.is/api/annotations/sFZzLMexEe2M2r_i759OiA
admin delete read update
acct:ravenscroftj@hypothes.is
acct:ravenscroftj@hypothes.is
group:__world__
acct:ravenscroftj@hypothes.is
openai
gpt
ModelEvaluation
selector source
endContainer endOffset startContainer startOffset type
/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/article[1]/div[4]/div[1]/div[1]/p[8]/span[2] 199 /div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/article[1]/div[4]/div[1]/div[1]/p[8]/span[1] 0 RangeSelector
end start type
7439 7071 TextPositionSelector
exact prefix suffix type
Still, we can look for telltale signs. Another symptom of memorization is that GPT is highly sensitive to the phrasing of the question. Melanie Mitchell gives an example of an MBA test question where changing some details in a way that wouldnt fool a person is enough to fool ChatGPT (running GPT-3.5). A more elaborate experiment along these lines would be valuable. how performance varies by date. Because of OpenAIs lack of tran TextQuoteSelector
https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks
OpenAI has memorised MBA tests- when these are rephrased or certain details are changed, the system fails to answer 2023-03-21T06:29:09.945605+00:00 https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks acct:ravenscroftj@hypothes.is
display_name
James Ravenscroft
https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks
openai
gpt
ModelEvaluation
hypothesis
annotation /annotations/2023/03/21/1679380149
Still, we can look for telltale signs. Another symptom of memorization is that GPT is highly sensitive to the phrasing of the question. Melanie Mitchell gives an example of an MBA test question where changing some details in a way that wouldnt fool a person is enough to fool ChatGPT (running GPT-3.5). A more elaborate experiment along these lines would be valuable.
OpenAI has memorised MBA tests- when these are rephrased or certain details are changed, the system fails to answer