brainsteam.co.uk/brainsteam/content/annotations/2023/03/21/1679379947.md

3.2 KiB
Raw Blame History

date hypothesis-meta in-reply-to tags type url
2023-03-21T06:25:47
created document flagged group hidden id links permissions tags target text updated uri user user_info
2023-03-21T06:25:47.417575+00:00
title
GPT-4 and professional benchmarks: the wrong answer to the wrong question
false __world__ false N6BVsMexEe2Z4X92AfjYDg
html incontext json
https://hypothes.is/a/N6BVsMexEe2Z4X92AfjYDg https://hyp.is/N6BVsMexEe2Z4X92AfjYDg/aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks https://hypothes.is/api/annotations/N6BVsMexEe2Z4X92AfjYDg
admin delete read update
acct:ravenscroftj@hypothes.is
acct:ravenscroftj@hypothes.is
group:__world__
acct:ravenscroftj@hypothes.is
llm
openai
gpt
ModelEvaluation
selector source
endContainer endOffset startContainer startOffset type
/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/article[1]/div[4]/div[1]/div[1]/p[4]/span[2] 300 /div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/article[1]/div[4]/div[1]/div[1]/p[4]/span[1] 0 RangeSelector
end start type
5998 5517 TextPositionSelector
exact prefix suffix type
To benchmark GPT-4s coding ability, OpenAI evaluated it on problems from Codeforces, a website that hosts coding competitions. Surprisingly, Horace He pointed out that GPT-4 solved 10/10 pre-2021 problems and 0/10 recent problems in the easy category. The training data cutoff for GPT-4 is September 2021. This strongly suggests that the model is able to memorize solutions from its training set — or at least partly memorize them, enough that it can fill in what it cant recall. m 1: training data contamination As further evidence for this hyp TextQuoteSelector
https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks
OpenAI was only able to pass questions available before september 2021 and failed to answer new questions - strongly suggesting that it has simply memorised the answers as part of its training 2023-03-21T06:26:57.441600+00:00 https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks acct:ravenscroftj@hypothes.is
display_name
James Ravenscroft
https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks
llm
openai
gpt
ModelEvaluation
hypothesis
annotation /annotations/2023/03/21/1679379947
To benchmark GPT-4s coding ability, OpenAI evaluated it on problems from Codeforces, a website that hosts coding competitions. Surprisingly, Horace He pointed out that GPT-4 solved 10/10 pre-2021 problems and 0/10 recent problems in the easy category. The training data cutoff for GPT-4 is September 2021. This strongly suggests that the model is able to memorize solutions from its training set — or at least partly memorize them, enough that it can fill in what it cant recall.
OpenAI was only able to pass questions available before september 2021 and failed to answer new questions - strongly suggesting that it has simply memorised the answers as part of its training