2023-03-21T06:25:47.417575+00:00 |
title |
GPT-4 and professional benchmarks: the wrong answer to the wrong question |
|
|
false |
__world__ |
false |
N6BVsMexEe2Z4X92AfjYDg |
|
admin |
delete |
read |
update |
acct:ravenscroftj@hypothes.is |
|
acct:ravenscroftj@hypothes.is |
|
|
acct:ravenscroftj@hypothes.is |
|
|
llm |
openai |
gpt |
ModelEvaluation |
|
selector |
source |
endContainer |
endOffset |
startContainer |
startOffset |
type |
/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/article[1]/div[4]/div[1]/div[1]/p[4]/span[2] |
300 |
/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/article[1]/div[4]/div[1]/div[1]/p[4]/span[1] |
0 |
RangeSelector |
|
end |
start |
type |
5998 |
5517 |
TextPositionSelector |
|
exact |
prefix |
suffix |
type |
To benchmark GPT-4’s coding ability, OpenAI evaluated it on problems from Codeforces, a website that hosts coding competitions. Surprisingly, Horace He pointed out that GPT-4 solved 10/10 pre-2021 problems and 0/10 recent problems in the easy category. The training data cutoff for GPT-4 is September 2021. This strongly suggests that the model is able to memorize solutions from its training set — or at least partly memorize them, enough that it can fill in what it can’t recall. |
m 1: training data contamination |
As further evidence for this hyp |
TextQuoteSelector |
|
|
https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks |
|
|
OpenAI was only able to pass questions available before september 2021 and failed to answer new questions - strongly suggesting that it has simply memorised the answers as part of its training |
2023-03-21T06:26:57.441600+00:00 |
https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks |
acct:ravenscroftj@hypothes.is |
display_name |
James Ravenscroft |
|