brainsteam.co.uk/brainsteam/content/annotations/2023/03/21/1679379947.md at 3fc1432513847c0bd3ad4268bf41d065e6c5beec

3.2 KiB

Raw Blame History

date

hypothesis-meta

in-reply-to

tags

target

text

updated

uri

user

user_info

2023-03-21T06:25:47.417575+00:00

title

GPT-4 and professional benchmarks: the wrong answer to the wrong question

false

__world__

false

N6BVsMexEe2Z4X92AfjYDg

html	incontext	json
https://hypothes.is/a/N6BVsMexEe2Z4X92AfjYDg	https://hyp.is/N6BVsMexEe2Z4X92AfjYDg/aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks	https://hypothes.is/api/annotations/N6BVsMexEe2Z4X92AfjYDg

admin

delete

read

update

acct:ravenscroftj@hypothes.is

group:__world__

acct:ravenscroftj@hypothes.is

llm

openai

gpt

ModelEvaluation

selector

source

endContainer	endOffset	startContainer	startOffset	type
/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/article[1]/div[4]/div[1]/div[1]/p[4]/span[2]	300	/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/article[1]/div[4]/div[1]/div[1]/p[4]/span[1]	0	RangeSelector

end	start	type
5998	5517	TextPositionSelector

exact	prefix	suffix	type
To benchmark GPT-4’s coding ability, OpenAI evaluated it on problems from Codeforces, a website that hosts coding competitions. Surprisingly, Horace He pointed out that GPT-4 solved 10/10 pre-2021 problems and 0/10 recent problems in the easy category. The training data cutoff for GPT-4 is September 2021. This strongly suggests that the model is able to memorize solutions from its training set — or at least partly memorize them, enough that it can fill in what it can’t recall.	m 1: training data contamination	As further evidence for this hyp	TextQuoteSelector

https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks

OpenAI was only able to pass questions available before september 2021 and failed to answer new questions - strongly suggesting that it has simply memorised the answers as part of its training

2023-03-21T06:26:57.441600+00:00

https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks

acct:ravenscroftj@hypothes.is

display_name
James Ravenscroft

https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks

llm

openai

gpt

ModelEvaluation

hypothesis

annotation

/annotations/2023/03/21/1679379947

To benchmark GPT-4’s coding ability, OpenAI evaluated it on problems from Codeforces, a website that hosts coding competitions. Surprisingly, Horace He pointed out that GPT-4 solved 10/10 pre-2021 problems and 0/10 recent problems in the easy category. The training data cutoff for GPT-4 is September 2021. This strongly suggests that the model is able to memorize solutions from its training set — or at least partly memorize them, enough that it can fill in what it can’t recall.

OpenAI was only able to pass questions available before september 2021 and failed to answer new questions - strongly suggesting that it has simply memorised the answers as part of its training

3.2 KiB Raw Blame History Unescape Escape

3.2 KiB

Raw Blame History