brainsteam.co.uk/brainsteam/content/annotations/2023/03/21/1679380149.md at b5f901932ea58cb920c3215324f880ae30176a97

2.8 KiB

Raw Blame History

date

hypothesis-meta

in-reply-to

tags

target

text

updated

uri

user

user_info

2023-03-21T06:29:09.945605+00:00

title

GPT-4 and professional benchmarks: the wrong answer to the wrong question

false

__world__

false

sFZzLMexEe2M2r_i759OiA

html	incontext	json
https://hypothes.is/a/sFZzLMexEe2M2r_i759OiA	https://hyp.is/sFZzLMexEe2M2r_i759OiA/aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks	https://hypothes.is/api/annotations/sFZzLMexEe2M2r_i759OiA

admin

delete

read

update

acct:ravenscroftj@hypothes.is

group:__world__

acct:ravenscroftj@hypothes.is

openai

gpt

ModelEvaluation

selector

source

endContainer	endOffset	startContainer	startOffset	type
/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/article[1]/div[4]/div[1]/div[1]/p[8]/span[2]	199	/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/article[1]/div[4]/div[1]/div[1]/p[8]/span[1]	0	RangeSelector

end	start	type
7439	7071	TextPositionSelector

exact	prefix	suffix	type
Still, we can look for telltale signs. Another symptom of memorization is that GPT is highly sensitive to the phrasing of the question. Melanie Mitchell gives an example of an MBA test question where changing some details in a way that wouldn’t fool a person is enough to fool ChatGPT (running GPT-3.5). A more elaborate experiment along these lines would be valuable.	how performance varies by date.	Because of OpenAI’s lack of tran	TextQuoteSelector

https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks

OpenAI has memorised MBA tests- when these are rephrased or certain details are changed, the system fails to answer

2023-03-21T06:29:09.945605+00:00

https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks

acct:ravenscroftj@hypothes.is

display_name
James Ravenscroft

https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks

openai

gpt

ModelEvaluation

hypothesis

annotation

/annotations/2023/03/21/1679380149

Still, we can look for telltale signs. Another symptom of memorization is that GPT is highly sensitive to the phrasing of the question. Melanie Mitchell gives an example of an MBA test question where changing some details in a way that wouldn’t fool a person is enough to fool ChatGPT (running GPT-3.5). A more elaborate experiment along these lines would be valuable.

OpenAI has memorised MBA tests- when these are rephrased or certain details are changed, the system fails to answer

2.8 KiB Raw Blame History Unescape Escape

2.8 KiB

Raw Blame History