2023-03-21T06:29:09.945605+00:00 |
title |
GPT-4 and professional benchmarks: the wrong answer to the wrong question |
|
|
false |
__world__ |
false |
sFZzLMexEe2M2r_i759OiA |
|
admin |
delete |
read |
update |
acct:ravenscroftj@hypothes.is |
|
acct:ravenscroftj@hypothes.is |
|
|
acct:ravenscroftj@hypothes.is |
|
|
openai |
gpt |
ModelEvaluation |
|
selector |
source |
endContainer |
endOffset |
startContainer |
startOffset |
type |
/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/article[1]/div[4]/div[1]/div[1]/p[8]/span[2] |
199 |
/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/article[1]/div[4]/div[1]/div[1]/p[8]/span[1] |
0 |
RangeSelector |
|
end |
start |
type |
7439 |
7071 |
TextPositionSelector |
|
exact |
prefix |
suffix |
type |
Still, we can look for telltale signs. Another symptom of memorization is that GPT is highly sensitive to the phrasing of the question. Melanie Mitchell gives an example of an MBA test question where changing some details in a way that wouldn’t fool a person is enough to fool ChatGPT (running GPT-3.5). A more elaborate experiment along these lines would be valuable. |
how performance varies by date. |
Because of OpenAI’s lack of tran |
TextQuoteSelector |
|
|
https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks |
|
|
OpenAI has memorised MBA tests- when these are rephrased or certain details are changed, the system fails to answer |
2023-03-21T06:29:09.945605+00:00 |
https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks |
acct:ravenscroftj@hypothes.is |
display_name |
James Ravenscroft |
|