brainsteam.co.uk/brainsteam/content/annotations/2023/03/21/1679379947.md

73 lines
3.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
date: '2023-03-21T06:25:47'
hypothesis-meta:
created: '2023-03-21T06:25:47.417575+00:00'
document:
title:
- 'GPT-4 and professional benchmarks: the wrong answer to the wrong question'
flagged: false
group: __world__
hidden: false
id: N6BVsMexEe2Z4X92AfjYDg
links:
html: https://hypothes.is/a/N6BVsMexEe2Z4X92AfjYDg
incontext: https://hyp.is/N6BVsMexEe2Z4X92AfjYDg/aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks
json: https://hypothes.is/api/annotations/N6BVsMexEe2Z4X92AfjYDg
permissions:
admin:
- acct:ravenscroftj@hypothes.is
delete:
- acct:ravenscroftj@hypothes.is
read:
- group:__world__
update:
- acct:ravenscroftj@hypothes.is
tags:
- llm
- openai
- gpt
- ModelEvaluation
target:
- selector:
- endContainer: /div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/article[1]/div[4]/div[1]/div[1]/p[4]/span[2]
endOffset: 300
startContainer: /div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/article[1]/div[4]/div[1]/div[1]/p[4]/span[1]
startOffset: 0
type: RangeSelector
- end: 5998
start: 5517
type: TextPositionSelector
- exact: "To benchmark GPT-4\u2019s coding ability, OpenAI evaluated it on problems\
\ from Codeforces, a website that hosts coding competitions. Surprisingly,\
\ Horace He pointed out that GPT-4 solved 10/10 pre-2021 problems and 0/10\
\ recent problems in the easy category. The training data cutoff for GPT-4\
\ is September 2021. This strongly suggests that the model is able to memorize\
\ solutions from its training set \u2014 or at least partly memorize them,\
\ enough that it can fill in what it can\u2019t recall."
prefix: 'm 1: training data contamination'
suffix: As further evidence for this hyp
type: TextQuoteSelector
source: https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks
text: OpenAI was only able to pass questions available before september 2021 and
failed to answer new questions - strongly suggesting that it has simply memorised
the answers as part of its training
updated: '2023-03-21T06:26:57.441600+00:00'
uri: https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks
user: acct:ravenscroftj@hypothes.is
user_info:
display_name: James Ravenscroft
in-reply-to: https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks
tags:
- llm
- openai
- gpt
- ModelEvaluation
- hypothesis
type: annotation
url: /annotations/2023/03/21/1679379947
---
<blockquote>To benchmark GPT-4s coding ability, OpenAI evaluated it on problems from Codeforces, a website that hosts coding competitions. Surprisingly, Horace He pointed out that GPT-4 solved 10/10 pre-2021 problems and 0/10 recent problems in the easy category. The training data cutoff for GPT-4 is September 2021. This strongly suggests that the model is able to memorize solutions from its training set — or at least partly memorize them, enough that it can fill in what it cant recall.</blockquote>OpenAI was only able to pass questions available before september 2021 and failed to answer new questions - strongly suggesting that it has simply memorised the answers as part of its training