brainsteam.co.uk/brainsteam/content/annotations/2023/03/21/1679380149.md

---
date: '2023-03-21T06:29:09'
hypothesis-meta:
  created: '2023-03-21T06:29:09.945605+00:00'
  document:
    title:
    - 'GPT-4 and professional benchmarks: the wrong answer to the wrong question'
  flagged: false
  group: __world__
  hidden: false
  id: sFZzLMexEe2M2r_i759OiA
  links:
    html: https://hypothes.is/a/sFZzLMexEe2M2r_i759OiA
    incontext: https://hyp.is/sFZzLMexEe2M2r_i759OiA/aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks
    json: https://hypothes.is/api/annotations/sFZzLMexEe2M2r_i759OiA
  permissions:
    admin:
    - acct:ravenscroftj@hypothes.is
    delete:
    - acct:ravenscroftj@hypothes.is
    read:
    - group:__world__
    update:
    - acct:ravenscroftj@hypothes.is
  tags:
  - openai
  - gpt
  - ModelEvaluation
  target:
  - selector:
    - endContainer: /div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/article[1]/div[4]/div[1]/div[1]/p[8]/span[2]
      endOffset: 199
      startContainer: /div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/article[1]/div[4]/div[1]/div[1]/p[8]/span[1]
      startOffset: 0
      type: RangeSelector
    - end: 7439
      start: 7071
      type: TextPositionSelector
    - exact: "Still, we can look for telltale signs. Another symptom of memorization\
        \ is that GPT is highly sensitive to the phrasing of the question. Melanie\
        \ Mitchell gives an example of an MBA test question where changing some details\
        \ in a way that wouldn\u2019t fool a person is enough to fool ChatGPT (running\
        \ GPT-3.5). A more elaborate experiment along these lines would be valuable."
      prefix: ' how performance varies by date.'
      suffix: "Because of OpenAI\u2019s lack of tran"
      type: TextQuoteSelector
    source: https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks
  text: OpenAI has memorised MBA tests- when these are rephrased or certain details
    are changed, the system fails to answer
  updated: '2023-03-21T06:29:09.945605+00:00'
  uri: https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks
  user: acct:ravenscroftj@hypothes.is
  user_info:
    display_name: James Ravenscroft
in-reply-to: https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks
tags:
- openai
- gpt
- ModelEvaluation
- hypothesis
type: annotation
url: /annotations/2023/03/21/1679380149

---


 <blockquote>Still, we can look for telltale signs. Another symptom of memorization is that GPT is highly sensitive to the phrasing of the question. Melanie Mitchell gives an example of an MBA test question where changing some details in a way that wouldn’t fool a person is enough to fool ChatGPT (running GPT-3.5). A more elaborate experiment along these lines would be valuable.</blockquote>OpenAI has memorised MBA tests- when these are rephrased or certain details are changed, the system fails to answer