---
date: '2022-12-07T11:55:42'
hypothesis-meta:
  created: '2022-12-07T11:55:42.527155+00:00'
  document:
    title:
    - 2203.15556.pdf
  flagged: false
  group: __world__
  hidden: false
  id: E3TX9nYmEe2IOgdyjyKG9w
  links:
    html: https://hypothes.is/a/E3TX9nYmEe2IOgdyjyKG9w
    incontext: https://hyp.is/E3TX9nYmEe2IOgdyjyKG9w/arxiv.org/pdf/2203.15556.pdf
    json: https://hypothes.is/api/annotations/E3TX9nYmEe2IOgdyjyKG9w
  permissions:
    admin:
    - acct:ravenscroftj@hypothes.is
    delete:
    - acct:ravenscroftj@hypothes.is
    read:
    - group:__world__
    update:
    - acct:ravenscroftj@hypothes.is
  tags:
  - nlproc
  - efficient ml
  target:
  - selector:
    - end: 1689
      start: 1063
      type: TextPositionSelector
    - exact: "We test this hypothesis by training a predicted compute-optimal model,\
        \ Chinchilla, that uses the same compute budget as Gopher but with 70B parameters\
        \ and4\xD7 more more data. Chinchilla uniformly and significantly outperforms\
        \ Gopher (280B), GPT-3 (175B),Jurassic-1 (178B), and Megatron-Turing NLG (530B)\
        \ on a large range of downstream evaluation tasks.This also means that Chinchilla\
        \ uses substantially less compute for fine-tuning and inference, greatlyfacilitating\
        \ downstream usage. As a highlight, Chinchilla reaches a state-of-the-art\
        \ average accuracy of67.5% on the MMLU benchmark, greater than a 7% improvement\
        \ over Gopher"
      prefix: ' tokens should also be doubled. '
      suffix: .1. IntroductionRecently a serie
      type: TextQuoteSelector
    source: https://arxiv.org/pdf/2203.15556.pdf
  text: By using more data on a smaller language model the authors were able to achieve
    better performance than with the larger models - this reduces the cost of using
    the model for inference.
  updated: '2022-12-07T11:55:42.527155+00:00'
  uri: https://arxiv.org/pdf/2203.15556.pdf
  user: acct:ravenscroftj@hypothes.is
  user_info:
    display_name: James Ravenscroft
in-reply-to: https://arxiv.org/pdf/2203.15556.pdf
tags:
- nlproc
- efficient ml
- hypothesis
type: annotation
url: /annotations/2022/12/07/1670414142

---


 <blockquote>We test this hypothesis by training a predicted compute-optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and4× more more data. Chinchilla uniformly and significantly outperforms Gopher (280B), GPT-3 (175B),Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks.This also means that Chinchilla uses substantially less compute for fine-tuning and inference, greatlyfacilitating downstream usage. As a highlight, Chinchilla reaches a state-of-the-art average accuracy of67.5% on the MMLU benchmark, greater than a 7% improvement over Gopher</blockquote>By using more data on a smaller language model the authors were able to achieve better performance than with the larger models - this reduces the cost of using the model for inference.