brainsteam.co.uk/brainsteam/content/annotations/2022/12/07/1670414142.md

66 lines
3.0 KiB
Markdown
Raw Normal View History

---
date: '2022-12-07T11:55:42'
hypothesis-meta:
created: '2022-12-07T11:55:42.527155+00:00'
document:
title:
- 2203.15556.pdf
flagged: false
group: __world__
hidden: false
id: E3TX9nYmEe2IOgdyjyKG9w
links:
html: https://hypothes.is/a/E3TX9nYmEe2IOgdyjyKG9w
incontext: https://hyp.is/E3TX9nYmEe2IOgdyjyKG9w/arxiv.org/pdf/2203.15556.pdf
json: https://hypothes.is/api/annotations/E3TX9nYmEe2IOgdyjyKG9w
permissions:
admin:
- acct:ravenscroftj@hypothes.is
delete:
- acct:ravenscroftj@hypothes.is
read:
- group:__world__
update:
- acct:ravenscroftj@hypothes.is
tags:
- nlproc
- efficient ml
target:
- selector:
- end: 1689
start: 1063
type: TextPositionSelector
- exact: "We test this hypothesis by training a predicted compute-optimal model,\
\ Chinchilla, that uses the same compute budget as Gopher but with 70B parameters\
\ and4\xD7 more more data. Chinchilla uniformly and significantly outperforms\
\ Gopher (280B), GPT-3 (175B),Jurassic-1 (178B), and Megatron-Turing NLG (530B)\
\ on a large range of downstream evaluation tasks.This also means that Chinchilla\
\ uses substantially less compute for fine-tuning and inference, greatlyfacilitating\
\ downstream usage. As a highlight, Chinchilla reaches a state-of-the-art\
\ average accuracy of67.5% on the MMLU benchmark, greater than a 7% improvement\
\ over Gopher"
prefix: ' tokens should also be doubled. '
suffix: .1. IntroductionRecently a serie
type: TextQuoteSelector
source: https://arxiv.org/pdf/2203.15556.pdf
text: By using more data on a smaller language model the authors were able to achieve
better performance than with the larger models - this reduces the cost of using
the model for inference.
updated: '2022-12-07T11:55:42.527155+00:00'
uri: https://arxiv.org/pdf/2203.15556.pdf
user: acct:ravenscroftj@hypothes.is
user_info:
display_name: James Ravenscroft
in-reply-to: https://arxiv.org/pdf/2203.15556.pdf
tags:
- nlproc
- efficient ml
- hypothesis
type: annotation
url: /annotations/2022/12/07/1670414142
---
<blockquote>We test this hypothesis by training a predicted compute-optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and4× more more data. Chinchilla uniformly and significantly outperforms Gopher (280B), GPT-3 (175B),Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks.This also means that Chinchilla uses substantially less compute for fine-tuning and inference, greatlyfacilitating downstream usage. As a highlight, Chinchilla reaches a state-of-the-art average accuracy of67.5% on the MMLU benchmark, greater than a 7% improvement over Gopher</blockquote>By using more data on a smaller language model the authors were able to achieve better performance than with the larger models - this reduces the cost of using the model for inference.