66 lines
3.0 KiB
Markdown
66 lines
3.0 KiB
Markdown
---
|
||
date: '2022-12-07T11:55:42'
|
||
hypothesis-meta:
|
||
created: '2022-12-07T11:55:42.527155+00:00'
|
||
document:
|
||
title:
|
||
- 2203.15556.pdf
|
||
flagged: false
|
||
group: __world__
|
||
hidden: false
|
||
id: E3TX9nYmEe2IOgdyjyKG9w
|
||
links:
|
||
html: https://hypothes.is/a/E3TX9nYmEe2IOgdyjyKG9w
|
||
incontext: https://hyp.is/E3TX9nYmEe2IOgdyjyKG9w/arxiv.org/pdf/2203.15556.pdf
|
||
json: https://hypothes.is/api/annotations/E3TX9nYmEe2IOgdyjyKG9w
|
||
permissions:
|
||
admin:
|
||
- acct:ravenscroftj@hypothes.is
|
||
delete:
|
||
- acct:ravenscroftj@hypothes.is
|
||
read:
|
||
- group:__world__
|
||
update:
|
||
- acct:ravenscroftj@hypothes.is
|
||
tags:
|
||
- nlproc
|
||
- efficient ml
|
||
target:
|
||
- selector:
|
||
- end: 1689
|
||
start: 1063
|
||
type: TextPositionSelector
|
||
- exact: "We test this hypothesis by training a predicted compute-optimal model,\
|
||
\ Chinchilla, that uses the same compute budget as Gopher but with 70B parameters\
|
||
\ and4\xD7 more more data. Chinchilla uniformly and significantly outperforms\
|
||
\ Gopher (280B), GPT-3 (175B),Jurassic-1 (178B), and Megatron-Turing NLG (530B)\
|
||
\ on a large range of downstream evaluation tasks.This also means that Chinchilla\
|
||
\ uses substantially less compute for fine-tuning and inference, greatlyfacilitating\
|
||
\ downstream usage. As a highlight, Chinchilla reaches a state-of-the-art\
|
||
\ average accuracy of67.5% on the MMLU benchmark, greater than a 7% improvement\
|
||
\ over Gopher"
|
||
prefix: ' tokens should also be doubled. '
|
||
suffix: .1. IntroductionRecently a serie
|
||
type: TextQuoteSelector
|
||
source: https://arxiv.org/pdf/2203.15556.pdf
|
||
text: By using more data on a smaller language model the authors were able to achieve
|
||
better performance than with the larger models - this reduces the cost of using
|
||
the model for inference.
|
||
updated: '2022-12-07T11:55:42.527155+00:00'
|
||
uri: https://arxiv.org/pdf/2203.15556.pdf
|
||
user: acct:ravenscroftj@hypothes.is
|
||
user_info:
|
||
display_name: James Ravenscroft
|
||
in-reply-to: https://arxiv.org/pdf/2203.15556.pdf
|
||
tags:
|
||
- nlproc
|
||
- efficient ml
|
||
- hypothesis
|
||
type: annotation
|
||
url: /annotations/2022/12/07/1670414142
|
||
|
||
---
|
||
|
||
|
||
|
||
<blockquote>We test this hypothesis by training a predicted compute-optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and4× more more data. Chinchilla uniformly and significantly outperforms Gopher (280B), GPT-3 (175B),Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks.This also means that Chinchilla uses substantially less compute for fine-tuning and inference, greatlyfacilitating downstream usage. As a highlight, Chinchilla reaches a state-of-the-art average accuracy of67.5% on the MMLU benchmark, greater than a 7% improvement over Gopher</blockquote>By using more data on a smaller language model the authors were able to achieve better performance than with the larger models - this reduces the cost of using the model for inference. |