66 lines
3.0 KiB
Markdown
66 lines
3.0 KiB
Markdown
|
---
|
|||
|
date: '2022-12-07T11:55:42'
|
|||
|
hypothesis-meta:
|
|||
|
created: '2022-12-07T11:55:42.527155+00:00'
|
|||
|
document:
|
|||
|
title:
|
|||
|
- 2203.15556.pdf
|
|||
|
flagged: false
|
|||
|
group: __world__
|
|||
|
hidden: false
|
|||
|
id: E3TX9nYmEe2IOgdyjyKG9w
|
|||
|
links:
|
|||
|
html: https://hypothes.is/a/E3TX9nYmEe2IOgdyjyKG9w
|
|||
|
incontext: https://hyp.is/E3TX9nYmEe2IOgdyjyKG9w/arxiv.org/pdf/2203.15556.pdf
|
|||
|
json: https://hypothes.is/api/annotations/E3TX9nYmEe2IOgdyjyKG9w
|
|||
|
permissions:
|
|||
|
admin:
|
|||
|
- acct:ravenscroftj@hypothes.is
|
|||
|
delete:
|
|||
|
- acct:ravenscroftj@hypothes.is
|
|||
|
read:
|
|||
|
- group:__world__
|
|||
|
update:
|
|||
|
- acct:ravenscroftj@hypothes.is
|
|||
|
tags:
|
|||
|
- nlproc
|
|||
|
- efficient ml
|
|||
|
target:
|
|||
|
- selector:
|
|||
|
- end: 1689
|
|||
|
start: 1063
|
|||
|
type: TextPositionSelector
|
|||
|
- exact: "We test this hypothesis by training a predicted compute-optimal model,\
|
|||
|
\ Chinchilla, that uses the same compute budget as Gopher but with 70B parameters\
|
|||
|
\ and4\xD7 more more data. Chinchilla uniformly and significantly outperforms\
|
|||
|
\ Gopher (280B), GPT-3 (175B),Jurassic-1 (178B), and Megatron-Turing NLG (530B)\
|
|||
|
\ on a large range of downstream evaluation tasks.This also means that Chinchilla\
|
|||
|
\ uses substantially less compute for fine-tuning and inference, greatlyfacilitating\
|
|||
|
\ downstream usage. As a highlight, Chinchilla reaches a state-of-the-art\
|
|||
|
\ average accuracy of67.5% on the MMLU benchmark, greater than a 7% improvement\
|
|||
|
\ over Gopher"
|
|||
|
prefix: ' tokens should also be doubled. '
|
|||
|
suffix: .1. IntroductionRecently a serie
|
|||
|
type: TextQuoteSelector
|
|||
|
source: https://arxiv.org/pdf/2203.15556.pdf
|
|||
|
text: By using more data on a smaller language model the authors were able to achieve
|
|||
|
better performance than with the larger models - this reduces the cost of using
|
|||
|
the model for inference.
|
|||
|
updated: '2022-12-07T11:55:42.527155+00:00'
|
|||
|
uri: https://arxiv.org/pdf/2203.15556.pdf
|
|||
|
user: acct:ravenscroftj@hypothes.is
|
|||
|
user_info:
|
|||
|
display_name: James Ravenscroft
|
|||
|
in-reply-to: https://arxiv.org/pdf/2203.15556.pdf
|
|||
|
tags:
|
|||
|
- nlproc
|
|||
|
- efficient ml
|
|||
|
- hypothesis
|
|||
|
type: annotation
|
|||
|
url: /annotations/2022/12/07/1670414142
|
|||
|
|
|||
|
---
|
|||
|
|
|||
|
|
|||
|
|
|||
|
<blockquote>We test this hypothesis by training a predicted compute-optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and4× more more data. Chinchilla uniformly and significantly outperforms Gopher (280B), GPT-3 (175B),Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks.This also means that Chinchilla uses substantially less compute for fine-tuning and inference, greatlyfacilitating downstream usage. As a highlight, Chinchilla reaches a state-of-the-art average accuracy of67.5% on the MMLU benchmark, greater than a 7% improvement over Gopher</blockquote>By using more data on a smaller language model the authors were able to achieve better performance than with the larger models - this reduces the cost of using the model for inference.
|