Add 'brainsteam/content/annotations/2022/12/07/1670414142.md'
continuous-integration/drone/push Build is passing
Details
continuous-integration/drone/push Build is passing
Details
This commit is contained in:
parent
3d5bc94f64
commit
18ffbb7cd9
|
@ -0,0 +1,66 @@
|
|||
---
|
||||
date: '2022-12-07T11:55:42'
|
||||
hypothesis-meta:
|
||||
created: '2022-12-07T11:55:42.527155+00:00'
|
||||
document:
|
||||
title:
|
||||
- 2203.15556.pdf
|
||||
flagged: false
|
||||
group: __world__
|
||||
hidden: false
|
||||
id: E3TX9nYmEe2IOgdyjyKG9w
|
||||
links:
|
||||
html: https://hypothes.is/a/E3TX9nYmEe2IOgdyjyKG9w
|
||||
incontext: https://hyp.is/E3TX9nYmEe2IOgdyjyKG9w/arxiv.org/pdf/2203.15556.pdf
|
||||
json: https://hypothes.is/api/annotations/E3TX9nYmEe2IOgdyjyKG9w
|
||||
permissions:
|
||||
admin:
|
||||
- acct:ravenscroftj@hypothes.is
|
||||
delete:
|
||||
- acct:ravenscroftj@hypothes.is
|
||||
read:
|
||||
- group:__world__
|
||||
update:
|
||||
- acct:ravenscroftj@hypothes.is
|
||||
tags:
|
||||
- nlproc
|
||||
- efficient ml
|
||||
target:
|
||||
- selector:
|
||||
- end: 1689
|
||||
start: 1063
|
||||
type: TextPositionSelector
|
||||
- exact: "We test this hypothesis by training a predicted compute-optimal model,\
|
||||
\ Chinchilla, that uses the same compute budget as Gopher but with 70B parameters\
|
||||
\ and4\xD7 more more data. Chinchilla uniformly and significantly outperforms\
|
||||
\ Gopher (280B), GPT-3 (175B),Jurassic-1 (178B), and Megatron-Turing NLG (530B)\
|
||||
\ on a large range of downstream evaluation tasks.This also means that Chinchilla\
|
||||
\ uses substantially less compute for fine-tuning and inference, greatlyfacilitating\
|
||||
\ downstream usage. As a highlight, Chinchilla reaches a state-of-the-art\
|
||||
\ average accuracy of67.5% on the MMLU benchmark, greater than a 7% improvement\
|
||||
\ over Gopher"
|
||||
prefix: ' tokens should also be doubled. '
|
||||
suffix: .1. IntroductionRecently a serie
|
||||
type: TextQuoteSelector
|
||||
source: https://arxiv.org/pdf/2203.15556.pdf
|
||||
text: By using more data on a smaller language model the authors were able to achieve
|
||||
better performance than with the larger models - this reduces the cost of using
|
||||
the model for inference.
|
||||
updated: '2022-12-07T11:55:42.527155+00:00'
|
||||
uri: https://arxiv.org/pdf/2203.15556.pdf
|
||||
user: acct:ravenscroftj@hypothes.is
|
||||
user_info:
|
||||
display_name: James Ravenscroft
|
||||
in-reply-to: https://arxiv.org/pdf/2203.15556.pdf
|
||||
tags:
|
||||
- nlproc
|
||||
- efficient ml
|
||||
- hypothesis
|
||||
type: annotation
|
||||
url: /annotations/2022/12/07/1670414142
|
||||
|
||||
---
|
||||
|
||||
|
||||
|
||||
<blockquote>We test this hypothesis by training a predicted compute-optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and4× more more data. Chinchilla uniformly and significantly outperforms Gopher (280B), GPT-3 (175B),Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks.This also means that Chinchilla uses substantially less compute for fine-tuning and inference, greatlyfacilitating downstream usage. As a highlight, Chinchilla reaches a state-of-the-art average accuracy of67.5% on the MMLU benchmark, greater than a 7% improvement over Gopher</blockquote>By using more data on a smaller language model the authors were able to achieve better performance than with the larger models - this reduces the cost of using the model for inference.
|
Loading…
Reference in New Issue