2022-12-07T11:55:42.527155+00:00 |
|
false |
__world__ |
false |
E3TX9nYmEe2IOgdyjyKG9w |
|
admin |
delete |
read |
update |
acct:ravenscroftj@hypothes.is |
|
acct:ravenscroftj@hypothes.is |
|
|
acct:ravenscroftj@hypothes.is |
|
|
|
selector |
source |
end |
start |
type |
1689 |
1063 |
TextPositionSelector |
|
exact |
prefix |
suffix |
type |
We test this hypothesis by training a predicted compute-optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and4× more more data. Chinchilla uniformly and significantly outperforms Gopher (280B), GPT-3 (175B),Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks.This also means that Chinchilla uses substantially less compute for fine-tuning and inference, greatlyfacilitating downstream usage. As a highlight, Chinchilla reaches a state-of-the-art average accuracy of67.5% on the MMLU benchmark, greater than a 7% improvement over Gopher |
tokens should also be doubled. |
.1. IntroductionRecently a serie |
TextQuoteSelector |
|
|
https://arxiv.org/pdf/2203.15556.pdf |
|
|
By using more data on a smaller language model the authors were able to achieve better performance than with the larger models - this reduces the cost of using the model for inference. |
2022-12-07T11:55:42.527155+00:00 |
https://arxiv.org/pdf/2203.15556.pdf |
acct:ravenscroftj@hypothes.is |
display_name |
James Ravenscroft |
|