brainsteam.co.uk/brainsteam/content/annotations/2022/12/07/1670414142.md

3.0 KiB
Raw Blame History

date hypothesis-meta in-reply-to tags type url
2022-12-07T11:55:42
created document flagged group hidden id links permissions tags target text updated uri user user_info
2022-12-07T11:55:42.527155+00:00
title
2203.15556.pdf
false __world__ false E3TX9nYmEe2IOgdyjyKG9w
html incontext json
https://hypothes.is/a/E3TX9nYmEe2IOgdyjyKG9w https://hyp.is/E3TX9nYmEe2IOgdyjyKG9w/arxiv.org/pdf/2203.15556.pdf https://hypothes.is/api/annotations/E3TX9nYmEe2IOgdyjyKG9w
admin delete read update
acct:ravenscroftj@hypothes.is
acct:ravenscroftj@hypothes.is
group:__world__
acct:ravenscroftj@hypothes.is
nlproc
efficient ml
selector source
end start type
1689 1063 TextPositionSelector
exact prefix suffix type
We test this hypothesis by training a predicted compute-optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and4× more more data. Chinchilla uniformly and significantly outperforms Gopher (280B), GPT-3 (175B),Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks.This also means that Chinchilla uses substantially less compute for fine-tuning and inference, greatlyfacilitating downstream usage. As a highlight, Chinchilla reaches a state-of-the-art average accuracy of67.5% on the MMLU benchmark, greater than a 7% improvement over Gopher tokens should also be doubled. .1. IntroductionRecently a serie TextQuoteSelector
https://arxiv.org/pdf/2203.15556.pdf
By using more data on a smaller language model the authors were able to achieve better performance than with the larger models - this reduces the cost of using the model for inference. 2022-12-07T11:55:42.527155+00:00 https://arxiv.org/pdf/2203.15556.pdf acct:ravenscroftj@hypothes.is
display_name
James Ravenscroft
https://arxiv.org/pdf/2203.15556.pdf
nlproc
efficient ml
hypothesis
annotation /annotations/2022/12/07/1670414142
We test this hypothesis by training a predicted compute-optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and4× more more data. Chinchilla uniformly and significantly outperforms Gopher (280B), GPT-3 (175B),Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks.This also means that Chinchilla uses substantially less compute for fine-tuning and inference, greatlyfacilitating downstream usage. As a highlight, Chinchilla reaches a state-of-the-art average accuracy of67.5% on the MMLU benchmark, greater than a 7% improvement over Gopher
By using more data on a smaller language model the authors were able to achieve better performance than with the larger models - this reduces the cost of using the model for inference.