diff --git a/brainsteam/content/annotations/2022/12/07/1670414142.md b/brainsteam/content/annotations/2022/12/07/1670414142.md new file mode 100644 index 0000000..2d5e360 --- /dev/null +++ b/brainsteam/content/annotations/2022/12/07/1670414142.md @@ -0,0 +1,66 @@ +--- +date: '2022-12-07T11:55:42' +hypothesis-meta: + created: '2022-12-07T11:55:42.527155+00:00' + document: + title: + - 2203.15556.pdf + flagged: false + group: __world__ + hidden: false + id: E3TX9nYmEe2IOgdyjyKG9w + links: + html: https://hypothes.is/a/E3TX9nYmEe2IOgdyjyKG9w + incontext: https://hyp.is/E3TX9nYmEe2IOgdyjyKG9w/arxiv.org/pdf/2203.15556.pdf + json: https://hypothes.is/api/annotations/E3TX9nYmEe2IOgdyjyKG9w + permissions: + admin: + - acct:ravenscroftj@hypothes.is + delete: + - acct:ravenscroftj@hypothes.is + read: + - group:__world__ + update: + - acct:ravenscroftj@hypothes.is + tags: + - nlproc + - efficient ml + target: + - selector: + - end: 1689 + start: 1063 + type: TextPositionSelector + - exact: "We test this hypothesis by training a predicted compute-optimal model,\ + \ Chinchilla, that uses the same compute budget as Gopher but with 70B parameters\ + \ and4\xD7 more more data. Chinchilla uniformly and significantly outperforms\ + \ Gopher (280B), GPT-3 (175B),Jurassic-1 (178B), and Megatron-Turing NLG (530B)\ + \ on a large range of downstream evaluation tasks.This also means that Chinchilla\ + \ uses substantially less compute for fine-tuning and inference, greatlyfacilitating\ + \ downstream usage. As a highlight, Chinchilla reaches a state-of-the-art\ + \ average accuracy of67.5% on the MMLU benchmark, greater than a 7% improvement\ + \ over Gopher" + prefix: ' tokens should also be doubled. ' + suffix: .1. IntroductionRecently a serie + type: TextQuoteSelector + source: https://arxiv.org/pdf/2203.15556.pdf + text: By using more data on a smaller language model the authors were able to achieve + better performance than with the larger models - this reduces the cost of using + the model for inference. + updated: '2022-12-07T11:55:42.527155+00:00' + uri: https://arxiv.org/pdf/2203.15556.pdf + user: acct:ravenscroftj@hypothes.is + user_info: + display_name: James Ravenscroft +in-reply-to: https://arxiv.org/pdf/2203.15556.pdf +tags: +- nlproc +- efficient ml +- hypothesis +type: annotation +url: /annotations/2022/12/07/1670414142 + +--- + + + +
We test this hypothesis by training a predicted compute-optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and4× more more data. Chinchilla uniformly and significantly outperforms Gopher (280B), GPT-3 (175B),Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks.This also means that Chinchilla uses substantially less compute for fine-tuning and inference, greatlyfacilitating downstream usage. As a highlight, Chinchilla reaches a state-of-the-art average accuracy of67.5% on the MMLU benchmark, greater than a 7% improvement over Gopher
By using more data on a smaller language model the authors were able to achieve better performance than with the larger models - this reduces the cost of using the model for inference. \ No newline at end of file