brainsteam.co.uk/brainsteam/content/annotations/2022/12/07/1670414142.md at 8ba49d27127540df99ef72d0c281b022e16fe3c3

3.0 KiB

Raw Blame History

date

hypothesis-meta

in-reply-to

tags

target

text

updated

uri

user

user_info

2022-12-07T11:55:42.527155+00:00

title

2203.15556.pdf

false

__world__

false

E3TX9nYmEe2IOgdyjyKG9w

html	incontext	json
https://hypothes.is/a/E3TX9nYmEe2IOgdyjyKG9w	https://hyp.is/E3TX9nYmEe2IOgdyjyKG9w/arxiv.org/pdf/2203.15556.pdf	https://hypothes.is/api/annotations/E3TX9nYmEe2IOgdyjyKG9w

admin

delete

read

update

acct:ravenscroftj@hypothes.is

group:__world__

acct:ravenscroftj@hypothes.is

nlproc

efficient ml

selector

source

end	start	type
1689	1063	TextPositionSelector

exact	prefix	suffix	type
We test this hypothesis by training a predicted compute-optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and4× more more data. Chinchilla uniformly and significantly outperforms Gopher (280B), GPT-3 (175B),Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks.This also means that Chinchilla uses substantially less compute for fine-tuning and inference, greatlyfacilitating downstream usage. As a highlight, Chinchilla reaches a state-of-the-art average accuracy of67.5% on the MMLU benchmark, greater than a 7% improvement over Gopher	tokens should also be doubled.	.1. IntroductionRecently a serie	TextQuoteSelector

https://arxiv.org/pdf/2203.15556.pdf

By using more data on a smaller language model the authors were able to achieve better performance than with the larger models - this reduces the cost of using the model for inference.

2022-12-07T11:55:42.527155+00:00

https://arxiv.org/pdf/2203.15556.pdf

acct:ravenscroftj@hypothes.is

display_name
James Ravenscroft

https://arxiv.org/pdf/2203.15556.pdf

nlproc

efficient ml

hypothesis

annotation

/annotations/2022/12/07/1670414142

We test this hypothesis by training a predicted compute-optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and4× more more data. Chinchilla uniformly and significantly outperforms Gopher (280B), GPT-3 (175B),Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks.This also means that Chinchilla uses substantially less compute for fine-tuning and inference, greatlyfacilitating downstream usage. As a highlight, Chinchilla reaches a state-of-the-art average accuracy of67.5% on the MMLU benchmark, greater than a 7% improvement over Gopher

By using more data on a smaller language model the authors were able to achieve better performance than with the larger models - this reduces the cost of using the model for inference.

3.0 KiB Raw Blame History Unescape Escape

3.0 KiB

Raw Blame History