brainsteam.co.uk/brainsteam/content/annotations/2022/12/07/1670414142.md

66 lines
3.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
date: '2022-12-07T11:55:42'
hypothesis-meta:
created: '2022-12-07T11:55:42.527155+00:00'
document:
title:
- 2203.15556.pdf
flagged: false
group: __world__
hidden: false
id: E3TX9nYmEe2IOgdyjyKG9w
links:
html: https://hypothes.is/a/E3TX9nYmEe2IOgdyjyKG9w
incontext: https://hyp.is/E3TX9nYmEe2IOgdyjyKG9w/arxiv.org/pdf/2203.15556.pdf
json: https://hypothes.is/api/annotations/E3TX9nYmEe2IOgdyjyKG9w
permissions:
admin:
- acct:ravenscroftj@hypothes.is
delete:
- acct:ravenscroftj@hypothes.is
read:
- group:__world__
update:
- acct:ravenscroftj@hypothes.is
tags:
- nlproc
- efficient ml
target:
- selector:
- end: 1689
start: 1063
type: TextPositionSelector
- exact: "We test this hypothesis by training a predicted compute-optimal model,\
\ Chinchilla, that uses the same compute budget as Gopher but with 70B parameters\
\ and4\xD7 more more data. Chinchilla uniformly and significantly outperforms\
\ Gopher (280B), GPT-3 (175B),Jurassic-1 (178B), and Megatron-Turing NLG (530B)\
\ on a large range of downstream evaluation tasks.This also means that Chinchilla\
\ uses substantially less compute for fine-tuning and inference, greatlyfacilitating\
\ downstream usage. As a highlight, Chinchilla reaches a state-of-the-art\
\ average accuracy of67.5% on the MMLU benchmark, greater than a 7% improvement\
\ over Gopher"
prefix: ' tokens should also be doubled. '
suffix: .1. IntroductionRecently a serie
type: TextQuoteSelector
source: https://arxiv.org/pdf/2203.15556.pdf
text: By using more data on a smaller language model the authors were able to achieve
better performance than with the larger models - this reduces the cost of using
the model for inference.
updated: '2022-12-07T11:55:42.527155+00:00'
uri: https://arxiv.org/pdf/2203.15556.pdf
user: acct:ravenscroftj@hypothes.is
user_info:
display_name: James Ravenscroft
in-reply-to: https://arxiv.org/pdf/2203.15556.pdf
tags:
- nlproc
- efficient ml
- hypothesis
type: annotation
url: /annotations/2022/12/07/1670414142
---
<blockquote>We test this hypothesis by training a predicted compute-optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and4× more more data. Chinchilla uniformly and significantly outperforms Gopher (280B), GPT-3 (175B),Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks.This also means that Chinchilla uses substantially less compute for fine-tuning and inference, greatlyfacilitating downstream usage. As a highlight, Chinchilla reaches a state-of-the-art average accuracy of67.5% on the MMLU benchmark, greater than a 7% improvement over Gopher</blockquote>By using more data on a smaller language model the authors were able to achieve better performance than with the larger models - this reduces the cost of using the model for inference.