Add 'brainsteam/content/annotations/2022/12/07/1670414142.md'

2022-12-07 12:00:04 +00:00 · 2022-12-07 12:00:04 +00:00 · 18ffbb7cd9
parent 3d5bc94f64
commit 18ffbb7cd9
1 changed files with 66 additions and 0 deletions
--- a/brainsteam/content/annotations/2022/12/07/1670414142.md
+++ b/brainsteam/content/annotations/2022/12/07/1670414142.md
@ -0,0 +1,66 @@
+---
+date: '2022-12-07T11:55:42'
+hypothesis-meta:
+  created: '2022-12-07T11:55:42.527155+00:00'
+  document:
+    title:
+    - 2203.15556.pdf
+  flagged: false
+  group: __world__
+  hidden: false
+  id: E3TX9nYmEe2IOgdyjyKG9w
+  links:
+    html: https://hypothes.is/a/E3TX9nYmEe2IOgdyjyKG9w
+    incontext: https://hyp.is/E3TX9nYmEe2IOgdyjyKG9w/arxiv.org/pdf/2203.15556.pdf
+    json: https://hypothes.is/api/annotations/E3TX9nYmEe2IOgdyjyKG9w
+  permissions:
+    admin:
+    - acct:ravenscroftj@hypothes.is
+    delete:
+    - acct:ravenscroftj@hypothes.is
+    read:
+    - group:__world__
+    update:
+    - acct:ravenscroftj@hypothes.is
+  tags:
+  - nlproc
+  - efficient ml
+  target:
+  - selector:
+    - end: 1689
+      start: 1063
+      type: TextPositionSelector
+    - exact: "We test this hypothesis by training a predicted compute-optimal model,\
+        \ Chinchilla, that uses the same compute budget as Gopher but with 70B parameters\
+        \ and4\xD7 more more data. Chinchilla uniformly and significantly outperforms\
+        \ Gopher (280B), GPT-3 (175B),Jurassic-1 (178B), and Megatron-Turing NLG (530B)\
+        \ on a large range of downstream evaluation tasks.This also means that Chinchilla\
+        \ uses substantially less compute for fine-tuning and inference, greatlyfacilitating\
+        \ downstream usage. As a highlight, Chinchilla reaches a state-of-the-art\
+        \ average accuracy of67.5% on the MMLU benchmark, greater than a 7% improvement\
+        \ over Gopher"
+      prefix: ' tokens should also be doubled. '
+      suffix: .1. IntroductionRecently a serie
+      type: TextQuoteSelector
+    source: https://arxiv.org/pdf/2203.15556.pdf
+  text: By using more data on a smaller language model the authors were able to achieve
+    better performance than with the larger models - this reduces the cost of using
+    the model for inference.
+  updated: '2022-12-07T11:55:42.527155+00:00'
+  uri: https://arxiv.org/pdf/2203.15556.pdf
+  user: acct:ravenscroftj@hypothes.is
+  user_info:
+    display_name: James Ravenscroft
+in-reply-to: https://arxiv.org/pdf/2203.15556.pdf
+tags:
+- nlproc
+- efficient ml
+- hypothesis
+type: annotation
+url: /annotations/2022/12/07/1670414142
+
+---
+
+
+
+ <blockquote>We test this hypothesis by training a predicted compute-optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and4× more more data. Chinchilla uniformly and significantly outperforms Gopher (280B), GPT-3 (175B),Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks.This also means that Chinchilla uses substantially less compute for fine-tuning and inference, greatlyfacilitating downstream usage. As a highlight, Chinchilla reaches a state-of-the-art average accuracy of67.5% on the MMLU benchmark, greater than a 7% improvement over Gopher</blockquote>By using more data on a smaller language model the authors were able to achieve better performance than with the larger models - this reduces the cost of using the model for inference.