Add 'brainsteam/content/annotations/2023/03/21/1679428744.md'

2023-03-21 20:00:07 +00:00 · 2023-03-21 20:00:07 +00:00 · 275e24fea8
parent bfddd5bab2
commit 275e24fea8
1 changed files with 66 additions and 0 deletions
--- a/brainsteam/content/annotations/2023/03/21/1679428744.md
+++ b/brainsteam/content/annotations/2023/03/21/1679428744.md
@ -0,0 +1,66 @@
+---
+date: '2023-03-21T19:59:04'
+hypothesis-meta:
+  created: '2023-03-21T19:59:04.177001+00:00'
+  document:
+    title:
+    - 2303.09752.pdf
+  flagged: false
+  group: __world__
+  hidden: false
+  id: 1MB9BMgiEe27GS99BvTIlA
+  links:
+    html: https://hypothes.is/a/1MB9BMgiEe27GS99BvTIlA
+    incontext: https://hyp.is/1MB9BMgiEe27GS99BvTIlA/arxiv.org/pdf/2303.09752.pdf
+    json: https://hypothes.is/api/annotations/1MB9BMgiEe27GS99BvTIlA
+  permissions:
+    admin:
+    - acct:ravenscroftj@hypothes.is
+    delete:
+    - acct:ravenscroftj@hypothes.is
+    read:
+    - group:__world__
+    update:
+    - acct:ravenscroftj@hypothes.is
+  tags:
+  - llm
+  - attention
+  - long-documents
+  target:
+  - selector:
+    - end: 1989
+      start: 1515
+      type: TextPositionSelector
+    - exact: "Over the past few years, many \u201Cefficient Trans-former\u201D approaches\
+        \ have been proposed that re-duce the cost of the attention mechanism over\
+        \ longinputs (Child et al., 2019; Ainslie et al., 2020; Belt-agy et al., 2020;\
+        \ Zaheer et al., 2020; Wang et al.,2020; Tay et al., 2021; Guo et al., 2022).\
+        \ However,especially for larger models, the feedforward andprojection layers\
+        \ actually make up the majority ofthe computational burden and can render\
+        \ process-ing long inputs intractable"
+      prefix: ' be applied to each input token.'
+      suffix: ".\u2217Author contributions are outli"
+      type: TextQuoteSelector
+    source: https://arxiv.org/pdf/2303.09752.pdf
+  text: Recent improvements in transformers for long documents have focused on efficiencies
+    in the attention mechanism but the feed-forward and projection layers are still
+    expensive for long docs
+  updated: '2023-03-21T19:59:04.177001+00:00'
+  uri: https://arxiv.org/pdf/2303.09752.pdf
+  user: acct:ravenscroftj@hypothes.is
+  user_info:
+    display_name: James Ravenscroft
+in-reply-to: https://arxiv.org/pdf/2303.09752.pdf
+tags:
+- llm
+- attention
+- long-documents
+- hypothesis
+type: annotation
+url: /annotations/2023/03/21/1679428744
+
+---
+
+
+
+ <blockquote>Over the past few years, many “efficient Trans-former” approaches have been proposed that re-duce the cost of the attention mechanism over longinputs (Child et al., 2019; Ainslie et al., 2020; Belt-agy et al., 2020; Zaheer et al., 2020; Wang et al.,2020; Tay et al., 2021; Guo et al., 2022). However,especially for larger models, the feedforward andprojection layers actually make up the majority ofthe computational burden and can render process-ing long inputs intractable</blockquote>Recent improvements in transformers for long documents have focused on efficiencies in the attention mechanism but the feed-forward and projection layers are still expensive for long docs