Add 'brainsteam/content/annotations/2023/03/21/1679428744.md'

2023-03-21 20:00:07 +00:00 · 2023-03-21 20:00:07 +00:00 · 275e24fea8
parent bfddd5bab2
commit 275e24fea8
1 changed files with 66 additions and 0 deletions
--- a/brainsteam/content/annotations/2023/03/21/1679428744.md
+++ b/brainsteam/content/annotations/2023/03/21/1679428744.md
@ -0,0 +1,66 @@
 ---
 date: '2023-03-21T19:59:04'
 hypothesis-meta:
  created: '2023-03-21T19:59:04.177001+00:00'
  document:
    title:
    - 2303.09752.pdf
  flagged: false
  group: __world__
  hidden: false
  id: 1MB9BMgiEe27GS99BvTIlA
  links:
    html: https://hypothes.is/a/1MB9BMgiEe27GS99BvTIlA
    incontext: https://hyp.is/1MB9BMgiEe27GS99BvTIlA/arxiv.org/pdf/2303.09752.pdf
    json: https://hypothes.is/api/annotations/1MB9BMgiEe27GS99BvTIlA
  permissions:
    admin:
    - acct:ravenscroftj@hypothes.is
    delete:
    - acct:ravenscroftj@hypothes.is
    read:
    - group:__world__
    update:
    - acct:ravenscroftj@hypothes.is
  tags:
  - llm
  - attention
  - long-documents
  target:
  - selector:
    - end: 1989
      start: 1515
      type: TextPositionSelector
    - exact: "Over the past few years, many \u201Cefficient Trans-former\u201D approaches\
        \ have been proposed that re-duce the cost of the attention mechanism over\
        \ longinputs (Child et al., 2019; Ainslie et al., 2020; Belt-agy et al., 2020;\
        \ Zaheer et al., 2020; Wang et al.,2020; Tay et al., 2021; Guo et al., 2022).\
        \ However,especially for larger models, the feedforward andprojection layers\
        \ actually make up the majority ofthe computational burden and can render\
        \ process-ing long inputs intractable"
      prefix: ' be applied to each input token.'
      suffix: ".\u2217Author contributions are outli"
      type: TextQuoteSelector
    source: https://arxiv.org/pdf/2303.09752.pdf
  text: Recent improvements in transformers for long documents have focused on efficiencies
    in the attention mechanism but the feed-forward and projection layers are still
    expensive for long docs
  updated: '2023-03-21T19:59:04.177001+00:00'
  uri: https://arxiv.org/pdf/2303.09752.pdf
  user: acct:ravenscroftj@hypothes.is
  user_info:
    display_name: James Ravenscroft
 in-reply-to: https://arxiv.org/pdf/2303.09752.pdf
 tags:
 - llm
 - attention
 - long-documents
 - hypothesis
 type: annotation
 url: /annotations/2023/03/21/1679428744
 ---
 <blockquote>Over the past few years, many “efficient Trans-former” approaches have been proposed that re-duce the cost of the attention mechanism over longinputs (Child et al., 2019; Ainslie et al., 2020; Belt-agy et al., 2020; Zaheer et al., 2020; Wang et al.,2020; Tay et al., 2021; Guo et al., 2022). However,especially for larger models, the feedforward andprojection layers actually make up the majority ofthe computational burden and can render process-ing long inputs intractable</blockquote>Recent improvements in transformers for long documents have focused on efficiencies in the attention mechanism but the feed-forward and projection layers are still expensive for long docs