Add 'brainsteam/content/annotations/2023/03/21/1679428744.md'
continuous-integration/drone/push Build is passing
Details
continuous-integration/drone/push Build is passing
Details
This commit is contained in:
parent
bfddd5bab2
commit
275e24fea8
|
@ -0,0 +1,66 @@
|
||||||
|
---
|
||||||
|
date: '2023-03-21T19:59:04'
|
||||||
|
hypothesis-meta:
|
||||||
|
created: '2023-03-21T19:59:04.177001+00:00'
|
||||||
|
document:
|
||||||
|
title:
|
||||||
|
- 2303.09752.pdf
|
||||||
|
flagged: false
|
||||||
|
group: __world__
|
||||||
|
hidden: false
|
||||||
|
id: 1MB9BMgiEe27GS99BvTIlA
|
||||||
|
links:
|
||||||
|
html: https://hypothes.is/a/1MB9BMgiEe27GS99BvTIlA
|
||||||
|
incontext: https://hyp.is/1MB9BMgiEe27GS99BvTIlA/arxiv.org/pdf/2303.09752.pdf
|
||||||
|
json: https://hypothes.is/api/annotations/1MB9BMgiEe27GS99BvTIlA
|
||||||
|
permissions:
|
||||||
|
admin:
|
||||||
|
- acct:ravenscroftj@hypothes.is
|
||||||
|
delete:
|
||||||
|
- acct:ravenscroftj@hypothes.is
|
||||||
|
read:
|
||||||
|
- group:__world__
|
||||||
|
update:
|
||||||
|
- acct:ravenscroftj@hypothes.is
|
||||||
|
tags:
|
||||||
|
- llm
|
||||||
|
- attention
|
||||||
|
- long-documents
|
||||||
|
target:
|
||||||
|
- selector:
|
||||||
|
- end: 1989
|
||||||
|
start: 1515
|
||||||
|
type: TextPositionSelector
|
||||||
|
- exact: "Over the past few years, many \u201Cefficient Trans-former\u201D approaches\
|
||||||
|
\ have been proposed that re-duce the cost of the attention mechanism over\
|
||||||
|
\ longinputs (Child et al., 2019; Ainslie et al., 2020; Belt-agy et al., 2020;\
|
||||||
|
\ Zaheer et al., 2020; Wang et al.,2020; Tay et al., 2021; Guo et al., 2022).\
|
||||||
|
\ However,especially for larger models, the feedforward andprojection layers\
|
||||||
|
\ actually make up the majority ofthe computational burden and can render\
|
||||||
|
\ process-ing long inputs intractable"
|
||||||
|
prefix: ' be applied to each input token.'
|
||||||
|
suffix: ".\u2217Author contributions are outli"
|
||||||
|
type: TextQuoteSelector
|
||||||
|
source: https://arxiv.org/pdf/2303.09752.pdf
|
||||||
|
text: Recent improvements in transformers for long documents have focused on efficiencies
|
||||||
|
in the attention mechanism but the feed-forward and projection layers are still
|
||||||
|
expensive for long docs
|
||||||
|
updated: '2023-03-21T19:59:04.177001+00:00'
|
||||||
|
uri: https://arxiv.org/pdf/2303.09752.pdf
|
||||||
|
user: acct:ravenscroftj@hypothes.is
|
||||||
|
user_info:
|
||||||
|
display_name: James Ravenscroft
|
||||||
|
in-reply-to: https://arxiv.org/pdf/2303.09752.pdf
|
||||||
|
tags:
|
||||||
|
- llm
|
||||||
|
- attention
|
||||||
|
- long-documents
|
||||||
|
- hypothesis
|
||||||
|
type: annotation
|
||||||
|
url: /annotations/2023/03/21/1679428744
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
<blockquote>Over the past few years, many “efficient Trans-former” approaches have been proposed that re-duce the cost of the attention mechanism over longinputs (Child et al., 2019; Ainslie et al., 2020; Belt-agy et al., 2020; Zaheer et al., 2020; Wang et al.,2020; Tay et al., 2021; Guo et al., 2022). However,especially for larger models, the feedforward andprojection layers actually make up the majority ofthe computational burden and can render process-ing long inputs intractable</blockquote>Recent improvements in transformers for long documents have focused on efficiencies in the attention mechanism but the feed-forward and projection layers are still expensive for long docs
|
Loading…
Reference in New Issue