brainsteam.co.uk/brainsteam/content/annotations/2023/03/21/1679428744.md

2.7 KiB
Raw Blame History

date hypothesis-meta in-reply-to tags type url
2023-03-21T19:59:04
created document flagged group hidden id links permissions tags target text updated uri user user_info
2023-03-21T19:59:04.177001+00:00
title
2303.09752.pdf
false __world__ false 1MB9BMgiEe27GS99BvTIlA
html incontext json
https://hypothes.is/a/1MB9BMgiEe27GS99BvTIlA https://hyp.is/1MB9BMgiEe27GS99BvTIlA/arxiv.org/pdf/2303.09752.pdf https://hypothes.is/api/annotations/1MB9BMgiEe27GS99BvTIlA
admin delete read update
acct:ravenscroftj@hypothes.is
acct:ravenscroftj@hypothes.is
group:__world__
acct:ravenscroftj@hypothes.is
llm
attention
long-documents
selector source
end start type
1989 1515 TextPositionSelector
exact prefix suffix type
Over the past few years, many “efficient Trans-former” approaches have been proposed that re-duce the cost of the attention mechanism over longinputs (Child et al., 2019; Ainslie et al., 2020; Belt-agy et al., 2020; Zaheer et al., 2020; Wang et al.,2020; Tay et al., 2021; Guo et al., 2022). However,especially for larger models, the feedforward andprojection layers actually make up the majority ofthe computational burden and can render process-ing long inputs intractable be applied to each input token. .Author contributions are outli TextQuoteSelector
https://arxiv.org/pdf/2303.09752.pdf
Recent improvements in transformers for long documents have focused on efficiencies in the attention mechanism but the feed-forward and projection layers are still expensive for long docs 2023-03-21T19:59:04.177001+00:00 https://arxiv.org/pdf/2303.09752.pdf acct:ravenscroftj@hypothes.is
display_name
James Ravenscroft
https://arxiv.org/pdf/2303.09752.pdf
llm
attention
long-documents
hypothesis
annotation /annotations/2023/03/21/1679428744
Over the past few years, many “efficient Trans-former” approaches have been proposed that re-duce the cost of the attention mechanism over longinputs (Child et al., 2019; Ainslie et al., 2020; Belt-agy et al., 2020; Zaheer et al., 2020; Wang et al.,2020; Tay et al., 2021; Guo et al., 2022). However,especially for larger models, the feedforward andprojection layers actually make up the majority ofthe computational burden and can render process-ing long inputs intractable
Recent improvements in transformers for long documents have focused on efficiencies in the attention mechanism but the feed-forward and projection layers are still expensive for long docs