2023-03-21T19:59:04.177001+00:00 |
|
false |
__world__ |
false |
1MB9BMgiEe27GS99BvTIlA |
|
admin |
delete |
read |
update |
acct:ravenscroftj@hypothes.is |
|
acct:ravenscroftj@hypothes.is |
|
|
acct:ravenscroftj@hypothes.is |
|
|
llm |
attention |
long-documents |
|
selector |
source |
end |
start |
type |
1989 |
1515 |
TextPositionSelector |
|
exact |
prefix |
suffix |
type |
Over the past few years, many “efficient Trans-former” approaches have been proposed that re-duce the cost of the attention mechanism over longinputs (Child et al., 2019; Ainslie et al., 2020; Belt-agy et al., 2020; Zaheer et al., 2020; Wang et al.,2020; Tay et al., 2021; Guo et al., 2022). However,especially for larger models, the feedforward andprojection layers actually make up the majority ofthe computational burden and can render process-ing long inputs intractable |
be applied to each input token. |
.∗Author contributions are outli |
TextQuoteSelector |
|
|
https://arxiv.org/pdf/2303.09752.pdf |
|
|
Recent improvements in transformers for long documents have focused on efficiencies in the attention mechanism but the feed-forward and projection layers are still expensive for long docs |
2023-03-21T19:59:04.177001+00:00 |
https://arxiv.org/pdf/2303.09752.pdf |
acct:ravenscroftj@hypothes.is |
display_name |
James Ravenscroft |
|