brainsteam.co.uk/brainsteam/content/annotations/2023/03/21/1679428744.md at 11405c148a65cbc6502dfe36a0c6b8aaf3a644a4

2.7 KiB

Raw Blame History

date

hypothesis-meta

in-reply-to

tags

target

text

updated

uri

user

user_info

2023-03-21T19:59:04.177001+00:00

title

2303.09752.pdf

false

__world__

false

1MB9BMgiEe27GS99BvTIlA

html	incontext	json
https://hypothes.is/a/1MB9BMgiEe27GS99BvTIlA	https://hyp.is/1MB9BMgiEe27GS99BvTIlA/arxiv.org/pdf/2303.09752.pdf	https://hypothes.is/api/annotations/1MB9BMgiEe27GS99BvTIlA

admin

delete

read

update

acct:ravenscroftj@hypothes.is

group:__world__

acct:ravenscroftj@hypothes.is

llm

attention

long-documents

selector

source

end	start	type
1989	1515	TextPositionSelector

exact	prefix	suffix	type
Over the past few years, many “efficient Trans-former” approaches have been proposed that re-duce the cost of the attention mechanism over longinputs (Child et al., 2019; Ainslie et al., 2020; Belt-agy et al., 2020; Zaheer et al., 2020; Wang et al.,2020; Tay et al., 2021; Guo et al., 2022). However,especially for larger models, the feedforward andprojection layers actually make up the majority ofthe computational burden and can render process-ing long inputs intractable	be applied to each input token.	.∗Author contributions are outli	TextQuoteSelector

https://arxiv.org/pdf/2303.09752.pdf

Recent improvements in transformers for long documents have focused on efficiencies in the attention mechanism but the feed-forward and projection layers are still expensive for long docs

2023-03-21T19:59:04.177001+00:00

https://arxiv.org/pdf/2303.09752.pdf

acct:ravenscroftj@hypothes.is

display_name
James Ravenscroft

https://arxiv.org/pdf/2303.09752.pdf

llm

attention

long-documents

hypothesis

annotation

/annotations/2023/03/21/1679428744

Over the past few years, many “efficient Trans-former” approaches have been proposed that re-duce the cost of the attention mechanism over longinputs (Child et al., 2019; Ainslie et al., 2020; Belt-agy et al., 2020; Zaheer et al., 2020; Wang et al.,2020; Tay et al., 2021; Guo et al., 2022). However,especially for larger models, the feedforward andprojection layers actually make up the majority ofthe computational burden and can render process-ing long inputs intractable

Recent improvements in transformers for long documents have focused on efficiencies in the attention mechanism but the feed-forward and projection layers are still expensive for long docs

2.7 KiB Raw Blame History Unescape Escape

2.7 KiB

Raw Blame History