From 275e24fea8a45f3aa4d34d4ceb6ab088e9a31ed9 Mon Sep 17 00:00:00 2001 From: ravenscroftj Date: Tue, 21 Mar 2023 20:00:07 +0000 Subject: [PATCH] Add 'brainsteam/content/annotations/2023/03/21/1679428744.md' --- .../annotations/2023/03/21/1679428744.md | 66 +++++++++++++++++++ 1 file changed, 66 insertions(+) create mode 100644 brainsteam/content/annotations/2023/03/21/1679428744.md diff --git a/brainsteam/content/annotations/2023/03/21/1679428744.md b/brainsteam/content/annotations/2023/03/21/1679428744.md new file mode 100644 index 0000000..333e1f0 --- /dev/null +++ b/brainsteam/content/annotations/2023/03/21/1679428744.md @@ -0,0 +1,66 @@ +--- +date: '2023-03-21T19:59:04' +hypothesis-meta: + created: '2023-03-21T19:59:04.177001+00:00' + document: + title: + - 2303.09752.pdf + flagged: false + group: __world__ + hidden: false + id: 1MB9BMgiEe27GS99BvTIlA + links: + html: https://hypothes.is/a/1MB9BMgiEe27GS99BvTIlA + incontext: https://hyp.is/1MB9BMgiEe27GS99BvTIlA/arxiv.org/pdf/2303.09752.pdf + json: https://hypothes.is/api/annotations/1MB9BMgiEe27GS99BvTIlA + permissions: + admin: + - acct:ravenscroftj@hypothes.is + delete: + - acct:ravenscroftj@hypothes.is + read: + - group:__world__ + update: + - acct:ravenscroftj@hypothes.is + tags: + - llm + - attention + - long-documents + target: + - selector: + - end: 1989 + start: 1515 + type: TextPositionSelector + - exact: "Over the past few years, many \u201Cefficient Trans-former\u201D approaches\ + \ have been proposed that re-duce the cost of the attention mechanism over\ + \ longinputs (Child et al., 2019; Ainslie et al., 2020; Belt-agy et al., 2020;\ + \ Zaheer et al., 2020; Wang et al.,2020; Tay et al., 2021; Guo et al., 2022).\ + \ However,especially for larger models, the feedforward andprojection layers\ + \ actually make up the majority ofthe computational burden and can render\ + \ process-ing long inputs intractable" + prefix: ' be applied to each input token.' + suffix: ".\u2217Author contributions are outli" + type: TextQuoteSelector + source: https://arxiv.org/pdf/2303.09752.pdf + text: Recent improvements in transformers for long documents have focused on efficiencies + in the attention mechanism but the feed-forward and projection layers are still + expensive for long docs + updated: '2023-03-21T19:59:04.177001+00:00' + uri: https://arxiv.org/pdf/2303.09752.pdf + user: acct:ravenscroftj@hypothes.is + user_info: + display_name: James Ravenscroft +in-reply-to: https://arxiv.org/pdf/2303.09752.pdf +tags: +- llm +- attention +- long-documents +- hypothesis +type: annotation +url: /annotations/2023/03/21/1679428744 + +--- + + + +
Over the past few years, many “efficient Trans-former” approaches have been proposed that re-duce the cost of the attention mechanism over longinputs (Child et al., 2019; Ainslie et al., 2020; Belt-agy et al., 2020; Zaheer et al., 2020; Wang et al.,2020; Tay et al., 2021; Guo et al., 2022). However,especially for larger models, the feedforward andprojection layers actually make up the majority ofthe computational burden and can render process-ing long inputs intractable
Recent improvements in transformers for long documents have focused on efficiencies in the attention mechanism but the feed-forward and projection layers are still expensive for long docs \ No newline at end of file