diff --git a/brainsteam/content/annotations/2023/03/21/1679379947.md b/brainsteam/content/annotations/2023/03/21/1679379947.md new file mode 100644 index 0000000..efae6ed --- /dev/null +++ b/brainsteam/content/annotations/2023/03/21/1679379947.md @@ -0,0 +1,73 @@ +--- +date: '2023-03-21T06:25:47' +hypothesis-meta: + created: '2023-03-21T06:25:47.417575+00:00' + document: + title: + - 'GPT-4 and professional benchmarks: the wrong answer to the wrong question' + flagged: false + group: __world__ + hidden: false + id: N6BVsMexEe2Z4X92AfjYDg + links: + html: https://hypothes.is/a/N6BVsMexEe2Z4X92AfjYDg + incontext: https://hyp.is/N6BVsMexEe2Z4X92AfjYDg/aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks + json: https://hypothes.is/api/annotations/N6BVsMexEe2Z4X92AfjYDg + permissions: + admin: + - acct:ravenscroftj@hypothes.is + delete: + - acct:ravenscroftj@hypothes.is + read: + - group:__world__ + update: + - acct:ravenscroftj@hypothes.is + tags: + - llm + - openai + - gpt + - ModelEvaluation + target: + - selector: + - endContainer: /div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/article[1]/div[4]/div[1]/div[1]/p[4]/span[2] + endOffset: 300 + startContainer: /div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/article[1]/div[4]/div[1]/div[1]/p[4]/span[1] + startOffset: 0 + type: RangeSelector + - end: 5998 + start: 5517 + type: TextPositionSelector + - exact: "To benchmark GPT-4\u2019s coding ability, OpenAI evaluated it on problems\ + \ from Codeforces, a website that hosts coding competitions. Surprisingly,\ + \ Horace He pointed out that GPT-4 solved 10/10 pre-2021 problems and 0/10\ + \ recent problems in the easy category. The training data cutoff for GPT-4\ + \ is September 2021. This strongly suggests that the model is able to memorize\ + \ solutions from its training set \u2014 or at least partly memorize them,\ + \ enough that it can fill in what it can\u2019t recall." + prefix: 'm 1: training data contamination' + suffix: As further evidence for this hyp + type: TextQuoteSelector + source: https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks + text: OpenAI was only able to pass questions available before september 2021 and + failed to answer new questions - strongly suggesting that it has simply memorised + the answers as part of its training + updated: '2023-03-21T06:26:57.441600+00:00' + uri: https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks + user: acct:ravenscroftj@hypothes.is + user_info: + display_name: James Ravenscroft +in-reply-to: https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks +tags: +- llm +- openai +- gpt +- ModelEvaluation +- hypothesis +type: annotation +url: /annotations/2023/03/21/1679379947 + +--- + + + +
To benchmark GPT-4’s coding ability, OpenAI evaluated it on problems from Codeforces, a website that hosts coding competitions. Surprisingly, Horace He pointed out that GPT-4 solved 10/10 pre-2021 problems and 0/10 recent problems in the easy category. The training data cutoff for GPT-4 is September 2021. This strongly suggests that the model is able to memorize solutions from its training set — or at least partly memorize them, enough that it can fill in what it can’t recall.OpenAI was only able to pass questions available before september 2021 and failed to answer new questions - strongly suggesting that it has simply memorised the answers as part of its training \ No newline at end of file diff --git a/brainsteam/content/annotations/2023/03/21/1679380079.md b/brainsteam/content/annotations/2023/03/21/1679380079.md new file mode 100644 index 0000000..24c2751 --- /dev/null +++ b/brainsteam/content/annotations/2023/03/21/1679380079.md @@ -0,0 +1,68 @@ +--- +date: '2023-03-21T06:27:59' +hypothesis-meta: + created: '2023-03-21T06:27:59.825632+00:00' + document: + title: + - 'GPT-4 and professional benchmarks: the wrong answer to the wrong question' + flagged: false + group: __world__ + hidden: false + id: hoqyasexEe2ZnQ_nOVgRxA + links: + html: https://hypothes.is/a/hoqyasexEe2ZnQ_nOVgRxA + incontext: https://hyp.is/hoqyasexEe2ZnQ_nOVgRxA/aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks + json: https://hypothes.is/api/annotations/hoqyasexEe2ZnQ_nOVgRxA + permissions: + admin: + - acct:ravenscroftj@hypothes.is + delete: + - acct:ravenscroftj@hypothes.is + read: + - group:__world__ + update: + - acct:ravenscroftj@hypothes.is + tags: + - openai + - gpt + - ModelEvaluation + target: + - selector: + - endContainer: /div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/article[1]/div[4]/div[1]/div[1]/p[6]/span[2] + endOffset: 42 + startContainer: /div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/article[1]/div[4]/div[1]/div[1]/p[6]/span[1] + startOffset: 0 + type: RangeSelector + - end: 6591 + start: 6238 + type: TextPositionSelector + - exact: 'In fact, we can definitively show that it has memorized problems in + its training set: when prompted with the title of a Codeforces problem, GPT-4 + includes a link to the exact contest where the problem appears (and the round + number is almost correct: it is off by one). Note that GPT-4 cannot access + the Internet, so memorization is the only explanation.' + prefix: the problems after September 12. + suffix: GPT-4 memorizes Codeforces probl + type: TextQuoteSelector + source: https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks + text: GPT4 knows the link to the coding exams that it was evaluated against but + doesn't have "internet access" so it appears to have memorised this as well + updated: '2023-03-21T06:27:59.825632+00:00' + uri: https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks + user: acct:ravenscroftj@hypothes.is + user_info: + display_name: James Ravenscroft +in-reply-to: https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks +tags: +- openai +- gpt +- ModelEvaluation +- hypothesis +type: annotation +url: /annotations/2023/03/21/1679380079 + +--- + + + +
In fact, we can definitively show that it has memorized problems in its training set: when prompted with the title of a Codeforces problem, GPT-4 includes a link to the exact contest where the problem appears (and the round number is almost correct: it is off by one). Note that GPT-4 cannot access the Internet, so memorization is the only explanation.GPT4 knows the link to the coding exams that it was evaluated against but doesn't have "internet access" so it appears to have memorised this as well \ No newline at end of file diff --git a/brainsteam/content/annotations/2023/03/21/1679380149.md b/brainsteam/content/annotations/2023/03/21/1679380149.md new file mode 100644 index 0000000..306dd08 --- /dev/null +++ b/brainsteam/content/annotations/2023/03/21/1679380149.md @@ -0,0 +1,68 @@ +--- +date: '2023-03-21T06:29:09' +hypothesis-meta: + created: '2023-03-21T06:29:09.945605+00:00' + document: + title: + - 'GPT-4 and professional benchmarks: the wrong answer to the wrong question' + flagged: false + group: __world__ + hidden: false + id: sFZzLMexEe2M2r_i759OiA + links: + html: https://hypothes.is/a/sFZzLMexEe2M2r_i759OiA + incontext: https://hyp.is/sFZzLMexEe2M2r_i759OiA/aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks + json: https://hypothes.is/api/annotations/sFZzLMexEe2M2r_i759OiA + permissions: + admin: + - acct:ravenscroftj@hypothes.is + delete: + - acct:ravenscroftj@hypothes.is + read: + - group:__world__ + update: + - acct:ravenscroftj@hypothes.is + tags: + - openai + - gpt + - ModelEvaluation + target: + - selector: + - endContainer: /div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/article[1]/div[4]/div[1]/div[1]/p[8]/span[2] + endOffset: 199 + startContainer: /div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/article[1]/div[4]/div[1]/div[1]/p[8]/span[1] + startOffset: 0 + type: RangeSelector + - end: 7439 + start: 7071 + type: TextPositionSelector + - exact: "Still, we can look for telltale signs. Another symptom of memorization\ + \ is that GPT is highly sensitive to the phrasing of the question. Melanie\ + \ Mitchell gives an example of an MBA test question where changing some details\ + \ in a way that wouldn\u2019t fool a person is enough to fool ChatGPT (running\ + \ GPT-3.5). A more elaborate experiment along these lines would be valuable." + prefix: ' how performance varies by date.' + suffix: "Because of OpenAI\u2019s lack of tran" + type: TextQuoteSelector + source: https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks + text: OpenAI has memorised MBA tests- when these are rephrased or certain details + are changed, the system fails to answer + updated: '2023-03-21T06:29:09.945605+00:00' + uri: https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks + user: acct:ravenscroftj@hypothes.is + user_info: + display_name: James Ravenscroft +in-reply-to: https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks +tags: +- openai +- gpt +- ModelEvaluation +- hypothesis +type: annotation +url: /annotations/2023/03/21/1679380149 + +--- + + + +
Still, we can look for telltale signs. Another symptom of memorization is that GPT is highly sensitive to the phrasing of the question. Melanie Mitchell gives an example of an MBA test question where changing some details in a way that wouldn’t fool a person is enough to fool ChatGPT (running GPT-3.5). A more elaborate experiment along these lines would be valuable.OpenAI has memorised MBA tests- when these are rephrased or certain details are changed, the system fails to answer \ No newline at end of file diff --git a/brainsteam/content/annotations/2023/03/21/1679428744.md b/brainsteam/content/annotations/2023/03/21/1679428744.md new file mode 100644 index 0000000..333e1f0 --- /dev/null +++ b/brainsteam/content/annotations/2023/03/21/1679428744.md @@ -0,0 +1,66 @@ +--- +date: '2023-03-21T19:59:04' +hypothesis-meta: + created: '2023-03-21T19:59:04.177001+00:00' + document: + title: + - 2303.09752.pdf + flagged: false + group: __world__ + hidden: false + id: 1MB9BMgiEe27GS99BvTIlA + links: + html: https://hypothes.is/a/1MB9BMgiEe27GS99BvTIlA + incontext: https://hyp.is/1MB9BMgiEe27GS99BvTIlA/arxiv.org/pdf/2303.09752.pdf + json: https://hypothes.is/api/annotations/1MB9BMgiEe27GS99BvTIlA + permissions: + admin: + - acct:ravenscroftj@hypothes.is + delete: + - acct:ravenscroftj@hypothes.is + read: + - group:__world__ + update: + - acct:ravenscroftj@hypothes.is + tags: + - llm + - attention + - long-documents + target: + - selector: + - end: 1989 + start: 1515 + type: TextPositionSelector + - exact: "Over the past few years, many \u201Cefficient Trans-former\u201D approaches\ + \ have been proposed that re-duce the cost of the attention mechanism over\ + \ longinputs (Child et al., 2019; Ainslie et al., 2020; Belt-agy et al., 2020;\ + \ Zaheer et al., 2020; Wang et al.,2020; Tay et al., 2021; Guo et al., 2022).\ + \ However,especially for larger models, the feedforward andprojection layers\ + \ actually make up the majority ofthe computational burden and can render\ + \ process-ing long inputs intractable" + prefix: ' be applied to each input token.' + suffix: ".\u2217Author contributions are outli" + type: TextQuoteSelector + source: https://arxiv.org/pdf/2303.09752.pdf + text: Recent improvements in transformers for long documents have focused on efficiencies + in the attention mechanism but the feed-forward and projection layers are still + expensive for long docs + updated: '2023-03-21T19:59:04.177001+00:00' + uri: https://arxiv.org/pdf/2303.09752.pdf + user: acct:ravenscroftj@hypothes.is + user_info: + display_name: James Ravenscroft +in-reply-to: https://arxiv.org/pdf/2303.09752.pdf +tags: +- llm +- attention +- long-documents +- hypothesis +type: annotation +url: /annotations/2023/03/21/1679428744 + +--- + + + +
Over the past few years, many “efficient Trans-former” approaches have been proposed that re-duce the cost of the attention mechanism over longinputs (Child et al., 2019; Ainslie et al., 2020; Belt-agy et al., 2020; Zaheer et al., 2020; Wang et al.,2020; Tay et al., 2021; Guo et al., 2022). However,especially for larger models, the feedforward andprojection layers actually make up the majority ofthe computational burden and can render process-ing long inputs intractableRecent improvements in transformers for long documents have focused on efficiencies in the attention mechanism but the feed-forward and projection layers are still expensive for long docs \ No newline at end of file diff --git a/brainsteam/content/annotations/2023/03/21/1679428782.md b/brainsteam/content/annotations/2023/03/21/1679428782.md new file mode 100644 index 0000000..77f118c --- /dev/null +++ b/brainsteam/content/annotations/2023/03/21/1679428782.md @@ -0,0 +1,54 @@ +--- +date: '2023-03-21T19:59:42' +hypothesis-meta: + created: '2023-03-21T19:59:42.317507+00:00' + document: + title: + - 2303.09752.pdf + flagged: false + group: __world__ + hidden: false + id: 63md-sgiEe2GA2OJo26mSA + links: + html: https://hypothes.is/a/63md-sgiEe2GA2OJo26mSA + incontext: https://hyp.is/63md-sgiEe2GA2OJo26mSA/arxiv.org/pdf/2303.09752.pdf + json: https://hypothes.is/api/annotations/63md-sgiEe2GA2OJo26mSA + permissions: + admin: + - acct:ravenscroftj@hypothes.is + delete: + - acct:ravenscroftj@hypothes.is + read: + - group:__world__ + update: + - acct:ravenscroftj@hypothes.is + tags: + - llm + target: + - selector: + - end: 2402 + start: 2357 + type: TextPositionSelector + - exact: This paper presents COLT5 (ConditionalLongT5) + prefix: s are processed by aheavier MLP. + suffix: ', a new family of models that, b' + type: TextQuoteSelector + source: https://arxiv.org/pdf/2303.09752.pdf + text: CoLT5 stands for Conditional LongT5 + updated: '2023-03-21T19:59:42.317507+00:00' + uri: https://arxiv.org/pdf/2303.09752.pdf + user: acct:ravenscroftj@hypothes.is + user_info: + display_name: James Ravenscroft +in-reply-to: https://arxiv.org/pdf/2303.09752.pdf +tags: +- llm +- hypothesis +type: annotation +url: /annotations/2023/03/21/1679428782 + +--- + + + +
This paper presents COLT5 (ConditionalLongT5)CoLT5 stands for Conditional LongT5 \ No newline at end of file diff --git a/brainsteam/content/notes/2023/04/07/1680866081.md b/brainsteam/content/notes/2023/04/07/1680866081.md new file mode 100644 index 0000000..c3f0f84 --- /dev/null +++ b/brainsteam/content/notes/2023/04/07/1680866081.md @@ -0,0 +1,19 @@ +--- +date: '2023-04-07T11:14:41.131905' +mp-syndicate-to: +- https://brid.gy/publish/mastodon +photo: +- /media/2023/04/07/1680866081_0.jpg +tags: +- personal +type: note +url: /notes/2023/04/07/1680866081 + +--- + + + + + + Happy freaking Easter James - from Mother Nature + \ No newline at end of file diff --git a/brainsteam/content/posts/2023/03/13/deepthought-hitchhiker-s-guide-llms-and-raspberry-pis1678738115.md b/brainsteam/content/posts/2023/03/13/deepthought-hitchhiker-s-guide-llms-and-raspberry-pis1678738115.md new file mode 100644 index 0000000..bcfdada --- /dev/null +++ b/brainsteam/content/posts/2023/03/13/deepthought-hitchhiker-s-guide-llms-and-raspberry-pis1678738115.md @@ -0,0 +1,58 @@ +--- +date: '2023-03-13T20:08:35.475110' +mp-syndicate-to: +- https://brid.gy/publish/mastodon +tags: +- ai +- nlp +- humour +title: Deep Thought, Hitchhiker's Guide, LLMs and Raspberry Pis +description: Musings on parallels between AI fiction and AI fact +type: post +url: /posts/2023/03/13/deepthought-hitchhiker-s-guide-llms-and-raspberry-pis1678738115 + +--- + +Today I read via [Simon Willison's blog](https://simonwillison.net/2023/Mar/13/alpaca/) that [someone has managed to get LlaMA running on a raspberry pi]. That's pretty incredible progress and it made me think of this excerpt from [Hitchiker's Guide To the Galaxy](https://bookwyrm.social/book/181728/s/hitchhikers-guide-to-the-galaxy-trilogy-collection-5-books-set-by-douglas-adams): + +> O Deep Thought computer," he said, "the task we have designed you to perform is this. We want you to tell us...." he paused, "The Answer." +> +>"The Answer?" said Deep Thought. "The Answer to what?" +> +>"Life!" urged Fook. +> +>"The Universe!" said Lunkwill. +> +>"Everything!" they said in chorus. +> +>Deep Thought paused for a moment's reflection. +> +>"Tricky," he said finally. +> +>"But can you do it?" +> +>Again, a significant pause. +> +>"Yes," said Deep Thought, "I can do it." +> +>"There is an answer?" said Fook with breathless excitement. +> +>"Yes," said Deep Thought. "Life, the Universe, and Everything. There is an answer. But, I'll have to think about it." +> +>... +> +>Fook glanced impatiently at his watch. +> +>“How long?” he said. +> +>“Seven and a half million years,” said Deep Thought. +> +>Lunkwill and Fook blinked at each other. +> +>“Seven and a half million years...!” they cried in chorus. +> +>“Yes,” declaimed Deep Thought, “I said I’d have to think about it, didn’t I?" + +Maybe Deep Thought was actually just an LLM running on a raspberry pi and that's why it took so long to generate the ultimate answer! + + \ No newline at end of file diff --git a/brainsteam/content/posts/2023/03/20/week-11/images/officelights.jpg b/brainsteam/content/posts/2023/03/20/week-11/images/officelights.jpg new file mode 100644 index 0000000..409a681 Binary files /dev/null and b/brainsteam/content/posts/2023/03/20/week-11/images/officelights.jpg differ diff --git a/brainsteam/content/posts/2023/03/20/week-11/index.md b/brainsteam/content/posts/2023/03/20/week-11/index.md new file mode 100644 index 0000000..dd807ba --- /dev/null +++ b/brainsteam/content/posts/2023/03/20/week-11/index.md @@ -0,0 +1,45 @@ +--- +title: "Weeknote 11 2023" +date: 2023-03-20T19:53:00Z +description: in which I ate too much, entered gremlin mode and upgraded mkdocs-material +url: /2023/3/20/week-11 +type: post +mp-syndicate-to: +- https://brid.gy/publish/mastodon +- https://brid.gy/publish/twitter +resources: + - name: feature + src: images/officelights.jpg +tags: + - personal +--- + +This week (or last week)'s weeknote is a touch late since I was travelling over the weekend. On Sunday it was Mother's Day in the UK so we visited my mum up in the midlands and then Mrs R's mum down here in Hampshire, having a sit down meal with both. It was a bit like [the bit in the Vicar of Dibley where she accidentally signs herself up for multiple christmas dinners on the same day](https://www.youtube.com/watch?v=2aq3DNSF-jc). + +--- + +On tuesday we had a problem with our lighting in our office AND the water main near our office complex burst which meant we were sat in the office like gremlins in the dark and there were no toilet facilities. I decided to work from home for the rest of the week for reasons that were not unrelated. + + + +{{