brainsteam.co.uk/brainsteam/content/annotations/2023/01/29/1674994106.md

---
date: '2023-01-29T12:08:26'
hypothesis-meta:
  created: '2023-01-29T12:08:26.920806+00:00'
  document:
    title:
    - 2301.11305.pdf
  flagged: false
  group: __world__
  hidden: false
  id: ovUwTp_NEe2lC8uCWsE7eg
  links:
    html: https://hypothes.is/a/ovUwTp_NEe2lC8uCWsE7eg
    incontext: https://hyp.is/ovUwTp_NEe2lC8uCWsE7eg/arxiv.org/pdf/2301.11305.pdf
    json: https://hypothes.is/api/annotations/ovUwTp_NEe2lC8uCWsE7eg
  permissions:
    admin:
    - acct:ravenscroftj@hypothes.is
    delete:
    - acct:ravenscroftj@hypothes.is
    read:
    - group:__world__
    update:
    - acct:ravenscroftj@hypothes.is
  tags:
  - chatgpt
  - detecting gpt
  target:
  - selector:
    - end: 16098
      start: 15348
      type: TextPositionSelector
    - exact: "Figure 3. The average drop in log probability (perturbation discrep-ancy)\
        \ after rephrasing a passage is consistently higher for model-generated passages\
        \ than for human-written passages. Each plotshows the distribution of the\
        \ perturbation discrepancy d (x, p\u03B8 , q)for human-written news articles\
        \ and machine-generated arti-cles; of equal word length from models GPT-2\
        \ (1.5B), GPT-Neo-2.7B (Black et al., 2021), GPT-J (6B; Wang & Komatsuzaki\
        \ (2021))and GPT-NeoX (20B; Black et al. (2022)). Human-written arti-cles\
        \ are a sample of 500 XSum articles; machine-generated textis generated by\
        \ prompting each model with the first 30 tokens ofeach XSum article, sampling\
        \ from the raw conditional distribution.Discrepancies are estimated with 100\
        \ T5-3B samples."
      prefix: ancy)0.00.20.40.60.81.0Frequency
      suffix: to machine-generated text detect
      type: TextQuoteSelector
    source: https://arxiv.org/pdf/2301.11305.pdf
  text: quite striking here is the fact that more powerful/larger models are more
    capable of generating unusual or "human-like" responses - looking at the overlap
    in log likelihoods
  updated: '2023-01-29T12:08:26.920806+00:00'
  uri: https://arxiv.org/pdf/2301.11305.pdf
  user: acct:ravenscroftj@hypothes.is
  user_info:
    display_name: James Ravenscroft
in-reply-to: https://arxiv.org/pdf/2301.11305.pdf
tags:
- chatgpt
- detecting gpt
- hypothesis
type: annotation
url: /annotations/2023/01/29/1674994106

---


 <blockquote>Figure 3. The average drop in log probability (perturbation discrep-ancy) after rephrasing a passage is consistently higher for model-generated passages than for human-written passages. Each plotshows the distribution of the perturbation discrepancy d (x, pθ , q)for human-written news articles and machine-generated arti-cles; of equal word length from models GPT-2 (1.5B), GPT-Neo-2.7B (Black et al., 2021), GPT-J (6B; Wang & Komatsuzaki (2021))and GPT-NeoX (20B; Black et al. (2022)). Human-written arti-cles are a sample of 500 XSum articles; machine-generated textis generated by prompting each model with the first 30 tokens ofeach XSum article, sampling from the raw conditional distribution.Discrepancies are estimated with 100 T5-3B samples.</blockquote>quite striking here is the fact that more powerful/larger models are more capable of generating unusual or "human-like" responses - looking at the overlap in log likelihoods
Add 'brainsteam/content/annotations/2023/01/29/1674994106.md' 2023-01-29 12:15:03 +00:00			`---`
			`date: '2023-01-29T12:08:26'`
			`hypothesis-meta:`
			`created: '2023-01-29T12:08:26.920806+00:00'`
			`document:`
			`title:`
			`- 2301.11305.pdf`
			`flagged: false`
			`group: __world__`
			`hidden: false`
			`id: ovUwTp_NEe2lC8uCWsE7eg`
			`links:`
			`html: https://hypothes.is/a/ovUwTp_NEe2lC8uCWsE7eg`
			`incontext: https://hyp.is/ovUwTp_NEe2lC8uCWsE7eg/arxiv.org/pdf/2301.11305.pdf`
			`json: https://hypothes.is/api/annotations/ovUwTp_NEe2lC8uCWsE7eg`
			`permissions:`
			`admin:`
			`- acct:ravenscroftj@hypothes.is`
			`delete:`
			`- acct:ravenscroftj@hypothes.is`
			`read:`
			`- group:__world__`
			`update:`
			`- acct:ravenscroftj@hypothes.is`
			`tags:`
			`- chatgpt`
			`- detecting gpt`
			`target:`
			`- selector:`
			`- end: 16098`
			`start: 15348`
			`type: TextPositionSelector`
			`- exact: "Figure 3. The average drop in log probability (perturbation discrep-ancy)\`
			`\ after rephrasing a passage is consistently higher for model-generated passages\`
			`\ than for human-written passages. Each plotshows the distribution of the\`
			`\ perturbation discrepancy d (x, p\u03B8 , q)for human-written news articles\`
			`\ and machine-generated arti-cles; of equal word length from models GPT-2\`
			`\ (1.5B), GPT-Neo-2.7B (Black et al., 2021), GPT-J (6B; Wang & Komatsuzaki\`
			`\ (2021))and GPT-NeoX (20B; Black et al. (2022)). Human-written arti-cles\`
			`\ are a sample of 500 XSum articles; machine-generated textis generated by\`
			`\ prompting each model with the first 30 tokens ofeach XSum article, sampling\`
			`\ from the raw conditional distribution.Discrepancies are estimated with 100\`
			`\ T5-3B samples."`
			`prefix: ancy)0.00.20.40.60.81.0Frequency`
			`suffix: to machine-generated text detect`
			`type: TextQuoteSelector`
			`source: https://arxiv.org/pdf/2301.11305.pdf`
			`text: quite striking here is the fact that more powerful/larger models are more`
			`capable of generating unusual or "human-like" responses - looking at the overlap`
			`in log likelihoods`
			`updated: '2023-01-29T12:08:26.920806+00:00'`
			`uri: https://arxiv.org/pdf/2301.11305.pdf`
			`user: acct:ravenscroftj@hypothes.is`
			`user_info:`
			`display_name: James Ravenscroft`
			`in-reply-to: https://arxiv.org/pdf/2301.11305.pdf`
			`tags:`
			`- chatgpt`
			`- detecting gpt`
			`- hypothesis`
			`type: annotation`
			`url: /annotations/2023/01/29/1674994106`

			`---`



			<blockquote>Figure 3. The average drop in log probability (perturbation discrep-ancy) after rephrasing a passage is consistently higher for model-generated passages than for human-written passages. Each plotshows the distribution of the perturbation discrepancy d (x, pθ , q)for human-written news articles and machine-generated arti-cles; of equal word length from models GPT-2 (1.5B), GPT-Neo-2.7B (Black et al., 2021), GPT-J (6B; Wang & Komatsuzaki (2021))and GPT-NeoX (20B; Black et al. (2022)). Human-written arti-cles are a sample of 500 XSum articles; machine-generated textis generated by prompting each model with the first 30 tokens ofeach XSum article, sampling from the raw conditional distribution.Discrepancies are estimated with 100 T5-3B samples.</blockquote>quite striking here is the fact that more powerful/larger models are more capable of generating unusual or "human-like" responses - looking at the overlap in log likelihoods