brainsteam.co.uk/brainsteam/content/annotations/2023/01/29/1674989184.md

---
date: '2023-01-29T10:46:24'
hypothesis-meta:
  created: '2023-01-29T10:46:24.271948+00:00'
  document:
    title:
    - 2301.11305.pdf
  flagged: false
  group: __world__
  hidden: false
  id: LNKuap_CEe2NNLuZfhdxTA
  links:
    html: https://hypothes.is/a/LNKuap_CEe2NNLuZfhdxTA
    incontext: https://hyp.is/LNKuap_CEe2NNLuZfhdxTA/arxiv.org/pdf/2301.11305.pdf
    json: https://hypothes.is/api/annotations/LNKuap_CEe2NNLuZfhdxTA
  permissions:
    admin:
    - acct:ravenscroftj@hypothes.is
    delete:
    - acct:ravenscroftj@hypothes.is
    read:
    - group:__world__
    update:
    - acct:ravenscroftj@hypothes.is
  tags:
  - chatgpt
  - detecting gpt
  target:
  - selector:
    - end: 31791
      start: 31366
      type: TextPositionSelector
    - exact: Figure 5. We simulate human edits to machine-generated text byreplacing
        varying fractions of model samples with T5-3B gener-ated text (masking out
        random five word spans until r% of text ismasked to simulate human edits to
        machine-generated text). Thefour top-performing methods all generally degrade
        in performancewith heavier revision, but DetectGPT is consistently most accurate.Experiment
        is conducted on the XSum dataset
      prefix: etectGPTLogRankLikelihoodEntropy
      suffix: .XSum SQuAD WritingPromptsMethod
      type: TextQuoteSelector
    source: https://arxiv.org/pdf/2301.11305.pdf
  text: DetectGPT shows 95% AUROC for texts that have been modified by about 10% and
    this drops off to about 85% when text is changed up to 24%.
  updated: '2023-01-29T10:46:24.271948+00:00'
  uri: https://arxiv.org/pdf/2301.11305.pdf
  user: acct:ravenscroftj@hypothes.is
  user_info:
    display_name: James Ravenscroft
in-reply-to: https://arxiv.org/pdf/2301.11305.pdf
tags:
- chatgpt
- detecting gpt
- hypothesis
type: annotation
url: /annotations/2023/01/29/1674989184

---


 <blockquote>Figure 5. We simulate human edits to machine-generated text byreplacing varying fractions of model samples with T5-3B gener-ated text (masking out random five word spans until r% of text ismasked to simulate human edits to machine-generated text). Thefour top-performing methods all generally degrade in performancewith heavier revision, but DetectGPT is consistently most accurate.Experiment is conducted on the XSum dataset</blockquote>DetectGPT shows 95% AUROC for texts that have been modified by about 10% and this drops off to about 85% when text is changed up to 24%.