brainsteam.co.uk/brainsteam/content/annotations/2022/12/19/1671461828.md

---
date: '2022-12-19T14:57:08'
hypothesis-meta:
  created: '2022-12-19T14:57:08.575784+00:00'
  document:
    title:
    - My AI Safety Lecture for UT Effective Altruism
  flagged: false
  group: __world__
  hidden: false
  id: aQ51un-tEe29v2MBjEX6Xw
  links:
    html: https://hypothes.is/a/aQ51un-tEe29v2MBjEX6Xw
    incontext: https://hyp.is/aQ51un-tEe29v2MBjEX6Xw/scottaaronson.blog/?p=6823
    json: https://hypothes.is/api/annotations/aQ51un-tEe29v2MBjEX6Xw
  permissions:
    admin:
    - acct:ravenscroftj@hypothes.is
    delete:
    - acct:ravenscroftj@hypothes.is
    read:
    - group:__world__
    update:
    - acct:ravenscroftj@hypothes.is
  tags:
  - explainability
  - nlproc
  target:
  - selector:
    - endContainer: /div[2]/div[2]/div[2]/div[1]/p[99]
      endOffset: 386
      startContainer: /div[2]/div[2]/div[2]/div[1]/p[99]
      startOffset: 0
      type: RangeSelector
    - end: 40910
      start: 40524
      type: TextPositionSelector
    - exact: "Anyway, we actually have a working prototype of the watermarking scheme,\
        \ built by OpenAI engineer Hendrik Kirchner.  It seems to work pretty well\u2014\
        empirically, a few hundred tokens seem to be enough to get a reasonable signal\
        \ that yes, this text came from GPT.  In principle, you could even take a\
        \ long text and isolate which parts probably came from GPT and which parts\
        \ probably didn\u2019t."
      prefix: 'irst hundred prime numbers).


        '
      suffix: '


        Now, this can all be defeate'
      type: TextQuoteSelector
    source: https://scottaaronson.blog/?p=6823
  text: Scott's team hsas already developed a prototype watermarking scheme at OpenAI
    and it works pretty well
  updated: '2022-12-19T14:57:08.575784+00:00'
  uri: https://scottaaronson.blog/?p=6823
  user: acct:ravenscroftj@hypothes.is
  user_info:
    display_name: James Ravenscroft
in-reply-to: https://scottaaronson.blog/?p=6823
tags:
- explainability
- nlproc
- hypothesis
type: annotation
url: /annotations/2022/12/19/1671461828

---


 <blockquote>Anyway, we actually have a working prototype of the watermarking scheme, built by OpenAI engineer Hendrik Kirchner.  It seems to work pretty well—empirically, a few hundred tokens seem to be enough to get a reasonable signal that yes, this text came from GPT.  In principle, you could even take a long text and isolate which parts probably came from GPT and which parts probably didn’t.</blockquote>Scott's team hsas already developed a prototype watermarking scheme at OpenAI and it works pretty well