brainsteam.co.uk/brainsteam/content/annotations/2022/12/19/1671461828.md

77 lines
2.5 KiB
Markdown
Raw Normal View History

---
date: '2022-12-19T14:57:08'
hypothesis-meta:
created: '2022-12-19T14:57:08.575784+00:00'
document:
title:
- My AI Safety Lecture for UT Effective Altruism
flagged: false
group: __world__
hidden: false
id: aQ51un-tEe29v2MBjEX6Xw
links:
html: https://hypothes.is/a/aQ51un-tEe29v2MBjEX6Xw
incontext: https://hyp.is/aQ51un-tEe29v2MBjEX6Xw/scottaaronson.blog/?p=6823
json: https://hypothes.is/api/annotations/aQ51un-tEe29v2MBjEX6Xw
permissions:
admin:
- acct:ravenscroftj@hypothes.is
delete:
- acct:ravenscroftj@hypothes.is
read:
- group:__world__
update:
- acct:ravenscroftj@hypothes.is
tags:
- explainability
- nlproc
target:
- selector:
- endContainer: /div[2]/div[2]/div[2]/div[1]/p[99]
endOffset: 386
startContainer: /div[2]/div[2]/div[2]/div[1]/p[99]
startOffset: 0
type: RangeSelector
- end: 40910
start: 40524
type: TextPositionSelector
- exact: "Anyway, we actually have a working prototype of the watermarking scheme,\
\ built by OpenAI engineer Hendrik Kirchner. It seems to work pretty well\u2014\
empirically, a few hundred tokens seem to be enough to get a reasonable signal\
\ that yes, this text came from GPT. In principle, you could even take a\
\ long text and isolate which parts probably came from GPT and which parts\
\ probably didn\u2019t."
prefix: 'irst hundred prime numbers).
'
suffix: '
Now, this can all be defeate'
type: TextQuoteSelector
source: https://scottaaronson.blog/?p=6823
text: Scott's team hsas already developed a prototype watermarking scheme at OpenAI
and it works pretty well
updated: '2022-12-19T14:57:08.575784+00:00'
uri: https://scottaaronson.blog/?p=6823
user: acct:ravenscroftj@hypothes.is
user_info:
display_name: James Ravenscroft
in-reply-to: https://scottaaronson.blog/?p=6823
tags:
- explainability
- nlproc
- hypothesis
type: annotation
url: /annotations/2022/12/19/1671461828
---
<blockquote>Anyway, we actually have a working prototype of the watermarking scheme, built by OpenAI engineer Hendrik Kirchner. It seems to work pretty well—empirically, a few hundred tokens seem to be enough to get a reasonable signal that yes, this text came from GPT. In principle, you could even take a long text and isolate which parts probably came from GPT and which parts probably didnt.</blockquote>Scott's team hsas already developed a prototype watermarking scheme at OpenAI and it works pretty well