brainsteam.co.uk/brainsteam/content/annotations/2022/12/19/1671461828.md

2.5 KiB
Raw Blame History

date hypothesis-meta in-reply-to tags type url
2022-12-19T14:57:08
created document flagged group hidden id links permissions tags target text updated uri user user_info
2022-12-19T14:57:08.575784+00:00
title
My AI Safety Lecture for UT Effective Altruism
false __world__ false aQ51un-tEe29v2MBjEX6Xw
html incontext json
https://hypothes.is/a/aQ51un-tEe29v2MBjEX6Xw https://hyp.is/aQ51un-tEe29v2MBjEX6Xw/scottaaronson.blog/?p=6823 https://hypothes.is/api/annotations/aQ51un-tEe29v2MBjEX6Xw
admin delete read update
acct:ravenscroftj@hypothes.is
acct:ravenscroftj@hypothes.is
group:__world__
acct:ravenscroftj@hypothes.is
explainability
nlproc
selector source
endContainer endOffset startContainer startOffset type
/div[2]/div[2]/div[2]/div[1]/p[99] 386 /div[2]/div[2]/div[2]/div[1]/p[99] 0 RangeSelector
end start type
40910 40524 TextPositionSelector
exact prefix suffix type
Anyway, we actually have a working prototype of the watermarking scheme, built by OpenAI engineer Hendrik Kirchner. It seems to work pretty well—empirically, a few hundred tokens seem to be enough to get a reasonable signal that yes, this text came from GPT. In principle, you could even take a long text and isolate which parts probably came from GPT and which parts probably didnt. irst hundred prime numbers). Now, this can all be defeate TextQuoteSelector
https://scottaaronson.blog/?p=6823
Scott's team hsas already developed a prototype watermarking scheme at OpenAI and it works pretty well 2022-12-19T14:57:08.575784+00:00 https://scottaaronson.blog/?p=6823 acct:ravenscroftj@hypothes.is
display_name
James Ravenscroft
https://scottaaronson.blog/?p=6823
explainability
nlproc
hypothesis
annotation /annotations/2022/12/19/1671461828
Anyway, we actually have a working prototype of the watermarking scheme, built by OpenAI engineer Hendrik Kirchner. It seems to work pretty well—empirically, a few hundred tokens seem to be enough to get a reasonable signal that yes, this text came from GPT. In principle, you could even take a long text and isolate which parts probably came from GPT and which parts probably didnt.
Scott's team hsas already developed a prototype watermarking scheme at OpenAI and it works pretty well