brainsteam.co.uk/brainsteam/content/annotations/2022/12/19/1671461828.md at c5f12bd8ebe06b9bb719286623c58884ac344a06

2.5 KiB

Raw Blame History

date

hypothesis-meta

in-reply-to

tags

target

text

updated

uri

user

user_info

2022-12-19T14:57:08.575784+00:00

title

My AI Safety Lecture for UT Effective Altruism

false

__world__

false

aQ51un-tEe29v2MBjEX6Xw

html	incontext	json
https://hypothes.is/a/aQ51un-tEe29v2MBjEX6Xw	https://hyp.is/aQ51un-tEe29v2MBjEX6Xw/scottaaronson.blog/?p=6823	https://hypothes.is/api/annotations/aQ51un-tEe29v2MBjEX6Xw

admin

delete

read

update

acct:ravenscroftj@hypothes.is

group:__world__

acct:ravenscroftj@hypothes.is

explainability

nlproc

selector

source

endContainer	endOffset	startContainer	startOffset	type
/div[2]/div[2]/div[2]/div[1]/p[99]	386	/div[2]/div[2]/div[2]/div[1]/p[99]	0	RangeSelector

end	start	type
40910	40524	TextPositionSelector

exact	prefix	suffix	type
Anyway, we actually have a working prototype of the watermarking scheme, built by OpenAI engineer Hendrik Kirchner. It seems to work pretty well—empirically, a few hundred tokens seem to be enough to get a reasonable signal that yes, this text came from GPT. In principle, you could even take a long text and isolate which parts probably came from GPT and which parts probably didn’t.	irst hundred prime numbers).	Now, this can all be defeate	TextQuoteSelector

https://scottaaronson.blog/?p=6823

Scott's team hsas already developed a prototype watermarking scheme at OpenAI and it works pretty well

2022-12-19T14:57:08.575784+00:00

https://scottaaronson.blog/?p=6823

acct:ravenscroftj@hypothes.is

display_name
James Ravenscroft

https://scottaaronson.blog/?p=6823

explainability

nlproc

hypothesis

annotation

/annotations/2022/12/19/1671461828

Anyway, we actually have a working prototype of the watermarking scheme, built by OpenAI engineer Hendrik Kirchner. It seems to work pretty well—empirically, a few hundred tokens seem to be enough to get a reasonable signal that yes, this text came from GPT. In principle, you could even take a long text and isolate which parts probably came from GPT and which parts probably didn’t.

Scott's team hsas already developed a prototype watermarking scheme at OpenAI and it works pretty well

2.5 KiB Raw Blame History Unescape Escape

2.5 KiB

Raw Blame History