brainsteam.co.uk/brainsteam/content/annotations/2022/12/19/1671461885.md at 6bdaeabded0921e1ecbc4d703e9a676d6385592e

2.6 KiB

Raw Blame History

date

hypothesis-meta

in-reply-to

tags

target

text

updated

uri

user

user_info

2022-12-19T14:58:05.006973+00:00

title

My AI Safety Lecture for UT Effective Altruism

false

__world__

false

iqqNRH-tEe2fKTMGgQumvA

html	incontext	json
https://hypothes.is/a/iqqNRH-tEe2fKTMGgQumvA	https://hyp.is/iqqNRH-tEe2fKTMGgQumvA/scottaaronson.blog/?p=6823	https://hypothes.is/api/annotations/iqqNRH-tEe2fKTMGgQumvA

admin

delete

read

update

acct:ravenscroftj@hypothes.is

group:__world__

acct:ravenscroftj@hypothes.is

explainability

nlproc

selector

source

endContainer	endOffset	startContainer	startOffset	type
/div[2]/div[2]/div[2]/div[1]/p[100]	429	/div[2]/div[2]/div[2]/div[1]/p[100]	0	RangeSelector

end	start	type
41343	40914	TextPositionSelector

exact	prefix	suffix	type
Now, this can all be defeated with enough effort. For example, if you used another AI to paraphrase GPT’s output—well okay, we’re not going to be able to detect that. On the other hand, if you just insert or delete a few words here and there, or rearrange the order of some sentences, the watermarking signal will still be there. Because it depends only on a sum over n-grams, it’s robust against those sorts of interventions.	which parts probably didn’t.	The hope is that this can be	TextQuoteSelector

https://scottaaronson.blog/?p=6823

this mechanism can be defeated by paraphrasing the output with another model

2022-12-19T14:58:05.006973+00:00

https://scottaaronson.blog/?p=6823

acct:ravenscroftj@hypothes.is

display_name
James Ravenscroft

https://scottaaronson.blog/?p=6823

explainability

nlproc

hypothesis

annotation

/annotations/2022/12/19/1671461885

Now, this can all be defeated with enough effort. For example, if you used another AI to paraphrase GPT’s output—well okay, we’re not going to be able to detect that. On the other hand, if you just insert or delete a few words here and there, or rearrange the order of some sentences, the watermarking signal will still be there. Because it depends only on a sum over n-grams, it’s robust against those sorts of interventions.

this mechanism can be defeated by paraphrasing the output with another model

2.6 KiB Raw Blame History Unescape Escape

2.6 KiB

Raw Blame History