Add 'brainsteam/content/annotations/2022/12/19/1671461186.md'
continuous-integration/drone/push Build is passing Details

This commit is contained in:
ravenscroftj 2022-12-19 15:00:19 +00:00
parent 507556d60a
commit dfb9af3c2c
1 changed files with 78 additions and 0 deletions

View File

@ -0,0 +1,78 @@
---
date: '2022-12-19T14:46:26'
hypothesis-meta:
created: '2022-12-19T14:46:26.361697+00:00'
document:
title:
- My AI Safety Lecture for UT Effective Altruism
flagged: false
group: __world__
hidden: false
id: 6k0-pn-rEe20ccNOEgwbaQ
links:
html: https://hypothes.is/a/6k0-pn-rEe20ccNOEgwbaQ
incontext: https://hyp.is/6k0-pn-rEe20ccNOEgwbaQ/scottaaronson.blog/?p=6823
json: https://hypothes.is/api/annotations/6k0-pn-rEe20ccNOEgwbaQ
permissions:
admin:
- acct:ravenscroftj@hypothes.is
delete:
- acct:ravenscroftj@hypothes.is
read:
- group:__world__
update:
- acct:ravenscroftj@hypothes.is
tags:
- nlproc
- explainability
target:
- selector:
- endContainer: /div[2]/div[2]/div[2]/div[1]/p[68]
endOffset: 803
startContainer: /div[2]/div[2]/div[2]/div[1]/p[68]
startOffset: 0
type: RangeSelector
- end: 27975
start: 27172
type: TextPositionSelector
- exact: "(3) A third direction, and I would say maybe the most popular one in\
\ AI alignment research right now, is called interpretability. This is also\
\ a major direction in mainstream machine learning research, so there\u2019\
s a big point of intersection there. The idea of interpretability is, why\
\ don\u2019t we exploit the fact that we actually have complete access to\
\ the code of the AI\u2014or if it\u2019s a neural net, complete access to\
\ its parameters? So we can look inside of it. We can do the AI analogue\
\ of neuroscience. Except, unlike an fMRI machine, which gives you only an\
\ extremely crude snapshot of what a brain is doing, we can see exactly what\
\ every neuron in a neural net is doing at every point in time. If we don\u2019\
t exploit that, then aren\u2019t we trying to make AI safe with our hands\
\ tied behind our backs?"
prefix: ' take over the world, right?
'
suffix: "\n\n\n\nSo we should look inside\u2014but"
type: TextQuoteSelector
source: https://scottaaronson.blog/?p=6823
text: Interesting metaphor - it is a bit like MRI for neural networks but actually
more accurate/powerful
updated: '2022-12-19T14:46:26.361697+00:00'
uri: https://scottaaronson.blog/?p=6823
user: acct:ravenscroftj@hypothes.is
user_info:
display_name: James Ravenscroft
in-reply-to: https://scottaaronson.blog/?p=6823
tags:
- nlproc
- explainability
- hypothesis
type: annotation
url: /annotations/2022/12/19/1671461186
---
<blockquote>(3) A third direction, and I would say maybe the most popular one in AI alignment research right now, is called interpretability. This is also a major direction in mainstream machine learning research, so theres a big point of intersection there. The idea of interpretability is, why dont we exploit the fact that we actually have complete access to the code of the AI—or if its a neural net, complete access to its parameters? So we can look inside of it. We can do the AI analogue of neuroscience. Except, unlike an fMRI machine, which gives you only an extremely crude snapshot of what a brain is doing, we can see exactly what every neuron in a neural net is doing at every point in time. If we dont exploit that, then arent we trying to make AI safe with our hands tied behind our backs?</blockquote>Interesting metaphor - it is a bit like MRI for neural networks but actually more accurate/powerful