brainsteam.co.uk/brainsteam/content/annotations/2022/11/28/1669635285.md

1.9 KiB
Raw Blame History

date hypothesis-meta in-reply-to tags type url
2022-11-28T11:34:45
created document flagged group hidden id links permissions tags target text updated uri user user_info
2022-11-28T11:34:45.963292+00:00
title
1809.09672.pdf
false __world__ false qMPVfG8QEe2WJWufCDu9ww
html incontext json
https://hypothes.is/a/qMPVfG8QEe2WJWufCDu9ww https://hyp.is/qMPVfG8QEe2WJWufCDu9ww/arxiv.org/pdf/1809.09672.pdf https://hypothes.is/api/annotations/qMPVfG8QEe2WJWufCDu9ww
admin delete read update
acct:ravenscroftj@hypothes.is
acct:ravenscroftj@hypothes.is
group:__world__
acct:ravenscroftj@hypothes.is
rl
bandit
nlproc
summarization
selector source
end start type
10089 9945 TextPositionSelector
exact prefix suffix type
andit is a decision-making formal-ization in which an agent repeatedly chooses oneof several actions, and receives a reward based onthis choice. dient reinforcementlearning. A b The agents goal is to quickly TextQuoteSelector
https://arxiv.org/pdf/1809.09672.pdf
Definition for contextual bandit: an agent that repeatedly choses one of several actions and receives a reward based on this choice. 2022-11-28T11:34:45.963292+00:00 https://arxiv.org/pdf/1809.09672.pdf acct:ravenscroftj@hypothes.is
display_name
James Ravenscroft
https://arxiv.org/pdf/1809.09672.pdf
rl
bandit
nlproc
summarization
hypothesis
annotation /annotations/2022/11/28/1669635285
andit is a decision-making formal-ization in which an agent repeatedly chooses oneof several actions, and receives a reward based onthis choice.
Definition for contextual bandit: an agent that repeatedly choses one of several actions and receives a reward based on this choice.