brainsteam.co.uk/brainsteam/content/annotations/2022/11/28/1669635285.md at 8ba49d27127540df99ef72d0c281b022e16fe3c3

1.9 KiB

Raw Blame History

date

hypothesis-meta

in-reply-to

tags

target

text

updated

uri

user

user_info

2022-11-28T11:34:45.963292+00:00

title

1809.09672.pdf

false

__world__

false

qMPVfG8QEe2WJWufCDu9ww

html	incontext	json
https://hypothes.is/a/qMPVfG8QEe2WJWufCDu9ww	https://hyp.is/qMPVfG8QEe2WJWufCDu9ww/arxiv.org/pdf/1809.09672.pdf	https://hypothes.is/api/annotations/qMPVfG8QEe2WJWufCDu9ww

admin

delete

read

update

acct:ravenscroftj@hypothes.is

group:__world__

acct:ravenscroftj@hypothes.is

bandit

nlproc

summarization

selector

source

end	start	type
10089	9945	TextPositionSelector

exact	prefix	suffix	type
andit is a decision-making formal-ization in which an agent repeatedly chooses oneof several actions, and receives a reward based onthis choice.	dient reinforcementlearning. A b	The agent’s goal is to quickly	TextQuoteSelector

https://arxiv.org/pdf/1809.09672.pdf

Definition for contextual bandit: an agent that repeatedly choses one of several actions and receives a reward based on this choice.

2022-11-28T11:34:45.963292+00:00

https://arxiv.org/pdf/1809.09672.pdf

acct:ravenscroftj@hypothes.is

display_name
James Ravenscroft

https://arxiv.org/pdf/1809.09672.pdf

bandit

nlproc

summarization

hypothesis

annotation

/annotations/2022/11/28/1669635285

andit is a decision-making formal-ization in which an agent repeatedly chooses oneof several actions, and receives a reward based onthis choice.

Definition for contextual bandit: an agent that repeatedly choses one of several actions and receives a reward based on this choice.

1.9 KiB Raw Blame History Unescape Escape

1.9 KiB

Raw Blame History