2022-11-28T11:34:45.963292+00:00 |
|
false |
__world__ |
false |
qMPVfG8QEe2WJWufCDu9ww |
|
admin |
delete |
read |
update |
acct:ravenscroftj@hypothes.is |
|
acct:ravenscroftj@hypothes.is |
|
|
acct:ravenscroftj@hypothes.is |
|
|
rl |
bandit |
nlproc |
summarization |
|
selector |
source |
end |
start |
type |
10089 |
9945 |
TextPositionSelector |
|
exact |
prefix |
suffix |
type |
andit is a decision-making formal-ization in which an agent repeatedly chooses oneof several actions, and receives a reward based onthis choice. |
dient reinforcementlearning. A b |
The agent’s goal is to quickly |
TextQuoteSelector |
|
|
https://arxiv.org/pdf/1809.09672.pdf |
|
|
Definition for contextual bandit: an agent that repeatedly choses one of several actions and receives a reward based on this choice. |
2022-11-28T11:34:45.963292+00:00 |
https://arxiv.org/pdf/1809.09672.pdf |
acct:ravenscroftj@hypothes.is |
display_name |
James Ravenscroft |
|