On the strategic learning of signal associations. (2nd September 2022)
- Record Type:
- Journal Article
- Title:
- On the strategic learning of signal associations. (2nd September 2022)
- Main Title:
- On the strategic learning of signal associations
- Authors:
- Sherratt, Thomas N
Voll, James - Editors:
- Naguib, Marc
- Abstract:
- Abstract: Signal detection theory (SDT) has been widely used to identify the optimal response of a receiver to a stimulus when it could be generated by more than one signaler type. While SDT assumes that the receiver adopts the optimal response at the outset, in reality, receivers often have to learn how to respond. We, therefore, recast a simple signal detection problem as a multi-armed bandit (MAB) in which inexperienced receivers chose between accepting a signaler (gaining information and an uncertain payoff) and rejecting it (gaining no information but a certain payoff). An exact solution to this exploration–exploitation dilemma can be identified by solving the relevant dynamic programming equation (DPE). However, to evaluate how the problem is solved in practice, we conducted an experiment. Here humans ( n = 135) were repeatedly presented with a four readily discriminable signaler types, some of which were on average profitable, and others unprofitable to accept in the long term. We then compared the performance of SDT, DPE, and three candidate exploration–exploitation models (Softmax, Thompson, and Greedy) in explaining the observed sequences of acceptance and rejection. All of the models predicted volunteer behavior well when signalers were clearly profitable or clearly unprofitable to accept. Overall however, the Softmax and Thompson sampling models, which predict the optimal (SDT) response towards signalers with borderline profitability only after extensiveAbstract: Signal detection theory (SDT) has been widely used to identify the optimal response of a receiver to a stimulus when it could be generated by more than one signaler type. While SDT assumes that the receiver adopts the optimal response at the outset, in reality, receivers often have to learn how to respond. We, therefore, recast a simple signal detection problem as a multi-armed bandit (MAB) in which inexperienced receivers chose between accepting a signaler (gaining information and an uncertain payoff) and rejecting it (gaining no information but a certain payoff). An exact solution to this exploration–exploitation dilemma can be identified by solving the relevant dynamic programming equation (DPE). However, to evaluate how the problem is solved in practice, we conducted an experiment. Here humans ( n = 135) were repeatedly presented with a four readily discriminable signaler types, some of which were on average profitable, and others unprofitable to accept in the long term. We then compared the performance of SDT, DPE, and three candidate exploration–exploitation models (Softmax, Thompson, and Greedy) in explaining the observed sequences of acceptance and rejection. All of the models predicted volunteer behavior well when signalers were clearly profitable or clearly unprofitable to accept. Overall however, the Softmax and Thompson sampling models, which predict the optimal (SDT) response towards signalers with borderline profitability only after extensive learning, explained the responses of volunteers significantly better. By highlighting the relationship between the MAB and SDT models, we encourage others to evaluate how receivers strategically learn about their environments. Abstract : Hide Signal detection theory identifies the optimal response of a fully informed decision-maker to a given stimulus. However, what if decision-makers are not fully informed? Here we show that exploration-exploitation models provide a principled way to solve such problems, in which decision-makers strategically balance the need to gain information and exploit it. Indeed, rather than immediately adopting the optimal response, naïve human decision-makers choose options with probabilities that are an S-shaped function of their estimated payoffs. … (more)
- Is Part Of:
- Behavioral ecology. Volume 33:Number 6(2022)
- Journal:
- Behavioral ecology
- Issue:
- Volume 33:Number 6(2022)
- Issue Display:
- Volume 33, Issue 6 (2022)
- Year:
- 2022
- Volume:
- 33
- Issue:
- 6
- Issue Sort Value:
- 2022-0033-0006-0000
- Page Start:
- 1058
- Page End:
- 1069
- Publication Date:
- 2022-09-02
- Subjects:
- Bayesian learning -- decision theory -- dynamic programming -- multi-armed bandit -- signal detection theory -- Softmax -- Thompson sampling
Animal behavior -- Periodicals
Behavior evolution -- Periodicals
Ecology -- Periodicals
Psychology, Comparative -- Periodicals
591.5 - Journal URLs:
- http://beheco.oupjournals.org ↗
http://beheco.oxfordjournals.org ↗
http://ukcatalogue.oup.com/ ↗
http://firstsearch.oclc.org ↗ - DOI:
- 10.1093/beheco/arac027 ↗
- Languages:
- English
- ISSNs:
- 1045-2249
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 1877.390000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 24670.xml