Integration of imitation learning using GAIL and reinforcement learning using task-achievement rewards via probabilistic graphical model. (17th August 2020)

Record Type:: Journal Article
Title:: Integration of imitation learning using GAIL and reinforcement learning using task-achievement rewards via probabilistic graphical model. (17th August 2020)
Main Title:: Integration of imitation learning using GAIL and reinforcement learning using task-achievement rewards via probabilistic graphical model
Authors:: Kinose, Akira
Taniguchi, Tadahiro
Abstract:: Abstract : The integration of reinforcement learning (RL) and imitation learning (IL) is an important problem that has long been studied in the field of intelligent robotics. RL optimizes policies to maximize the cumulative reward, whereas IL attempts to extract general knowledge about the trajectories demonstrated by experts, i.e, demonstrators. Because each has its own drawbacks, many methods combining them and compensating for each set of drawbacks have been explored thus far. However, many of these methods are heuristic and do not have a solid theoretical basis. This paper presents a new theory for integrating RL and IL by extending the probabilistic graphical model (PGM) framework for RL, control as inference . We develop a new PGM for RL with multiple types of rewards, called probabilistic graphical model for Markov decision processes with multiple optimality emissions (pMDP-MO). Furthermore, we demonstrate that the integrated learning method of RL and IL can be formulated as a probabilistic inference of policies on pMDP-MO by considering the discriminator in generative adversarial imitation learning (GAIL) as an additional optimality emission. We adapt the GAIL and task-achievement reward to our proposed framework, achieving significantly better performance than policies trained with baseline methods. GRAPHICAL ABSTRACT: UF0001
Is Part Of:: Advanced robotics. Volume 34:Number 16(2020)
Journal:: Advanced robotics
Issue:: Volume 34:Number 16(2020)
Issue Display:: Volume 34, Issue 16 (2020)
Year:: 2020
Volume:: 34
Issue:: 16
Issue Sort Value:: 2020-0034-0016-0000
Page Start:: 1055
Page End:: 1067
Publication Date:: 2020-08-17
Subjects:: Imitation learning -- reinforcement learning -- probabilistic inference -- control as inference -- generative adversarial imitation learning
Robotics -- Periodicals
Robotics -- Japan -- Periodicals
Robotics
Japan
Periodicals
629.89205
Journal URLs:: http://www.catchword.com/rpsv/cw/vsp/01691864/contp1.htm ↗
http://catalog.hathitrust.org/api/volumes/oclc/14883000.html ↗
http://www.tandfonline.com/toc/tadr20/current ↗
http://www.tandfonline.com/ ↗
http://firstsearch.oclc.org ↗
http://firstsearch.oclc.org/journal=0169-1864;screen=info;ECOIP ↗
http://www.ingentaselect.com/vl=16659242/cl=11/nw=1/rpsv/cw/vsp/01691864/contp1.htm ↗
DOI:: 10.1080/01691864.2020.1778521 ↗
Languages:: English
ISSNs:: 0169-1864
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 0696.926500
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store
Ingest File:: 22698.xml