Reducing Entropy Overestimation in Soft Actor Critic Using Dual Policy Network. (10th June 2021)
- Record Type:
- Journal Article
- Title:
- Reducing Entropy Overestimation in Soft Actor Critic Using Dual Policy Network. (10th June 2021)
- Main Title:
- Reducing Entropy Overestimation in Soft Actor Critic Using Dual Policy Network
- Authors:
- Ali, Hamid
Majeed, Hammad
Usman, Imran
Almejalli, Khaled A. - Other Names:
- Abbas Haider Academic Editor.
- Abstract:
- Abstract : In reinforcement learning (RL), an agent learns an environment through hit and trail. This behavior allows the agent to learn in complex and difficult environments. In RL, the agent normally learns the given environment by exploring or exploiting. Most of the algorithms suffer from under exploration in the latter stage of the episodes. Recently, an off-policy algorithm called soft actor critic (SAC) is proposed that overcomes this problem by maximizing entropy as it learns the environment. In it, the agent tries to maximize entropy along with the expected discounted rewards. In SAC, the agent tries to be as random as possible while moving towards the maximum reward. This randomness allows the agent to explore the environment and stops it from getting stuck into local optima. We believe that maximizing the entropy causes the overestimation of entropy term which results in slow policy learning. This is because of the drastic change in action distribution whenever agent revisits the similar states. To overcome this problem, we propose a dual policy optimization framework, in which two independent policies are trained. Both the policies try to maximize entropy by choosing actions against the minimum entropy to reduce the overestimation. The use of two policies result in better and faster convergence. We demonstrate our approach on different well known continuous control simulated environments. Results show that our proposed technique achieves better results againstAbstract : In reinforcement learning (RL), an agent learns an environment through hit and trail. This behavior allows the agent to learn in complex and difficult environments. In RL, the agent normally learns the given environment by exploring or exploiting. Most of the algorithms suffer from under exploration in the latter stage of the episodes. Recently, an off-policy algorithm called soft actor critic (SAC) is proposed that overcomes this problem by maximizing entropy as it learns the environment. In it, the agent tries to maximize entropy along with the expected discounted rewards. In SAC, the agent tries to be as random as possible while moving towards the maximum reward. This randomness allows the agent to explore the environment and stops it from getting stuck into local optima. We believe that maximizing the entropy causes the overestimation of entropy term which results in slow policy learning. This is because of the drastic change in action distribution whenever agent revisits the similar states. To overcome this problem, we propose a dual policy optimization framework, in which two independent policies are trained. Both the policies try to maximize entropy by choosing actions against the minimum entropy to reduce the overestimation. The use of two policies result in better and faster convergence. We demonstrate our approach on different well known continuous control simulated environments. Results show that our proposed technique achieves better results against state of the art SAC algorithm and learns better policies. … (more)
- Is Part Of:
- Wireless communications and mobile computing. Volume 2021(2021)
- Journal:
- Wireless communications and mobile computing
- Issue:
- Volume 2021(2021)
- Issue Display:
- Volume 2021, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 2021
- Issue:
- 2021
- Issue Sort Value:
- 2021-2021-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-06-10
- Subjects:
- Wireless communication systems -- Periodicals
Mobile communication systems -- Periodicals
621.38205 - Journal URLs:
- https://onlinelibrary.wiley.com/journal/15308677 ↗
https://www.hindawi.com/journals/wcmc/ ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1155/2021/9920591 ↗
- Languages:
- English
- ISSNs:
- 1530-8669
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 9323.860000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 17462.xml