Exploration Exploitation Problem in Policy Based Deep Reinforcement Learning for Episodic and Continuous Environments
Vedang Naik1, Rohit Sahoo2, Sameer Mahajan3, Saurabh Singh4, Shaveta Malik5

1Vedang Naik*, Department of Computer Engineering, Terna Engineering College, Navi-Mumbai, India.
2Rohit Sahoo, Department of Computer Engineering, Terna Engineering College, Navi-Mumbai, India. 
3Sameer Mahajan, College of Engineering, Penn State University, Paoli, PA, USA. 
4Saurabh Singh, Department of Computer Engineering, Terna Engineering College, Navi-Mumbai, India. 
5Dr. Shaveta Malik, Associate Professor, Department of Computer Engineering, Terna Engineering College, Navi-Mumbai, India. 
Manuscript received on November 16, 2021. | Revised Manuscript received on November 19, 2021. | Manuscript published on December 30, 2021. | PP: 29-34 | Volume-11 Issue-2, December 2021. | Retrieval Number: 100.1/ijeat.B32671211221 | DOI: 10.35940/ijeat.B3267.1211221
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Reinforcement learning is an artificial intelligence paradigm that enables intelligent agents to accrue environmental incentives to get superior results. It is concerned with sequential decision-making problems which offer limited feedback. Reinforcement learning has roots in cybernetics and research in statistics, psychology, neurology, and computer science. It has piqued the interest of the machine learning and artificial intelligence groups in the last five to ten years. It promises that it allows you to train agents using rewards and penalties without explaining how the task will be completed. The RL issue may be described as an agent that must make decisions in a given environment to maximize a specified concept of cumulative rewards. The learner is not taught which actions to perform but must experiment to determine which acts provide the greatest reward. Thus, the learner has to actively choose between exploring its environment or exploiting it based on its knowledge. The exploration-exploitation paradox is one of the most common issues encountered while dealing with Reinforcement Learning algorithms. Deep reinforcement learning is the combination of reinforcement learning (RL) and deep learning. We describe how to utilize several deep reinforcement learning (RL) algorithms for managing a Cartpole system used to represent episodic environments and Stock Market Trading, which is used to describe continuous environments in this study. We explain and demonstrate the effects of different RL ideas such as Deep Q Networks (DQN), Double DQN, and Dueling DQN on learning performance. We also look at the fundamental distinctions between episodic and continuous activities and how the exploration-exploitation issue is addressed in their context.
Keywords: Reinforcement Learning, Episodic Task, Continuous Task, Exploration-Exploitation Problem.
Scope of the Article: Data Management, Exploration, and Mining