k-Certainty Exploration Method: an action selector to identify the environment in reinforcement learning

https://doi.org/10.1016/S0004-3702(96)00062-8Get rights and content
Under an Elsevier user license
open archive

Abstract

Reinforcement learning aims to adapt an agent to an unknown environment according to rewards. There are two issues to handle delayed reward and uncertainty. Q-learning is a representative reinforcement learning method. It is used in many works since it can learn an optimum policy. However, Q-learning needs numerous trials to converge to an optimum policy. If the target environments can be described in Markov decision processes, we can identify them from statistics of sensor-action pairs. When we build the correct environment model, we can derive an optimum policy with the Policy Iteration Algorithm. Therefore, we can construct an optimum policy through identifying environments efficiently.

We separate the learning process into two phases: identifying an environment and determining an optimum policy. We propose the k-Certainty Exploration Method for identifying an environment. After that, an optimum policy is determined by the Policy Iteration Algorithm. We call a rule k-certainty if and only if it has been selected k times or more. The k-Certainty Exploration Method excepts any loop of rules that already achieve k-certainty. We show its effectiveness by comparing it with Q-learning in two experiments. One is Sutton's maze-like environment, the other is an original environment where an optimum policy varies according to a parameter.

Keywords

Reinforcement learning
Q-learning
Markov decision processes
Policy Iteration Algorithm
k-Certainty Exploration Method

Cited by (0)