Fast and Epsilon-Optimal Discretized Pursuit Learning Automata

Junqi Zhang, Cheng Wang, Mengchu Zhou

Research output: Contribution to journalArticle

27 Citations (Scopus)

Abstract

Learning automata (LA) are powerful tools for reinforcement learning. A discretized pursuit LA is the most popular one among them. During an iteration its operation consists of three basic phases: 1) selecting the next action; 2) finding the optimal estimated action; and 3) updating the state probability. However, when the number of actions is large, the learning becomes extremely slow because there are too many updates to be made at each iteration. The increased updates are mostly from phases 1 and 3. A new fast discretized pursuit LA with assured ϵ-optimality is proposed to perform both phases 1 and 3 with the computational complexity independent of the number of actions. Apart from its low computational complexity, it achieves faster convergence speed than the classical one when operating in stationary environments. This paper can promote the applications of LA toward the large-scale-action oriented area that requires efficient reinforcement learning tools with assured ϵ-optimality, fast convergence speed, and low computational complexity for each iteration.

Original languageEnglish (US)
Article number6955789
Pages (from-to)2089-2099
Number of pages11
JournalIEEE Transactions on Cybernetics
Volume45
Issue number10
DOIs
StatePublished - Oct 1 2015

Fingerprint

Computational complexity
Reinforcement learning

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Human-Computer Interaction
  • Electrical and Electronic Engineering
  • Control and Systems Engineering
  • Computer Science Applications

Keywords

  • Discretized pursuit learning automata (LA)
  • Stationary environments
  • low computational complexity

Cite this

Zhang, Junqi ; Wang, Cheng ; Zhou, Mengchu. / Fast and Epsilon-Optimal Discretized Pursuit Learning Automata. In: IEEE Transactions on Cybernetics. 2015 ; Vol. 45, No. 10. pp. 2089-2099.
@article{aea4883f506842a5b8f399ac923cf484,
title = "Fast and Epsilon-Optimal Discretized Pursuit Learning Automata",
abstract = "Learning automata (LA) are powerful tools for reinforcement learning. A discretized pursuit LA is the most popular one among them. During an iteration its operation consists of three basic phases: 1) selecting the next action; 2) finding the optimal estimated action; and 3) updating the state probability. However, when the number of actions is large, the learning becomes extremely slow because there are too many updates to be made at each iteration. The increased updates are mostly from phases 1 and 3. A new fast discretized pursuit LA with assured ϵ-optimality is proposed to perform both phases 1 and 3 with the computational complexity independent of the number of actions. Apart from its low computational complexity, it achieves faster convergence speed than the classical one when operating in stationary environments. This paper can promote the applications of LA toward the large-scale-action oriented area that requires efficient reinforcement learning tools with assured ϵ-optimality, fast convergence speed, and low computational complexity for each iteration.",
keywords = "Discretized pursuit learning automata (LA), Stationary environments, low computational complexity",
author = "Junqi Zhang and Cheng Wang and Mengchu Zhou",
year = "2015",
month = "10",
day = "1",
doi = "https://doi.org/10.1109/TCYB.2014.2365463",
language = "English (US)",
volume = "45",
pages = "2089--2099",
journal = "IEEE Transactions on Cybernetics",
issn = "2168-2267",
publisher = "IEEE Advancing Technology for Humanity",
number = "10",

}

Fast and Epsilon-Optimal Discretized Pursuit Learning Automata. / Zhang, Junqi; Wang, Cheng; Zhou, Mengchu.

In: IEEE Transactions on Cybernetics, Vol. 45, No. 10, 6955789, 01.10.2015, p. 2089-2099.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Fast and Epsilon-Optimal Discretized Pursuit Learning Automata

AU - Zhang, Junqi

AU - Wang, Cheng

AU - Zhou, Mengchu

PY - 2015/10/1

Y1 - 2015/10/1

N2 - Learning automata (LA) are powerful tools for reinforcement learning. A discretized pursuit LA is the most popular one among them. During an iteration its operation consists of three basic phases: 1) selecting the next action; 2) finding the optimal estimated action; and 3) updating the state probability. However, when the number of actions is large, the learning becomes extremely slow because there are too many updates to be made at each iteration. The increased updates are mostly from phases 1 and 3. A new fast discretized pursuit LA with assured ϵ-optimality is proposed to perform both phases 1 and 3 with the computational complexity independent of the number of actions. Apart from its low computational complexity, it achieves faster convergence speed than the classical one when operating in stationary environments. This paper can promote the applications of LA toward the large-scale-action oriented area that requires efficient reinforcement learning tools with assured ϵ-optimality, fast convergence speed, and low computational complexity for each iteration.

AB - Learning automata (LA) are powerful tools for reinforcement learning. A discretized pursuit LA is the most popular one among them. During an iteration its operation consists of three basic phases: 1) selecting the next action; 2) finding the optimal estimated action; and 3) updating the state probability. However, when the number of actions is large, the learning becomes extremely slow because there are too many updates to be made at each iteration. The increased updates are mostly from phases 1 and 3. A new fast discretized pursuit LA with assured ϵ-optimality is proposed to perform both phases 1 and 3 with the computational complexity independent of the number of actions. Apart from its low computational complexity, it achieves faster convergence speed than the classical one when operating in stationary environments. This paper can promote the applications of LA toward the large-scale-action oriented area that requires efficient reinforcement learning tools with assured ϵ-optimality, fast convergence speed, and low computational complexity for each iteration.

KW - Discretized pursuit learning automata (LA)

KW - Stationary environments

KW - low computational complexity

UR - http://www.scopus.com/inward/record.url?scp=84911071747&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84911071747&partnerID=8YFLogxK

U2 - https://doi.org/10.1109/TCYB.2014.2365463

DO - https://doi.org/10.1109/TCYB.2014.2365463

M3 - Article

VL - 45

SP - 2089

EP - 2099

JO - IEEE Transactions on Cybernetics

JF - IEEE Transactions on Cybernetics

SN - 2168-2267

IS - 10

M1 - 6955789

ER -