quotation:[Copy]
Zhinan Peng,Jiangping Hu,Rui Luo,Bijoy K. Ghosh.[en_title][J].Control Theory and Technology,2020,18(4):379~389.[Copy]
【Print page】 【Online reading】【Download 【PDF Full text】 View/Add CommentDownload reader Close

←Previous page|Page Next →

Back Issue    Advanced search

This Paper:Browse 607   Download 67 本文二维码信息
码上扫一扫!
Distributed multi-agent temporal-difference learning with full neighbor information
ZhinanPeng,JiangpingHu,RuiLuo,BijoyK.Ghosh
0
(School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, Sichuan, China;1.Department of Mathematics and Statistics, Texas Tech University, Lubbock, TX, 79409-1042, USA; 2.School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, Sichuan, China)
摘要:
This paper presents a novel distributed multi-agent temporal-difference learning framework for value function approximation, which allows agents using all the neighbor information instead of the information from only one neighbor. With full neighbor information, the proposed framework (1) has a faster convergence rate, and (2) is more robust compared to the state-of-the-art approaches. Then we propose a distributed multi-agent discounted temporal difference algorithm and a distributed multi-agent average cost temporal difference learning algorithm based on the framework. Moreover, the two proposed algorithms’ theoretical convergence proofs are provided. Numerical simulation results show that our proposed algorithms are superior to the gossip-based algorithm in convergence speed, robustness to noise and time-varying network topology.
关键词:  Distributed algorithm · Reinforcement learning · Temporal-diff erence learning · Multi-agent systems
DOI:https://doi.org/10.1007/s11768-020-00016-w
基金项目:This work was partially supported by the Sichuan Science and Technology Program (No. 2020YFSY0012), the National Natural Science Foundation of China (Nos. 61473061, 61104104) and the Program for New Century Excellent Talents in University (No. NCET-13-0091).
Distributed multi-agent temporal-difference learning with full neighbor information
Zhinan Peng,Jiangping Hu,Rui Luo,Bijoy K. Ghosh
(School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, Sichuan, China;1.Department of Mathematics and Statistics, Texas Tech University, Lubbock, TX, 79409-1042, USA; 2.School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, Sichuan, China)
Abstract:
This paper presents a novel distributed multi-agent temporal-difference learning framework for value function approximation, which allows agents using all the neighbor information instead of the information from only one neighbor. With full neighbor information, the proposed framework (1) has a faster convergence rate, and (2) is more robust compared to the state-of-the-art approaches. Then we propose a distributed multi-agent discounted temporal difference algorithm and a distributed multi-agent average cost temporal difference learning algorithm based on the framework. Moreover, the two proposed algorithms’ theoretical convergence proofs are provided. Numerical simulation results show that our proposed algorithms are superior to the gossip-based algorithm in convergence speed, robustness to noise and time-varying network topology.
Key words:  Distributed algorithm · Reinforcement learning · Temporal-diff erence learning · Multi-agent systems