講演抄録/キーワード |
講演名 |
2011-06-21 14:15
Analysis and Improvement of Policy Gradient Estimation ○Tingting Zhao・Hirotaka Hachiya・Gang Niu・Masashi Sugiyama(Tokyo Inst. of Tech.) IBISML2011-12 |
抄録 |
(和) |
Policy gradient is a useful model-free reinforcement learning approach,
but it tends to suffer from instability of gradient estimates.
In this paper, we analyze and improve the stability of policy gradient methods. We first prove that the variance of gradient estimates in the PGPE (policy gradients with parameter-based exploration) method is smaller than that of the classical REINFORCE method under a mild assumption. We then derive the optimal baseline for PGPE, which contributes to further reducing the variance. We also theoretically show that PGPE with the optimal baseline is more preferable than REINFORCE with the optimal baseline in terms of the variance of gradient estimates.
Finally, we demonstrate the usefulness of the improved PGPE method through experiments. |
(英) |
Policy gradient is a useful model-free reinforcement learning approach,
but it tends to suffer from instability of gradient estimates.
In this paper, we analyze and improve the stability of policy gradient methods. We first prove that the variance of gradient estimates in the PGPE (policy gradients with parameter-based exploration) method is smaller than that of the classical REINFORCE method under a mild assumption. We then derive the optimal baseline for PGPE, which contributes to further reducing the variance. We also theoretically show that PGPE with the optimal baseline is more preferable than REINFORCE with the optimal baseline in terms of the variance of gradient estimates.
Finally, we demonstrate the usefulness of the improved PGPE method through experiments. |
キーワード |
(和) |
Reinforcement Learning / Policy gradients / Optimal Baseline / / / / / |
(英) |
Reinforcement Learning / Policy gradients / Optimal Baseline / / / / / |
文献情報 |
信学技報, vol. 111, no. 87, IBISML2011-12, pp. 83-89, 2011年6月. |
資料番号 |
IBISML2011-12 |
発行日 |
2011-06-13 (IBISML) |
ISSN |
Print edition: ISSN 0913-5685 Online edition: ISSN 2432-6380 |
著作権に ついて |
技術研究報告に掲載された論文の著作権は電子情報通信学会に帰属します.(許諾番号:10GA0019/12GB0052/13GB0056/17GB0034/18GB0034) |
PDFダウンロード |
IBISML2011-12 |