Paper Abstract and Keywords |
Presentation |
2006-03-16 14:55
Multiobjective Reinforcement Learning based on Multiple Value Function Takumi Kamioka (OIST/NAIST), Eiji Uchibe (OIST), Kenji Doya (OIST/ATR) |
Abstract |
(in Japanese) |
(See Japanese page) |
(in English) |
Standard Reinforcement Learning(RL) is formulated
for optimization of a single objective function.
However in most real world problems,
multiple objective functions need to be considered.
We propose Actor-Critic architecture to deal
with multiple objective functions.
Our architecture updates a separate state value function for each objectives and the actor is updated by scarlarized TD error calculated from multiple value functions to acquire a Pareto optimal policy. We compare a number of sclarizing functions, such as Kang and Bien's max-min method, extended max-min method and weighted summation. In a computer simulation of learning period defined by multiple inequality, extended max-min method is able to acquire the good policy without affect of combination of reward functions. |
Keyword |
(in Japanese) |
(See Japanese page) |
(in English) |
multiobjective optimization / reinforcement learning / Pareto optimal solution / / / / / |
Reference Info. |
IEICE Tech. Rep., vol. 105, no. 658, NC2005-146, pp. 127-132, March 2006. |
Paper # |
NC2005-146 |
Date of Issue |
2006-03-09 (NC) |
ISSN |
Print edition: ISSN 0913-5685 |
Download PDF |
|
Conference Information |
Committee |
NC |
Conference Date |
2006-03-15 - 2006-03-17 |
Place (in Japanese) |
(See Japanese page) |
Place (in English) |
Tamagawa University |
Topics (in Japanese) |
(See Japanese page) |
Topics (in English) |
General |
Paper Information |
Registration To |
NC |
Conference Code |
2006-03-NC |
Language |
Japanese |
Title (in Japanese) |
(See Japanese page) |
Sub Title (in Japanese) |
(See Japanese page) |
Title (in English) |
Multiobjective Reinforcement Learning based on Multiple Value Function |
Sub Title (in English) |
|
Keyword(1) |
multiobjective optimization |
Keyword(2) |
reinforcement learning |
Keyword(3) |
Pareto optimal solution |
Keyword(4) |
|
Keyword(5) |
|
Keyword(6) |
|
Keyword(7) |
|
Keyword(8) |
|
1st Author's Name |
Takumi Kamioka |
1st Author's Affiliation |
Nara Institute of Science and Technology (OIST/NAIST) |
2nd Author's Name |
Eiji Uchibe |
2nd Author's Affiliation |
Okinawa Institute of Science and Technology (OIST) |
3rd Author's Name |
Kenji Doya |
3rd Author's Affiliation |
Okinawa Institute of Science and Technology (OIST/ATR) |
4th Author's Name |
|
4th Author's Affiliation |
() |
5th Author's Name |
|
5th Author's Affiliation |
() |
6th Author's Name |
|
6th Author's Affiliation |
() |
7th Author's Name |
|
7th Author's Affiliation |
() |
8th Author's Name |
|
8th Author's Affiliation |
() |
9th Author's Name |
|
9th Author's Affiliation |
() |
10th Author's Name |
|
10th Author's Affiliation |
() |
11th Author's Name |
|
11th Author's Affiliation |
() |
12th Author's Name |
|
12th Author's Affiliation |
() |
13th Author's Name |
|
13th Author's Affiliation |
() |
14th Author's Name |
|
14th Author's Affiliation |
() |
15th Author's Name |
|
15th Author's Affiliation |
() |
16th Author's Name |
|
16th Author's Affiliation |
() |
17th Author's Name |
|
17th Author's Affiliation |
() |
18th Author's Name |
|
18th Author's Affiliation |
() |
19th Author's Name |
|
19th Author's Affiliation |
() |
20th Author's Name |
|
20th Author's Affiliation |
() |
Speaker |
Author-1 |
Date Time |
2006-03-16 14:55:00 |
Presentation Time |
25 minutes |
Registration for |
NC |
Paper # |
NC2005-146 |
Volume (vol) |
vol.105 |
Number (no) |
no.658 |
Page |
pp.127-132 |
#Pages |
6 |
Date of Issue |
2006-03-09 (NC) |