(英) |
Bayesian optimization is an optimization method that sequentially searches for the optimal value of an unknown continuous reward function. Bayesian optimization can be applied to a wide range of real-world problems, such as designing online games and controlling the quality of streaming videos. In Bayesian optimization, reward functions represent the relationship between the actions taken by a system designer and the profits observed by a user in receiving services from the system, and these characteristics vary with user preferences. Therefore, when applying Bayesian optimization to a large number of users, each with a different reward function, we need to search for the optimal value of each reward function individually, which is computationally intensive. For this problem, it is expected that by utilizing the similarity in user preferences, we can efficiently estimate the reward function and search for the optimal values. In this study, we propose Gaussian process-based clustering bandits (GP-CLUB), which simultaneously performs online clustering of users and Bayesian optimization for multiple users with different reward functions. By assuming a cluster structure representing the similarity in reward functions among users, the proposed method sequentially estimates the clusters while searching for the optimal value of the reward functions and maximizes the cumulative rewards. We theoretically analyze the characteristics of the expected cumulative regret of the proposed method and show its effectiveness compared to methods without clustering. |