Paper Abstract and Keywords |
Presentation |
2016-12-20 16:40
Generative Adversarial Network-based Postfiltering for Statistical Parametric Speech Synthesis Takuhiro Kaneko, Hirokazu Kameoka, Nobukatsu Hojo, Yusuke Ijima, Kaoru Hiramatsu, Kunio Kashino (NTT) SP2016-61 |
Abstract |
(in Japanese) |
(See Japanese page) |
(in English) |
In the field of speech synthesis, statistical parametric speech synthesis has been widely used due to the flexibility and compactness. However, the quality of its synthesized speech is degraded by over-smoothing and there is a large quality gap between natural and synthesized speech. To fill the gap, we propose a novel postfilter based on a generative adversarial network (GAN). There have been several attempts to alleviate over-smoothing like ours; however, they are based on empirical findings about acoustic differences between natural and synthesized speech. Therefore, they cannot cover all the factors causing the differences. In contrast, we examine a learning-based postfilter and learn how to compensate for the differences directly from the data. In particular, we utilize a GAN and optimize a generator (i.e., postfilter) and a discriminator in an adversarial process. This enables us to obtain the postfilter to fit the true data distribution. Experimental results show that the speech generated by our proposed method is comparable to analyzed-and-synthesized speech. |
Keyword |
(in Japanese) |
(See Japanese page) |
(in English) |
statistical parametric speech synthesis / postfilter / deep neural network / generative adversarial network / / / / |
Reference Info. |
IEICE Tech. Rep., vol. 116, no. 378, SP2016-61, pp. 89-94, Dec. 2016. |
Paper # |
SP2016-61 |
Date of Issue |
2016-12-13 (SP) |
ISSN |
Print edition: ISSN 0913-5685 Online edition: ISSN 2432-6380 |
Copyright and reproduction |
All rights are reserved and no part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Notwithstanding, instructors are permitted to photocopy isolated articles for noncommercial classroom use without fee. (License No.: 10GA0019/12GB0052/13GB0056/17GB0034/18GB0034) |
Notes on Review |
This article is a technical report without peer review, and its polished version will be published elsewhere. |
Download PDF |
SP2016-61 |
|