Paper Abstract and Keywords |
Presentation |
2023-02-28 09:30
MS-FC-HiFiGAN : Fast Neural Waveform Generation Model With Learnable Lightweight Upsampling Haruki Yamashita (Kobe Univ/NICT), Takuma Okamoto (NICT), Ryoichi Takashima, Tetsuya Takiguchi (Kobe Univ), Tomoki Toda (Nagoya Univ/NICT), Hisashi Kawai (NICT) EA2022-76 SIP2022-120 SP2022-40 |
Abstract |
(in Japanese) |
(See Japanese page) |
(in English) |
In recent years, in text-to-speech synthesis, it is required to improve the inference speed while keeping the quality.
Multi-stream(MS) iSTFT-HiFiGAN was proposed as a high-speed model of HiFi-GAN, a vocoder capable of inferring waveforms on single CPU.
In the TTS task using VITS, although there was some deterioration in sound quality, the speed was increased by about 4 times.
In this paper, we propose a MS-FC-HiFi-GAN in which the inverse short-time Fourier transform (iSTFT) part is changed to trainable fully connected layer for the purpose of improving the synthesis quality of the MS-iSTFT-HiFiGAN.
As for the inference speed, RTF was 0.15 on 1 CPU, which is the same as MS-iSTFT-HiFiGAN.
Synthesis quality was inferior to that of MS-iSTFT-HiFiGAN in TTS task, but was superior to thatin analysis/synthesis task. |
Keyword |
(in Japanese) |
(See Japanese page) |
(in English) |
speech synthesis / Neural Vocoder / HiFi-GAN / Text-to-Speech / Analysis Synthesis / / / |
Reference Info. |
IEICE Tech. Rep., vol. 122, no. 389, SP2022-40, pp. 7-12, Feb. 2023. |
Paper # |
SP2022-40 |
Date of Issue |
2023-02-21 (EA, SIP, SP) |
ISSN |
Online edition: ISSN 2432-6380 |
Copyright and reproduction |
All rights are reserved and no part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Notwithstanding, instructors are permitted to photocopy isolated articles for noncommercial classroom use without fee. (License No.: 10GA0019/12GB0052/13GB0056/17GB0034/18GB0034) |
Download PDF |
EA2022-76 SIP2022-120 SP2022-40 |
|