講演抄録/キーワード |
講演名 |
2008-03-20 15:15
[ポスター講演]Prosody Reconstruction by Rescaling Fundamental Frequency Contours in Order to Synthesize Communicative Speech ○Jinfu Ni・Shinsuke Sakai・Satoshi Nakamura(NICT/ATR) SP2007-193 |
抄録 |
(和) |
This paper presents a method of prosody reconstruction that can be used to synthesize conversational speech. In our method, we use a conventional text-to-speech engine to initially generate reading-style prosody for input text. We then use a frequency modulation technique to rescale the fundamental frequency (F0) contours to add the communicative functions of intonation to the synthesized speech. The frequency modulation technique is based on a functional F0 model, and the transformation scales are modeled by combining simple piecewise-linear patterns according to input tags. We conducted two experiments to evaluate our method: modulating the F0 range of reading-style prosody when synthesizing Japanese speech to convey "good news" and "bad news" (Experiment 1), and making a narrow focus when synthesizing Chinese dialog to convey emphasis (Experiment 2). The results of Experiment 1 showed that the listeners judged 94% of samples modulated with "bad news" F0 ranges as "bad news" and 78% of samples with "good news" F0 ranges as "good news." They are comparable with those obtained by style-specified corpora in our previous work. In Experiment 2, 90% of the samples with narrow focuses were identified. The results showed that proposed method could use paralinguistic information to achieve specific communicative purposes. |
(英) |
This paper presents a method of prosody reconstruction that can be used to synthesize conversational speech. In our method, we use a conventional text-to-speech engine to initially generate reading-style prosody for input text. We then use a frequency modulation technique to rescale the fundamental frequency (F0) contours to add the communicative functions of intonation to the synthesized speech. The frequency modulation technique is based on a functional F0 model, and the transformation scales are modeled by combining simple piecewise-linear patterns according to input tags. We conducted two experiments to evaluate our method: modulating the F0 range of reading-style prosody when synthesizing Japanese speech to convey "good news" and "bad news" (Experiment 1), and making a narrow focus when synthesizing Chinese dialog to convey emphasis (Experiment 2). The results of Experiment 1 showed that the listeners judged 94% of samples modulated with "bad news" F0 ranges as "bad news" and 78% of samples with "good news" F0 ranges as "good news." They are comparable with those obtained by style-specified corpora in our previous work. In Experiment 2, 90% of the samples with narrow focuses were identified. The results showed that proposed method could use paralinguistic information to achieve specific communicative purposes. |
キーワード |
(和) |
Conversational speech synthesis / prosody reconstruction / intonation synthesis / fundamental frequency control / text-to-speech synthesis / / / |
(英) |
Conversational speech synthesis / prosody reconstruction / intonation synthesis / fundamental frequency control / text-to-speech synthesis / / / |
文献情報 |
信学技報, vol. 107, no. 551, SP2007-193, pp. 39-44, 2008年3月. |
資料番号 |
SP2007-193 |
発行日 |
2008-03-13 (SP) |
ISSN |
Print edition: ISSN 0913-5685 Online edition: ISSN 2432-6380 |
著作権に ついて |
技術研究報告に掲載された論文の著作権は電子情報通信学会に帰属します.(許諾番号:10GA0019/12GB0052/13GB0056/17GB0034/18GB0034) |
PDFダウンロード |
SP2007-193 |