Committee |
Date Time |
Place |
Paper Title / Authors |
Abstract |
Paper # |
SP, NLC, IPSJ-SLP, IPSJ-NL [detail] |
2023-12-03 10:00 |
Tokyo |
Kikai-Shinko-Kaikan Bldg. (Primary: On-site, Secondary: Online) |
Improvement of Tacotron2 text-to-speech model based on masking operation and positional attention mechanism Tong Ma, Daisuke Saito, Nobuaki Minematsu (Univ. of Tokyo) NLC2023-17 SP2023-37 |
[more] |
NLC2023-17 SP2023-37 pp.19-24 |
SP, IPSJ-MUS, IPSJ-SLP [detail] |
2023-06-24 13:50 |
Tokyo |
(Primary: On-site, Secondary: Online) |
Fast Neural Waveform Generation Model With Fully Connected Upsampling Haruki Yamashita (Kobe cniv/NICT), Takuma Okamoto (NICT), Ryoichi Takashima (Kobe Univ), Yamato Ohtani (NICT), Tetsuya Takiguchi (Kobe Univ), Tomoki Toda (Nagoya Univ/NICT), Hisashi Kawai (NICT) SP2023-15 |
In recent years, in text-to-speech synthesis, it is required to improve the inference speed while keeping the quality.
... [more] |
SP2023-15 pp.73-78 |
SP, IPSJ-MUS, IPSJ-SLP [detail] |
2023-06-24 13:50 |
Tokyo |
(Primary: On-site, Secondary: Online) |
Evaluation of multi-speaker text-to-speech synthesis using a corpus for speech recognition with x-vectors for various speech styles Koki Hida (Wakayama Univ/NICT), Takuma Okamoto (NICT), Ryuichi Nisimura (Wakayama Univ), Yamato Ohtani (NICT), Tomoki Toda (Nagoya Univ/NICT), Hisashi Kawai (NICT) SP2023-25 |
We have implemented multi-speaker end-to-end text-to-speech synthesis based on JETS using x-vectors as speaker embedding... [more] |
SP2023-25 pp.125-130 |
PRMU, IPSJ-CVIM |
2023-05-19 09:50 |
Aichi |
(Primary: On-site, Secondary: Online) |
Incorporating Signed Distance Fields to Improve Text-to-3D Generation Zhuofan Sun, Daichi Horita (Univ. of Tokyo), Satoshi Ikehata (NII), Kiyoharu Aizawa (Univ. of Tokyo) PRMU2023-9 |
Frameworks for generating 3D objects from text description have been proposed in recent years. These frameworks utilize ... [more] |
PRMU2023-9 pp.45-50 |
SP, IPSJ-SLP, EA, SIP [detail] |
2023-02-28 09:30 |
Okinawa |
(Primary: On-site, Secondary: Online) |
MS-FC-HiFiGAN : Fast Neural Waveform Generation Model With Learnable Lightweight Upsampling Haruki Yamashita (Kobe Univ/NICT), Takuma Okamoto (NICT), Ryoichi Takashima, Tetsuya Takiguchi (Kobe Univ), Tomoki Toda (Nagoya Univ/NICT), Hisashi Kawai (NICT) EA2022-76 SIP2022-120 SP2022-40 |
In recent years, in text-to-speech synthesis, it is required to improve the inference speed while keeping the quality.
... [more] |
EA2022-76 SIP2022-120 SP2022-40 pp.7-12 |
SP, IPSJ-SLP, EA, SIP [detail] |
2023-02-28 13:00 |
Okinawa |
(Primary: On-site, Secondary: Online) |
[Invited Talk]
Multiple sound spot synthesis meets multilingual speech synthesis
-- Implementation is really all we need -- Takuma Okamoto (NICT) EA2022-87 SIP2022-131 SP2022-51 |
A multilingual multiple sound spot synthesis system is implemented as a user interface for real-time speech translation ... [more] |
EA2022-87 SIP2022-131 SP2022-51 pp.73-76 |
SP, IPSJ-SLP, EA, SIP [detail] |
2023-03-01 11:00 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Representation and Prediction of Accent Phrase Prosodic Features in Japanese Text-to-Speech Masaki Sato, Shinnosuke Takamichi, Hiroshi Saruwatari (The Univ. of Tokyo) EA2022-108 SIP2022-152 SP2022-72 |
In order to use speech synthesis in a variety of situations such as dialogue systems and emotional expression in audiobo... [more] |
EA2022-108 SIP2022-152 SP2022-72 pp.197-202 |
NLC, IPSJ-NL, SP, IPSJ-SLP [detail] |
2022-11-30 15:30 |
Tokyo |
(Primary: On-site, Secondary: Online) |
Semi-supervised joint training of text to speech and automatic speech recognition using unpaired text data Naoki Makishima, Satoshi Suzuki, Atsushi Ando, Ryo Masumura (NTT) NLC2022-14 SP2022-34 |
This paper presents a novel joint training of text to speech (TTS) and automatic speech recognition (ASR) with small amo... [more] |
NLC2022-14 SP2022-34 pp.27-32 |
SP, IPSJ-MUS, IPSJ-SLP [detail] |
2022-06-18 10:50 |
Online |
Online |
[Invited Talk]
Crazy vocoder is unbreakable
-- But let's talk about an informal vision of the future -- Masanori Morise (Meiji Univ.) SP2022-15 |
When current speech synthesis researchers refer to Vocoder in their papers, they are most likely referring to Neural voc... [more] |
SP2022-15 pp.61-66 |
SP, IPSJ-SLP, IPSJ-MUS |
2021-06-19 15:00 |
Online |
Online |
Neural speech synthesis using local phrase dependency structure information Nobuyoshi Kaiki, Sakriani Sakti, Satoshi Nakamura (NIST) SP2021-23 |
In order to synthesize Japanese speech with natural prosody, we introduce an end-to-end TTS with new prosodic symbol rep... [more] |
SP2021-23 pp.107-112 |
NLC, IPSJ-NL, SP, IPSJ-SLP [detail] |
2020-12-02 13:50 |
Online |
Online |
Multi-Modal Emotion Recognition by Integrating of Acoustic and Linguistic Features Ryotaro Nagase, Takahiro Fukumori, Yoichi Yamashita (Ritsumeikan Univ.) NLC2020-14 SP2020-17 |
In recent years, the advanced techique of deep learning has improved the performance of Speech Emotional Recognition as ... [more] |
NLC2020-14 SP2020-17 pp.7-12 |
MVE |
2020-09-09 14:00 |
Online |
Online |
Shitsukan Representation Based on Kansei Model Using Neural Style Feature Natsuki Sunda, Iori Tani (Kwansei Gakuin Univ.), Kensuke Tobitani (The Univ. of Nagasaki), Atsushi Takemoto, Yusuke Tani, Noriko Nagata (Kwansei Gakuin Univ.), Nobufumi Morita (Couture Digital) MVE2020-18 |
In this research, we focus on affective texture, which comprises the visual impressions evoked by surface properties, su... [more] |
MVE2020-18 pp.38-43 |
SP, EA, SIP |
2020-03-02 13:00 |
Okinawa |
Okinawa Industry Support Center (Cancelled but technical report was issued) |
The Effectiveness of Additional Context in DNN-based Spontaneous Speech Synthesis Yuki Yamashita, Tomoki Koriyama, Yuki Saito, Shinnosuke Takamichi (UTokyo), Yusuke Ijima, Ryo Masumura (NTT), Hiroshi Saruwatari (UTokyo) EA2019-112 SIP2019-114 SP2019-61 |
In DNN-based speech synthesis, contexts, which are input features of DNN, can be used not only for the representation of... [more] |
EA2019-112 SIP2019-114 SP2019-61 pp.65-70 |
SP |
2019-08-28 14:40 |
Kyoto |
Kyoto Univ. |
[Poster Presentation]
An investigation on training of WaveNet vocoder in end-to-end text-to-speech Kazuki Yasuhara, Tomoki Hayashi, Tomoki Toda (Nagoya Univ.) SP2019-14 |
In this paper, we investigate the training of WaveNet vocoder in end-to-end text-to-speech. Tacotron 2, which is an end-... [more] |
SP2019-14 pp.31-36 |
EA, SIP, SP |
2019-03-14 13:30 |
Nagasaki |
i+Land nagasaki (Nagasaki-shi) |
[Poster Presentation]
Use and evaluation of Tacotron and context features in rakugo speech synthesis Shuhei Kato (SOKENDAI/NII), Shinji Takaki, Junichi Yamagishi (NII), Yusuke Yasuda (SOKENDAI/NII), Xin Wang (NII) EA2018-126 SIP2018-132 SP2018-88 |
We have been working on constructing rakugo (a traditional Japanese verbal entertainment) speech synthesis toward speech... [more] |
EA2018-126 SIP2018-132 SP2018-88 pp.161-166 |
SP |
2019-01-27 09:00 |
Ishikawa |
Kanazawa-Harmonie |
[Tutorial Invited Lecture]
Software components towards end-to-end speech synthesis at NII
-- Tutorial for Tacotron and WaveNet -- Yusuke Yasuda, Xin Wang (NII) SP2018-56 |
This presentation describes recent advances of end-to-end speech synthesis. We introduce major approaches and our method... [more] |
SP2018-56 p.21 |
NLC, IPSJ-NL, SP, IPSJ-SLP (Joint) [detail] |
2017-12-22 13:00 |
Tokyo |
Waseda Univ. Green Computing Systems Research Organization |
[Invited Talk]
Expressive Speech Synthesis: Approaches to Text-to-Speech with Diverse Voices and Styles Takao Kobayashi (Tokyo Tech.) SP2017-64 |
As the performance of smart devices and information systems becomes higher, more advanced speech interfaces are requeste... [more] |
SP2017-64 pp.85-86 |
SP, IPSJ-SLP (Joint) |
2017-07-27 16:15 |
Miyagi |
Akiu Resort Hotel Crescent |
Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities and Evaluation of Dual Learning Hiroyuki Miyoshi, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari (Univ. of Tokyo) SP2017-17 |
Voice conversion (VC) using sequence-to-sequence learning of context posterior probabilities is proposed. Conventional V... [more] |
SP2017-17 pp.9-14 |
MBE, NC (Joint) |
2017-03-13 13:35 |
Tokyo |
Kikai-Shinko-Kaikan Bldg. |
A Generative Model of Textures Using Hierarchical Probabilistic Principal Component Analysis Aiga Suzuki, Hayaru Shouno (UEC) NC2016-83 |
Modeling of natural textures in an important task for microscopic structure of natural images. Portilla and Simon-
cell... [more] |
NC2016-83 pp.115-120 |
SP, IPSJ-SLP, NLC, IPSJ-NL (Joint) [detail] |
2016-12-20 15:10 |
Tokyo |
NTT Musashino R&D |
[Poster Presentation]
F0 control by modeling differential features in DNN-based speech synthesis Shuhei Yamada, Takashi Nose, Akinori Ito (Tohoku Univ.) SP2016-55 |
We have been developing ``tailor-made speech synthesis,'' a framework which enables users to modify synthetic speech nat... [more] |
SP2016-55 pp.37-42 |