Committee |
Date Time |
Place |
Paper Title / Authors |
Abstract |
Paper # |
SP, NLC, IPSJ-SLP, IPSJ-NL [detail] |
2023-12-03 09:30 |
Tokyo |
Kikai-Shinko-Kaikan Bldg. (Primary: On-site, Secondary: Online) |
Enhancing Recognition of Rare Words in ASR through Error Detection and Context-Aware Error Correction Jiajun He, Zekun Yang, Tomoki Toda (Nagoya Univ.) NLC2023-16 SP2023-36 |
(To be available after the conference date) [more] |
NLC2023-16 SP2023-36 pp.13-18 |
WIT, SP, IPSJ-SLP [detail] |
2023-10-14 16:40 |
Fukuoka |
Kyushu Institute of Technology (Primary: On-site, Secondary: Online) |
Sequence-to-sequence Voice Conversion for Electrolaryngeal Speech Enhancement with Multi-stage Pretraining and Fine-tuning Techniques Ding Ma, Lester Phillip Violeta, Kazuhiro Kobayashi, Tomoki Toda (Nagoya Univ.) SP2023-32 WIT2023-23 |
Sequence-to-sequence (seq2seq) voice conversion (VC) models have great potential for electrolaryngeal (EL) speech to nor... [more] |
SP2023-32 WIT2023-23 pp.27-32 |
WIT, SP, IPSJ-SLP [detail] |
2023-10-14 17:05 |
Fukuoka |
Kyushu Institute of Technology (Primary: On-site, Secondary: Online) |
Electrolaryngeal Speech Enhancement through Strong Linguistic Encoding Methods Lester Phillip Violeta, Wen-Chin Huang, Ding Ma, Ryuichi Yamamoto, Kazuhiro Kobayashi, Tomoki Toda (Nagoya Univ.) SP2023-33 WIT2023-24 |
Although pretraining and fine-tuning approaches have proven to work well in speech intelligibility enhancement, various ... [more] |
SP2023-33 WIT2023-24 pp.33-38 |
SP, IPSJ-MUS, IPSJ-SLP [detail] |
2023-06-23 13:50 |
Tokyo |
(Primary: On-site, Secondary: Online) |
[Poster Presentation]
MS-Harmonic-Net++ vs SiFi-GAN: Comparison of fundamental frequency controllable fast neural waveform generative models. Sota Shimizu (Kobe Univ./NICT), Takuma Okamoto (NICT), Ryoichi Takashima (Kobe Univ.), Yamato Ohtani (NICT), Tetsuya Takiguchi (Kobe Univ.), Tomoki Toda (Nagoya Univ./NICT), Hisashi Kawai (NICT) SP2023-5 |
Although Harmonic-Net+ has been proposed as a fundamental frequency (fo) and speech rate (SR) controllable fast neural v... [more] |
SP2023-5 pp.20-25 |
SP, IPSJ-MUS, IPSJ-SLP [detail] |
2023-06-24 13:50 |
Tokyo |
(Primary: On-site, Secondary: Online) |
Fast Neural Waveform Generation Model With Fully Connected Upsampling Haruki Yamashita (Kobe cniv/NICT), Takuma Okamoto (NICT), Ryoichi Takashima (Kobe Univ), Yamato Ohtani (NICT), Tetsuya Takiguchi (Kobe Univ), Tomoki Toda (Nagoya Univ/NICT), Hisashi Kawai (NICT) SP2023-15 |
In recent years, in text-to-speech synthesis, it is required to improve the inference speed while keeping the quality.
... [more] |
SP2023-15 pp.73-78 |
SP, IPSJ-MUS, IPSJ-SLP [detail] |
2023-06-24 13:50 |
Tokyo |
(Primary: On-site, Secondary: Online) |
Evaluation of multi-speaker text-to-speech synthesis using a corpus for speech recognition with x-vectors for various speech styles Koki Hida (Wakayama Univ/NICT), Takuma Okamoto (NICT), Ryuichi Nisimura (Wakayama Univ), Yamato Ohtani (NICT), Tomoki Toda (Nagoya Univ/NICT), Hisashi Kawai (NICT) SP2023-25 |
We have implemented multi-speaker end-to-end text-to-speech synthesis based on JETS using x-vectors as speaker embedding... [more] |
SP2023-25 pp.125-130 |
SP, IPSJ-SLP, EA, SIP [detail] |
2023-02-28 09:10 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Comparison of fundamental frequency controllable fast neural waveform generative models. Sota Shimizu (Kobe Univ./NICT), Takuma Okamoto (NICT), Ryoichi Takashima, Tetsuya Takiguchi (Kobe Univ.), Tomoki Toda (Nagoya Univ./NICT), Hisashi Kawai (NICT) EA2022-75 SIP2022-119 SP2022-39 |
Neural vocoders, which reconstruct speech waveforms from acoustic features with deep neural networks, have significantly... [more] |
EA2022-75 SIP2022-119 SP2022-39 pp.1-6 |
SP, IPSJ-SLP, EA, SIP [detail] |
2023-02-28 09:30 |
Okinawa |
(Primary: On-site, Secondary: Online) |
MS-FC-HiFiGAN : Fast Neural Waveform Generation Model With Learnable Lightweight Upsampling Haruki Yamashita (Kobe Univ/NICT), Takuma Okamoto (NICT), Ryoichi Takashima, Tetsuya Takiguchi (Kobe Univ), Tomoki Toda (Nagoya Univ/NICT), Hisashi Kawai (NICT) EA2022-76 SIP2022-120 SP2022-40 |
In recent years, in text-to-speech synthesis, it is required to improve the inference speed while keeping the quality.
... [more] |
EA2022-76 SIP2022-120 SP2022-40 pp.7-12 |
SP, IPSJ-SLP, EA, SIP [detail] |
2023-02-28 16:35 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Generalized warping based on Lie group theory Atsushi Miyashita, Tomoki Toda (Nagoya Univ.) EA2022-91 SIP2022-135 SP2022-55 |
Speech is an ordered sequence of data. In speech processing, some processes, such as spectral frequency warping, speakin... [more] |
EA2022-91 SIP2022-135 SP2022-55 pp.89-94 |
SP, IPSJ-SLP, EA, SIP [detail] |
2023-03-01 11:00 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Analysis of Noisy-target Training for DNN-based speech enhancement and investigation towards its practical use Takuya Fujimura, Tomoki Toda (Nagoya Univ.) EA2022-112 SIP2022-156 SP2022-76 |
Deep neural network (DNN)-based speech enhancement usually uses a clean speech as a training target. However, it is hard... [more] |
EA2022-112 SIP2022-156 SP2022-76 pp.221-226 |
SP, IPSJ-MUS, IPSJ-SLP [detail] |
2022-06-17 15:00 |
Online |
Online |
Representation and analytical normalization for vocal-tract-length transformation by group theory Atsushi Miyashita, Tomoki Toda (Nagoya Univ) SP2022-11 |
In automatic speech recognition, a recognition result should be invariant with respect to acoustic changes caused by dif... [more] |
SP2022-11 pp.41-46 |
EA |
2022-05-13 14:35 |
Online |
Online |
A serial anomalous sound detection method using outlier exposure based on two types of binary classification Ibuki Kuroyanagi (Nagoya Univ.), Tomoki Hayashi (Nagoya Univ./HDL/), Kazuya Takeda, Tomoki Toda (Nagoya Univ.) EA2022-8 |
Anomalous sound detection systems use only normal sound data to detect unknown, atypical sounds. Conventional methods us... [more] |
EA2022-8 pp.35-40 |
EA, SIP, SP, IPSJ-SLP [detail] |
2022-03-01 14:45 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Target speaker extraction based on conditional variational autoencoder and directional information in underdetermined condition Rui Wang, Li Li, Tomoki Toda (Nagoya Univ) EA2021-76 SIP2021-103 SP2021-61 |
This paper deals with a dual-channel target speaker extraction problem in underdetermined conditions. A blind source sep... [more] |
EA2021-76 SIP2021-103 SP2021-61 pp.76-81 |
EA, US, SP, SIP, IPSJ-SLP [detail] |
2021-03-03 14:05 |
Online |
Online |
[Poster Presentation]
A unified source-filter network for neural vocoder Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda (Nagoya Univ.) EA2020-69 SIP2020-100 SP2020-34 |
In this paper, we propose a method to develop a neural vocoder using a single network based on the source-filter theory.... [more] |
EA2020-69 SIP2020-100 SP2020-34 pp.57-62 |
EA, US, SP, SIP, IPSJ-SLP [detail] |
2021-03-04 09:00 |
Online |
Online |
Anomalous Sound Detection Using a Binary Classification Model Considering Class Centroids Ibuki Kuroyanagi, Tomiki Hayashi, Kazuya Takeda, Tomoki Toda (Nagoya Univ) EA2020-79 SIP2020-110 SP2020-44 |
In an anomalous sound detection system, it is necessary to detect unknown anomalous sounds using only normal sound data.... [more] |
EA2020-79 SIP2020-110 SP2020-44 pp.114-121 |
SP, EA, SIP |
2020-03-02 09:20 |
Okinawa |
Okinawa Industry Support Center (Cancelled but technical report was issued) |
Investigation of neural speech rate conversion with multi-speaker WaveNet vocoder Takuma Okamoto (NICT), Keisuke Matsubara (Kobe Univ./NICT), Tomoki Toda (Nagoya Univ./NICT), Yoshinori Shiga, Hisashi Kawai (NICT) EA2019-101 SIP2019-103 SP2019-50 |
Speech rate conversion technology, which can expand or compress speech waveforms without changing pitch of sound, is con... [more] |
EA2019-101 SIP2019-103 SP2019-50 pp.1-6 |
SP, EA, SIP |
2020-03-03 09:00 |
Okinawa |
Okinawa Industry Support Center (Cancelled but technical report was issued) |
Cross-Lingual Voice Conversion using Cyclic Variational Auto-encoder Hikaru Nakatani, Patrick Lumban Tobing, Kazuya Takeda, Tomoki Toda (Nagoya Univ.) EA2019-139 SIP2019-141 SP2019-88 |
In this report, we present a novel cross-lingual voice conversion (VC) method based on cyclic variational auto-encoder (... [more] |
EA2019-139 SIP2019-141 SP2019-88 pp.219-224 |
SP, EA, SIP |
2020-03-03 09:00 |
Okinawa |
Okinawa Industry Support Center (Cancelled but technical report was issued) |
Semi-supervised Self-produced Speech Enhancement and Suppression Based on Joint Source Modeling of Air- and Body-conducted Signals Using Variational Autoencoder Shogo Seki, Moe Takada, Kazuya Takeda, Tomoki Toda (Nagoya Univ.) EA2019-140 SIP2019-142 SP2019-89 |
This paper proposes a semi-supervised method for enhancing and suppressing self-produced speech, using a variational aut... [more] |
EA2019-140 SIP2019-142 SP2019-89 pp.225-230 |
NLC, IPSJ-NL, SP, IPSJ-SLP (Joint) [detail] |
2019-12-06 16:25 |
Tokyo |
NHK Science & Technology Research Labs. |
An evaluation of representation learning using phoneme posteriorgrams and data augmentation in speech emotion recognition Shintaro Okada (Nagoya Univ.), Atsushi Ando (Nagoya Univ./NTT), Tomoki Toda (Nagoya Univ.) SP2019-43 |
This paper presents a new speech emotion recognition method based on representation learning and data augmentation.
To ... [more] |
SP2019-43 pp.91-96 |
SP |
2019-08-28 14:40 |
Kyoto |
Kyoto Univ. |
[Poster Presentation]
Intelligibility enhancement based on speech waveform modification using hearing impairment simulator Shu Hikosaka, Kazuhiro Kobayashi, Tomoki Hayashi, Shogo Seki, Kazuya Takeda (Nagoya Univ.), Hideki Banno (Meijo Univ.), Tomoki Toda (Nagoya Univ.) SP2019-13 |
For sensory hearing loss, which is difficult to treat medically, a hearing aid is commonly used to assist the hearing fu... [more] |
SP2019-13 pp.25-29 |