Committee |
Date Time |
Place |
Paper Title / Authors |
Abstract |
Paper # |
SIP, SP, EA, IPSJ-SLP [detail] |
2024-02-29 10:10 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Noise-Robust Voice Conversion by Denoising Training Conditioned with Latent Variables of Speech Quality and Recording Environment Takuto Igarashi, Yuki Saito, Kentaro Seki, Shinnosuke Takamichi (UT), Ryuichi Yamamoto, Kentaro Tachibana (LY), Hiroshi Saruwatari (UT) EA2023-63 SIP2023-110 SP2023-45 |
In this paper, we propose noise-robust voice conversion by conditioning latent variables representing speech quality and... [more] |
EA2023-63 SIP2023-110 SP2023-45 pp.13-18 |
SIP, SP, EA, IPSJ-SLP [detail] |
2024-02-29 10:30 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Accelerating and stabilizing vectorwise coordinate descent for spatially regularized independent low-rank matrix analysis Yuto Ishikawa, Takuya Okubo, Norihiro Takamune (UTokyo), Tomohiko Nakamura (AIST), Daichi Kitamura (NIT Kagawa), Hiroshi Saruwatari (UTokyo), Yu Takahashi, Kazunobu Kondo (Yamaha) EA2023-68 SIP2023-115 SP2023-50 |
Spatially regularized independent low-rank matrix analysis (SR-ILRMA) is the method that introduces the spatial prior in... [more] |
EA2023-68 SIP2023-115 SP2023-50 pp.43-50 |
SIP, SP, EA, IPSJ-SLP [detail] |
2024-02-29 15:30 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Adaptation of End-to-End Japanese Speech Synthesis Using Crowdsoursed Dialect Accent Labels Yuki Oda, Kazuki Yamauchi, Yuki Saito, Hiroshi Saruwatari (UTokyo) |
[more] |
|
SIP, SP, EA, IPSJ-SLP [detail] |
2024-02-29 15:35 |
Okinawa |
(Primary: On-site, Secondary: Online) |
SRC4VC: Smartphone-Recorded Corpus for Benchmarking Multi-Speaker Voice Conversion Models Yuki Saito, Takuto Igarashi, Kentaro Seki, Shinnosuke Takamichi (UT), Ryuichi Yamamoto, Kentaro Tachibana (LY), Hiroshi Saruwatari (UT) |
[more] |
|
SIP, SP, EA, IPSJ-SLP [detail] |
2024-03-01 09:30 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Multi-Dialect Speech Synthesis with Interpretable Accent latent Variable based on VQ-VAE Kazuki Yamauchi, Yuki Saito, Hiroshi Saruwatari (UTokyo) EA2023-98 SIP2023-145 SP2023-80 |
In this paper, we address two tasks: "Intra-dialect Text-to-Speech (TTS)," aiming to synthesize speech in the same diale... [more] |
EA2023-98 SIP2023-145 SP2023-80 pp.220-225 |
SIP, SP, EA, IPSJ-SLP [detail] |
2024-03-01 16:05 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Evaluating speech generation based on objective measures for text generation Takaaki Saeki (UTokyo), Soumi Maiti (CMU), Shinnosuke Takamichi (UTokyo), Shinji Watanabe (CMU), Hiroshi Saruwatari (UTokyo) EA2023-133 SIP2023-180 SP2023-115 |
In the evaluation of speech generation, while subjective judgments have long been the gold standard, objective metrics s... [more] |
EA2023-133 SIP2023-180 SP2023-115 pp.421-426 |
EA, US (Joint) |
2023-12-22 13:00 |
Fukuoka |
|
[Poster Presentation]
Multichannel Blind Source Separation Using Independent Low-Rank Matrix Analysis with Observed-Signal-Dependent Regularization Based on Spectrogram Consistency Takaaki Kojima, Norihiro Takamune, Sota Misawa (UTokyo), Daichi Kitamura (NIT,Kagawa), Hiroshi Saruwatari (UTokyo) EA2023-51 |
Independent low-rank matrix analysis (ILRMA) is the state-of-the-art technique for blind source separation under the ove... [more] |
EA2023-51 pp.13-20 |
NLC, IPSJ-NL |
2023-03-18 16:40 |
Okinawa |
OIST (Primary: On-site, Secondary: Online) |
Collection of Textual Expressions in the Wild Toward Voice-quality Control from Free Description Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Hiroshi Saruwatari (UTokyo) NLC2022-29 |
[more] |
NLC2022-29 pp.55-60 |
SP, IPSJ-SLP, EA, SIP [detail] |
2023-02-28 16:15 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images Hien Ohnaka (NITTC), Shinnosuke Takamichi (UT), Keisuke Imoto (DU), Yuki Okamoto (Rits), Kazuki Fujii, Hiroshi Saruwatari (UT) EA2022-90 SIP2022-134 SP2022-54 |
(To be available after the conference date) [more] |
EA2022-90 SIP2022-134 SP2022-54 pp.83-88 |
SP, IPSJ-SLP, EA, SIP [detail] |
2023-03-01 09:50 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Regularization Term Design Based on Spectrogram Consistency in Independent Low-Rank Matrix Analysis for Multichannel Audio Source Separation Sota Misawa, Norihiro Takamune (UTokyo), Kohei Yatabe (TUAT), Daichi Kitamura (NIT, Kagawa), Hiroshi Saruwatari (UTokyo) EA2022-105 SIP2022-149 SP2022-69 |
It is known that block permutation occurs in the separated signals obtained by independent low-rank matrix analysis. Rec... [more] |
EA2022-105 SIP2022-149 SP2022-69 pp.177-184 |
SP, IPSJ-SLP, EA, SIP [detail] |
2023-03-01 11:00 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Representation and Prediction of Accent Phrase Prosodic Features in Japanese Text-to-Speech Masaki Sato, Shinnosuke Takamichi, Hiroshi Saruwatari (The Univ. of Tokyo) EA2022-108 SIP2022-152 SP2022-72 |
In order to use speech synthesis in a variety of situations such as dialogue systems and emotional expression in audiobo... [more] |
EA2022-108 SIP2022-152 SP2022-72 pp.197-202 |
SP, IPSJ-SLP, EA, SIP [detail] |
2023-03-01 14:50 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Corpus construction toward multi-domain empathetic dialogue speech synthesis Yuki Saito, Eiji Iimori, Shinnosuke Takamichi (UT), Kentaro Tachibana (LINE), Hiroshi Saruwatari (UT) |
(To be available after the conference date) [more] |
|
SP, IPSJ-MUS, IPSJ-SLP [detail] |
2022-06-17 13:00 |
Online |
Online |
SP2022-6 |
Rank-constrained spatial covariance matrix estimation (RCSCME) is a method for blind speech extraction. In RCSCME, we de... [more] |
SP2022-6 pp.18-23 |
EA |
2022-05-13 12:45 |
Online |
Online |
Directionally-weighted region-to-region kernel interpolation of acoustic transfer function Juliano G. C. Ribeiro, Shoichi Koyama, Hiroshi Saruwatari (UTokyo) EA2022-4 |
An interpolation method for the acoustic transfer function (ATF) for variable source and receiver points within regions ... [more] |
EA2022-4 pp.18-19 |
EA, SIP, SP, IPSJ-SLP [detail] |
2022-03-01 12:20 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Training Algorithm for Multispeaker Text-To-Speech Synthesis Considering Adversarial Regularizer Yusuke Nakai, Kenta Udagawa, Yuki Saito, Hiroshi Saruwatari (UTokyo) EA2021-72 SIP2021-99 SP2021-57 |
(To be available after the conference date) [more] |
EA2021-72 SIP2021-99 SP2021-57 pp.50-55 |
EA, SIP, SP, IPSJ-SLP [detail] |
2022-03-02 10:45 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Evaluation of sentence-level generation in Japanese dialect speech synthesis using accent latent variables Kazuya Yufune, Tomoki Koriyama, Shinnosuke Takamichi, Hiroshi Saruwatari (UTokyo) EA2021-79 SIP2021-106 SP2021-64 |
Japanese dialect speech synthesis is useful for personalized speech synthesis systems. However, inability to prepare acc... [more] |
EA2021-79 SIP2021-106 SP2021-64 pp.96-101 |
EA, SIP, SP, IPSJ-SLP [detail] |
2022-03-02 13:25 |
Okinawa |
(Primary: On-site, Secondary: Online) |
[Poster Presentation]
Filtered-X LMS algorithm based on individual interpolation of primary and secondary sound fields for spatial active noise control Kazuyuki Arikawa, Shoichi Koyama, Hiroshi Saruwatari (The Univ. of Tokyo) EA2021-84 SIP2021-111 SP2021-69 |
Spatial active noise control (ANC), which aims to reduce noise over a three-dimensional target region, has at- tracted a... [more] |
EA2021-84 SIP2021-111 SP2021-69 pp.126-131 |
EA, SIP, SP, IPSJ-SLP [detail] |
2022-03-02 13:25 |
Okinawa |
(Primary: On-site, Secondary: Online) |
[Poster Presentation]
Sound Field Estimation from Small Number of Observations by Deep Learning with Difference-Approximation-Based Helmholtz-Equation Loss Function Kazuhide Shigemi, Shoichi Koyama, TomohikoNakamura, Hiroshi Saruwatari (UTokyo) EA2021-85 SIP2021-112 SP2021-70 |
We propose a single-frequency sound field estimation method from a small number of observations that uses a loss functio... [more] |
EA2021-85 SIP2021-112 SP2021-70 pp.132-139 |
EA, SIP, SP, IPSJ-SLP [detail] |
2022-03-02 15:35 |
Okinawa |
(Primary: On-site, Secondary: Online) |
[Poster Presentation]
Interpolation of head-related transfer function from small amount of observation data using deep learning based on spherical wavefunction expansion Yuki Ito, Tomohiko Nakamura, Shoichi Koyama, Hiroshi Saruwatari (UTokyo) EA2021-90 SIP2021-117 SP2021-75 |
In binaural synthesis, listeners' individual head-related transfer functions (HRTFs) are necessary for highly-immersive ... [more] |
EA2021-90 SIP2021-117 SP2021-75 pp.163-170 |
NLC, IPSJ-NL, SP, IPSJ-SLP [detail] |
2021-12-03 11:00 |
Online |
Online |
Multi-speaker Audiobook Speech Synthesis using Discrete Character Acting Styles Acquired by VQVAE Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Yuki Saito (UT), Yusuke Ijima, Ryo Masumura (NTT), Hiroshi Saruwatari (UT) NLC2021-26 SP2021-47 |
In this paper, we propose a method of extracting discrete character acting styles using vector quantized variational aut... [more] |
NLC2021-26 SP2021-47 pp.42-47 |