Online edition: ISSN 2432-6380
[TOP] | [2018] | [2019] | [2020] | [2021] | [2022] | [2023] | [2024] | [Japanese] / [English]
SP2022-39
Comparison of fundamental frequency controllable fast neural waveform generative models.
Sota Shimizu (Kobe Univ./NICT), Takuma Okamoto (NICT), Ryoichi Takashima, Tetsuya Takiguchi (Kobe Univ.), Tomoki Toda (Nagoya Univ./NICT), Hisashi Kawai (NICT)
pp. 1 - 6
SP2022-40
MS-FC-HiFiGAN : Fast Neural Waveform Generation Model With Learnable Lightweight Upsampling
Haruki Yamashita (Kobe Univ/NICT), Takuma Okamoto (NICT), Ryoichi Takashima, Tetsuya Takiguchi (Kobe Univ), Tomoki Toda (Nagoya Univ/NICT), Hisashi Kawai (NICT)
pp. 7 - 12
SP2022-41
End-to-End Speech Synthesis Based on Articulatory Movements Captured by Real-time MRI
Yuto Otani, Shun Sawada, Hidefumi Ohmura, Kouichi Katsurada (Tokyo Univ. Sci.)
pp. 13 - 18
SP2022-42
Singing voice synthesis based on a frame-driven attention mechanism considering vocal timing deviation
Miku Nishihara, Yukiya Hono, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda (NITech)
pp. 19 - 24
SP2022-43
Extension of acoustic system measurement based on signal safeguarding
-- Repetition and orthogonalization for post hoc analysis --
Hideki Kawahara (和歌山大), Kohei Yatabe (Tokyo Univ。 of Agriculture and Technology), Ken-Ichi Sakakibara (Health Sciences Univ. of Hokkaido), Mitsunori Mizumachi (Kyushu Inst. of Tech.)
pp. 25 - 30
SP2022-44
A Study on Designing Hopping Patterns Based on Euler Graphs for Inaudible Sound Communication Systems
Naofumi Aoki, Kosei Ozeki (Hokkaido Univ.), Kenichi Ikeda, Hiroshi Yasuda, Hiroyuki Namba (SST)
pp. 31 - 34
SP2022-45
Influence of Reflections in Small-Scale Anechoic Room Measurements
Tatsuya Higuchi, Yutaka Kaneda, Kenji Suyama (Tokyo Denki Univ.)
pp. 35 - 40
SP2022-46
Generation of the individualized head-related transfer functions in the upper hemisphere using parametric notch-peak model in the median plane
Fuka Nakamura, Kazuhiro Iida (CIT)
pp. 41 - 48
SP2022-47
Image reconstruction with a diffusion model for robust image classification against unknown degradation
Teruaki Akazawa (Tokyo Metro. Univ.), Yuma Kinoshita (Tokai Univ.), Hitoshi Kiya (Tokyo Metro. Univ.)
pp. 49 - 54
SP2022-48
The target detection method through autocovariance matrices and its robust analysis
Yusuke Ono, Linyu Peng (Keio Univ.)
pp. 55 - 60
SP2022-49
Hadamard-coded Supervised Discrete Hashing on Quaternion Domain
Akari Katsuma, Seisuke Kyochi (Kogakuin Univ.), Shunsuke Ono (Tokyo Tech.), Ivan Selesnick (New York Univ.)
pp. 61 - 66
SP2022-50
Acoustic Echo and Noise Canceller Based on Minimization of Shared-Error Signal
Kenta Iwai, Takanobu Nishiura (Ritsumeikan Univ.)
pp. 67 - 72
SP2022-51
[Invited Talk]
Multiple sound spot synthesis meets multilingual speech synthesis
-- Implementation is really all we need --
Takuma Okamoto (NICT)
pp. 73 - 76
SP2022-52
[Invited Talk]
Multichannel audio source separation based on deep generative model and signal independence
Li Li (CA)
p. 77
SP2022-53
Self-Supervised Learning With Spatial Audio-Visual Recording for Sound Event Localization and Detection
Yoto Fujita (Kyoto Univ.), Yoshiaki Bando (AIST), Keisuke Imoto (Doshisha Univ./AIST), Masaki Onihsi (AIST), Yoshii Kazuyoshi (Kyoto Univ.)
pp. 78 - 82
SP2022-54
Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images
Hien Ohnaka (NITTC), Shinnosuke Takamichi (UT), Keisuke Imoto (DU), Yuki Okamoto (Rits), Kazuki Fujii, Hiroshi Saruwatari (UT)
pp. 83 - 88
SP2022-55
Generalized warping based on Lie group theory
Atsushi Miyashita, Tomoki Toda (Nagoya Univ.)
pp. 89 - 94
SP2022-56
Vocal tract length estimation using fundamental frequency adaptive auditory representation
Toshio Irino, Shintaro Doan (Wakayama Univ.)
pp. 95 - 100
SP2022-57
DNN-based Noise Reduction Using Noise Signal for Target Signal
Ryota Hiromasa, Hien Ohnaka, Ryoichi Miyazaki (NITTC)
pp. 101 - 106
SP2022-58
A new configuration of 1-2-2 multi-channel active noise control system
Kensaku Fujii (Kodaway Lab.), Mitsuji Muneyasu (Kansai Univ.), Yoshifumi Chisaki (CIT)
pp. 107 - 114
SP2022-59
A method of constantly estimating the feedback path in active noise control systems
Kensaku Fujii (kodaway Lab.), Mitsuji Muneyasu (Kansai Univ.), Yoshifumi Chisaki (CIT)
pp. 115 - 122
SP2022-60
Sound Source Localization Method based on Suppression Amount of Complex Weighted Sum Circuit
Tsukasa Hidaka, Kenji Suyama (Tokyo Denki Univ.)
pp. 123 - 128
SP2022-61
Application of Frequency Domain Adaptive Filter to Residual Noise Reduction
Kai Furusawa, Kenji Suyama (Tokyo Denki Univ.)
pp. 129 - 134
SP2022-62
A Study of the Number of Groups for CSD Coefficient FIR Filter Design by Grouped ACO
Marika Morikawa, Kenji Suyama (Tokyo Denki Univ.)
pp. 135 - 140
SP2022-63
Training Dialect Speech Recognition Model using Corpus of Japanese Dialects and Self-Supervised Learning-based Model XLSR
Shogo Miwa, Atsuhiko Kai (Shizuoka Univ.)
pp. 141 - 146
SP2022-64
A Study on Scheduled Sampling for Neural Transducer-based ASR
Takafumi Moriya, Takanori Ashihara, Hiroshi Sato, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura (NTT)
pp. 147 - 152
SP2022-65
Domain Adaptation for Improving End-to-end ASR Performance of Classroom Speech with Variable Recording Condition
Raufun Nahar, Rino Suzuki, Atsuhiko Kai (Shizuoka Univ.)
pp. 153 - 158
SP2022-66
Vocabulary-Set Decomposition and Multi-task Learning for Target Vocabulary Extraction in Japanese Speech Recognition
Aoi Ito (LINE/Hosei Univ.), Tatsuya Komatsu, Yusuke Fujita (LINE)
pp. 159 - 164
SP2022-67
Joint analysis of acoustic scenes and sound events based on semi-supervised learning
Ami Igarashi, Shunsuke Tsubaki, Keisuke Imoto (DU)
pp. 165 - 170
SP2022-68
Texture Reproduction of Ultrasonic Mid-Air Haptics Based on Amplitude Modulation Signal Generation Using Fricative Sounds Feature Extraction and Hand Tracking
Asuto Ueda, Toru Takahashi, Masato Nakayama (Osaka Sangyo Univ.)
pp. 171 - 176
SP2022-69
Regularization Term Design Based on Spectrogram Consistency in Independent Low-Rank Matrix Analysis for Multichannel Audio Source Separation
Sota Misawa, Norihiro Takamune (UTokyo), Kohei Yatabe (TUAT), Daichi Kitamura (NIT, Kagawa), Hiroshi Saruwatari (UTokyo)
pp. 177 - 184
SP2022-70
Anomalous sound detection with complex-valued hybrid neural networks considering phase variations
Shota Nishiyama, Akira Tamamori (AIT)
pp. 185 - 190
SP2022-71
Diffusion-based parallel voice conversion with source-feature condition
Takuya Kishida, Toru Nakashika (UEC)
pp. 191 - 196
SP2022-72
Representation and Prediction of Accent Phrase Prosodic Features in Japanese Text-to-Speech
Masaki Sato, Shinnosuke Takamichi, Hiroshi Saruwatari (The Univ. of Tokyo)
pp. 197 - 202
SP2022-73
An Investigation of Text-to-Speech Synthesis Using Voice Conversion and x-vector Embedding Sympathizing Emotion of Input Audio for Spoken Dialogue Systems
Shunichi Kohara, Masanobu Abe, Sunao Hara (Okayama Univ.)
pp. 203 - 208
SP2022-74
Choral Singing Voice Synthesis with Modulation Acoustic Features
Sora Miyazawa, Anan Kikuchi, Daisuke Saito, Nobuaki Minematsu (UTokyo)
pp. 209 - 214
SP2022-75
Quasi-real-time estimation of a maximum radiation direction from a loudspeaker surrounded by four microphones based on SPL ratio
Ryusei Tsuda, Daiki Maekawa, Tomoru Awatani, Masato Nakayama, Toru Takahashi (Osaka Sangyo Univ.)
pp. 215 - 220
SP2022-76
Analysis of Noisy-target Training for DNN-based speech enhancement and investigation towards its practical use
Takuya Fujimura, Tomoki Toda (Nagoya Univ.)
pp. 221 - 226
SP2022-77
A Study on Selective Fixed-Filter ANC Using 2D-CNN with Sliding DCT input
Kenya Doi, Yoshinobu Kajikawa (KU)
pp. 227 - 231
SP2022-78
Predominant Instrument Recognition in Polyphonic Music Based on Transfer Learning with Vanilla ResNet-50
Lifan Zhong, Daisuke Saito, Nobuaki Minematsu (UTokyo)
pp. 232 - 237
SP2022-79
[Invited Talk]
What Do Self-Supervised Speech Representation Models Know?
-- A Layer-Wise Analysis --
Karen Livescu, Ankita Pasad, Ju-Chieh Chou, Bowen Shi (TTI-Chicago)
p. 238
SP2022-80
[Invited Talk]
Speech and Language Research in the Google Tokyo Office
Michiel Bacchiani (Google)
pp. 239 - 240
SP2022-81
Personality Recognition on Dyadic Interactions with Representation Learning
Nathania Nah (Tokyo Tech), Takafumi Koshinaka (YCU), Koichi Shinoda (Tokyo Tech)
pp. 241 - 246
SP2022-82
The linguistic influence on speaker verification based on Self-Supervised Learning
Tomoka Wakamatsu (Tokyo Metropolitan Univ.), Atsushi Ando (NTT), Sayaka Shiota (Tokyo Metropolitan Univ.), Ryo Masumura (NTT), Hitoshi Kiya (Tokyo Metropolitan Univ.)
pp. 247 - 252
SP2022-83
Increasing speech intelligibility for evacuation guidance by mimicking professional announcers' voice
-- Discussion on speech intelligibility and its physical correlates --
KimDung Tran, Masato Akagi, Masashi Unoki (JAIST)
pp. 253 - 258
SP2022-84
Data cleansing using synthetic speech detection for speaker verification
Kenzo Wada, Sayaka Shiota, Hitoshi Kiya (Tokyo Metropolitan Univ.)
pp. 259 - 263
SP2022-85
Effects of Voice Artificiality on the Degree of Compatibility between Voice and Appearance of Voice Agents
Kota Iura, Naotake Masuda, Daisuke Saito, Nobuaki Minematsu (UTokyo)
pp. 264 - 269
SP2022-86
Quantification of Voice Register Information including Mixed Voice based on Class Posterior Probabilities
Yu Kitamura, Anan Kikuchi, Daisuke Saito, Nobuaki Minematsu (UTokyo)
pp. 270 - 275
SP2022-87
Multiscale Manifold Clustering and Embedding with Multiple Kernels
Kyohei Suzuki, Masahiro Yukawa (Keio Univ.)
pp. 276 - 281
SP2022-88
On Design of Real Filters For Directed Graph Signals
Shogo Muramatsu, Hotaka Kitamura, Hiroyashu Yasuda (Niigta Univ.), Yuichi Tanaka (Osaka Univ.)
pp. 282 - 287
SP2022-89
Low-bit Image Restoration with Loop-unrolled ISTA
Shu Abe, Soushi Takahashi, Shogo Muramatsu (Niigata Univ)
pp. 288 - 293
SP2022-90
A Study on Virtual Sensing Method for Hybrid Active Noise Control System
Shota Toyooka, Kajikawa Yoshinobu (Kansai Univ.)
pp. 294 - 299
SP2022-91
RGB-D Salient Object Detection Using Saliency and Edge Reverse Attention
Tomoki Ikeda, Masaaki Ikehara (Keio Univ.)
pp. 300 - 305
Note: Each article is a technical report without peer review, and its polished version will be published elsewhere.