IEICE Technical Report

Online edition: ISSN 2432-6380

Volume 122, Number 388

Signal Processing

Workshop Date : 2023-02-28 - 2023-03-01 / Issue Date : 2023-02-21

[PREV] [NEXT]

[TOP] | [2018] | [2019] | [2020] | [2021] | [2022] | [2023] | [2024] | [Japanese] / [English]

[PROGRAM] [BULK PDF DOWNLOAD]


Table of contents

SIP2022-119
Comparison of fundamental frequency controllable fast neural waveform generative models.
Sota Shimizu (Kobe Univ./NICT), Takuma Okamoto (NICT), Ryoichi Takashima, Tetsuya Takiguchi (Kobe Univ.), Tomoki Toda (Nagoya Univ./NICT), Hisashi Kawai (NICT)
pp. 1 - 6

SIP2022-120
MS-FC-HiFiGAN : Fast Neural Waveform Generation Model With Learnable Lightweight Upsampling
Haruki Yamashita (Kobe Univ/NICT), Takuma Okamoto (NICT), Ryoichi Takashima, Tetsuya Takiguchi (Kobe Univ), Tomoki Toda (Nagoya Univ/NICT), Hisashi Kawai (NICT)
pp. 7 - 12

SIP2022-121
End-to-End Speech Synthesis Based on Articulatory Movements Captured by Real-time MRI
Yuto Otani, Shun Sawada, Hidefumi Ohmura, Kouichi Katsurada (Tokyo Univ. Sci.)
pp. 13 - 18

SIP2022-122
Singing voice synthesis based on a frame-driven attention mechanism considering vocal timing deviation
Miku Nishihara, Yukiya Hono, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda (NITech)
pp. 19 - 24

SIP2022-123
Extension of acoustic system measurement based on signal safeguarding -- Repetition and orthogonalization for post hoc analysis --
Hideki Kawahara (和歌山大), Kohei Yatabe (Tokyo Univ。 of Agriculture and Technology), Ken-Ichi Sakakibara (Health Sciences Univ. of Hokkaido), Mitsunori Mizumachi (Kyushu Inst. of Tech.)
pp. 25 - 30

SIP2022-124
A Study on Designing Hopping Patterns Based on Euler Graphs for Inaudible Sound Communication Systems
Naofumi Aoki, Kosei Ozeki (Hokkaido Univ.), Kenichi Ikeda, Hiroshi Yasuda, Hiroyuki Namba (SST)
pp. 31 - 34

SIP2022-125
Influence of Reflections in Small-Scale Anechoic Room Measurements
Tatsuya Higuchi, Yutaka Kaneda, Kenji Suyama (Tokyo Denki Univ.)
pp. 35 - 40

SIP2022-126
Generation of the individualized head-related transfer functions in the upper hemisphere using parametric notch-peak model in the median plane
Fuka Nakamura, Kazuhiro Iida (CIT)
pp. 41 - 48

SIP2022-127
Image reconstruction with a diffusion model for robust image classification against unknown degradation
Teruaki Akazawa (Tokyo Metro. Univ.), Yuma Kinoshita (Tokai Univ.), Hitoshi Kiya (Tokyo Metro. Univ.)
pp. 49 - 54

SIP2022-128
The target detection method through autocovariance matrices and its robust analysis
Yusuke Ono, Linyu Peng (Keio Univ.)
pp. 55 - 60

SIP2022-129
Hadamard-coded Supervised Discrete Hashing on Quaternion Domain
Akari Katsuma, Seisuke Kyochi (Kogakuin Univ.), Shunsuke Ono (Tokyo Tech.), Ivan Selesnick (New York Univ.)
pp. 61 - 66

SIP2022-130
Acoustic Echo and Noise Canceller Based on Minimization of Shared-Error Signal
Kenta Iwai, Takanobu Nishiura (Ritsumeikan Univ.)
pp. 67 - 72

SIP2022-131
[Invited Talk] Multiple sound spot synthesis meets multilingual speech synthesis -- Implementation is really all we need --
Takuma Okamoto (NICT)
pp. 73 - 76

SIP2022-132
[Invited Talk] Multichannel audio source separation based on deep generative model and signal independence
Li Li (CA)
p. 77

SIP2022-133
Self-Supervised Learning With Spatial Audio-Visual Recording for Sound Event Localization and Detection
Yoto Fujita (Kyoto Univ.), Yoshiaki Bando (AIST), Keisuke Imoto (Doshisha Univ./AIST), Masaki Onihsi (AIST), Yoshii Kazuyoshi (Kyoto Univ.)
pp. 78 - 82

SIP2022-134
Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images
Hien Ohnaka (NITTC), Shinnosuke Takamichi (UT), Keisuke Imoto (DU), Yuki Okamoto (Rits), Kazuki Fujii, Hiroshi Saruwatari (UT)
pp. 83 - 88

SIP2022-135
Generalized warping based on Lie group theory
Atsushi Miyashita, Tomoki Toda (Nagoya Univ.)
pp. 89 - 94

SIP2022-136
Vocal tract length estimation using fundamental frequency adaptive auditory representation
Toshio Irino, Shintaro Doan (Wakayama Univ.)
pp. 95 - 100

SIP2022-137
DNN-based Noise Reduction Using Noise Signal for Target Signal
Ryota Hiromasa, Hien Ohnaka, Ryoichi Miyazaki (NITTC)
pp. 101 - 106

SIP2022-138
A new configuration of 1-2-2 multi-channel active noise control system
Kensaku Fujii (Kodaway Lab.), Mitsuji Muneyasu (Kansai Univ.), Yoshifumi Chisaki (CIT)
pp. 107 - 114

SIP2022-139
A method of constantly estimating the feedback path in active noise control systems
Kensaku Fujii (kodaway Lab.), Mitsuji Muneyasu (Kansai Univ.), Yoshifumi Chisaki (CIT)
pp. 115 - 122

SIP2022-140
Sound Source Localization Method based on Suppression Amount of Complex Weighted Sum Circuit
Tsukasa Hidaka, Kenji Suyama (Tokyo Denki Univ.)
pp. 123 - 128

SIP2022-141
Application of Frequency Domain Adaptive Filter to Residual Noise Reduction
Kai Furusawa, Kenji Suyama (Tokyo Denki Univ.)
pp. 129 - 134

SIP2022-142
A Study of the Number of Groups for CSD Coefficient FIR Filter Design by Grouped ACO
Marika Morikawa, Kenji Suyama (Tokyo Denki Univ.)
pp. 135 - 140

SIP2022-143
Training Dialect Speech Recognition Model using Corpus of Japanese Dialects and Self-Supervised Learning-based Model XLSR
Shogo Miwa, Atsuhiko Kai (Shizuoka Univ.)
pp. 141 - 146

SIP2022-144
A Study on Scheduled Sampling for Neural Transducer-based ASR
Takafumi Moriya, Takanori Ashihara, Hiroshi Sato, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura (NTT)
pp. 147 - 152

SIP2022-145
Domain Adaptation for Improving End-to-end ASR Performance of Classroom Speech with Variable Recording Condition
Raufun Nahar, Rino Suzuki, Atsuhiko Kai (Shizuoka Univ.)
pp. 153 - 158

SIP2022-146
Vocabulary-Set Decomposition and Multi-task Learning for Target Vocabulary Extraction in Japanese Speech Recognition
Aoi Ito (LINE/Hosei Univ.), Tatsuya Komatsu, Yusuke Fujita (LINE)
pp. 159 - 164

SIP2022-147
Joint analysis of acoustic scenes and sound events based on semi-supervised learning
Ami Igarashi, Shunsuke Tsubaki, Keisuke Imoto (DU)
pp. 165 - 170

SIP2022-148
Texture Reproduction of Ultrasonic Mid-Air Haptics Based on Amplitude Modulation Signal Generation Using Fricative Sounds Feature Extraction and Hand Tracking
Asuto Ueda, Toru Takahashi, Masato Nakayama (Osaka Sangyo Univ.)
pp. 171 - 176

SIP2022-149
Regularization Term Design Based on Spectrogram Consistency in Independent Low-Rank Matrix Analysis for Multichannel Audio Source Separation
Sota Misawa, Norihiro Takamune (UTokyo), Kohei Yatabe (TUAT), Daichi Kitamura (NIT, Kagawa), Hiroshi Saruwatari (UTokyo)
pp. 177 - 184

SIP2022-150
Anomalous sound detection with complex-valued hybrid neural networks considering phase variations
Shota Nishiyama, Akira Tamamori (AIT)
pp. 185 - 190

SIP2022-151
Diffusion-based parallel voice conversion with source-feature condition
Takuya Kishida, Toru Nakashika (UEC)
pp. 191 - 196

SIP2022-152
Representation and Prediction of Accent Phrase Prosodic Features in Japanese Text-to-Speech
Masaki Sato, Shinnosuke Takamichi, Hiroshi Saruwatari (The Univ. of Tokyo)
pp. 197 - 202

SIP2022-153
An Investigation of Text-to-Speech Synthesis Using Voice Conversion and x-vector Embedding Sympathizing Emotion of Input Audio for Spoken Dialogue Systems
Shunichi Kohara, Masanobu Abe, Sunao Hara (Okayama Univ.)
pp. 203 - 208

SIP2022-154
Choral Singing Voice Synthesis with Modulation Acoustic Features
Sora Miyazawa, Anan Kikuchi, Daisuke Saito, Nobuaki Minematsu (UTokyo)
pp. 209 - 214

SIP2022-155
Quasi-real-time estimation of a maximum radiation direction from a loudspeaker surrounded by four microphones based on SPL ratio
Ryusei Tsuda, Daiki Maekawa, Tomoru Awatani, Masato Nakayama, Toru Takahashi (Osaka Sangyo Univ.)
pp. 215 - 220

SIP2022-156
Analysis of Noisy-target Training for DNN-based speech enhancement and investigation towards its practical use
Takuya Fujimura, Tomoki Toda (Nagoya Univ.)
pp. 221 - 226

SIP2022-157
A Study on Selective Fixed-Filter ANC Using 2D-CNN with Sliding DCT input
Kenya Doi, Yoshinobu Kajikawa (KU)
pp. 227 - 231

SIP2022-158
Predominant Instrument Recognition in Polyphonic Music Based on Transfer Learning with Vanilla ResNet-50
Lifan Zhong, Daisuke Saito, Nobuaki Minematsu (UTokyo)
pp. 232 - 237

SIP2022-159
[Invited Talk] What Do Self-Supervised Speech Representation Models Know? -- A Layer-Wise Analysis --
Karen Livescu, Ankita Pasad, Ju-Chieh Chou, Bowen Shi (TTI-Chicago)
p. 238

SIP2022-160
[Invited Talk] Speech and Language Research in the Google Tokyo Office
Michiel Bacchiani (Google)
pp. 239 - 240

SIP2022-161
Personality Recognition on Dyadic Interactions with Representation Learning
Nathania Nah (Tokyo Tech), Takafumi Koshinaka (YCU), Koichi Shinoda (Tokyo Tech)
pp. 241 - 246

SIP2022-162
The linguistic influence on speaker verification based on Self-Supervised Learning
Tomoka Wakamatsu (Tokyo Metropolitan Univ.), Atsushi Ando (NTT), Sayaka Shiota (Tokyo Metropolitan Univ.), Ryo Masumura (NTT), Hitoshi Kiya (Tokyo Metropolitan Univ.)
pp. 247 - 252

SIP2022-163
Increasing speech intelligibility for evacuation guidance by mimicking professional announcers' voice -- Discussion on speech intelligibility and its physical correlates --
KimDung Tran, Masato Akagi, Masashi Unoki (JAIST)
pp. 253 - 258

SIP2022-164
Data cleansing using synthetic speech detection for speaker verification
Kenzo Wada, Sayaka Shiota, Hitoshi Kiya (Tokyo Metropolitan Univ.)
pp. 259 - 263

SIP2022-165
Effects of Voice Artificiality on the Degree of Compatibility between Voice and Appearance of Voice Agents
Kota Iura, Naotake Masuda, Daisuke Saito, Nobuaki Minematsu (UTokyo)
pp. 264 - 269

SIP2022-166
Quantification of Voice Register Information including Mixed Voice based on Class Posterior Probabilities
Yu Kitamura, Anan Kikuchi, Daisuke Saito, Nobuaki Minematsu (UTokyo)
pp. 270 - 275

SIP2022-167
Multiscale Manifold Clustering and Embedding with Multiple Kernels
Kyohei Suzuki, Masahiro Yukawa (Keio Univ.)
pp. 276 - 281

SIP2022-168
On Design of Real Filters For Directed Graph Signals
Shogo Muramatsu, Hotaka Kitamura, Hiroyashu Yasuda (Niigta Univ.), Yuichi Tanaka (Osaka Univ.)
pp. 282 - 287

SIP2022-169
Low-bit Image Restoration with Loop-unrolled ISTA
Shu Abe, Soushi Takahashi, Shogo Muramatsu (Niigata Univ)
pp. 288 - 293

SIP2022-170
A Study on Virtual Sensing Method for Hybrid Active Noise Control System
Shota Toyooka, Kajikawa Yoshinobu (Kansai Univ.)
pp. 294 - 299

SIP2022-171
RGB-D Salient Object Detection Using Saliency and Edge Reverse Attention
Tomoki Ikeda, Masaaki Ikehara (Keio Univ.)
pp. 300 - 305

Note: Each article is a technical report without peer review, and its polished version will be published elsewhere.


The Institute of Electronics, Information and Communication Engineers (IEICE), Japan