IEICE Technical Report

Online edition: ISSN 2432-6380

Volume 123, Number 401

Engineering Acoustics

Workshop Date : 2024-02-29 - 2024-03-01 / Issue Date : 2024-02-22

[PREV] [NEXT]

[TOP] | [2018] | [2019] | [2020] | [2021] | [2022] | [2023] | [2024] | [Japanese] / [English]

[PROGRAM] [BULK PDF DOWNLOAD]


Table of contents

EA2023-61
Vocal tract length perturbation-based pseudo-speaker augmentation for automatic speaker verification
Tomoka Wakamatsu, Sayaka Shiota, Hitoshi Kiya (Tokyo Metropolitan Univ.)
pp. 1 - 6

EA2023-62
Pseudo-speaker augmentation based on vocal tract length perturbation considering speaker variability for speaker verification
Fumika Ono, Tomoka Wakamatsu, Sayaka Shiota (TMU)
pp. 7 - 12

EA2023-63
Noise-Robust Voice Conversion by Denoising Training Conditioned with Latent Variables of Speech Quality and Recording Environment
Takuto Igarashi, Yuki Saito, Kentaro Seki, Shinnosuke Takamichi (UT), Ryuichi Yamamoto, Kentaro Tachibana (LY), Hiroshi Saruwatari (UT)
pp. 13 - 18

EA2023-64
Multi-task learning with age information model for highly accurate elderly speech recognition.
Shine Takumi, Kinouchi Takahiro, Wakabayashi Yukoh, Kitaoka Norihide (TUT)
pp. 19 - 24

EA2023-65
Simultaneous Estimation of Transfer Coefficients and Signals of Sound-to-Light Conversion Device Blinky Under Saturation Using Non-negative Matrix Factorization
Kosuke Nishida, Natsuki Ueno, Nobutaka Ono (TMU), Daichi Kitamura (Kagawa NCT)
pp. 25 - 30

EA2023-66
Derivation of Direct Update Rule for Back-Projected Separation Matrix
Yui Kuriki, Taishi Nakashima, Nobutaka Ono (TMU)
pp. 31 - 36

EA2023-67
Analysis of Overlapped Utterances in Everyday Conversation and Source Separation by Online Independent Vector Analysis for Asynchronous Distributed Recordings
Haruki Nammoku, Taishi Nakashima, Kouei Yamaoka, Yukoh Wakabayashi, Nobutaka Ono (TMU)
pp. 37 - 42

EA2023-68
Accelerating and stabilizing vectorwise coordinate descent for spatially regularized independent low-rank matrix analysis
Yuto Ishikawa, Takuya Okubo, Norihiro Takamune (UTokyo), Tomohiko Nakamura (AIST), Daichi Kitamura (NIT Kagawa), Hiroshi Saruwatari (UTokyo), Yu Takahashi, Kazunobu Kondo (Yamaha)
pp. 43 - 50

EA2023-69
Evaluation of Effect of Scatterer Shape on Incident Sound Field Estimation Based on Kernel Interpolation
Shihori Kozuka (NTT), Shoichi Koyama (NII), Hiroaki Itou, Noriyoshi Kamado (NTT)
pp. 51 - 56

EA2023-70
Study on Virtual Sensing Feedback ANC System with Noise Control Filter Selection
Shota Toyooka, Yoshinobu Kajikawa (Kansai Univ.)
pp. 57 - 60

EA2023-71
(See Japanese page.)
pp. 61 - 64

EA2023-72
On conditions for stably working filtered-x type active noise control systems
Kensaku Fujii (Kodaway Lab.), Mitsuji Muneyasu (Kansai Univ.), Yoshifumi Chisaki (CIT)
pp. 65 - 72

EA2023-73
Study of Sound Source Localization for Disaster Survivor Search Using Quadcopters -- An Analysis of Factors Related to MUSIC Algorithm through Environmental Modeling with PyRoomAcoustics --
Masachika Kamada (Waseda Univ.), Junji Yamato (Kogakuin Univ.), Yasuhiro Oikawa, Hiroshi G Okuno, Jun Ohya (Waseda Univ.)
pp. 73 - 78

EA2023-74
Development of the mental disorder estimation model using voice
Kaho Kato, Akihiko Takashima, Kei Kikuiri, Takeshi Yoshimura (NTT docomo)
pp. 79 - 84

EA2023-75
Multiple Lag Window Pairs for Estimation of Fundamental Frequency and Periodicity Measure
Michiki Koshimori (UEC), Shigeki Sagayama (UTokyo/UEC), Toru Nakashika (UEC)
pp. 85 - 90

EA2023-76
A Study on Automatic Performance for Emulating the Playing Style of a Specific Pianist using Feature Extraction with LSTM and Score Analysis
Li Senhao, Matsuno Yutaka (Nihon Univ.)
pp. 91 - 96

EA2023-77
(See Japanese page.)
pp. 97 - 102

EA2023-78
(See Japanese page.)
pp. 103 - 108

EA2023-79
Kernel-Induced Sampling Theorem for A Class of Mapping-Prescribed Reproducing Kernel Hilbert Spaces
Akira Tanaka (Hokkaido Univ.)
pp. 109 - 114

EA2023-80
An Enhanced Privacy-Preserving Scheme for Federated Learning of Vision Transformer without Model Performance Degradation
Rei Aso, Sayaka Shiota, Hitoshi Kiya (Tokyo Metropolitan Univ.)
pp. 115 - 120

EA2023-81
Privacy preserving deep unrolling ISTA method for sparse representation
Nichika Yuge, Takayuki Nakachi (Univ. of the Ryukyus.)
pp. 121 - 126

EA2023-82
Lightweight and Interpretable Deep Learning Model for EEG-Based Sleep Stage Classification
Aozora Ito, Toshihisa Tanaka (TUAT)
pp. 127 - 132

EA2023-83
Element Selection Based on Classifiability Using Nonconvex Sparse Optimization
Taiga Kawamura, Natsuki Ueno, Nobutaka Ono (TMU)
pp. 133 - 138

EA2023-84
Cramér-Rao Lower Bound for Parameter Estimation from Observation with Irreversible Saturation Effects
Natsuki Ueno, Hirokazu Kameoka (NTT)
pp. 139 - 144

EA2023-85
Adaptive subspace clustering for matrix completion
Takuto Wada (Hosei Univ.), Ryohei Sasaki (TUT), Katsumi Konishi (Hosei Univ.)
pp. 145 - 149

EA2023-86
Byzantine attack detection via similarity of local updates in federated learning
Kenta Ohno, Masao Yamagishi (Hosei Univ.)
pp. 150 - 155

EA2023-87
Multiple sound source localization system in a rectangular area based on a distributed microphone array network
Toru Takahashi, Kotaro Fukuda, Taiki Kanbayashi, Hitoshi Ogaki (OSU), Ryo Higashigawa (coroutine), Masato Nakayama (OSU)
pp. 156 - 161

EA2023-88
Comparison of DNN architectures for determined BSS by proximal average of IVA and DNN
Kazuki Matsumoto (Waseda Univ.), Koki Yamada, Kohei Yatabe (TUAT)
pp. 162 - 167

EA2023-89
Role Selection of Microphone Pairs for Omnidirectional Sound Source Tracking
Haruto Sasaki, Kenji Suyama (Tokyo Denki Univ.)
pp. 168 - 173

EA2023-90
Residual Noise Reduction Based on Sound Source Signal Independence
Kai Furusawa, Kenji Suyama (Tokyo Denki Univ.)
pp. 174 - 179

EA2023-91
Effectiveness of Specified Error for Suppression Section in Directivity Design
Tsukasa Hidaka, Kenji Suyama (Tokyo Denki Univ.)
pp. 180 - 184

EA2023-92
Multiple Sound Source Localization using High Spatial Resolution Microphone Pairs
Tomoya Hori, Kenji Suyama (Tokyo Denki Univ.)
pp. 185 - 189

EA2023-93
An experimental survey on speaker embedding spaces for controlling speaker identity in speech synthesis system
Wakuto Morita, Daisuke Saito, Nobuaki Minematsu (Univ. of Tokyo)
pp. 190 - 195

EA2023-94
SELECTING N-LOWEST SCORES FOR TRAINING MOS PREDICTION MODELS
Yuto Kondo, Hirokazu Kameoka, Kou Tanaka, Takuhiro Kaneko (NTT)
pp. 196 - 201

EA2023-95
Improving training recipe of Remixed2Remixed for speech enhancement
Li Li, Shogo Seki (CyberAgent)
pp. 202 - 207

EA2023-96
A Study on Environmental Sound Synthesis in the Case of Pausing in Virtual Walking Applications
Hiroshi Nishijima, Wakuto Morita, Daisuke Saito, Nobuaki Minematsu (UTokyo)
pp. 208 - 213

EA2023-97
Analysis of speech synthesis of text-free audio using a self-supervised learning model -- focusing on multilingual applications --
Joonyong Park, Daisuke Saito, Nobuaki Minematsu (The Univ. of Tokyo)
pp. 214 - 219

EA2023-98
Multi-Dialect Speech Synthesis with Interpretable Accent latent Variable based on VQ-VAE
Kazuki Yamauchi, Yuki Saito, Hiroshi Saruwatari (UTokyo)
pp. 220 - 225

EA2023-99
Constructing and Evaluating a Batch Voice Input System for Electronic Medical Records Using Large Language Models
Ryo Maejima, Norihide Kitaoka (TUT)
pp. 226 - 231

EA2023-100
Domain adaptation of speech recognition model based on multilingual SSL model with only nonparallel corpus.
Takahiro Kinouchi (TUT), Atsunori Ogawa (NTT), Yukoh Wakabayashi (TUT), Kengo Ohta (NITA), Norihide Kitaoka (TUT)
pp. 232 - 237

EA2023-101
Improving speech recognition system consisting of multiple speech recognition models
Keigo Hojo, Yukoh Wakabayashi (TUT), Kengo Ohta (NITAC), Atsunori Ogawa (NTT), Norihide Kitaoka (TUT)
pp. 238 - 243

EA2023-102
Evaluation of Automatic Speech Recognition for Deaf and Hard-of-Hearing People by Speaker Adaptation.
Kaito Takahashi, Takahiro Kinouchi, Yukoh Wakabayashi (TUT), Kengo Ohta (NITAC), Akio Kobayashi (Yamato Univ.), Norihide Kitaoka (TUT)
pp. 244 - 249

EA2023-103
Intermediate speaker speech synthesis between two speakers using x-vector speaker space
Sota Hosoi, Takahiro Kinouchi, Yukoh Wakabayashi, Norihide Kitaoka (TUT)
pp. 250 - 255

EA2023-104
Speech representation based on VAE assuming gamma distribution for latent variables and observation
Nanako Imaichi, Toru Nakashika (UEC)
pp. 256 - 261

EA2023-105
An Investigation into Weighting Strategies for Model Averaging in Continual Learning for Automatic Speech Recognition
Kentaro Shinayama, Hiroshi Sato, Tomoharu Iwata, Takeshi Mori, Taichi Asami (NTT)
pp. 262 - 267

EA2023-106
Substitution of Implicit Linguistic Information in Beam Search Decoding Using CTC-based Speech Recognition Models
Tatsunari Takagi, Yukoh Wakabayashi (TUT), Atsunori Ogawa (NTT), Norihide Kitaoka (TUT)
pp. 268 - 273

EA2023-107
A study on loom operation analysis using acoustic signals for abnormality detection
Shinji Sako (NITech)
pp. 274 - 276

EA2023-108
An Investigation on the Speech Recovery from EEG Signals Using Transformer
Tomoaki Mizuno (The Univ. of Electro-Communications), Takuya Kishida (Aichi Shukutoku Univ.), Natsue Yoshimura (Tokyo Tech), Toru Nakashika (The Univ. of Electro-Communications)
pp. 277 - 282

EA2023-109
modal-to-falsetto singing voice conversion focused on the shape of glottal sound wave and parameter control of the glottal wave
Shota Okada, Yu Kitamura, Daisuke Saito, Nobuaki Minematsu (Tokyo Univ.)
pp. 283 - 288

EA2023-110
Black-Box Adversarial Attack for Math Formula Recognition Model
Haruto Namura, Masatomo Yoshida (Doshisha Univ.), Nicola Adami (UNIBS), Masahiro Okuda (Doshisha Univ.)
pp. 289 - 293

EA2023-111
Variable step size of shared error NLMS algorithm for acoustic echo and noise canceller
Kenta Iwai, Takanobu Nishiura (Ritsumeikan Univ.)
pp. 294 - 299

EA2023-112
EEG during music recall: Time-frequency analysis, event-related potential, and directed connectivity
Mayu Goto, Ingon Chanpornpakdi, Kazuki Matsunaga, Shuma Ito, Toshihisa Tanaka (TUAT)
pp. 300 - 305

EA2023-113
Decorrelation-based blind speech separation
Shinya Saito, Kunio Oishi (Tokyo University of Tech.)
pp. 306 - 308

EA2023-114
Cello-like Sound Synthesis from Viola Recordings Using Pitch Shifting and Harmonic Generation
Natsuki Yoshino, Akira Tanaka (Hokudai)
pp. 309 - 314

EA2023-115
Multiple Pitch Estimation Based on Finite-Order Harmonic Constraint Differential Equation
Kenta Yamada, Yoshiki Masuyama, Kouei Yamaoka, Natsuki Ueno, Nobutaka Ono (Metropolitan Univ.)
pp. 315 - 320

EA2023-116
Inverse filter design of Shoulder-mounted Wearable Speaker using H-infinity control theory -- Extension and evaluation to MIMO systems --
Kenji Kita (Daido Univ.)
pp. 321 - 326

EA2023-117
Mixing Method of Remote Choral Sound Source by Component Selection Using Sparse Representation
Haruki Ota, Kota Takahashi (UEC)
pp. 327 - 332

EA2023-118
[Invited Talk] Getting Started With Environmental Sound Analysis and Synthesis
Keisuke Imoto (Doshisha Univ.)
p. 333

EA2023-119
Investigation of objective intelligibility metrics based on speech foundation models for Clarity Prediction Challenge 2
Katsuhiko Yamamoto (CyberAgent)
pp. 334 - 339

EA2023-120
Spatial auditory masking of audio signals with different elevations on the median plane and a sagittal plane
Hiroto Fujishiro, Masayuki Nishiguchi, Kanji Watanabe, Koji Abe (Akita Prefectural Univ.)
pp. 340 - 345

EA2023-121
Acoustic morphing based on autoencoder for piano scale and reverberation
Yuma Hakoda, Takao Tsuchiya (Doshisha Univ.)
pp. 346 - 351

EA2023-122
Investigation on factors of beamforming with reduced the number of microphones on sound space synthesis
Ryosuke Oyashiki, Kanji Watanabe, Masayuki Nishiguchi, Koji Abe (Akita Prefectural Univ.)
pp. 352 - 359

EA2023-123
Perceptible delay of moving sound source signals with different azimuth and bandwidth
Yuuki Saito, Masayuki Nishiguchi, Kanji Watanabe, Koji Abe (Akita Prefectural Univ.)
pp. 360 - 367

EA2023-124
Creation of representative head-related impulse responses for binaural rendering of audio signals by waveform based acoustic panning
Kazuki Houshito, Masayuki Nishiguchi, Kanji Watanabe, Koji Abe (Akita Prefectural Univ.)
pp. 368 - 375

EA2023-125
Discrimination of rotation direction of virtual sound source in binaural synthesis using sound source radiation characteristics
Orie Nishiyama (Chiba Institute of Technology), Toshiharu Horiuchi, Shota Okubo (KDDI Research, Inc.), Yoshifumi Chisaki (Chiba Institute of Technology)
pp. 376 - 381

EA2023-126
Simulation Evaluation of Speech Detection Based on Distributed Sound-to-Light Conversion Device Blinkies
Satoshi Motoyama, Natsuki Ueno, Masahiro Yasuda (TMU), Yuma Kinoshita (Tokai Univ.), Nobutaka Ono (TMU)
pp. 382 - 387

EA2023-127
Evaluations of Multi-channel Blind Source Separation for Speech Recognition in Car Environments
Yutsuki Takeuchi, Natsuki Ueno, Nobutaka Ono (Tokyo Metropolitan Univ.), Takashi Takazawa, Shuhei Shimanoe, Tomoki Tanemura (MIRISE Technologies)
pp. 388 - 393

EA2023-128
Large Scale Pre-training and Dynamic Convolution for Image Restoration Under Bad Weather Conditions
Shugo Yamashita, Masaaki Ikehara (Keio Univ.)
pp. 394 - 399

EA2023-129
Synthesizing perceived melody from stereo electroencephalogram
Yuta Inaba, Yuiko Kumagai, Naoki Yoshimura, Shuji Komeiji (Tokyo Univ. Agri.&Tech.), Takumi Mitsuhashi, Yasushi Iimura, Hiroharu Suzuki, Hidenori Sugano (Juntendo Univ.), Toshihisa Tanaka (Tokyo Univ. Agri.&Tech.)
pp. 400 - 405

EA2023-130
A Design of Denser-Graph-Frequency Graph Fourier Frames for Undirected Graph Signal Analysis
Kaito Nitani, Seisuke Kyochi (Kogakuin Univ.)
pp. 406 - 410

EA2023-131
Generating Japanese-accented English voices of 3 types according to the listening proficiency of Japanese ESL learners
Kiyotada Mori, Yasuo Miyoshi, Ryo Okamoto (Kochi Univ.)
pp. 411 - 414

EA2023-132
Prediction of Voice Processing Intensity Matching the Impression of a Voice Agent
Ren Miyamoto, Wakuto Morita, Daisuke Saito, Nobuaki Minematsu (Tokyo Univ.)
pp. 415 - 420

EA2023-133
Evaluating speech generation based on objective measures for text generation
Takaaki Saeki (UTokyo), Soumi Maiti (CMU), Shinnosuke Takamichi (UTokyo), Shinji Watanabe (CMU), Hiroshi Saruwatari (UTokyo)
pp. 421 - 426

Note: Each article is a technical report without peer review, and its polished version will be published elsewhere.


The Institute of Electronics, Information and Communication Engineers (IEICE), Japan