Thu, Feb 29 AM 09:30 - 10:50 |
(1) SP |
09:30-09:50 |
Vocal tract length perturbation-based pseudo-speaker augmentation for automatic speaker verification |
Tomoka Wakamatsu, Sayaka Shiota, Hitoshi Kiya (Tokyo Metropolitan Univ.) |
(2) SP |
09:50-10:10 |
Pseudo-speaker augmentation based on vocal tract length perturbation considering speaker variability for speaker verification |
Fumika Ono, Tomoka Wakamatsu, Sayaka Shiota (TMU) |
(3) SP |
10:10-10:30 |
Noise-Robust Voice Conversion by Denoising Training Conditioned with Latent Variables of Speech Quality and Recording Environment |
Takuto Igarashi, Yuki Saito, Kentaro Seki, Shinnosuke Takamichi (UT), Ryuichi Yamamoto, Kentaro Tachibana (LY), Hiroshi Saruwatari (UT) |
(4) SP |
10:30-10:50 |
Multi-task learning with age information model for highly accurate elderly speech recognition. |
Shine Takumi, Kinouchi Takahiro, Wakabayashi Yukoh, Kitaoka Norihide (TUT) |
|
10:50-11:00 |
Break ( 10 min. ) |
Thu, Feb 29 AM 09:30 - 10:50 |
(5) EA |
09:30-09:50 |
Simultaneous Estimation of Transfer Coefficients and Signals of Sound-to-Light Conversion Device Blinky Under Saturation Using Non-negative Matrix Factorization |
Kosuke Nishida, Natsuki Ueno, Nobutaka Ono (TMU), Daichi Kitamura (Kagawa NCT) |
(6) EA |
09:50-10:10 |
Derivation of Direct Update Rule for Back-Projected Separation Matrix |
Yui Kuriki, Taishi Nakashima, Nobutaka Ono (TMU) |
(7) EA |
10:10-10:30 |
Analysis of Overlapped Utterances in Everyday Conversation and Source Separation by Online Independent Vector Analysis for Asynchronous Distributed Recordings |
Haruki Nammoku, Taishi Nakashima, Kouei Yamaoka, Yukoh Wakabayashi, Nobutaka Ono (TMU) |
(8) EA |
10:30-10:50 |
Accelerating and stabilizing vectorwise coordinate descent for spatially regularized independent low-rank matrix analysis |
Yuto Ishikawa, Takuya Okubo, Norihiro Takamune (UTokyo), Tomohiko Nakamura (AIST), Daichi Kitamura (NIT Kagawa), Hiroshi Saruwatari (UTokyo), Yu Takahashi, Kazunobu Kondo (Yamaha) |
|
10:50-11:00 |
Break ( 10 min. ) |
Thu, Feb 29 AM 11:00 - 12:20 |
(9) |
11:00-11:20 |
SLP |
(10) |
11:20-11:40 |
SLP |
(11) |
11:40-12:00 |
SLP |
(12) |
12:00-12:20 |
SLP |
|
12:20-13:50 |
Break ( 90 min. ) |
Thu, Feb 29 AM 11:00 - 12:20 |
(13) EA |
11:00-11:20 |
Evaluation of Effect of Scatterer Shape on Incident Sound Field Estimation Based on Kernel Interpolation |
Shihori Kozuka (NTT), Shoichi Koyama (NII), Hiroaki Itou, Noriyoshi Kamado (NTT) |
(14) EA |
11:20-11:40 |
Study on Virtual Sensing Feedback ANC System with Noise Control Filter Selection |
Shota Toyooka, Yoshinobu Kajikawa (Kansai Univ.) |
(15) EA |
11:40-12:00 |
|
|
(16) EA |
12:00-12:20 |
On conditions for stably working filtered-x type active noise control systems |
Kensaku Fujii (Kodaway Lab.), Mitsuji Muneyasu (Kansai Univ.), Yoshifumi Chisaki (CIT) |
|
12:20-13:50 |
Break ( 90 min. ) |
Thu, Feb 29 PM Invited Talk 1 13:50 - 14:35 |
(17) |
13:50-14:35 |
[Invited Talk]
Making the Invisible Visible: Toward High-Quality Deep THz Computational Imaging |
Chia-Wen Lin (National Tsing Hua Univ.) |
|
14:35-14:45 |
Break ( 10 min. ) |
Thu, Feb 29 PM 14:45 - 15:00 |
(18) |
14:45-15:00 |
|
|
15:00-15:10 |
Break ( 10 min. ) |
Thu, Feb 29 PM 15:10 - 15:55 |
(19) |
15:10-15:15 |
Computational Complexity Reduction for Clustering in Speaker Diarization |
Komei Yamashita, Ryota Shimokura, Youji Iiguni (Osaka Univ.) |
(20) |
15:15-15:20 |
Selective Active Noise Control using Cartilage Conduction as a Secondary Source
-- Canceling complex and narrowband noise by Delayed-X Harmonics Synthesizer Algorithm -- |
Miyuki Azuma, Ryota Shimokura, Yoji Iiguni (Osaka Univ.) |
(21) |
15:20-15:25 |
Application of Audio Adversarial Examples to Audio CAPTCHA |
Yusuke Nobukawa, Ryota Shimokura, Yoji Iiguni (Osaka Univ.) |
(22) |
15:25-15:30 |
Evaluation of the validity of CNN-based image quality assessment |
Ririko Harada (Osaka Univ.), Ryo Hayakawa (TUAT), Youji Iiguni (Osaka Univ.) |
(23) |
15:30-15:35 |
Adaptation of End-to-End Japanese Speech Synthesis Using Crowdsoursed Dialect Accent Labels |
Yuki Oda, Kazuki Yamauchi, Yuki Saito, Hiroshi Saruwatari (UTokyo) |
(24) |
15:35-15:40 |
SRC4VC: Smartphone-Recorded Corpus for Benchmarking Multi-Speaker Voice Conversion Models |
Yuki Saito, Takuto Igarashi, Kentaro Seki, Shinnosuke Takamichi (UT), Ryuichi Yamamoto, Kentaro Tachibana (LY), Hiroshi Saruwatari (UT) |
(25) |
15:40-15:45 |
Preliminary Evaluation of Japanese Speech Corpus J-SpAW for Speaker Verification and Spoofing Detection |
Kota Kanno (Tokyo Metropolitan Univ.), Shinnosuke Takamichi (UTokyo), Sayaka Shiota (Tokyo Metropolitan Univ.) |
(26) |
15:45-15:50 |
|
|
(27) |
15:50-15:55 |
Estimation of direct sound arrival time at arbitrary position by kriging in real environment |
Mizuki Yamashita, Yosuke Tatekura (Shizuoka Univ.) |
|
15:55-16:05 |
Break ( 10 min. ) |
Thu, Feb 29 PM 16:05 - 17:25 |
(28) SP |
16:05-16:25 |
Study of Sound Source Localization for Disaster Survivor Search Using Quadcopters
-- An Analysis of Factors Related to MUSIC Algorithm through Environmental Modeling with PyRoomAcoustics -- |
Masachika Kamada (Waseda Univ.), Junji Yamato (Kogakuin Univ.), Yasuhiro Oikawa, Hiroshi G Okuno, Jun Ohya (Waseda Univ.) |
(29) SP |
16:25-16:45 |
Development of the mental disorder estimation model using voice |
Kaho Kato, Akihiko Takashima, Kei Kikuiri, Takeshi Yoshimura (NTT docomo) |
(30) SP |
16:45-17:05 |
Multiple Lag Window Pairs for Estimation of Fundamental Frequency and Periodicity Measure |
Michiki Koshimori (UEC), Shigeki Sagayama (UTokyo/UEC), Toru Nakashika (UEC) |
(31) SP |
17:05-17:25 |
A Study on Automatic Performance for Emulating the Playing Style of a Specific Pianist using Feature Extraction with LSTM and Score Analysis |
Li Senhao, Matsuno Yutaka (Nihon Univ.) |
Thu, Feb 29 PM 16:20 - 17:40 |
(32) SIP |
16:20-16:40 |
|
|
(33) SIP |
16:40-17:00 |
|
|
(34) SIP |
17:00-17:20 |
Kernel-Induced Sampling Theorem for A Class of Mapping-Prescribed Reproducing Kernel Hilbert Spaces |
Akira Tanaka (Hokkaido Univ.) |
(35) SIP |
17:20-17:40 |
An Enhanced Privacy-Preserving Scheme for Federated Learning of Vision Transformer without Model Performance Degradation |
Rei Aso, Sayaka Shiota, Hitoshi Kiya (Tokyo Metropolitan Univ.) |
Thu, Feb 29 PM 15:10 - 17:20 |
(36) SIP |
15:10-16:10 |
Privacy preserving deep unrolling ISTA method for sparse representation |
Nichika Yuge, Takayuki Nakachi (Univ. of the Ryukyus.) |
(37) SIP |
15:10-16:10 |
Lightweight and Interpretable Deep Learning Model for EEG-Based Sleep Stage Classification |
Aozora Ito, Toshihisa Tanaka (TUAT) |
(38) SIP |
15:10-16:10 |
Element Selection Based on Classifiability Using Nonconvex Sparse Optimization |
Taiga Kawamura, Natsuki Ueno, Nobutaka Ono (TMU) |
(39) SIP |
15:10-16:10 |
Cramér-Rao Lower Bound for Parameter Estimation from Observation with Irreversible Saturation Effects |
Natsuki Ueno, Hirokazu Kameoka (NTT) |
(40) SIP |
15:10-16:10 |
Adaptive subspace clustering for matrix completion |
Takuto Wada (Hosei Univ.), Ryohei Sasaki (TUT), Katsumi Konishi (Hosei Univ.) |
(41) SIP |
15:10-16:10 |
Byzantine attack detection via similarity of local updates in federated learning |
Kenta Ohno, Masao Yamagishi (Hosei Univ.) |
|
16:10-16:20 |
Break ( 10 min. ) |
(42) EA |
16:20-17:20 |
Multiple sound source localization system in a rectangular area based on a distributed microphone array network |
Toru Takahashi, Kotaro Fukuda, Taiki Kanbayashi, Hitoshi Ogaki (OSU), Ryo Higashigawa (coroutine), Masato Nakayama (OSU) |
(43) EA |
16:20-17:20 |
Comparison of DNN architectures for determined BSS by proximal average of IVA and DNN |
Kazuki Matsumoto (Waseda Univ.), Koki Yamada, Kohei Yatabe (TUAT) |
(44) EA |
16:20-17:20 |
Role Selection of Microphone Pairs for Omnidirectional Sound Source Tracking |
Haruto Sasaki, Kenji Suyama (Tokyo Denki Univ.) |
(45) EA |
16:20-17:20 |
Residual Noise Reduction Based on Sound Source Signal Independence |
Kai Furusawa, Kenji Suyama (Tokyo Denki Univ.) |
(46) EA |
16:20-17:20 |
Effectiveness of Specified Error for Suppression Section in Directivity Design |
Tsukasa Hidaka, Kenji Suyama (Tokyo Denki Univ.) |
(47) EA |
16:20-17:20 |
Multiple Sound Source Localization using High Spatial Resolution Microphone Pairs |
Tomoya Hori, Kenji Suyama (Tokyo Denki Univ.) |
Fri, Mar 1 AM 09:30 - 11:40 |
(48) SP |
09:30-10:30 |
An experimental survey on speaker embedding spaces for controlling speaker identity in speech synthesis system |
Wakuto Morita, Daisuke Saito, Nobuaki Minematsu (Univ. of Tokyo) |
(49) SP |
09:30-10:30 |
SELECTING N-LOWEST SCORES FOR TRAINING MOS PREDICTION MODELS |
Yuto Kondo, Hirokazu Kameoka, Kou Tanaka, Takuhiro Kaneko (NTT) |
(50) SP |
09:30-10:30 |
Improving training recipe of Remixed2Remixed for speech enhancement |
Li Li, Shogo Seki (CyberAgent) |
(51) SP |
09:30-10:30 |
A Study on Environmental Sound Synthesis in the Case of Pausing in Virtual Walking Applications |
Hiroshi Nishijima, Wakuto Morita, Daisuke Saito, Nobuaki Minematsu (UTokyo) |
(52) SP |
09:30-10:30 |
Analysis of speech synthesis of text-free audio using a self-supervised learning model
-- focusing on multilingual applications -- |
Joonyong Park, Daisuke Saito, Nobuaki Minematsu (The Univ. of Tokyo) |
(53) SP |
09:30-10:30 |
Multi-Dialect Speech Synthesis with Interpretable Accent latent Variable based on VQ-VAE |
Kazuki Yamauchi, Yuki Saito, Hiroshi Saruwatari (UTokyo) |
(54) SP |
09:30-10:30 |
Constructing and Evaluating a Batch Voice Input System for Electronic Medical Records Using Large Language Models |
Ryo Maejima, Norihide Kitaoka (TUT) |
(55) SP |
09:30-10:30 |
Domain adaptation of speech recognition model based on multilingual SSL model with only nonparallel corpus. |
Takahiro Kinouchi (TUT), Atsunori Ogawa (NTT), Yukoh Wakabayashi (TUT), Kengo Ohta (NITA), Norihide Kitaoka (TUT) |
(56) SP |
09:30-10:30 |
Improving speech recognition system consisting of multiple speech recognition models |
Keigo Hojo, Yukoh Wakabayashi (TUT), Kengo Ohta (NITAC), Atsunori Ogawa (NTT), Norihide Kitaoka (TUT) |
(57) SP |
09:30-10:30 |
Evaluation of Automatic Speech Recognition for Deaf and Hard-of-Hearing People by Speaker Adaptation. |
Kaito Takahashi, Takahiro Kinouchi, Yukoh Wakabayashi (TUT), Kengo Ohta (NITAC), Akio Kobayashi (Yamato Univ.), Norihide Kitaoka (TUT) |
|
10:30-10:40 |
Break ( 10 min. ) |
(58) SP |
10:40-11:40 |
Intermediate speaker speech synthesis between two speakers using x-vector speaker space |
Sota Hosoi, Takahiro Kinouchi, Yukoh Wakabayashi, Norihide Kitaoka (TUT) |
(59) SP |
10:40-11:40 |
Speech representation based on VAE assuming gamma distribution for latent variables and observation |
Nanako Imaichi, Toru Nakashika (UEC) |
(60) SP |
10:40-11:40 |
An Investigation into Weighting Strategies for Model Averaging in Continual Learning for Automatic Speech Recognition |
Kentaro Shinayama, Hiroshi Sato, Tomoharu Iwata, Takeshi Mori, Taichi Asami (NTT) |
(61) SP |
10:40-11:40 |
Substitution of Implicit Linguistic Information in Beam Search Decoding Using CTC-based Speech Recognition Models |
Tatsunari Takagi, Yukoh Wakabayashi (TUT), Atsunori Ogawa (NTT), Norihide Kitaoka (TUT) |
(62) SP |
10:40-11:40 |
A study on loom operation analysis using acoustic signals for abnormality detection |
Shinji Sako (NITech) |
(63) SP |
10:40-11:40 |
An Investigation on the Speech Recovery from EEG Signals Using Transformer |
Tomoaki Mizuno (The Univ. of Electro-Communications), Takuya Kishida (Aichi Shukutoku Univ.), Natsue Yoshimura (Tokyo Tech), Toru Nakashika (The Univ. of Electro-Communications) |
(64) SP |
10:40-11:40 |
modal-to-falsetto singing voice conversion focused on the shape of glottal sound wave and parameter control of the glottal wave |
Shota Okada, Yu Kitamura, Daisuke Saito, Nobuaki Minematsu (Tokyo Univ.) |
(65) |
10:40-11:40 |
SLP |
|
11:40-13:30 |
Break ( 110 min. ) |
Fri, Mar 1 AM 09:30 - 10:50 |
(66) SIP |
09:30-09:50 |
Black-Box Adversarial Attack for Math Formula Recognition Model |
Haruto Namura, Masatomo Yoshida (Doshisha Univ.), Nicola Adami (UNIBS), Masahiro Okuda (Doshisha Univ.) |
(67) SIP |
09:50-10:10 |
Variable step size of shared error NLMS algorithm for acoustic echo and noise canceller |
Kenta Iwai, Takanobu Nishiura (Ritsumeikan Univ.) |
(68) SIP |
10:10-10:30 |
EEG during music recall: Time-frequency analysis, event-related potential, and directed connectivity |
Mayu Goto, Ingon Chanpornpakdi, Kazuki Matsunaga, Shuma Ito, Toshihisa Tanaka (TUAT) |
(69) SIP |
10:30-10:50 |
Decorrelation-based blind speech separation |
Shinya Saito, Kunio Oishi (Tokyo University of Tech.) |
|
10:50-11:00 |
Break ( 10 min. ) |
Fri, Mar 1 AM 11:00 - 12:20 |
(70) EA |
11:00-11:20 |
Cello-like Sound Synthesis from Viola Recordings Using Pitch Shifting and Harmonic Generation |
Natsuki Yoshino, Akira Tanaka (Hokudai) |
(71) EA |
11:20-11:40 |
Multiple Pitch Estimation Based on Finite-Order Harmonic Constraint Differential Equation |
Kenta Yamada, Yoshiki Masuyama, Kouei Yamaoka, Natsuki Ueno, Nobutaka Ono (Metropolitan Univ.) |
(72) EA |
11:40-12:00 |
Inverse filter design of Shoulder-mounted Wearable Speaker using H-infinity control theory
-- Extension and evaluation to MIMO systems -- |
Kenji Kita (Daido Univ.) |
(73) EA |
12:00-12:20 |
Mixing Method of Remote Choral Sound Source by Component Selection Using Sparse Representation |
Haruki Ota, Kota Takahashi (UEC) |
|
12:20-13:30 |
Break ( 70 min. ) |
Fri, Mar 1 PM 13:30 - 14:15 |
(74) |
13:30-14:15 |
SLP |
|
14:15-14:25 |
Break ( 10 min. ) |
Fri, Mar 1 PM 14:25 - 15:10 |
(75) |
14:25-15:10 |
[Invited Talk]
Getting Started With Environmental Sound Analysis and Synthesis |
Keisuke Imoto (Doshisha Univ.) |
|
15:10-15:25 |
Break ( 15 min. ) |
Fri, Mar 1 PM 15:25 - 17:35 |
(76) EA |
15:25-16:25 |
Investigation of objective intelligibility metrics based on speech foundation models for Clarity Prediction Challenge 2 |
Katsuhiko Yamamoto (CyberAgent) |
(77) EA |
15:25-16:25 |
Spatial auditory masking of audio signals with different elevations on the median plane and a sagittal plane |
Hiroto Fujishiro, Masayuki Nishiguchi, Kanji Watanabe, Koji Abe (Akita Prefectural Univ.) |
(78) EA |
15:25-16:25 |
Acoustic morphing based on autoencoder for piano scale and reverberation |
Yuma Hakoda, Takao Tsuchiya (Doshisha Univ.) |
(79) EA |
15:25-16:25 |
Investigation on factors of beamforming with reduced the number of microphones on sound space synthesis |
Ryosuke Oyashiki, Kanji Watanabe, Masayuki Nishiguchi, Koji Abe (Akita Prefectural Univ.) |
(80) EA |
15:25-16:25 |
Perceptible delay of moving sound source signals with different azimuth and bandwidth |
Yuuki Saito, Masayuki Nishiguchi, Kanji Watanabe, Koji Abe (Akita Prefectural Univ.) |
(81) EA |
15:25-16:25 |
Creation of representative head-related impulse responses for binaural rendering of audio signals by waveform based acoustic panning |
Kazuki Houshito, Masayuki Nishiguchi, Kanji Watanabe, Koji Abe (Akita Prefectural Univ.) |
|
16:25-16:35 |
Break ( 10 min. ) |
(82) EA |
16:35-17:35 |
Discrimination of rotation direction of virtual sound source in binaural synthesis using sound source radiation characteristics |
Orie Nishiyama (Chiba Institute of Technology), Toshiharu Horiuchi, Shota Okubo (KDDI Research, Inc.), Yoshifumi Chisaki (Chiba Institute of Technology) |
(83) EA |
16:35-17:35 |
Simulation Evaluation of Speech Detection Based on Distributed Sound-to-Light Conversion Device Blinkies |
Satoshi Motoyama, Natsuki Ueno, Masahiro Yasuda (TMU), Yuma Kinoshita (Tokai Univ.), Nobutaka Ono (TMU) |
(84) EA |
16:35-17:35 |
Evaluations of Multi-channel Blind Source Separation for Speech Recognition in Car Environments |
Yutsuki Takeuchi, Natsuki Ueno, Nobutaka Ono (Tokyo Metropolitan Univ.), Takashi Takazawa, Shuhei Shimanoe, Tomoki Tanemura (MIRISE Technologies) |
(85) SIP |
16:35-17:35 |
Large Scale Pre-training and Dynamic Convolution for Image Restoration Under Bad Weather Conditions |
Shugo Yamashita, Masaaki Ikehara (Keio Univ.) |
(86) SIP |
16:35-17:35 |
Synthesizing perceived melody from stereo electroencephalogram |
Yuta Inaba, Yuiko Kumagai, Naoki Yoshimura, Shuji Komeiji (Tokyo Univ. Agri.&Tech.), Takumi Mitsuhashi, Yasushi Iimura, Hiroharu Suzuki, Hidenori Sugano (Juntendo Univ.), Toshihisa Tanaka (Tokyo Univ. Agri.&Tech.) |
(87) SIP |
16:35-17:35 |
A Design of Denser-Graph-Frequency Graph Fourier Frames for Undirected Graph Signal Analysis |
Kaito Nitani, Seisuke Kyochi (Kogakuin Univ.) |
Fri, Mar 1 PM 15:25 - 16:25 |
(88) SP |
15:25-15:45 |
Generating Japanese-accented English voices of 3 types according to the listening proficiency of Japanese ESL learners |
Kiyotada Mori, Yasuo Miyoshi, Ryo Okamoto (Kochi Univ.) |
(89) SP |
15:45-16:05 |
Prediction of Voice Processing Intensity Matching the Impression of a Voice Agent |
Ren Miyamoto, Wakuto Morita, Daisuke Saito, Nobuaki Minematsu (Tokyo Univ.) |
(90) SP |
16:05-16:25 |
Evaluating speech generation based on objective measures for text generation |
Takaaki Saeki (UTokyo), Soumi Maiti (CMU), Shinnosuke Takamichi (UTokyo), Shinji Watanabe (CMU), Hiroshi Saruwatari (UTokyo) |