Tue, Feb 28 AM SP1 09:10 - 10:30 |
(1) SP |
09:10-09:30 |
Comparison of fundamental frequency controllable fast neural waveform generative models. EA2022-75 SIP2022-119 SP2022-39 |
Sota Shimizu (Kobe Univ./NICT), Takuma Okamoto (NICT), Ryoichi Takashima, Tetsuya Takiguchi (Kobe Univ.), Tomoki Toda (Nagoya Univ./NICT), Hisashi Kawai (NICT) |
(2) SP |
09:30-09:50 |
MS-FC-HiFiGAN : Fast Neural Waveform Generation Model With Learnable Lightweight Upsampling EA2022-76 SIP2022-120 SP2022-40 |
Haruki Yamashita (Kobe Univ/NICT), Takuma Okamoto (NICT), Ryoichi Takashima, Tetsuya Takiguchi (Kobe Univ), Tomoki Toda (Nagoya Univ/NICT), Hisashi Kawai (NICT) |
(3) SP |
09:50-10:10 |
End-to-End Speech Synthesis Based on Articulatory Movements Captured by Real-time MRI EA2022-77 SIP2022-121 SP2022-41 |
Yuto Otani, Shun Sawada, Hidefumi Ohmura, Kouichi Katsurada (Tokyo Univ. Sci.) |
(4) SP |
10:10-10:30 |
Singing voice synthesis based on a frame-driven attention mechanism considering vocal timing deviation EA2022-78 SIP2022-122 SP2022-42 |
Miku Nishihara, Yukiya Hono, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda (NITech) |
Tue, Feb 28 AM EA1 09:10 - 10:30 |
(5) EA |
09:10-09:30 |
Extension of acoustic system measurement based on signal safeguarding
-- Repetition and orthogonalization for post hoc analysis -- EA2022-79 SIP2022-123 SP2022-43 |
Hideki Kawahara (和歌山大), Kohei Yatabe (Tokyo Univ。 of Agriculture and Technology), Ken-Ichi Sakakibara (Health Sciences Univ. of Hokkaido), Mitsunori Mizumachi (Kyushu Inst. of Tech.) |
(6) EA |
09:30-09:50 |
A Study on Designing Hopping Patterns Based on Euler Graphs for Inaudible Sound Communication Systems EA2022-80 SIP2022-124 SP2022-44 |
Naofumi Aoki, Kosei Ozeki (Hokkaido Univ.), Kenichi Ikeda, Hiroshi Yasuda, Hiroyuki Namba (SST) |
(7) EA |
09:50-10:10 |
Influence of Reflections in Small-Scale Anechoic Room Measurements EA2022-81 SIP2022-125 SP2022-45 |
Tatsuya Higuchi, Yutaka Kaneda, Kenji Suyama (Tokyo Denki Univ.) |
(8) EA |
10:10-10:30 |
Generation of the individualized head-related transfer functions in the upper hemisphere using parametric notch-peak model in the median plane EA2022-82 SIP2022-126 SP2022-46 |
Fuka Nakamura, Kazuhiro Iida (CIT) |
|
10:30-10:40 |
Break ( 10 min. ) |
Tue, Feb 28 AM SLP 10:40 - 12:00 |
(9) |
10:40-11:00 |
SLP |
(10) |
11:00-11:20 |
SLP |
(11) |
11:20-11:40 |
SLP |
(12) |
11:40-12:00 |
SLP |
Tue, Feb 28 AM SIP1 10:40 - 12:00 |
(13) SIP |
10:40-11:00 |
Image reconstruction with a diffusion model for robust image classification against unknown degradation EA2022-83 SIP2022-127 SP2022-47 |
Teruaki Akazawa (Tokyo Metro. Univ.), Yuma Kinoshita (Tokai Univ.), Hitoshi Kiya (Tokyo Metro. Univ.) |
(14) SIP |
11:00-11:20 |
The target detection method through autocovariance matrices and its robust analysis EA2022-84 SIP2022-128 SP2022-48 |
Yusuke Ono, Linyu Peng (Keio Univ.) |
(15) SIP |
11:20-11:40 |
Hadamard-coded Supervised Discrete Hashing on Quaternion Domain EA2022-85 SIP2022-129 SP2022-49 |
Akari Katsuma, Seisuke Kyochi (Kogakuin Univ.), Shunsuke Ono (Tokyo Tech.), Ivan Selesnick (New York Univ.) |
(16) SIP |
11:40-12:00 |
Acoustic Echo and Noise Canceller Based on Minimization of Shared-Error Signal EA2022-86 SIP2022-130 SP2022-50 |
Kenta Iwai, Takanobu Nishiura (Ritsumeikan Univ.) |
|
12:00-13:00 |
Lunch Break ( 60 min. ) |
Tue, Feb 28 PM Invited Talk 1 13:00 - 13:45 |
(17) SP |
13:00-13:45 |
[Invited Talk]
Multiple sound spot synthesis meets multilingual speech synthesis
-- Implementation is really all we need -- EA2022-87 SIP2022-131 SP2022-51 |
Takuma Okamoto (NICT) |
Tue, Feb 28 PM Invited Talk 2 13:45 - 14:30 |
(18) EA |
13:45-14:30 |
[Invited Talk]
Multichannel audio source separation based on deep generative model and signal independence EA2022-88 SIP2022-132 SP2022-52 |
Li Li (CA) |
|
14:30-14:40 |
Break ( 10 min. ) |
Tue, Feb 28 PM Short Presentation 1 14:40 - 15:45 |
(19) |
14:40-14:45 |
Design of a regularization of blind deblurring for blurred images containing saturated pixels and Gaussian noise |
Tomoya Kobayashi, Ryo Hayakawa, Youji Iiguni (Osaka Univ.) |
(20) |
14:45-14:50 |
Application of Deep Unfolding to Video Reconstruction Algorithms from Compressed Images |
Takashi Matsuda, Ryo Hayakawa, Youji Iiguni (Osaka Univ.) |
(21) |
14:50-14:55 |
Consideration of misalignment in multi-focus image fusion using convolutional sparse representation |
Ryo Tamaki, Ryo Hayakawa, Youji Iiguni (Osaka Univ.) |
(22) |
14:55-15:00 |
Pop Noise Based Speaker Verification with Continuous Phoneme-Pop Data and GBDT |
Kenta Takemae, Ryota Shimokura, Yoji Iiguni (OU) |
(23) |
15:00-15:05 |
A Study on Regularization in Video Super-Resolution Based on LMS Algorithm |
Ryogo Shimizu, Ryo Hayakawa, Youji Iiguni (Osaka Univ.) |
(24) |
15:05-15:10 |
Identification of Seizure Onset Zone from Intracranial EEG Using Source Selection-Based Domain Adaptation |
Keisuke Matsubayashi (TUAT), Yasushi Iimura, Takumi Mitsuhashi, Hidenori Sugano (Juntendo Univ.), Kosuke Fukumori, Toshihisa Tanaka (TUAT) |
(25) |
15:10-15:15 |
Speech synthesis from electrocorticogram using pre-trained neural vocoder |
Kai Shigemi, Shuji Komeiji (TUAT), Takumi Mitsuhashi, Yasushi Iimura, Hiroharu Suzuki, Hidenori Sugano (Juntendo Univ.), Koichi Shinoda (Tokyo Tech), Kohei Yatabe, Toshihisa Tanaka (TUAT) |
(26) |
15:15-15:20 |
Effects of movement on the EEG during rhythmic response |
Hiroki Arai, Ingon Chanpornpakdi, Toshihisa Tanaka (TUAT) |
(27) |
15:20-15:25 |
Single-channel environmental sound classification using distance-based sound separation |
Ryoya Ogura, Sayaka Shiota (Tokyo Metropolitan Univ.), Keisuke Imoto (Doshisha Univ.), Hitoshi Kiya (Tokyo Metropolitan Univ.) |
(28) |
15:25-15:30 |
Comfortable sound design of dental treatment sound based on automatic chord progression generation with modulation conditions using critical bandwidth |
Takuya Hayashi, Toru Takahashi, Masato Nakayama (Osaka Sangyo Univ.) |
(29) |
15:30-15:35 |
Fine-tuning for Speaker Diarization :Measuring Accuracy in Japanese Conversation |
Yurina Machida (Tsukuba Univ.), Taishi Yamaoka (Empath) |
(30) |
15:35-15:40 |
Any-to-Many Voice Conversion with Voice Similarity Comparison and Many-to-Many Model |
Hiroaki Hyodo, Tetsuya Sakai (Waseda Univ.) |
(31) |
15:40-15:45 |
Cross-language Speaker Recognition for Japanese-English Bilinguals |
Ryotaro Sano (Chiba Univ.), Masahumi Nishida (Shizuoka Univ), Satoru Tsuge (Daido Univ.), Shingo Kuroiwa, Hiroyuki Yoshimura (Chiba Univ.) |
|
15:45-15:55 |
Break ( 10 min. ) |
Tue, Feb 28 PM SP-EA 15:55 - 17:35 |
(32) SP |
15:55-16:15 |
Self-Supervised Learning With Spatial Audio-Visual Recording for Sound Event Localization and Detection EA2022-89 SIP2022-133 SP2022-53 |
Yoto Fujita (Kyoto Univ.), Yoshiaki Bando (AIST), Keisuke Imoto (Doshisha Univ./AIST), Masaki Onihsi (AIST), Yoshii Kazuyoshi (Kyoto Univ.) |
(33) SP |
16:15-16:35 |
Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images EA2022-90 SIP2022-134 SP2022-54 |
Hien Ohnaka (NITTC), Shinnosuke Takamichi (UT), Keisuke Imoto (DU), Yuki Okamoto (Rits), Kazuki Fujii, Hiroshi Saruwatari (UT) |
(34) SP |
16:35-16:55 |
Generalized warping based on Lie group theory EA2022-91 SIP2022-135 SP2022-55 |
Atsushi Miyashita, Tomoki Toda (Nagoya Univ.) |
(35) SP |
16:55-17:15 |
Vocal tract length estimation using fundamental frequency adaptive auditory representation EA2022-92 SIP2022-136 SP2022-56 |
Toshio Irino, Shintaro Doan (Wakayama Univ.) |
(36) EA |
17:15-17:35 |
DNN-based Noise Reduction Using Noise Signal for Target Signal EA2022-93 SIP2022-137 SP2022-57 |
Ryota Hiromasa, Hien Ohnaka, Ryoichi Miyazaki (NITTC) |
Tue, Feb 28 PM EA-SIP 15:55 - 17:35 |
(37) EA |
15:55-16:15 |
A new configuration of 1-2-2 multi-channel active noise control system EA2022-94 SIP2022-138 SP2022-58 |
Kensaku Fujii (Kodaway Lab.), Mitsuji Muneyasu (Kansai Univ.), Yoshifumi Chisaki (CIT) |
(38) EA |
16:15-16:35 |
A method of constantly estimating the feedback path in active noise control systems EA2022-95 SIP2022-139 SP2022-59 |
Kensaku Fujii (kodaway Lab.), Mitsuji Muneyasu (Kansai Univ.), Yoshifumi Chisaki (CIT) |
(39) SIP |
16:35-16:55 |
Sound Source Localization Method based on Suppression Amount of Complex Weighted Sum Circuit EA2022-96 SIP2022-140 SP2022-60 |
Tsukasa Hidaka, Kenji Suyama (Tokyo Denki Univ.) |
(40) SIP |
16:55-17:15 |
Application of Frequency Domain Adaptive Filter to Residual Noise Reduction EA2022-97 SIP2022-141 SP2022-61 |
Kai Furusawa, Kenji Suyama (Tokyo Denki Univ.) |
(41) SIP |
17:15-17:35 |
A Study of the Number of Groups for CSD Coefficient FIR Filter Design by Grouped ACO EA2022-98 SIP2022-142 SP2022-62 |
Marika Morikawa, Kenji Suyama (Tokyo Denki Univ.) |
Wed, Mar 1 AM SP2 09:10 - 10:30 |
(42) SP |
09:10-09:30 |
Training Dialect Speech Recognition Model using Corpus of Japanese Dialects and Self-Supervised Learning-based Model XLSR EA2022-99 SIP2022-143 SP2022-63 |
Shogo Miwa, Atsuhiko Kai (Shizuoka Univ.) |
(43) SP |
09:30-09:50 |
A Study on Scheduled Sampling for Neural Transducer-based ASR EA2022-100 SIP2022-144 SP2022-64 |
Takafumi Moriya, Takanori Ashihara, Hiroshi Sato, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura (NTT) |
(44) SP |
09:50-10:10 |
Domain Adaptation for Improving End-to-end ASR Performance of Classroom Speech with Variable Recording Condition EA2022-101 SIP2022-145 SP2022-65 |
Raufun Nahar, Rino Suzuki, Atsuhiko Kai (Shizuoka Univ.) |
(45) SP |
10:10-10:30 |
Vocabulary-Set Decomposition and Multi-task Learning for Target Vocabulary Extraction in Japanese Speech Recognition EA2022-102 SIP2022-146 SP2022-66 |
Aoi Ito (LINE/Hosei Univ.), Tatsuya Komatsu, Yusuke Fujita (LINE) |
Wed, Mar 1 AM EA2 09:10 - 10:30 |
(46) EA |
09:10-09:30 |
Joint analysis of acoustic scenes and sound events based on semi-supervised learning EA2022-103 SIP2022-147 SP2022-67 |
Ami Igarashi, Shunsuke Tsubaki, Keisuke Imoto (DU) |
(47) EA |
09:30-09:50 |
Texture Reproduction of Ultrasonic Mid-Air Haptics Based on Amplitude Modulation Signal Generation Using Fricative Sounds Feature Extraction and Hand Tracking EA2022-104 SIP2022-148 SP2022-68 |
Asuto Ueda, Toru Takahashi, Masato Nakayama (Osaka Sangyo Univ.) |
(48) EA |
09:50-10:10 |
Regularization Term Design Based on Spectrogram Consistency in Independent Low-Rank Matrix Analysis for Multichannel Audio Source Separation EA2022-105 SIP2022-149 SP2022-69 |
Sota Misawa, Norihiro Takamune (UTokyo), Kohei Yatabe (TUAT), Daichi Kitamura (NIT, Kagawa), Hiroshi Saruwatari (UTokyo) |
(49) EA |
10:10-10:30 |
Anomalous sound detection with complex-valued hybrid neural networks considering phase variations EA2022-106 SIP2022-150 SP2022-70 |
Shota Nishiyama, Akira Tamamori (AIT) |
|
10:30-10:40 |
Break ( 10 min. ) |
Wed, Mar 1 AM SP3 10:40 - 12:00 |
(50) SP |
10:40-11:00 |
Diffusion-based parallel voice conversion with source-feature condition EA2022-107 SIP2022-151 SP2022-71 |
Takuya Kishida, Toru Nakashika (UEC) |
(51) SP |
11:00-11:20 |
Representation and Prediction of Accent Phrase Prosodic Features in Japanese Text-to-Speech EA2022-108 SIP2022-152 SP2022-72 |
Masaki Sato, Shinnosuke Takamichi, Hiroshi Saruwatari (The Univ. of Tokyo) |
(52) SP |
11:20-11:40 |
An Investigation of Text-to-Speech Synthesis Using Voice Conversion and x-vector Embedding Sympathizing Emotion of Input Audio for Spoken Dialogue Systems EA2022-109 SIP2022-153 SP2022-73 |
Shunichi Kohara, Masanobu Abe, Sunao Hara (Okayama Univ.) |
(53) SP |
11:40-12:00 |
Choral Singing Voice Synthesis with Modulation Acoustic Features EA2022-110 SIP2022-154 SP2022-74 |
Sora Miyazawa, Anan Kikuchi, Daisuke Saito, Nobuaki Minematsu (UTokyo) |
Wed, Mar 1 AM EA3 10:40 - 12:00 |
(54) EA |
10:40-11:00 |
Quasi-real-time estimation of a maximum radiation direction from a loudspeaker surrounded by four microphones based on SPL ratio EA2022-111 SIP2022-155 SP2022-75 |
Ryusei Tsuda, Daiki Maekawa, Tomoru Awatani, Masato Nakayama, Toru Takahashi (Osaka Sangyo Univ.) |
(55) EA |
11:00-11:20 |
Analysis of Noisy-target Training for DNN-based speech enhancement and investigation towards its practical use EA2022-112 SIP2022-156 SP2022-76 |
Takuya Fujimura, Tomoki Toda (Nagoya Univ.) |
(56) EA |
11:20-11:40 |
A Study on Selective Fixed-Filter ANC Using 2D-CNN with Sliding DCT input EA2022-113 SIP2022-157 SP2022-77 |
Kenya Doi, Yoshinobu Kajikawa (KU) |
(57) EA |
11:40-12:00 |
Predominant Instrument Recognition in Polyphonic Music Based on Transfer Learning with Vanilla ResNet-50 EA2022-114 SIP2022-158 SP2022-78 |
Lifan Zhong, Daisuke Saito, Nobuaki Minematsu (UTokyo) |
|
12:00-13:00 |
Lunch Break ( 60 min. ) |
Wed, Mar 1 PM Invited Talk 3 13:00 - 13:45 |
(58) SP |
13:00-13:45 |
[Invited Talk]
What Do Self-Supervised Speech Representation Models Know?
-- A Layer-Wise Analysis -- EA2022-115 SIP2022-159 SP2022-79 |
Karen Livescu, Ankita Pasad, Ju-Chieh Chou, Bowen Shi (TTI-Chicago) |
Wed, Mar 1 PM Invited Talk 4 13:45 - 14:30 |
(59) SP |
13:45-14:30 |
[Invited Talk]
Speech and Language Research in the Google Tokyo Office EA2022-116 SIP2022-160 SP2022-80 |
Michiel Bacchiani (Google) |
|
14:30-14:40 |
Break ( 10 min. ) |
Wed, Mar 1 PM Short Presentation 2 14:40 - 15:40 |
(60) |
14:40-14:45 |
Anomalous sound detection based on differential features of multi channel acoustic signals considering spatial and temporal variations |
Shota Nishiyama, Akira Tamamori (AIT) |
(61) |
14:45-14:50 |
Personality Recognition on Dyadic Interactions with Representation Learning EA2022-117 SIP2022-161 SP2022-81 |
Nathania Nah (Tokyo Tech), Takafumi Koshinaka (YCU), Koichi Shinoda (Tokyo Tech) |
(62) |
14:50-14:55 |
Corpus construction toward multi-domain empathetic dialogue speech synthesis |
Yuki Saito, Eiji Iimori, Shinnosuke Takamichi (UT), Kentaro Tachibana (LINE), Hiroshi Saruwatari (UT) |
(63) |
14:55-15:00 |
A faster method for blind source separation based on frequency bin selection and linear interpolation |
Yuki Nakamura, Ryoichi Miyazaki (NITTC) |
(64) |
15:00-15:05 |
Self-localization of microphone array in distributed microphone arrays and real environmental experiment using Blinky |
Manami Nakamura, Ryoichi Miyazaki (NITTC) |
(65) |
15:05-15:10 |
Construction of Language Model for Low-resource Domain Speech Recognition Based on Sentence Generation |
Ryo Maejima, Daiki Mori, Youkoh Wakabayashi, Norihide Kitaoka (TUT) |
(66) |
15:10-15:15 |
Automatic Speech Recognition model using data with verbal and non-verbal information tag |
Nagito Shione, Yukoh Wakabayashi, Norihide Kitaoka (TUT) |
(67) |
15:15-15:20 |
Directivity Control of Multichannel One-point Spherical Microphone by Long Short-term Memory Networks |
Shota Naiki, Kenta Iwai, Takanobu Nishiura (Ritsumeikan Univ), Yoshiharu Soeta (AIST) |
(68) |
15:20-15:25 |
Study of Frequency Response Analysis of Effect Cymbals by Finite Element Method |
Kohei Izawa, Yuting Geng, Kenta Iwai, Takanobu Nishiura (Ritsumeikan Univ.) |
(69) |
15:25-15:30 |
Study of Speech Quality Improvement for Interpolated Missing Segments of Extracted Speech Signals from Captured Videos with Dual Rolling-Shutter Cameras |
Hayata Nakano, Yuting Geng, Kenta Iwai, Takanobu Nishiura (Ritsumeikan Univ.) |
(70) |
15:30-15:35 |
|
|
(71) |
15:35-15:40 |
Development of chemical terminology learning materials for learners with foreign roots |
Mayu Tokumoto, Akemi Ishii (SIT) |
|
15:40-15:50 |
Break ( 10 min. ) |
Wed, Mar 1 PM SP4 15:50 - 17:30 |
(72) SP |
15:50-16:10 |
The linguistic influence on speaker verification based on Self-Supervised Learning EA2022-118 SIP2022-162 SP2022-82 |
Tomoka Wakamatsu (Tokyo Metropolitan Univ.), Atsushi Ando (NTT), Sayaka Shiota (Tokyo Metropolitan Univ.), Ryo Masumura (NTT), Hitoshi Kiya (Tokyo Metropolitan Univ.) |
(73) SP |
16:10-16:30 |
Increasing speech intelligibility for evacuation guidance by mimicking professional announcers' voice
-- Discussion on speech intelligibility and its physical correlates -- EA2022-119 SIP2022-163 SP2022-83 |
KimDung Tran, Masato Akagi, Masashi Unoki (JAIST) |
(74) SP |
16:30-16:50 |
Data cleansing using synthetic speech detection for speaker verification EA2022-120 SIP2022-164 SP2022-84 |
Kenzo Wada, Sayaka Shiota, Hitoshi Kiya (Tokyo Metropolitan Univ.) |
(75) SP |
16:50-17:10 |
Effects of Voice Artificiality on the Degree of Compatibility between Voice and Appearance of Voice Agents EA2022-121 SIP2022-165 SP2022-85 |
Kota Iura, Naotake Masuda, Daisuke Saito, Nobuaki Minematsu (UTokyo) |
(76) SP |
17:10-17:30 |
Quantification of Voice Register Information including Mixed Voice based on Class Posterior Probabilities EA2022-122 SIP2022-166 SP2022-86 |
Yu Kitamura, Anan Kikuchi, Daisuke Saito, Nobuaki Minematsu (UTokyo) |
Wed, Mar 1 PM SIP2 15:50 - 17:40 |
(77) SIP |
15:50-16:10 |
Multiscale Manifold Clustering and Embedding with Multiple Kernels EA2022-123 SIP2022-167 SP2022-87 |
Kyohei Suzuki, Masahiro Yukawa (Keio Univ.) |
(78) SIP |
16:10-16:30 |
On Design of Real Filters For Directed Graph Signals EA2022-124 SIP2022-168 SP2022-88 |
Shogo Muramatsu, Hotaka Kitamura, Hiroyashu Yasuda (Niigta Univ.), Yuichi Tanaka (Osaka Univ.) |
(79) SIP |
16:30-16:50 |
Low-bit Image Restoration with Loop-unrolled ISTA EA2022-125 SIP2022-169 SP2022-89 |
Shu Abe, Soushi Takahashi, Shogo Muramatsu (Niigata Univ) |
(80) SIP |
16:50-17:10 |
A Study on Virtual Sensing Method for Hybrid Active Noise Control System EA2022-126 SIP2022-170 SP2022-90 |
Shota Toyooka, Kajikawa Yoshinobu (Kansai Univ.) |
(81) SIP |
17:10-17:30 |
RGB-D Salient Object Detection Using Saliency and Edge Reverse Attention EA2022-127 SIP2022-171 SP2022-91 |
Tomoki Ikeda, Masaaki Ikehara (Keio Univ.) |
(82) |
17:30-17:40 |
Closing |