SP, IPSJ-SLP, EA, SIP [detail] 2023-03-01
(Primary: On-site, Secondary: Online)
A Study on Scheduled Sampling for Neural Transducer-based ASR
Takafumi Moriya, Takanori Ashihara, Hiroshi Sato, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura (NTT)
(To be available after the conference date) [more]
SP, IPSJ-SLP, EA, SIP [detail] 2023-03-01
(Primary: On-site, Secondary: Online)
Vocabulary-Set Decomposition and Multi-task Learning for Target Vocabulary Extraction in Japanese Speech Recognition
Aoi Ito (LINE/Hosei Univ.), Tatsuya Komatsu, Yusuke Fujita (LINE)
(To be available after the conference date) [more]
SP, IPSJ-SLP, EA, SIP [detail] 2023-03-01
(Primary: On-site, Secondary: Online)
[Invited Talk] Speech and Language Research in the Google Tokyo Office
Michiel Bacchiani (Google)
(To be available after the conference date) [more]
Hiroshima Satellite Campus Hiroshima [Poster Presentation] Data augmentation method for machine learning on speech data
Tsubasa Maruyama (Tokyo Tech), Tsutomu Ikegami (AIST), Toshio Endo (Tokyo Tech), Takahiro Hirofuchi (AIST) EA2022-68
In machine learning, data augmentation is a method to enhance the number and diversity of data by adding transformations... [more] EA2022-68
NLC, IPSJ-NL, SP, IPSJ-SLP [detail] 2022-11-30
(Primary: On-site, Secondary: Online)
Semi-supervised joint training of text to speech and automatic speech recognition using unpaired text data
Naoki Makishima, Satoshi Suzuki, Atsushi Ando, Ryo Masumura (NTT) NLC2022-14 SP2022-34
This paper presents a novel joint training of text to speech (TTS) and automatic speech recognition (ASR) with small amo... [more] NLC2022-14 SP2022-34
NLC, IPSJ-NL, SP, IPSJ-SLP [detail] 2022-12-01
(Primary: On-site, Secondary: Online)
A Japanese Automatic Speech Recognition System on the Next-Gen Kaldi Framework
Wen Shen Teo, Yasuhiro Minami (UEC) NLC2022-16 SP2022-36
2021 saw the introduction of the cutting-edge successor to the Kaldi speech processing toolkit, known as Next-Gen Kaldi.... [more] NLC2022-16 SP2022-36
NLC, IPSJ-NL, SP, IPSJ-SLP [detail] 2022-12-01
(Primary: On-site, Secondary: Online)
Domain and language adaptation of large-scale pretrained model for speech recognition of low-resource language
Kak Soky (Kyoto University), Sheng Li (NICT), Chenhui Chu, Tatsuya Kawahara (Kyoto University) NLC2022-17 SP2022-37
The self-supervised learning (SSL) models are effective for automatic speech recognition (ASR). Due to the huge paramete... [more] NLC2022-17 SP2022-37
NLC, IPSJ-NL, SP, IPSJ-SLP [detail] 2022-12-01
(Primary: On-site, Secondary: Online)
ASR model adaptation to target domain with large-scale audio data without transcription
Takahiro Kinouchi, Daiki Mori (TUT), Ogawa Atsunori (NTT), Norihide Kitaoka (TUT) NLC2022-18 SP2022-38
Nowadays, speech recognition is used in various services and businesses thanks to the advent of high-performance models ... [more] NLC2022-18 SP2022-38
SP, WIT, IPSJ-SLP [detail] 2022-10-22
Kyoto Kyoto University
(Primary: On-site, Secondary: Online)
Conformer based early fusion model for audio-visual speech recognition
Nobukazu Aoki, Shun Sawada, Hidefumi Ohmura, Kouichi Katsurada (Tokyo Univ. of Sci.) SP2022-28 WIT2022-3
Previous studies of late fusion models with conformer encoders use independent encoders for both visual and audio inform... [more] SP2022-28 WIT2022-3
SIS, ITE-BCT 2022-10-13
Aomori Hachinohe Institute of Technology
(Primary: On-site, Secondary: Online)
Toward Improving Speech Naturalness Introducing a Capsule Structure for Speech Enhancement Networks
Reito Kasuga, Tetsuya Shimamura, Yosuke Sugiura, Nozomiko Yasui (Saitama Univ.) SIS2022-12
Although the field of speech enhancement has been extensively studied around the world, phase tends to be neglected comp... [more] SIS2022-12
SIP 2022-08-26
Okinawa Nobumoto Ohama Memorial Hall (Ishigaki Island)
(Primary: On-site, Secondary: Online)
Study on Bone-conducted Speech Enhancement Using Vector-quantized Variational Autoencoder and Gammachirp Filterbank Cepstral Coefficients
Quoc-Huy Nguyen, Masashi Unoki (JAIST) SIP2022-71
Bone-conducted (BC) speech potentially avoids the undesired effects on recorded speech due to background noise or reverb... [more] SIP2022-71
SP, IPSJ-MUS, IPSJ-SLP [detail] 2022-06-17
Online Online Representation and analytical normalization for vocal-tract-length transformation by group theory
Atsushi Miyashita, Tomoki Toda (Nagoya Univ) SP2022-11
In automatic speech recognition, a recognition result should be invariant with respect to acoustic changes caused by dif... [more] SP2022-11
SP, IPSJ-MUS, IPSJ-SLP [detail] 2022-06-18
Online Online [Poster Presentation] Proposal of Speech Content Conversion and the Initial Trial: Conversion of Linguistic Information Depending on Situations
Kohei Takita, Saizo Aoyagi, Tatsunori Hirai (Komazawa Univ.) SP2022-19
It is important to speak dialects, honorifics, and simple words for listeners and the environment in order to smooth com... [more] SP2022-19
IMQ 2022-05-27
Tokyo   Implementation of subtitling system using AR and study of display position
Suga Masaki, Tetsuya Matsumoto (Nagoya Univ..), Yoshinori Takeuchi (Daido Univ.), Hiroaki Kudo (Nagoya Univ..) IMQ2022-1
Sign language interpretation and captioning are used as substitute information for hearing impaired people. One of the p... [more] IMQ2022-1
SIP, BioX, IE, MI, ITE-IST, ITE-ME [detail] 2022-05-20
Kumamoto Kumamoto University Kurokami Campus
(Primary: On-site, Secondary: Online)
Implementation of a Lightweight Automatic Speech Recognition System at the Edge
Haotian Tan, Junichi Akita (Kanazawa Univ.)
Automatic speech recognition (ASR) on the cloud has been widely adopted and has demonstrated satisfactory performance. W... [more]
EA, SIP, SP, IPSJ-SLP [detail] 2022-03-01
(Primary: On-site, Secondary: Online)
Incorporating Acoustic and Textual Information for Language Modeling in Code-switching Speech Recognition
Roland Hartanto, Kuniaki Uto, Koichi Shinoda (TokyoTech) EA2021-73 SIP2021-100 SP2021-58
People who speak two or more languages tend to alternate the language when they are speaking. This particular phenomenon... [more] EA2021-73 SIP2021-100 SP2021-58
NLC, IPSJ-NL, SP, IPSJ-SLP [detail] 2021-12-02
Online Online improvement of multilingual speech emotion recognition by normalizing features using CRNN
Jinhai Qi, Motoyuki Suzuki (OIT) NLC2021-22 SP2021-43
In this research, a new multilingual emotion recognition method by normalizing features using CRNN has been proposed. We... [more] NLC2021-22 SP2021-43
SP, IPSJ-SLP, IPSJ-MUS 2021-06-18
Online Online Protection method with audio processing against Audio Adversarial Example
Taisei Yamamoto, Yuya Tarutani, Yukinobu Fukusima, Tokumi Yokohira (Okayama Univ) SP2021-4
Machine learning technology has improved the recognition accuracy of voice recognition, and demand for voice recognition... [more] SP2021-4
SP, IPSJ-SLP, IPSJ-MUS 2021-06-19
Online Online [Invited Talk] Toward a Unification of Various Speech Processing Tasks Based on End-to-End Neural networks
Shinji Watanabe (CMU) SP2021-8
This presentation will introduce the recent progress of speech processing technologies based on end-to-end neural networ... [more] SP2021-8
SP, IPSJ-SLP, IPSJ-MUS 2021-06-19
Online Online A Study on Error Correction for Improving the Accuracy of Acoustic Models
Saki Anazawa, Naofumi Aoki, Yoshinori Dobashi (Hokkaido Univ.) SP2021-12
People with ALS (amyotrophic lateral sclerosis) or dysarthria sometimes use their own voice for speech synthesis. In thi... [more] SP2021-12
