ICSS, IPSJ-SPT 2024-03-21
Okinawa OIST
(Primary: On-site, Secondary: Online)
Security Analysis on End-to-End Encryption of Zoom Mail
Shogo Shiraki, Takanori Isobe (Univ.Hyogo) ICSS2023-71
Zoom Mail, an email service offered by Zoom Video Communications, incorporates an end-to-end encryption (E2EE) scheme, t... [more] ICSS2023-71
SIP, SP, EA, IPSJ-SLP [detail] 2024-03-01
(Primary: On-site, Secondary: Online)
Domain adaptation of speech recognition model based on multilingual SSL model with only nonparallel corpus.
Takahiro Kinouchi (TUT), Atsunori Ogawa (NTT), Yukoh Wakabayashi (TUT), Kengo Ohta (NITA), Norihide Kitaoka (TUT) EA2023-100 SIP2023-147 SP2023-82
Automatic speech recognition (ASR) models are used in various services and businesses, and each domain’s recognition acc... [more] EA2023-100 SIP2023-147 SP2023-82
SIP, SP, EA, IPSJ-SLP [detail] 2024-03-01
(Primary: On-site, Secondary: Online)
Substitution of Implicit Linguistic Information in Beam Search Decoding Using CTC-based Speech Recognition Models
Tatsunari Takagi, Yukoh Wakabayashi (TUT), Atsunori Ogawa (NTT), Norihide Kitaoka (TUT) EA2023-106 SIP2023-153 SP2023-88
The rise of neural networks in the field of automatic speech recognition has notably improved the accuracy of speech rec... [more] EA2023-106 SIP2023-153 SP2023-88
SP, NLC, IPSJ-SLP, IPSJ-NL [detail] 2023-12-03
Tokyo Kikai-Shinko-Kaikan Bldg.
(Primary: On-site, Secondary: Online)
Improvement of Tacotron2 text-to-speech model based on masking operation and positional attention mechanism
Tong Ma, Daisuke Saito, Nobuaki Minematsu (Univ. of Tokyo) NLC2023-17 SP2023-37
 [more] NLC2023-17 SP2023-37
SP, IPSJ-MUS, IPSJ-SLP [detail] 2023-06-23
(Primary: On-site, Secondary: Online)
Streaming End-to-End speech recognition using a CTC decoder with substituted linguistic information
Tatsunari Takagi (TUT), Atsunori Ogawa (NTT), Norihide Kitaoka, Yukoh Wakabayashi (TUT) SP2023-12
Speech recognition technology has been employed in various fields due to the enhancement of speech recognition model acc... [more] SP2023-12
SP, IPSJ-MUS, IPSJ-SLP [detail] 2023-06-24
(Primary: On-site, Secondary: Online)
Domain adaptation of speech recognition models based on self-supervised learning using target domain speech
Takahiro Kinouchi (TUT), Atsunori Ogawa (NTT), Yuko Wakabayashi, Norihide Kitaoka (TUT) SP2023-19
In this study, we propose a domain adaptation method using only speech data in the target domain without using transcrib... [more] SP2023-19
SP, IPSJ-MUS, IPSJ-SLP [detail] 2023-06-24
(Primary: On-site, Secondary: Online)
Automatic speech recognition model simultaneously recognizes linguistic information and verbal/non-verbal phenomena
Nagito Shione, Yukoh Wakabayashi, Norihide Kitaoka (TUT) SP2023-22
Although speech recognition technology has advanced in recent years, most of them recognize only linguistic information ... [more] SP2023-22
SP, IPSJ-SLP, EA, SIP [detail] 2023-02-28
(Primary: On-site, Secondary: Online)
End-to-End Speech Synthesis Based on Articulatory Movements Captured by Real-time MRI
Yuto Otani, Shun Sawada, Hidefumi Ohmura, Kouichi Katsurada (Tokyo Univ. Sci.) EA2022-77 SIP2022-121 SP2022-41
We propose an end-to-end deep learning model for speech synthesis based on articulatory movements captured by real-time ... [more] EA2022-77 SIP2022-121 SP2022-41
SP, IPSJ-SLP, EA, SIP [detail] 2023-03-01
(Primary: On-site, Secondary: Online)
A Study on Scheduled Sampling for Neural Transducer-based ASR
Takafumi Moriya, Takanori Ashihara, Hiroshi Sato, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura (NTT) EA2022-100 SIP2022-144 SP2022-64
In this paper, we propose scheduled sampling approaches suited for the recurrent neural network-transducer (RNNT) that i... [more] EA2022-100 SIP2022-144 SP2022-64
NLC, IPSJ-NL, SP, IPSJ-SLP [detail] 2022-12-01
(Primary: On-site, Secondary: Online)
ASR model adaptation to target domain with large-scale audio data without transcription
Takahiro Kinouchi, Daiki Mori (TUT), Ogawa Atsunori (NTT), Norihide Kitaoka (TUT) NLC2022-18 SP2022-38
Nowadays, speech recognition is used in various services and businesses thanks to the advent of high-performance models ... [more] NLC2022-18 SP2022-38
R 2022-07-29
(Primary: On-site, Secondary: Online)
A Comparison Study on Image Captioning by VGG and YOLO
Yan LYU, Qiangfu Zhao, Yong Liu (UoA) R2022-10
Image captioning is a task for generating a descriptive statement automatically for a given image by combining image pro... [more] R2022-10
EA, SIP, SP, IPSJ-SLP [detail] 2022-03-02
(Primary: On-site, Secondary: Online)
A Study on Hybrid RNN-T/Attention-based Streaming ASR with Triggered Chunkwise Attention and Dual Internal Language Model Integration
Takafumi Moriya, Takanori Ashihara, Atsushi Ando, Hiroshi Sato, Tomohiro Tanaka, Kohei Matsuura, Ryo Masumura, Marc Delcroix (NTT), Takahiro Shinozaki (Tokyo Tech) EA2021-78 SIP2021-105 SP2021-63
In this paper we propose improvements to our recently proposed hybrid RNN-T/Attention architecture that includes a share... [more] EA2021-78 SIP2021-105 SP2021-63
Hiroshima Higashi-Senda campus, Hiroshima Univ.
(Primary: On-site, Secondary: Online)
[Short Paper] On the Impact of Communication Link Heterogeneity on Content Delivery Delay in Information-Centric Delay/Disruption-Tolerant Networking
Sagayama Hisashi, Ohnishi Michika, Matsuo Ryotaro, Ohsaki Hiroyuki (Kwansei Gakuin Univ.) IA2021-49
In recent years, it is expected that ICDTN (Information-Centric Delay/Disruption-Tolerant Networking) incorpo-
rating t... [more]
SP, IPSJ-SLP, IPSJ-MUS 2021-06-19
Online Online Neural speech synthesis using local phrase dependency structure information
Nobuyoshi Kaiki, Sakriani Sakti, Satoshi Nakamura (NIST) SP2021-23
In order to synthesize Japanese speech with natural prosody, we introduce an end-to-end TTS with new prosodic symbol rep... [more] SP2021-23
EA, US, SP, SIP, IPSJ-SLP [detail] 2021-03-03
Online Online [Poster Presentation] End-to-end incremental TTS with lookahead generation with large pretrained language model
Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari (UTokyo) EA2020-74 SIP2020-105 SP2020-39
(To be available after the conference date) [more] EA2020-74 SIP2020-105 SP2020-39
EA, US, SP, SIP, IPSJ-SLP [detail] 2021-03-03
Online Online [Short Paper] Comparison of End-to-End Models for Joint Speaker and Speech Recognition
Kak Soky (Kyoto Univ.), Sheng Li (NICT), Masato Mimura, Chenhui Chu, Tatsuya Kawahara (Kyoto Univ.) EA2020-78 SIP2020-109 SP2020-43
In this paper, we investigate the effectiveness of using speaker information on the performance of speaker-imbalanced au... [more] EA2020-78 SIP2020-109 SP2020-43
NLC, IPSJ-NL, SP, IPSJ-SLP [detail] 2020-12-02
Online Online Fast End-to-End Speech Recognition with CTC and Mask Predict
Yosuke Higuchi (Waseda Univ.), Hirofumi Inaguma (Kyoto Univ.), Shinji Watanabe (JHU), Tetsuji Ogawa, Tetsunori Kobayashi (Waseda Univ.) NLC2020-13 SP2020-16
We present a fast non-autoregressive (NAR) end-to-end automatic speech recognition (E2E-ASR) framework, which generates ... [more] NLC2020-13 SP2020-16
WIT, SP, IPSJ-SLP [detail] 2020-10-22
Online Online [Invited Talk] NHK's activities on Japanese end-to-end speech synthesis
Kiyoshi Kurihara (NHK) SP2020-11 WIT2020-12
The main business of NHK (Japan Broadcasting Corporation) is the production and broadcasting of programs. Many programs ... [more] SP2020-11 WIT2020-12
SP, EA, SIP 2020-03-02
Okinawa Okinawa Industry Support Center
(Cancelled but technical report was issued)
Data augmentation for ASR system by using locally time-reversed speech -- Temporal inversion of feature sequence --
Takanori Ashihara, Tomohiro Tanaka, Takafumi Moriya, Ryo Masumura, Yusuke Shinohara, Makio Kashino (NTT) EA2019-110 SIP2019-112 SP2019-59
Data augmentation is one of the techniques to mitigate overfitting and improve robustness against several acoustic varia... [more] EA2019-110 SIP2019-112 SP2019-59
SP, EA, SIP 2020-03-03
Okinawa Okinawa Industry Support Center
(Cancelled but technical report was issued)
[Poster Presentation] An Educational Study on Prosodic Symbols and Their Acoustic Realization Using Japanese End-to-end Speech Synthesis
Fuki Yoshizawa (UTokyo), Tadashi Kumano (NHK), Nobuaki Minematsu (UTokyo), Kiyoshi Kurihara (NHK) EA2019-137 SIP2019-139 SP2019-86
In order to examine the educational effect of presenting prosodic symbols to learners of Japanese, a method was proposed... [more] EA2019-137 SIP2019-139 SP2019-86
