Online edition: ISSN 2432-6380
[TOP] | [2018] | [2019] | [2020] | [2021] | [2022] | [2023] | [2024] | [Japanese] / [English]
SP2023-1
[Poster Presentation]
Research on Fundamental Frequency Estimation of Instrument Sounds using Asynohronous Detection Technique
Nichika Mitsubori, Tkuma Yamakawa, Kenichiro Miwa (Salesian Polytechnic)
pp. 1 - 3
SP2023-2
Impression Conversion of Speech for Unknown Speakers Using FaderNet
Saki Kugimoto, Toru Nakashika (UEC)
pp. 4 - 7
SP2023-3
Feature Representation of Japanese Pitch Accent and its Perceptual Adequacy
-- Fundamental Study for Application to Japanese Speech Education --
Ikuyo Masuda-Katsuse (Kindai Univ.)
pp. 8 - 13
SP2023-4
Data Augmentation by Synthesised Voice for Deep Learning-based A Cappella Separation
Kyoka Kazama (TMU), Yuma Kinoshita (Tokai Univ.), Natsuki Ueno, Nobutaka Ono (TMU)
pp. 14 - 19
SP2023-5
[Poster Presentation]
MS-Harmonic-Net++ vs SiFi-GAN: Comparison of fundamental frequency controllable fast neural waveform generative models.
Sota Shimizu (Kobe Univ./NICT), Takuma Okamoto (NICT), Ryoichi Takashima (Kobe Univ.), Yamato Ohtani (NICT), Tetsuya Takiguchi (Kobe Univ.), Tomoki Toda (Nagoya Univ./NICT), Hisashi Kawai (NICT)
pp. 20 - 25
SP2023-6
[Poster Presentation]
Examination of the vocal tract control for pitch changes in opera singing using real-time MRI
Natsuki Toda, Hironori Takemoto (CIT), Jun Takahashi (OUA)
pp. 26 - 29
SP2023-7
[Poster Presentation]
Opera-singing voice synthesis using Diff-SVC
Aoto Sugahara (Kobe Univ.), Soma Kishimoto, Yuji Adachi, Kiyoto Tai (MEC Company Ltd.), Ryoichi Takashima, Testuya Takiguchi (Kobe Univ.)
pp. 30 - 35
SP2023-8
On phase function design of extended time-stretched pulse based on cascaded all-pass filters
Hideki Kawahara (Wakayama Univ.), Kohei Yatabe (Tokyo Univ. Agri. Tech.)
pp. 36 - 41
SP2023-9
Speech Emotion Recognition based on Emotional Label Sequence Estimation Considering Phoneme Class Attribute
Ryotaro Nagase, Takahiro Fukumori, Yoichi Yamashita (Ritsumeikan Univ.)
pp. 42 - 47
SP2023-10
[Poster Presentation]
Parody Detection Based on Alignment Collapse Between Lyrics and Singing Voice
Tomoki Ariga, Yosuke Higuchi (Waseda Univ.), Mitsunori Kanno, Rie Shigyo, Takato Mizuguchi, Naoki Okamoto (DAIICHIKOSHO), Tetsuji Ogawa (Waseda Univ.)
pp. 48 - 53
SP2023-11
[Poster Presentation]
Generation of colored subtitle images based on emotional information of speech utterances
Fumiya Nakamura (Kobe Univ.), Ryo Aihara (Mitsubishi Electric), Ryoichi Takashima, Tetsuya Takiguchi (Kobe Univ.), Yusuke Itani (Mitsubishi Electric)
pp. 54 - 59
SP2023-12
Streaming End-to-End speech recognition using a CTC decoder with substituted linguistic information
Tatsunari Takagi (TUT), Atsunori Ogawa (NTT), Norihide Kitaoka, Yukoh Wakabayashi (TUT)
pp. 60 - 64
SP2023-13
[Poster Presentation]
The effect of acoustic and linguistic information on the evaluation of one's own recorded speech
Hidekazu Nagamura, Seita Tomioka, Taichirou Tanaka, Kohta I. Kobayasi (Doshisha Univ.)
pp. 65 - 67
SP2023-14
[Poster Presentation]
Study on Fundamental Frequency Estimation Method with Robust to Noise or Reverberation
Takuma Yamakawa, Kenichiro Miwa (Salesian Polytechnic)
pp. 68 - 72
SP2023-15
Fast Neural Waveform Generation Model With Fully Connected Upsampling
Haruki Yamashita (Kobe cniv/NICT), Takuma Okamoto (NICT), Ryoichi Takashima (Kobe Univ), Yamato Ohtani (NICT), Tetsuya Takiguchi (Kobe Univ), Tomoki Toda (Nagoya Univ/NICT), Hisashi Kawai (NICT)
pp. 73 - 78
SP2023-16
Dilation of Time-Frequency Mask and Phase Restoration with Phase Difference Constraint for Dichotic Pitch Improvement
Daiki Sugawara, Taishi Nakashima, Natsuki Ueno, Nobutaka Ono (TMU)
pp. 79 - 82
SP2023-17
[Poster Presentation]
Development of gamification based vocal therapy support system.
Taketo Murai, Tatsuya Kitamura (Konan Univ.), Naoko Kawamura (Himeji Dokkyo Univ.)
pp. 83 - 85
SP2023-18
[Short Paper]
SBERT-based Musical Components Estimation from Lyrics Trained with Imbalanced "Orpheus" Data
Mastuti Puspitasari, Takuya Takahashi (UEC), Gen Hori (AU), Shigeki Sagayama, Toru Nakashika (UEC)
pp. 86 - 90
SP2023-19
Domain adaptation of speech recognition models based on self-supervised learning using target domain speech
Takahiro Kinouchi (TUT), Atsunori Ogawa (NTT), Yuko Wakabayashi, Norihide Kitaoka (TUT)
pp. 91 - 96
SP2023-20
Non-chord Tone Data Collection for Music Analysis and Generation
Takuya Takahashi, , Toru Nakashika, Shigeki Sagayama (UEC)
pp. 97 - 102
SP2023-21
[Poster Presentation]
Freezing response to distress calls and heart rate variability analysis in Japanese house bats
Kazuki Yoshino-Hashizawa (Doshisha Univ./JSPS), Yuna Nishiuchi, Midori Hiragochi, Motoki Kihara, Kohta I Kobayasi, Shizuko Hiryu (Doshisha Univ.)
pp. 103 - 108
SP2023-22
Automatic speech recognition model simultaneously recognizes linguistic information and verbal/non-verbal phenomena
Nagito Shione, Yukoh Wakabayashi, Norihide Kitaoka (TUT)
pp. 109 - 113
SP2023-23
Effect of pause length ratio in speech length on the perception of speech rate induced by speech length
Maho Tamakawa, Shuichi Sakamoto (Tohoku Univ.)
pp. 114 - 118
SP2023-24
Environmental Sound Separation Considering Separation Distortion and Remixing Error
Kanta Shimonishi, Takahiro Fukumori, Yoichi Yamashita (Ritsumeikan Univ.)
pp. 119 - 124
SP2023-25
Evaluation of multi-speaker text-to-speech synthesis using a corpus for speech recognition with x-vectors for various speech styles
Koki Hida (Wakayama Univ/NICT), Takuma Okamoto (NICT), Ryuichi Nisimura (Wakayama Univ), Yamato Ohtani (NICT), Tomoki Toda (Nagoya Univ/NICT), Hisashi Kawai (NICT)
pp. 125 - 130
SP2023-26
Kazuki Tokeshi, Toshie Matsui (Toyohashi UT)
pp. 131 - 136
Note: Each article is a technical report without peer review, and its polished version will be published elsewhere.