講演抄録/キーワード |
講演名 |
2022-08-26 11:42
Study on Relationship Between Speakers' Physiological Structure and Acoustic Speech Signals: Data-Driven Study Based on Frequency-Wise Attentional Neural Network ○Li Kai(JAIST)・Xugang Lu(NICT)・Masato Akagi・Jianwu Dang(JAIST)・Sheng Li(NICT)・Unoki Masashi(JAIST) SIP2022-68 |
抄録 |
(和) |
(まだ登録されていません) |
(英) |
Quantitatively revealing the relationship between speakers’ physiological structure and acoustic speech signals by considering the properties of resonance and antiresonance can help us to extract effective speaker discriminative information (SDI) from speech signals. The conventional quantification method based on F-ratio only considers the power of acoustic speech in each frequency band independently. We propose a novel frequency-wise attentional neural network to learn the nonlinear combined effect of the frequency components on speaker identity. The learned results indicate that antiresonance frequency induced by the nasal cavity is another essential factor
for speaker discrimination that the F-ratio method could not reveal. To further evaluate our findings, we designed a non-uniform subband processing strategy based on the learned results for speaker feature extraction and did automatic speaker verification (ASV). The ASV results confirmed that further emphasizing the spectral structure
around the antiresonance frequency region can enhance speaker discrimination. |
キーワード |
(和) |
/ / / / / / / |
(英) |
physiological feature / non-uniform filterbank / frequency-wise attention / data-driven feature / / / / |
文献情報 |
信学技報, vol. 122, no. 165, SIP2022-68, pp. 97-102, 2022年8月. |
資料番号 |
SIP2022-68 |
発行日 |
2022-08-18 (SIP) |
ISSN |
Online edition: ISSN 2432-6380 |
著作権に ついて |
技術研究報告に掲載された論文の著作権は電子情報通信学会に帰属します.(許諾番号:10GA0019/12GB0052/13GB0056/17GB0034/18GB0034) |
PDFダウンロード |
SIP2022-68 |