Multimodal voice conversion using deep bottleneck features and deep canonical correlation analysis

Tamura,Satoshi; Horio,Kento; Endo,Hajime; Hayamizu,Satoru; Toda,Tomoki

Information: Join today and make your research activities more affordable! Technical workshop participation fees and annual registration fees are available at member rates.
Notice: [Important] Announcement of Changes to Registration Fee Payment and Manuscript Upload Procedures for IEICE Technical Meetings

IEICE Technical Committee Submission System
Conference Paper's Information

Online Proceedings
[Sign in]
Tech. Rep. Archives

Go Top Page

Go Previous

[Japanese] / [English]

Paper Abstract and Keywords
Presentation		2018-06-28 15:10 Multimodal voice conversion using deep bottleneck features and deep canonical correlation analysis Satoshi Tamura, Kento Horio, Hajime Endo, Satoru Hayamizu (Gifu Univ.), Tomoki Toda (Nagoya Univ.) PRMU2018-24 SP2018-4
Abstract	(in Japanese)	(See Japanese page)
	(in English)	In this paper, we aim at improving the speech quality in voice conversion and propose a novel multi-modal voice conversion approach using speech waveforms and lip images. We employ deep bottleneck features to improve visual features in audio-visual voice conversion. In addition, we also apply deep canonical correlation analysis to obtain much better audio and visual representations, as well as to build a new cross-modal framework. We conducted subjective and objective evaluations in noisy environments to clarify usefulness of our proposed method, comparing to audio-only, visual-only and conventional audio-visual voice conversion schemes. We then found our method can significantly improve the quality even in heavily noisy conditions.
Keyword	(in Japanese)	(See Japanese page)
	(in English)	Voice conversion / multi-modal / audio-visual / cross-modal / deep learning / bottleneck feature / canonical component analysis /
Reference Info.		IEICE Tech. Rep., vol. 118, no. 112, SP2018-4, pp. 13-18, June 2018.
Paper #		SP2018-4
Date of Issue		2018-06-21 (PRMU, SP)
ISSN		Online edition: ISSN 2432-6380
Copyright and reproduction		All rights are reserved and no part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Notwithstanding, instructors are permitted to photocopy isolated articles for noncommercial classroom use without fee. (License No.: 10GA0019/12GB0052/13GB0056/17GB0034/18GB0034)
Download PDF		PRMU2018-24 SP2018-4

Conference Information
Committee	PRMU SP
Conference Date	2018-06-28 - 2018-06-29
Place (in Japanese)	(See Japanese page)
Place (in English)
Topics (in Japanese)	(See Japanese page)
Topics (in English)
Paper Information
Registration To	SP
Conference Code	2018-06-PRMU-SP
Language	Japanese
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Multimodal voice conversion using deep bottleneck features and deep canonical correlation analysis
Sub Title (in English)
Keyword(1)	Voice conversion
Keyword(2)	multi-modal
Keyword(3)	audio-visual
Keyword(4)	cross-modal
Keyword(5)	deep learning
Keyword(6)	bottleneck feature
Keyword(7)	canonical component analysis
Keyword(8)
1st Author's Name	Satoshi Tamura
1st Author's Affiliation	Gifu University (Gifu Univ.)
2nd Author's Name	Kento Horio
2nd Author's Affiliation	Gifu University (Gifu Univ.)
3rd Author's Name	Hajime Endo
3rd Author's Affiliation	Gifu University (Gifu Univ.)
4th Author's Name	Satoru Hayamizu
4th Author's Affiliation	Gifu University (Gifu Univ.)
5th Author's Name	Tomoki Toda
5th Author's Affiliation	Nagoya University (Nagoya Univ.)
6th Author's Name
6th Author's Affiliation	()
7th Author's Name
7th Author's Affiliation	()
8th Author's Name
8th Author's Affiliation	()
9th Author's Name
9th Author's Affiliation	()
10th Author's Name
10th Author's Affiliation	()
11th Author's Name
11th Author's Affiliation	()
12th Author's Name
12th Author's Affiliation	()
13th Author's Name
13th Author's Affiliation	()
14th Author's Name
14th Author's Affiliation	()
15th Author's Name
15th Author's Affiliation	()
16th Author's Name
16th Author's Affiliation	()
17th Author's Name
17th Author's Affiliation	()
18th Author's Name
18th Author's Affiliation	()
19th Author's Name
19th Author's Affiliation	()
20th Author's Name
20th Author's Affiliation	()
21st Author's Name
21st Author's Affiliation	()
22nd Author's Name
22nd Author's Affiliation	()
23rd Author's Name
23rd Author's Affiliation	()
24th Author's Name
24th Author's Affiliation	()
25th Author's Name
25th Author's Affiliation	()
26th Author's Name	/ /
26th Author's Affiliation	() ()
27th Author's Name	/ /
27th Author's Affiliation	() ()
28th Author's Name	/ /
28th Author's Affiliation	() ()
29th Author's Name	/ /
29th Author's Affiliation	() ()
30th Author's Name	/ /
30th Author's Affiliation	() ()
31st Author's Name	/ /
31st Author's Affiliation	() ()
32nd Author's Name	/ /
32nd Author's Affiliation	() ()
33rd Author's Name	/ /
33rd Author's Affiliation	() ()
34th Author's Name	/ /
34th Author's Affiliation	() ()
35th Author's Name	/ /
35th Author's Affiliation	() ()
36th Author's Name	/ /
36th Author's Affiliation	() ()
Speaker	Author-1
Date Time	2018-06-28 15:10:00
Presentation Time	30 minutes
Registration for	SP
Paper #	PRMU2018-24, SP2018-4
Volume (vol)	vol.118
Number (no)	no.111(PRMU), no.112(SP)
Page	pp.13-18
#Pages	6
Date of Issue	2018-06-21 (PRMU, SP)

[Return to Top Page]

[Return to IEICE Web Page]

The Institute of Electronics, Information and Communication Engineers (IEICE), Japan