Multimodal Data Augmentation for Visual Speech Recognition using Deep Canonical Correlation Analysis

Shimonishi,Masaki; Tamura,Satoshi; Hayamizu,Satoru

Information: Join today and make your research activities more affordable! Technical workshop participation fees and annual registration fees are available at member rates.
Notice: [Important] Announcement of Changes to Registration Fee Payment and Manuscript Upload Procedures for IEICE Technical Meetings

IEICE Technical Committee Submission System
Conference Paper's Information

Online Proceedings
[Sign in]
Tech. Rep. Archives

Go Top Page

Go Previous

[Japanese] / [English]

Paper Abstract and Keywords
Presentation		2019-01-27 11:30 Multimodal Data Augmentation for Visual Speech Recognition using Deep Canonical Correlation Analysis Masaki Shimonishi, Satoshi Tamura, Satoru Hayamizu (Gifu University) SP2018-60
Abstract	(in Japanese)	(See Japanese page)
	(in English)	This paper proposes ta new data augmentation strategy for deep learning, in which feature vectors in one modality can be used as an additional training data set in the target modality by means of Deep Canonical Correlation Analysis (DCCA). Particularly, this work applies our scheme to visual speech recognition, i.e. lipread- ing, employing audio training feature vectors. A long short-term memory is chosen as a recognition model, which is built using visual training data and augmented data from the audio modality. We evaluated our method using an audio-visual corpus, preparing audio data by adding acoustic noises. Experimental results show applying DCCA enables us to augment training data from the other modalities, and it is turned out that we can further improve the model by implicitly utilize information and knowledge in those modalities.
Keyword	(in Japanese)	(See Japanese page)
	(in English)	lip reading / deep learning / deep canonical correlation analysis / multimodal speech recognition / / / /
Reference Info.		IEICE Tech. Rep., vol. 118, no. 426, SP2018-60, pp. 41-45, Jan. 2019.
Paper #		SP2018-60
Date of Issue		2019-01-19 (SP)
ISSN		Online edition: ISSN 2432-6380
Copyright and reproduction		All rights are reserved and no part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Notwithstanding, instructors are permitted to photocopy isolated articles for noncommercial classroom use without fee. (License No.: 10GA0019/12GB0052/13GB0056/17GB0034/18GB0034)
Download PDF		SP2018-60

Conference Information
Committee	SP
Conference Date	2019-01-26 - 2019-01-27
Place (in Japanese)	(See Japanese page)
Place (in English)	Kanazawa-Harmonie
Topics (in Japanese)	(See Japanese page)
Topics (in English)	Speech Synthesis, Generation, Prosody, Emergency Broadcast, etc.
Paper Information
Registration To	SP
Conference Code	2019-01-SP
Language	Japanese
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Multimodal Data Augmentation for Visual Speech Recognition using Deep Canonical Correlation Analysis
Sub Title (in English)
Keyword(1)	lip reading
Keyword(2)	deep learning
Keyword(3)	deep canonical correlation analysis
Keyword(4)	multimodal speech recognition
Keyword(5)
Keyword(6)
Keyword(7)
Keyword(8)
1st Author's Name	Masaki Shimonishi
1st Author's Affiliation	Gifu University (Gifu University)
2nd Author's Name	Satoshi Tamura
2nd Author's Affiliation	Gifu University (Gifu University)
3rd Author's Name	Satoru Hayamizu
3rd Author's Affiliation	Gifu University (Gifu University)
4th Author's Name
4th Author's Affiliation	()
5th Author's Name
5th Author's Affiliation	()
6th Author's Name
6th Author's Affiliation	()
7th Author's Name
7th Author's Affiliation	()
8th Author's Name
8th Author's Affiliation	()
9th Author's Name
9th Author's Affiliation	()
10th Author's Name
10th Author's Affiliation	()
11th Author's Name
11th Author's Affiliation	()
12th Author's Name
12th Author's Affiliation	()
13th Author's Name
13th Author's Affiliation	()
14th Author's Name
14th Author's Affiliation	()
15th Author's Name
15th Author's Affiliation	()
16th Author's Name
16th Author's Affiliation	()
17th Author's Name
17th Author's Affiliation	()
18th Author's Name
18th Author's Affiliation	()
19th Author's Name
19th Author's Affiliation	()
20th Author's Name
20th Author's Affiliation	()
21st Author's Name
21st Author's Affiliation	()
22nd Author's Name
22nd Author's Affiliation	()
23rd Author's Name
23rd Author's Affiliation	()
24th Author's Name
24th Author's Affiliation	()
25th Author's Name
25th Author's Affiliation	()
26th Author's Name	/ /
26th Author's Affiliation	() ()
27th Author's Name	/ /
27th Author's Affiliation	() ()
28th Author's Name	/ /
28th Author's Affiliation	() ()
29th Author's Name	/ /
29th Author's Affiliation	() ()
30th Author's Name	/ /
30th Author's Affiliation	() ()
31st Author's Name	/ /
31st Author's Affiliation	() ()
32nd Author's Name	/ /
32nd Author's Affiliation	() ()
33rd Author's Name	/ /
33rd Author's Affiliation	() ()
34th Author's Name	/ /
34th Author's Affiliation	() ()
35th Author's Name	/ /
35th Author's Affiliation	() ()
36th Author's Name	/ /
36th Author's Affiliation	() ()
Speaker	Author-1
Date Time	2019-01-27 11:30:00
Presentation Time	25 minutes
Registration for	SP
Paper #	SP2018-60
Volume (vol)	vol.118
Number (no)	no.426
Page	pp.41-45
#Pages	5
Date of Issue	2019-01-19 (SP)

[Return to Top Page]

[Return to IEICE Web Page]

The Institute of Electronics, Information and Communication Engineers (IEICE), Japan