Paper Abstract and Keywords |
Presentation |
2010-05-15 09:30
Detection of Inconsistency between Face and Speaker based on the Co-occurrence of Lip Motion and Audio Features Shogo Kumagai, Keisuke Doman (Nagoya Univ.), Tomokazu Takahashi (Gifu Shotoku Gakuen Univ.), Daisuke Deguchi (Nagoya Univ.), Ichiro Ide (Nagoya Univ./NII), Hiroshi Murase (Nagoya Univ.) MVE2010-13 |
Abstract |
(in Japanese) |
(See Japanese page) |
(in English) |
We propose a method for detection of inconsistency between face and speaker to extract speech scenes in news videos.High co-occurrence of lip motion and audio features is observed in speech scenes where the face matches the speaker.Focusing on this, our method detects inconsistency between face and speaker with feature vectors based on correlations between image features from lip motions and audio features from speech waveform.We obtained up to $78.3\%$ detection accuracy in our experiments, which showed the effectiveness of our method. |
Keyword |
(in Japanese) |
(See Japanese page) |
(in English) |
auditory-visual integration / news video / speech scene extraction / normalized cross correlation / / / / |
Reference Info. |
IEICE Tech. Rep., vol. 110, no. 35, MVE2010-13, pp. 51-52, May 2010. |
Paper # |
MVE2010-13 |
Date of Issue |
2010-05-07 (MVE) |
ISSN |
Print edition: ISSN 0913-5685 Online edition: ISSN 2432-6380 |
Copyright and reproduction |
All rights are reserved and no part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Notwithstanding, instructors are permitted to photocopy isolated articles for noncommercial classroom use without fee. (License No.: 10GA0019/12GB0052/13GB0056/17GB0034/18GB0034) |
Download PDF |
MVE2010-13 |
|