Paper Abstract and Keywords |
Presentation |
2023-03-03 09:10
Study on Analysis of Amplitude and Frequency Perturbation in the Voice for Fake Audio Detection Kai Li, Yao Wang, Minh Le Nguyen, Masato Akagi, Masashi Unoki (JAIST) EMM2022-88 |
Abstract |
(in Japanese) |
(See Japanese page) |
(in English) |
Fake audio detection (FAD) aims to detect fake speech generated by advanced voice conversion and text-to-speech technologies. Recently, the quality of synthesized speech has significantly improved due to the remarkable development of deep neural networks. However, it is still easy for humans to identify fake speech by perceiving pathological prosody in a voice. Pathological prosody is significantly related to the amplitude and frequency perturbation (AFP) in the voice and provides essential cues to identify fake speech. This paper proposed to analyze AFP differences in the voice using the jitter and shimmer features. According to the statistical analysis of AFP features, the continuous-shimmer feature (CS3) can effectively separate genuine and fake speech signals. Moreover, static and dynamic CS3 features were combined with a light convolutional neural network bidirectional long short-term memory (LCNN-BLSTM)-based FAD system, and experiments on datasets of the Audio Deep Synthesis Detection Challenge (ADD2022) were carried out. The results of the experiments show that both the static and dynamic shimmer features of voice can provide complementary knowledge to the traditional spectrum-based FAD systems. |
Keyword |
(in Japanese) |
(See Japanese page) |
(in English) |
fake audio detection / prosodic feature / amplitude and frequency perturbation / jitter and shimmer / / / / |
Reference Info. |
IEICE Tech. Rep., vol. 122, no. 412, EMM2022-88, pp. 110-115, March 2023. |
Paper # |
EMM2022-88 |
Date of Issue |
2023-02-23 (EMM) |
ISSN |
Online edition: ISSN 2432-6380 |
Copyright and reproduction |
All rights are reserved and no part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Notwithstanding, instructors are permitted to photocopy isolated articles for noncommercial classroom use without fee. (License No.: 10GA0019/12GB0052/13GB0056/17GB0034/18GB0034) |
Download PDF |
EMM2022-88 |
|