Paper Abstract and Keywords |
Presentation |
2008-03-20 15:15
[Poster Presentation]
Unsupervised Phoneme Segmentation Using Mahalanobis Distance Yu Qiao, Nobuaki Minematsu (Univ. of Tokyo) SP2007-198 |
Abstract |
(in Japanese) |
(See Japanese page) |
(in English) |
One of the fundamental problems in speech engineering is phoneme segmentation. Approaches to phoneme segmentation can be divided into two categories: supervised and unsupervised segmentation. The approach of this paper belongs to the 2nd category: that is we try to perform phonetic segmentation without using any prior knowledge on linguistic contents and acoustic models. In an earlier work, we have formulated the segmentation problem into a probabilistic optimization problem by using statistics and information theory analysis. We developed an objective function: summation of square error (SSE) based on Euclidean distance of cepstrual features. However, it is not known whether or not Euclidean distance yields the best distance metric to estimate the goodness of the segments. A popular generalization of Euclidean distance is Mahalanobis distance. In this paper, we study whether and how Mahalanobis distance can be used to improve the performance of segmentation. The essential problem here is how to determine the parameters (covariance matrix) for Mahalanobis distance calculation. We deal with this problem in a learning based framework and develop two criteria for determining the optimal parameters: MSV and MDV. MSV minimizes the summation of variance within-phoneme, and MDV tries to maximize the ratio of the variance between phonemes to the variance within phonemes. Both of them can lead to close form solutions by using matrix calculation. We carried out experiments on the TIMIT database to compare the proposed methods. The results indicate that the using of learning Mahalanobis distance can improve the segmentation performance. |
Keyword |
(in Japanese) |
(See Japanese page) |
(in English) |
Unsupervised phoneme segmentation / Optimization / Mahalanobis distance / Learning distance metric / / / / |
Reference Info. |
IEICE Tech. Rep., vol. 107, no. 551, SP2007-198, pp. 69-74, March 2008. |
Paper # |
SP2007-198 |
Date of Issue |
2008-03-13 (SP) |
ISSN |
Print edition: ISSN 0913-5685 Online edition: ISSN 2432-6380 |
Copyright and reproduction |
All rights are reserved and no part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Notwithstanding, instructors are permitted to photocopy isolated articles for noncommercial classroom use without fee. (License No.: 10GA0019/12GB0052/13GB0056/17GB0034/18GB0034) |
Download PDF |
SP2007-198 |
Conference Information |
Committee |
SP |
Conference Date |
2008-03-20 - 2008-03-21 |
Place (in Japanese) |
(See Japanese page) |
Place (in English) |
Univ. Tokyo |
Topics (in Japanese) |
(See Japanese page) |
Topics (in English) |
International Workshop (Mar 20), Speech Production, Speech Perception, Hearing and Speech, etc. (Mar 21) |
Paper Information |
Registration To |
SP |
Conference Code |
2008-03-SP |
Language |
English |
Title (in Japanese) |
(See Japanese page) |
Sub Title (in Japanese) |
(See Japanese page) |
Title (in English) |
Unsupervised Phoneme Segmentation Using Mahalanobis Distance |
Sub Title (in English) |
|
Keyword(1) |
Unsupervised phoneme segmentation |
Keyword(2) |
Optimization |
Keyword(3) |
Mahalanobis distance |
Keyword(4) |
Learning distance metric |
Keyword(5) |
|
Keyword(6) |
|
Keyword(7) |
|
Keyword(8) |
|
1st Author's Name |
Yu Qiao |
1st Author's Affiliation |
the University of Tokyo (Univ. of Tokyo) |
2nd Author's Name |
Nobuaki Minematsu |
2nd Author's Affiliation |
the University of Tokyo (Univ. of Tokyo) |
3rd Author's Name |
|
3rd Author's Affiliation |
() |
4th Author's Name |
|
4th Author's Affiliation |
() |
5th Author's Name |
|
5th Author's Affiliation |
() |
6th Author's Name |
|
6th Author's Affiliation |
() |
7th Author's Name |
|
7th Author's Affiliation |
() |
8th Author's Name |
|
8th Author's Affiliation |
() |
9th Author's Name |
|
9th Author's Affiliation |
() |
10th Author's Name |
|
10th Author's Affiliation |
() |
11th Author's Name |
|
11th Author's Affiliation |
() |
12th Author's Name |
|
12th Author's Affiliation |
() |
13th Author's Name |
|
13th Author's Affiliation |
() |
14th Author's Name |
|
14th Author's Affiliation |
() |
15th Author's Name |
|
15th Author's Affiliation |
() |
16th Author's Name |
|
16th Author's Affiliation |
() |
17th Author's Name |
|
17th Author's Affiliation |
() |
18th Author's Name |
|
18th Author's Affiliation |
() |
19th Author's Name |
|
19th Author's Affiliation |
() |
20th Author's Name |
|
20th Author's Affiliation |
() |
Speaker |
Author-1 |
Date Time |
2008-03-20 15:15:00 |
Presentation Time |
90 minutes |
Registration for |
SP |
Paper # |
SP2007-198 |
Volume (vol) |
vol.107 |
Number (no) |
no.551 |
Page |
pp.69-74 |
#Pages |
6 |
Date of Issue |
2008-03-13 (SP) |
|