IEICE Technical Committee Submission System
Conference Paper's Information
Online Proceedings
[Sign in]
Tech. Rep. Archives
 Go Top Page Go Previous   [Japanese] / [English] 

Paper Abstract and Keywords
Presentation 2014-06-21 16:35
Determining the number of topics for LDA method and evaluating extracted topics -- With an application to Twitter streaming data --
Iwao Fujino, Yuko Hoshino (Tokai Univ.) DE2014-16
Abstract (in Japanese) (See Japanese page) 
(in English) Topic model is an emerging approach to summarize data, especially text data, in terms of a small set of latent variables. The most useful implement of topic model is LDA method, which is an unsupervised machine learning technique to identify latent topic information from a massive document collection. However, sometimes the LDA method gives some hard-understanding or meaningless results. In order to improve this problem, in this paper we proposed a method for refining result of LDA and also ranking topics in order of some significant criterion. Our study is based on two assumptions. The first assumption is that the correlation coefficient between any two different topics should be zero under ideal condition. The second assumption is that the quality of topics can be defined as a deviation from usual word distribution. Starting from these two assumptions, we provided a concrete method to determine the number of topics when using LDA method to extract topics from documents data and also to ranking the LDA results in order of quality. As a confirmation of our proposed methods, we conducted some experiments to processing Twitter streaming data. The results of these experiments show that our methods work efficiently as expected.
Keyword (in Japanese) (See Japanese page) 
(in English) Topic model / LDA / Correlation coefficient / JS divergence / Twitter / / /  
Reference Info. IEICE Tech. Rep., vol. 114, no. 101, DE2014-16, pp. 67-72, June 2014.
Paper # DE2014-16 
Date of Issue 2014-06-14 (DE) 
ISSN Print edition: ISSN 0913-5685    Online edition: ISSN 2432-6380
Copyright
and
reproduction
All rights are reserved and no part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Notwithstanding, instructors are permitted to photocopy isolated articles for noncommercial classroom use without fee. (License No.: 10GA0019/12GB0052/13GB0056/17GB0034/18GB0034)
Download PDF DE2014-16

Conference Information
Committee DE  
Conference Date 2014-06-21 - 2014-06-21 
Place (in Japanese) (See Japanese page) 
Place (in English) Ricoh IT Solutions 
Topics (in Japanese) (See Japanese page) 
Topics (in English) Social computing 
Paper Information
Registration To DE 
Conference Code 2014-06-DE 
Language Japanese 
Title (in Japanese) (See Japanese page) 
Sub Title (in Japanese) (See Japanese page) 
Title (in English) Determining the number of topics for LDA method and evaluating extracted topics 
Sub Title (in English) With an application to Twitter streaming data 
Keyword(1) Topic model  
Keyword(2) LDA  
Keyword(3) Correlation coefficient  
Keyword(4) JS divergence  
Keyword(5) Twitter  
Keyword(6)  
Keyword(7)  
Keyword(8)  
1st Author's Name Iwao Fujino  
1st Author's Affiliation Tokai University (Tokai Univ.)
2nd Author's Name Yuko Hoshino  
2nd Author's Affiliation Tokai University (Tokai Univ.)
3rd Author's Name  
3rd Author's Affiliation ()
4th Author's Name  
4th Author's Affiliation ()
5th Author's Name  
5th Author's Affiliation ()
6th Author's Name  
6th Author's Affiliation ()
7th Author's Name  
7th Author's Affiliation ()
8th Author's Name  
8th Author's Affiliation ()
9th Author's Name  
9th Author's Affiliation ()
10th Author's Name  
10th Author's Affiliation ()
11th Author's Name  
11th Author's Affiliation ()
12th Author's Name  
12th Author's Affiliation ()
13th Author's Name  
13th Author's Affiliation ()
14th Author's Name  
14th Author's Affiliation ()
15th Author's Name  
15th Author's Affiliation ()
16th Author's Name  
16th Author's Affiliation ()
17th Author's Name  
17th Author's Affiliation ()
18th Author's Name  
18th Author's Affiliation ()
19th Author's Name  
19th Author's Affiliation ()
20th Author's Name  
20th Author's Affiliation ()
Speaker Author-2 
Date Time 2014-06-21 16:35:00 
Presentation Time 20 minutes 
Registration for DE 
Paper # DE2014-16 
Volume (vol) vol.114 
Number (no) no.101 
Page pp.67-72 
#Pages
Date of Issue 2014-06-14 (DE) 


[Return to Top Page]

[Return to IEICE Web Page]


The Institute of Electronics, Information and Communication Engineers (IEICE), Japan