Paper Abstract and Keywords |
Presentation |
2014-06-21 16:35
Determining the number of topics for LDA method and evaluating extracted topics
-- With an application to Twitter streaming data -- Iwao Fujino, Yuko Hoshino (Tokai Univ.) DE2014-16 |
Abstract |
(in Japanese) |
(See Japanese page) |
(in English) |
Topic model is an emerging approach to summarize data, especially text data, in terms of a small set of latent variables. The most useful implement of topic model is LDA method, which is an unsupervised machine learning technique to identify latent topic information from a massive document collection. However, sometimes the LDA method gives some hard-understanding or meaningless results. In order to improve this problem, in this paper we proposed a method for refining result of LDA and also ranking topics in order of some significant criterion. Our study is based on two assumptions. The first assumption is that the correlation coefficient between any two different topics should be zero under ideal condition. The second assumption is that the quality of topics can be defined as a deviation from usual word distribution. Starting from these two assumptions, we provided a concrete method to determine the number of topics when using LDA method to extract topics from documents data and also to ranking the LDA results in order of quality. As a confirmation of our proposed methods, we conducted some experiments to processing Twitter streaming data. The results of these experiments show that our methods work efficiently as expected. |
Keyword |
(in Japanese) |
(See Japanese page) |
(in English) |
Topic model / LDA / Correlation coefficient / JS divergence / Twitter / / / |
Reference Info. |
IEICE Tech. Rep., vol. 114, no. 101, DE2014-16, pp. 67-72, June 2014. |
Paper # |
DE2014-16 |
Date of Issue |
2014-06-14 (DE) |
ISSN |
Print edition: ISSN 0913-5685 Online edition: ISSN 2432-6380 |
Copyright and reproduction |
All rights are reserved and no part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Notwithstanding, instructors are permitted to photocopy isolated articles for noncommercial classroom use without fee. (License No.: 10GA0019/12GB0052/13GB0056/17GB0034/18GB0034) |
Download PDF |
DE2014-16 |
Conference Information |
Committee |
DE |
Conference Date |
2014-06-21 - 2014-06-21 |
Place (in Japanese) |
(See Japanese page) |
Place (in English) |
Ricoh IT Solutions |
Topics (in Japanese) |
(See Japanese page) |
Topics (in English) |
Social computing |
Paper Information |
Registration To |
DE |
Conference Code |
2014-06-DE |
Language |
Japanese |
Title (in Japanese) |
(See Japanese page) |
Sub Title (in Japanese) |
(See Japanese page) |
Title (in English) |
Determining the number of topics for LDA method and evaluating extracted topics |
Sub Title (in English) |
With an application to Twitter streaming data |
Keyword(1) |
Topic model |
Keyword(2) |
LDA |
Keyword(3) |
Correlation coefficient |
Keyword(4) |
JS divergence |
Keyword(5) |
Twitter |
Keyword(6) |
|
Keyword(7) |
|
Keyword(8) |
|
1st Author's Name |
Iwao Fujino |
1st Author's Affiliation |
Tokai University (Tokai Univ.) |
2nd Author's Name |
Yuko Hoshino |
2nd Author's Affiliation |
Tokai University (Tokai Univ.) |
3rd Author's Name |
|
3rd Author's Affiliation |
() |
4th Author's Name |
|
4th Author's Affiliation |
() |
5th Author's Name |
|
5th Author's Affiliation |
() |
6th Author's Name |
|
6th Author's Affiliation |
() |
7th Author's Name |
|
7th Author's Affiliation |
() |
8th Author's Name |
|
8th Author's Affiliation |
() |
9th Author's Name |
|
9th Author's Affiliation |
() |
10th Author's Name |
|
10th Author's Affiliation |
() |
11th Author's Name |
|
11th Author's Affiliation |
() |
12th Author's Name |
|
12th Author's Affiliation |
() |
13th Author's Name |
|
13th Author's Affiliation |
() |
14th Author's Name |
|
14th Author's Affiliation |
() |
15th Author's Name |
|
15th Author's Affiliation |
() |
16th Author's Name |
|
16th Author's Affiliation |
() |
17th Author's Name |
|
17th Author's Affiliation |
() |
18th Author's Name |
|
18th Author's Affiliation |
() |
19th Author's Name |
|
19th Author's Affiliation |
() |
20th Author's Name |
|
20th Author's Affiliation |
() |
Speaker |
Author-2 |
Date Time |
2014-06-21 16:35:00 |
Presentation Time |
20 minutes |
Registration for |
DE |
Paper # |
DE2014-16 |
Volume (vol) |
vol.114 |
Number (no) |
no.101 |
Page |
pp.67-72 |
#Pages |
6 |
Date of Issue |
2014-06-14 (DE) |
|