Paper Abstract and Keywords |
Presentation |
2008-09-22 12:45
[Invited Talk]
Data Stream Processing Research at IMC of East China Normal University Aoying Zhou, Cheqing Jin, Weining Qian (East China Normal Univ.) DE2008-49 |
Abstract |
(in Japanese) |
(See Japanese page) |
(in English) |
Data stream processing has been attracting more and more attention in research and industry communities due to its broad potential applications. In this talk, we would like to introduce briefly the research work which have been done in our group. Our research interests on data streams are frequent item(set)s mining, clustering, and burst detection over data streams. Some work on practical application and some consideration on future work will be introduced as well.
For the basic problem of mining frequent items over data streams, an algorithm, called hCount is proposed. It is of low space complexity, low per-tuple processing cost, and high recall and precision. Then, for mining of the frequent itemsets, we develop a new false-negative frequent itemset mining algorithm which can get a condensed representation of frequent itemsets in transactional data streams by discovering a false negative collection of some special itemsets that covers frequent itemsets with high probability with respect to set inclusion relationship among itemsets.
Our research on data stream mining was focusing on clustering of data streams. SWClustering is the algorithm we proposed to cluster data streams over sliding windows, and EHCF (Exponential Histogram of Cluster Features) is the synopsis to maintain the statistic information of clusters in sliding windows. With SWClustering, not only the changing distribution of clusters but also the evolving behaviors of individual clusters could be captured. CluDistream is for clustering distributed data streams, which can effectively handle a huge volume of data with noisy, corrupted or incomplete data records generated in distributed enviornment. In CluDistream, the EM-based (Expectation Maximization) algorithms, each data record is assigned to a cluster with certain degree of membership.
The other important piece of work is on burst detection or monitoring over data streams. The fractal analysis method is adapted to enable the monitoring of both monotonic and non-monotonic aggregates on time changing data stream. The monotony property of aggregate monitoring is revealed and monotonic search space is built to decrease the time overhead for detecting bursts from O(m) to O(log m), where m is the number of windows to be monitored. With the help of a novel piecewise fractal model, the statistical summary is compressed to be fit in limited main memory, so that high aggregates on windows of any length can be detected accurately and efficiently on-line.
A practical data stream processing system for telecommunication network flow data analysis will be also introduced in this talk. |
Keyword |
(in Japanese) |
(See Japanese page) |
(in English) |
Data stream processing / Frequent item / Clustering / Burst Detection / / / / |
Reference Info. |
IEICE Tech. Rep., vol. 108, no. 211, DE2008-49, pp. 39-40, Sept. 2008. |
Paper # |
DE2008-49 |
Date of Issue |
2008-09-14 (DE) |
ISSN |
Print edition: ISSN 0913-5685 Online edition: ISSN 2432-6380 |
Copyright and reproduction |
All rights are reserved and no part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Notwithstanding, instructors are permitted to photocopy isolated articles for noncommercial classroom use without fee. (License No.: 10GA0019/12GB0052/13GB0056/17GB0034/18GB0034) |
Download PDF |
DE2008-49 |
Conference Information |
Committee |
DE |
Conference Date |
2008-09-21 - 2008-09-22 |
Place (in Japanese) |
(See Japanese page) |
Place (in English) |
|
Topics (in Japanese) |
(See Japanese page) |
Topics (in English) |
|
Paper Information |
Registration To |
DE |
Conference Code |
2008-09-DE |
Language |
English |
Title (in Japanese) |
(See Japanese page) |
Sub Title (in Japanese) |
(See Japanese page) |
Title (in English) |
Data Stream Processing Research at IMC of East China Normal University |
Sub Title (in English) |
|
Keyword(1) |
Data stream processing |
Keyword(2) |
Frequent item |
Keyword(3) |
Clustering |
Keyword(4) |
Burst Detection |
Keyword(5) |
|
Keyword(6) |
|
Keyword(7) |
|
Keyword(8) |
|
1st Author's Name |
Aoying Zhou |
1st Author's Affiliation |
East China Normal University (East China Normal Univ.) |
2nd Author's Name |
Cheqing Jin |
2nd Author's Affiliation |
East China Normal University (East China Normal Univ.) |
3rd Author's Name |
Weining Qian |
3rd Author's Affiliation |
East China Normal University (East China Normal Univ.) |
4th Author's Name |
|
4th Author's Affiliation |
() |
5th Author's Name |
|
5th Author's Affiliation |
() |
6th Author's Name |
|
6th Author's Affiliation |
() |
7th Author's Name |
|
7th Author's Affiliation |
() |
8th Author's Name |
|
8th Author's Affiliation |
() |
9th Author's Name |
|
9th Author's Affiliation |
() |
10th Author's Name |
|
10th Author's Affiliation |
() |
11th Author's Name |
|
11th Author's Affiliation |
() |
12th Author's Name |
|
12th Author's Affiliation |
() |
13th Author's Name |
|
13th Author's Affiliation |
() |
14th Author's Name |
|
14th Author's Affiliation |
() |
15th Author's Name |
|
15th Author's Affiliation |
() |
16th Author's Name |
|
16th Author's Affiliation |
() |
17th Author's Name |
|
17th Author's Affiliation |
() |
18th Author's Name |
|
18th Author's Affiliation |
() |
19th Author's Name |
|
19th Author's Affiliation |
() |
20th Author's Name |
|
20th Author's Affiliation |
() |
Speaker |
Author-1 |
Date Time |
2008-09-22 12:45:00 |
Presentation Time |
45 minutes |
Registration for |
DE |
Paper # |
DE2008-49 |
Volume (vol) |
vol.108 |
Number (no) |
no.211 |
Page |
pp.39-40 |
#Pages |
2 |
Date of Issue |
2008-09-14 (DE) |
|