Topic Mining over Asynchronous Text Sequences
Time stamped texts, or
text sequences, are ubiquitous in real-world applications. Multiple text
sequences are often related to each other by sharing common topics. The
correlation among these sequences provides more meaningful and comprehensive
clues for topic mining than those from each individual sequence. However, it is
nontrivial to explore the correlation with the existence of asynchronism among
multiple sequences, i.e., documents from different sequences about the same
topic may have different time stamps. In this paper, we formally address this
problem and put forward a novel algorithm based on the generative topic model.
Our algorithm consists of two alternate steps: the first step extracts common
topics from multiple sequences based on the adjusted time stamps provided by
the second step; the second step adjusts the time stamps of the documents
according to the time distribution of the topics discovered by the first step.
We perform these two steps alternately and after iterations a monotonic convergence
of our objective function can be guaranteed. The effectiveness and advantage of
our approach were justified through extensive empirical studies on two real
data sets consisting of six research paper repositories and two news article
feeds, respectively
No comments:
Post a Comment