itcsbanner.jpg
Conference Paper

TopicAnalyzer: A system for unsupervised multi-label Arabic topic categorization

By
Ezzat H.
Ezzat S.
El-Beltagy S.
Ghanem M.

The wide spread use of social media tools and forums has led to the production of textual data at unprecedented rates. Without summarization, classification or other form of analysis, the sheer volume of this data will often render it useless and human analysis on this scale is next to impossible. The work presented in this paper focuses on investigating an approach for classifying large volumes of data when no training data and no classification scheme are available. Motivation for this work lies in encountering a real life problem which is further described in the paper. The presented system TopicAnalyzer combines different features extraction, selection and classification methods to accommodate any textual data. The results of evaluating the presented system show that its accuracy is comparable to existing supervised classification systems. The paper also suggests an emergence of promising future work that can further enhance the presented results. © 2012 IEEE.