Automatic Topics Extraction from Artificial Intelligence Research Papers


This paper presents a method to extract topics of scientific papers from their abstracts. Topics are assumed to be represented by categories forming a hierarchy. The problem is thus a multi-class multi-label text categorization one. Firstly, we propose to use a dummy node to represent the remaining sub-categories of each category in the hierarchy. The problem is then reduced to one of learning only categories at the leaf nodes. Secondly, we apply the Naive Bayes learning technique to do multi-label text categorization. Experiments have been carried out on a corpus of about 500 papers in Artificial Intelligence, with the precision recall breakeven point of around 0.6.