Automatic classification of accounting literature

Vasundhara Chakraborty, Victoria Chiu, Miklos Vasarhelyi

Research output: Contribution to journalArticlepeer-review

13 Scopus citations


This paper explores the possibility of using semantic parsing, information retrieval and data mining techniques to automatically classify accounting research. Literature taxonomization plays a critical role in understanding a discipline's knowledge attributes and structure. The traditional research classification is a manual process which is considerably time consuming and may introduce inconsistent classifications by different experts. Aiming at aiding this classification issue, this study conducted three studies to seek the most effective and accurate method to classify accounting publications' attributes. We found results in the third study most rewarding in which the classification accuracy reached 87.27% with decision trees and rule-based algorithms applied. Findings in the first and second studies also provided valuable implications on automatic literature classifications, e.g. abstracts are better measures to use than keywords and balancing under-represented subclasses does not contribute to more accurate classifications. All three studies' results also suggest that expanding article sample size is a key to strengthen automatic classification accuracy. Overall, the potential path of this line of research seems to be very promising and would have several collateral benefits and applications.

Original languageEnglish (US)
Pages (from-to)122-148
Number of pages27
JournalInternational Journal of Accounting Information Systems
Issue number2
StatePublished - Jun 2014

ASJC Scopus subject areas

  • Management Information Systems
  • Accounting
  • Finance
  • Information Systems and Management


  • Accounting literature
  • Attributes
  • Automatic classification
  • Data mining
  • Semantic parsing
  • Taxonomy


Dive into the research topics of 'Automatic classification of accounting literature'. Together they form a unique fingerprint.

Cite this