EasyAug: An Automatic Textual Data Augmentation Platform for Classification Tasks

Siyuan Qiu, Binxia Xu, Jie Zhang, Yafang Wang, Xiaoyu Shen, Gerard De Melo, Chong Long, Xiaolong Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

Imbalanced data is a perennial problem that impedes the learning abilities of current machine learning-based classification models. One approach to address it is to leverage data augmentation to expand the training set. For image data, there are a number of suitable augmentation techniques that have proven effective in previous work. For textual data, however, due to the discrete units inherent in natural language, techniques that randomly perturb the signal may be ineffective. Additionally, due to the substantial discrepancy between different textual datasets (e.g., different domains), an augmentation approach that facilitates the classification on one dataset may be detrimental on another dataset. For practitioners, comparing different data augmentation techniques is non-trivial, as the corresponding methods might need to be incorporated into different system architectures, and the implementation of some approaches, such as generative models, is laborious. To address these challenges, we develop EasyAug, a data augmentation platform that provides several augmentation approaches. Users can conveniently compare the classification results and can easily choose the most suitable one for their own dataset. In addition, the system is extensible and can incorporate further augmentation approaches, such that with minimal effort a new method can comprehensively be compared with the baselines.

Original languageEnglish (US)
Title of host publicationThe Web Conference 2020 - Companion of the World Wide Web Conference, WWW 2020
PublisherAssociation for Computing Machinery
Pages249-252
Number of pages4
ISBN (Electronic)9781450370240
DOIs
StatePublished - Apr 20 2020
Externally publishedYes
Event29th International World Wide Web Conference, WWW 2020 - Taipei, Taiwan, Province of China
Duration: Apr 20 2020Apr 24 2020

Publication series

NameThe Web Conference 2020 - Companion of the World Wide Web Conference, WWW 2020

Conference

Conference29th International World Wide Web Conference, WWW 2020
Country/TerritoryTaiwan, Province of China
CityTaipei
Period4/20/204/24/20

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Software

Keywords

  • data augmentation
  • imbalanced data
  • model fusion
  • text classification
  • text generation

Fingerprint

Dive into the research topics of 'EasyAug: An Automatic Textual Data Augmentation Platform for Classification Tasks'. Together they form a unique fingerprint.

Cite this