EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks

NLU Comments

Paper Link

Overview

  1. EDA consists of four simple but powerful operations: synonym replacement, random insertion, random swap, and random deletion.
  2. Training with EDA while using only 50% of the available training set achieved the same accuracy as normal training with all available data.

Architecture

EDA Architecture
  1. Synonym Replacement (SR): Randomly choose n words from the sentence that are not stop words. Replace each of these words with one of its synonyms chosen at random.
  2. Random Insertion (RI): Find a random synonym of a random word in the sentence that is not a stop word. Insert that synonym into a random position in the sentence. Do this n times.
  3. Random Swap (RS): Randomly choose two words in the sentence and swap their positions. Do this n times.
  4. Random Deletion (RD): Randomly remove each word in the sentence with probability p.

Results

EDA Results

Kaushik Rangadurai

Code. Learn. Explore

Share this post

Comments