EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks
- EDA consists of four simple but powerful operations: synonym replacement, random insertion, random swap, and random deletion.
- Training with EDA while using only 50% of the available training set achieved the same accuracy as normal training with all available data.
- Synonym Replacement (SR): Randomly choose n words from the sentence that are not stop words. Replace each of these words with one of its synonyms chosen at random.
- Random Insertion (RI): Find a random synonym of a random word in the sentence that is not a stop word. Insert that synonym into a random position in the sentence. Do this n times.
- Random Swap (RS): Randomly choose two words in the sentence and swap their positions. Do this n times.
- Random Deletion (RD): Randomly remove each word in the sentence with probability p.
Code. Learn. Explore