Perturbations
cleva_perturbation
CLEVAMildMixPerturbation()
CLEVA robustness perturbation that composes several perturbations.
chinese_typos_perturbation = ChineseTyposPerturbation(0.05)
instance-attribute
synonym_perturbation = ChineseSynonymPerturbation(0.3)
instance-attribute
ChineseGenderPerturbation(mode: str, prob: float, source_class: str, target_class: str)
Individual fairness perturbation for Chinese gender terms and pronouns.
| Parameters: |
|
|---|
ChinesePersonNamePerturbation(prob: float, source_class: Dict[str, str], target_class: Dict[str, str], preserve_gender: bool = True)
Individual fairness perturbation for Chinese person names.
Code adopted from https://github.com/stanford-crfm/helm/blob/main/src/helm/benchmark/augmentations/person_name_perturbation.py
| Parameters: |
|
|---|
ChineseSynonymPerturbation(prob: float, trial_num: int = 10)
Chinese synonyms. For implementation details, see https://github.com/GEM-benchmark/NL-Augmenter/blob/main/nlaugmenter/transformations/chinese_antonym_synonym_substitution
This perturbation adds noise to a text source by randomly inserting synonyms of randomly selected words excluding punctuations and stopwords.
Perturbation example:
Input: 裸婚,这里的“裸”,指物质财富匮乏的情况下结婚,例如:无房无车无存款,有时候用于强调现实的无奈,也有时候用于强调人对情感的关注。
Output: 裸婚,这里底“裸”,指物质财富匮乏的情况下结婚,譬如说:无房无车无储蓄,有时候用于强调现实的无奈,亦有时候用来强调人士对情感的关注。
ChineseTyposPerturbation(prob: float, rare_char_prob: float = 0.05, consider_tone: bool = False, word_level_perturb: bool = True)
Chinese typos. For implementation details, see https://github.com/GEM-benchmark/NL-Augmenter/tree/main/nlaugmenter/transformations/chinese_butter_fingers_perturbation
This perturbation adds noise to a text source by randomly replacing Chinese characters or words by other characters or words that share a similar Pinyin.
Perturbation example:
Input: 我想买一部新手机。
Output: 我想买一部新收集。
MandarinToCantonesePerturbation()
Individual fairness perturbation for Mandarin to Cantonese translation. The implementation is inspired by https://justyy.com/tools/chinese-converter/
Note that this is a rule-based translation system and there are limitations.
SimplifiedToTraditionalPerturbation()
Individual fairness perturbation for Chinese simplified to Chinese traditional.
contraction_expansion_perturbation
ContractionPerturbation()
Contractions. Replaces each expansion with its contracted version.
Perturbation example:
Input: She is a doctor, and I am a student
Output: She's a doctor, and I'm a student
ExpansionPerturbation()
Expansions. Replaces each contraction with its expanded version.
Perturbation example:
Input: She's a doctor, and I'm a student
Output: She is a doctor, and I am a student
contrast_sets_perturbation
ContrastSetsPerturbation()
Contrast Sets are from this paper (currently supported for the BoolQ and IMDB scenarios): https://arxiv.org/abs/2004.02709
Original repository can be found at: https://github.com/allenai/contrast-sets
An example instance of a perturbation for the BoolQ dataset:
The Fate of the Furious premiered in Berlin on April 4, 2017, and was theatrically released in the United States on
April 14, 2017, playing in 3D, IMAX 3D and 4DX internationally. . . A spinoff film starring Johnson and Statham’s
characters is scheduled for release in August 2019, while the ninth and tenth films are scheduled for releases on
the years 2020 and 2021.
question: is “Fate and the Furious” the last movie?
answer: no
perturbed question: is “Fate and the Furious” the first of multiple movies?
perturbed answer: yes
perturbation strategy: adjective change.
An example instance of a perturbation for the IMDB dataset (from the original paper):
Original instance: Hardly one to be faulted for his ambition or his vision, it is genuinely unexpected, then, to see
all Park’s effort add up to so very little. . . . The premise is promising, gags are copious and offbeat humour
abounds but it all fails miserably to create any meaningful connection with the audience.
Sentiment: negative
Perturbed instance: Hardly one to be faulted for his ambition or his vision, here we
see all Park’s effort come to fruition. . . . The premise is perfect, gags are
hilarious and offbeat humour abounds, and it creates a deep connection with the
audience.
Sentiment: positive
dialect_perturbation
DialectPerturbation(prob: float, source_class: str, target_class: str, mapping_file_path: Optional[str] = None)
Individual fairness perturbation for dialect.
If mapping_file_path is not provided, (source_class, target_class) should be ("SAE", "AAVE").
| Parameters: |
|
|---|
extra_space_perturbation
ExtraSpacePerturbation(num_spaces: int)
A toy perturbation that replaces existing spaces in the text with
num_spaces number of spaces.
filler_words_perturbation
FillerWordsPerturbation(insert_prob = 0.333, max_num_insert = None, speaker_ph = True, uncertain_ph = True, fill_ph = True)
Randomly inserts filler words and phrases in the sentence. Perturbation example:
Input: The quick brown fox jumps over the lazy dog.
Output: The quick brown fox jumps over probably the lazy dog.
gender_perturbation
GenderPerturbation(mode: str, prob: float, source_class: str, target_class: str, mapping_file_path: Optional[str] = None, mapping_file_genders: Optional[List[str]] = None, bidirectional: bool = False)
Individual fairness perturbation for gender terms and pronouns.
| Parameters: |
|
|---|
lowercase_perturbation
LowerCasePerturbation
Simple perturbation turning input and references into lowercase.
mild_mix_perturbation
MildMixPerturbation()
Canonical robustness perturbation that composes several perturbations. These perturbations are chosen to be reasonable.
contraction_perturbation = ContractionPerturbation()
instance-attribute
lowercase_perturbation = LowerCasePerturbation()
instance-attribute
misspelling_perturbation = MisspellingPerturbation(prob=0.1)
instance-attribute
space_perturbation = SpacePerturbation(max_spaces=3)
instance-attribute
misspelling_perturbation
MisspellingPerturbation(prob: float)
Replaces words randomly with common misspellings, from a list of common misspellings.
Perturbation example:
Input: Already, the new product is not available.
Output: Aready, the new product is not availible.
| Parameters: |
|
|---|
person_name_perturbation
PersonNamePerturbation(prob: float, source_class: Dict[str, str], target_class: Dict[str, str], name_file_path: Optional[str] = None, person_name_type: str = FIRST_NAME, preserve_gender: bool = True)
Individual fairness perturbation for person names.
If name_file_path isn't provided, we use our default name mapping file, which can be found at:
https://storage.googleapis.com/crfm-helm-public/source_datasets/augmentations/person_name_perturbation/person_names.txt
The available categories in our default file and their values are as follows:
If person_name_type == "last_name":
(1) "race" => "asian", "chinese", "hispanic", "russian", "white"
If person_name_type == "first_name":
(1) "race" => "white_american", "black_american"
(2) "gender" => "female", "male"
The first names in our default file come from Caliskan et al. (2017), which derives its list from Greenwald (1998). The former removed some names from the latter because the corresponding tokens infrequently occurred in Common Crawl, which was used as the training corpus for GloVe. We include the full list from the latter in our default file.
The last names in our default file and their associated categories come from Garg et. al. (2017), which derives its list from Chalabi and Flowers (2014).
| Parameters: |
|
|---|
space_perturbation
SpacePerturbation(max_spaces: int)
A simple perturbation that replaces existing spaces with 0-max_spaces spaces (thus potentially merging words).
suffix_perturbation
SuffixPerturbation(suffix: str)
Appends a suffix to the end of the text. Example:
A picture of a dog -> A picture of a dog, picasso
synonym_perturbation
SynonymPerturbation(prob: float)
Synonyms. For implementation details, see https://github.com/GEM-benchmark/NL-Augmenter/blob/main/nlaugmenter/transformations/synonym_substitution/transformation.py
This perturbation adds noise to a text source by randomly inserting synonyms of randomly selected words excluding punctuations and stopwords. The space of synonyms depends on WordNet and could be limited. The transformation might introduce non-grammatical segments.
Perturbation example:
Input: This was a good movie, would watch again.
Output: This was a dependable movie, would determine again.
translate_perturbation
TranslatePerturbation(language_code: str)
Translates to different languages.
typos_perturbation
TyposPerturbation(prob: float)
Typos. For implementation details, see https://github.com/GEM-benchmark/NL-Augmenter/tree/main/transformations/butter_fingers_perturbation
Replaces each random letters with nearby keys on a querty keyboard. We modified the keyboard mapping compared to the NL-augmenter augmentations so that: a) only distance-1 keys are used for replacement, b) the original letter is no longer an option, c) removed special characters (e.g., commas).
Perturbation example:
Input: After their marriage, she started a close collaboration with Karvelas.
Output: Aftrr theif marriage, she started a close collaboration with Karcelas.