Adversarial language imitation with constrained exemplars
Abstract:
Generally discussed herein are devices, systems, and methods for generating a phrase that is confusing to a language classifier. A method can include determining, by the LC, a first classification score (CS) of a prompt indicating whether the prompt is a first class or a second class, predicting, based on the prompt and by a pre-trained language model (PLM), likely next words and a corresponding probability for each of the likely next words, determining, by the LC, a second CS for each of the likely next words, determining, by an adversarial classifier, respective scores for each of the likely next words, the respective scores determined based on the first CS of the prompt, the second CS of the likely next words, and the probabilities of the likely next words, and selecting, by an adversarial classifier, a next word of the likely next words based on the respective scores.
Public/Granted literature
Information query
Patent Agency Ranking
0/0