Systems and methods for formatting informal utterances
Abstract:
Methods and systems are presented for translating informal utterances into formal texts. Informal utterances may include words in abbreviation forms or typographical errors. The informal utterances may be processed by mapping each word in an utterance into a well-defined token. The mapping from the words to the tokens may be based on a context associated with the utterance derived by analyzing the utterance in a character-by-character basis. The token that is mapped for each word can be one of a vocabulary token that corresponds to a formal word in a pre-defined word corpus, an unknown token that corresponds to an unknown word, or a masked token. Formal text may then be generated based on the mapped tokens. Through the processing of informal utterances using the techniques disclosed herein, the informal utterances are both normalized and sanitized.
Public/Granted literature
Information query
Patent Agency Ranking
0/0