- Patent Title: Punctuation and capitalization of speech recognition transcripts
-
Application No.: US17135283Application Date: 2020-12-28
-
Publication No.: US11645460B2Publication Date: 2023-05-09
- Inventor: Avraham Faizakof , Arnon Mazza , Lev Haikin , Eyal Orbach
- Applicant: GENESYS TELECOMMUNICATIONS LABORATORIES, INC.
- Applicant Address: US CA Daly City
- Assignee: GENESYS TELECOMMUNICATIONS LABORATORIES, INC.
- Current Assignee: GENESYS TELECOMMUNICATIONS LABORATORIES, INC.
- Current Assignee Address: US CA Daly City
- Main IPC: G06F40/232
- IPC: G06F40/232 ; G06N20/00 ; G06F40/279 ; G06F40/169 ; G10L15/04 ; G10L15/06 ; G10L15/197 ; G10L15/22

Abstract:
A first text corpus comprising punctuated and capitalized text is received. The words in the first text corpus are then annotated with a set of labels indicating a punctuation and a capitalization of each word. At an initial training stage, a machine learning model is trained on a first training set using the annotated words from the first text corpus and the labels. A second text corpus is received representing conversational speech. The words in the second text corpus are then annotated with the set of labels. In a re-training stage, the machine learning model is re-trained on a second training set comprising the annotated words from the second text corpus, and the labels. At an inference stage, the trained machine learning model is applied to a target set of words representing conversational speech to predict a punctuation and capitalization of each word in the target set.
Public/Granted literature
- US20220208176A1 PUNCTUATION AND CAPITALIZATION OF SPEECH RECOGNITION TRANSCRIPTS Public/Granted day:2022-06-30
Information query