Real time machine learning-based indication of whether audio quality is suitable for transcription

Invention Grant

US10665231B1 Real time machine learning-based indication of whether audio quality is suitable for transcription 有权

Please log in to see more content

Patent Title: Real time machine learning-based indication of whether audio quality is suitable for transcription
Application No.: US16595129

Application Date: 2019-10-07
Publication No.: US10665231B1

Publication Date: 2020-05-26
Inventor: Eric Ariel Shellef , Yaakov Kobi Ben Tsvi , Iris Getz , Tom Livne , Roman Himmelreich , Elisha Yehuda Rosensweig
Applicant: Verbit Software Ltd.
Applicant Address: IL Tel Aviv
Assignee: Verbit Software Ltd.
Current Assignee: Verbit Software Ltd.
Current Assignee Address: IL Tel Aviv
Agency: Active Knowledge Ltd.
Main IPC: G10L15/19
IPC: G10L15/19 ; G10L15/22 ; G10L25/60 ; G10L15/18 ; G10L15/30 ; G10L15/04 ; G10L15/26 ; G10L15/20 ; G10L15/01 ; G10L15/02 ; G10L15/06 ; G06F3/0484

Real time machine learning-based indication of whether audio quality is suitable for transcription

Abstract:

Maintaining adequate audio quality is very important for creating fast and accurate transcriptions, especially in a hybrid transcription setting, in which human transcribers review transcriptions generated by automatic speech recognition (ASR) systems. Some embodiments described herein involve detecting low-quality audio intended for transcription. In one embodiment, a server receives an audio recording that includes speech. The server generates feature values based on a segment of the audio recording and utilizes a model to calculate, based on the feature values, a certain value indicative of expected hybrid transcription quality of the segment. The model is generated based on training data that includes feature values generated based on previously recorded segments of audio, and values of transcription-quality metrics generated based on transcriptions of the previously recorded segments, which were generated at least in part by human transcribers. Optionally, an alert is provided responsive to the certain value being below a threshold.

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/08	.语音分类或检索
G10L15/18	..利用自然语言模型
G10L15/183	...用上下文相关性，例如：语言模型
G10L15/19	....语法上下文，例如：基于字母顺序规则的识别假定的消除二义性