SYSTEMS AND METHODS FOR PROCESSING BI-MODE DUAL-CHANNEL SOUND DATA FOR AUTOMATIC SPEECH RECOGNITION MODELS

Invention Publication

US20240087592A1 SYSTEMS AND METHODS FOR PROCESSING BI-MODE DUAL-CHANNEL SOUND DATA FOR AUTOMATIC SPEECH RECOGNITION MODELS 审中-公开

Please log in to see more content

Patent Title: SYSTEMS AND METHODS FOR PROCESSING BI-MODE DUAL-CHANNEL SOUND DATA FOR AUTOMATIC SPEECH RECOGNITION MODELS
Application No.: US17930567

Application Date: 2022-09-08
Publication No.: US20240087592A1

Publication Date: 2024-03-14
Inventor: James J. Mou , Jun Li , Julie Zhu
Applicant: Optum, Inc.
Applicant Address: US MN Minnetonka
Assignee: Optum, Inc.
Current Assignee: Optum, Inc.
Current Assignee Address: US MN Minnetonka
Main IPC: G10L25/21
IPC: G10L25/21 ; G10L15/187 ; G10L15/197 ; G10L15/26 ; G10L25/18 ; G10L25/45

SYSTEMS AND METHODS FOR PROCESSING BI-MODE DUAL-CHANNEL SOUND DATA FOR AUTOMATIC SPEECH RECOGNITION MODELS

Abstract:

Various embodiments of the present disclosure provide methods, apparatus, systems, computing devices, computing entities, and/or the like for pre-processing dual-channel voice data for an automatic speech recognition mode. The method comprises creating one or more spectrograms for each channel of the dual-channel voice data by applying fast Fourier transform and generating power spectral density. The one or more balanced power spectrograms are created by merging the spectrograms of the channels, and are provided as input for acoustic and language processing by an automatic speech recognition machine learning model.

Public/Granted literature

US12154589B2 Systems and methods for processing bi-mode dual-channel sound data for automatic speech recognition models Public/Granted day:2024-11-26

Information query

Global Dossier Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L25/00	不限于组G10L 15/00-G10L 21/00的语言或者声音分析技术(当利用语音检测器来感知一些信号特殊特征的基于半导体的静噪放大器，如无信号时的感知入H03G3/34)
G10L25/03	.以提取参数类型为特征的
G10L25/21	..提取参数的功率信息