Systems and methods for separating and identifying audio in an audio file using machine learning

Invention Grant

US12062375B2 Systems and methods for separating and identifying audio in an audio file using machine learning 有权

Please log in to see more content

Patent Title: Systems and methods for separating and identifying audio in an audio file using machine learning
Application No.: US17545710

Application Date: 2021-12-08
Publication No.: US12062375B2

Publication Date: 2024-08-13
Inventor: Yuan-Jun Wei
Applicant: The MITRE Corporation
Applicant Address: US VA McLean
Assignee: The MITRE Corporation
Current Assignee: The MITRE Corporation
Current Assignee Address: US VA McLean
Agency: Morrison & Foerster LLP
Main IPC: G10L17/18
IPC: G10L17/18 ; G10L15/06 ; G10L15/18

Systems and methods for separating and identifying audio in an audio file using machine learning

Abstract:

Disclosed herein are systems and methods for processing an audio file to perform audio Segmentation and Speaker Role Identification (SRID) by training low level classifier and high level clustering components to separate and identify audio from different sources in an audio file by unifying audio separation and automatic speech recognition (ASR) techniques in a single system. Segmentation and SRID can include separating audio in an audio file into one or more segments, based on a determination of the identity of the speaker, category of the speaker, or source of audio in the segment. In one or more examples, the disclosed systems and methods use machine learning and artificial intelligence technology to determine the source of segments of audio using a combination of acoustic and language information. In some examples, the acoustic and language information is used to classify audio in each frame and cluster the audio into segments.

Public/Granted literature

US20230178082A1 SYSTEMS AND METHODS FOR SEPARATING AND IDENTIFYING AUDIO IN AN AUDIO FILE USING MACHINE LEARNING Public/Granted day:2023-06-08

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L17/00	讲话者辨认或验证
G10L17/18	.人工神经网络，连接方法