Invention Grant
- Patent Title: Using long short-term memory recurrent neural network for speaker diarization segmentation
-
Application No.: US15379010Application Date: 2016-12-14
-
Publication No.: US10249292B2Publication Date: 2019-04-02
- Inventor: Dimitrios B. Dimitriadis , David C. Haws , Michael Picheny , George Saon , Samuel Thomas
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agent Feb Cabrasawan
- Main IPC: G10L21/00
- IPC: G10L21/00 ; G10L15/00 ; G10L15/04 ; G10L25/30 ; G10L25/78 ; G10L17/18

Abstract:
Speaker diarization is performed on audio data including speech by a first speaker, speech by a second speaker, and silence. The speaker diarization includes segmenting the audio data using a long short-term memory (LSTM) recurrent neural network (RNN) to identify change points of the audio data that divide the audio data into segments. The speaker diarization includes assigning a label selected from a group of labels to each segment of the audio data using the LSTM RNN. The group of labels comprising includes labels corresponding to the first speaker, the second speaker, and the silence. Each change point is a transition from one of the first speaker, the second speaker, and the silence to a different one of the first speaker, the second speaker, and the silence. Speech recognition can be performed on the segments that each correspond to one of the first speaker and the second speaker.
Public/Granted literature
- US20180166066A1 USING LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK FOR SPEAKER DIARIZATION SEGMENTATION Public/Granted day:2018-06-14
Information query