Speaker separation based on real-time latent speaker state characterization

Invention Grant

US12315516B2 Speaker separation based on real-time latent speaker state characterization 有权

Please log in to see more content

Patent Title: Speaker separation based on real-time latent speaker state characterization
Application No.: US18368459

Application Date: 2023-09-14
Publication No.: US12315516B2

Publication Date: 2025-05-27
Inventor: Valentin Alain Jean Perret , Nándor Kedves , Nicolas Lucien Perony
Applicant: Unity Technologies SF
Applicant Address: US CA San Francisco
Assignee: Unity Technologies SF
Current Assignee: Unity Technologies SF
Current Assignee Address: US CA San Francisco
Agency: Schwegman Lundberg & Woessner, P.A.
Main IPC: G10L17/18
IPC: G10L17/18 ; G06N3/04 ; G06N3/045 ; G06N3/049 ; G06N3/08 ; G10L17/02 ; G10L17/04 ; G10L17/06 ; G10L17/08 ; G10L21/0272

Speaker separation based on real-time latent speaker state characterization

Abstract:

Systems, methods, and non-transitory computer-readable media can obtain a stream of audio waveform data that represents speech involving a plurality of speakers. As the stream of audio waveform data is obtained, a plurality of audio chunks can be determined. An audio chunk can be associated with one or more identity embeddings. The stream of audio waveform data can be segmented into a plurality of segments based on the plurality of audio chunks and respective identity embeddings associated with the plurality of audio chunks. A segment can be associated with a speaker included in the plurality of speakers. Information describing the plurality of segments associated with the stream of audio waveform data can be provided.

Public/Granted literature

US20240153509A1 SPEAKER SEPARATION BASED ON REAL-TIME LATENT SPEAKER STATE CHARACTERIZATION Public/Granted day:2024-05-09

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L17/00	讲话者辨认或验证
G10L17/18	.人工神经网络，连接方法