Transformer-based automatic speech recognition system incorporating time-reduction layer

Invention Grant

US11715461B2 Transformer-based automatic speech recognition system incorporating time-reduction layer 有权

Please log in to see more content

Patent Title: Transformer-based automatic speech recognition system incorporating time-reduction layer
Application No.: US17076794

Application Date: 2020-10-21
Publication No.: US11715461B2

Publication Date: 2023-08-01
Inventor: Md Akmal Haidar , Chao Xing
Applicant: Md Akmal Haidar , Chao Xing
Applicant Address: CA Montreal
Assignee: HUAWEI TECHNOLOGIES CO., LTD.
Current Assignee: HUAWEI TECHNOLOGIES CO., LTD.
Current Assignee Address: CN Shenzhen
Main IPC: G10L15/16
IPC: G10L15/16 ; G10L15/06

Transformer-based automatic speech recognition system incorporating time-reduction layer

Abstract:

Computer implemented method and system for automatic speech recognition. A first speech sequence is processed, using a time reduction operation of an encoder NN, into a second speech sequence comprising a second set of speech frame feature vectors that each concatenate information from a respective plurality of speech frame feature vectors included in the first set and includes fewer speech frame feature vectors than the first speech sequence. The second speech sequence is transformed, using a self-attention operation of the encoder NN, into a third speech sequence comprising a third set of speech frame feature vectors. The third speech sequence is processed using a probability operation of the encoder NN, to predict a sequence of first labels corresponding to the third set of speech frame feature vectors, and using a decoder NN to predict a sequence of second labels corresponding to the third set of speech frame feature vectors.

Public/Granted literature

US20220122590A1 TRANSFORMER-BASED AUTOMATIC SPEECH RECOGNITION SYSTEM INCORPORATING TIME-REDUCTION LAYER Public/Granted day:2022-04-21

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/08	.语音分类或检索
G10L15/16	..利用人工神经网络