Patent search ap:("Google LLC") AND inv:"Jiahui Yu" Page 2

11.

发明申请
Optimizing Inference Performance for Conformer 有权

公开(公告)号：US20230130634A1

公开(公告)日：2023-04-27

申请号：US17936547

申请日：2022-09-29

Applicant: Google LLC

Inventor： Tara N. Sainath , Rami Botros , Anmol Gulati , Krzysztof Choromanski , Ruoming Pang , Trevor Strohman , Weiran Wang , Jiahui Yu

IPC: G10L15/16 , G10L15/22 , G10L15/06

Abstract: A computer-implemented method includes receiving a sequence of acoustic frames as input to an automatic speech recognition (ASR) model. Here, the ASR model includes a causal encoder and a decoder. The method also includes generating, by the causal encoder, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method also includes generating, by the decoder, a first probability distribution over possible speech recognition hypotheses. Here, the causal encoder includes a stack of causal encoder layers each including a Recurrent Neural Network (RNN) Attention-Performer module that applies linear attention.

12.

发明申请
Fast Emit Low-latency Streaming ASR with Sequence-level Emission Regularization 有权

公开(公告)号：US20220122586A1

公开(公告)日：2022-04-21

申请号：US17447285

申请日：2021-09-09

Applicant: Google LLC

Inventor： Jiahui Yu , Chung-cheng Chiu , Bo Li , Shuo-yiin Chang , Tara Sainath , Wei Han , Anmol Gulati , Yanzhang He , Arun Narayanan , Yonghui Wu , Ruoming Pang

IPC: G10L15/06 , G10L15/22 , G10L15/30 , G10L15/16

Abstract: A computer-implemented method of training a streaming speech recognition model that includes receiving, as input to the streaming speech recognition model, a sequence of acoustic frames. The streaming speech recognition model is configured to learn an alignment probability between the sequence of acoustic frames and an output sequence of vocabulary tokens. The vocabulary tokens include a plurality of label tokens and a blank token. At each output step, the method includes determining a first probability of emitting one of the label tokens and determining a second probability of emitting the blank token. The method also includes generating the alignment probability at a sequence level based on the first probability and the second probability. The method also includes applying a tuning parameter to the alignment probability at the sequence level to maximize the first probability of emitting one of the label tokens.

13.

发明授权
Relative margin for contrastive learning 有权

公开(公告)号：US12282857B1

公开(公告)日：2025-04-22

申请号：US18900506

申请日：2024-09-27

Applicant: Google LLC

Inventor： Siyuan Qiao , Chenxi Liu , Jiahui Yu , Yonghui Wu

IPC: G06N3/088

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training neural networks through contrastive learning. In particular, the contrastive learning is modified to use a relative margin to adjust a training pair's contribution to optimization.

14.

发明公开
Systems and Methods for Pretraining Image Processing Models 审中-公开

公开(公告)号：US20230281400A1

公开(公告)日：2023-09-07

申请号：US17685774

申请日：2022-03-03

Applicant: Google LLC

Inventor： Zirui Wang , Jiahui Yu , Yuan Cao , Wei Yu , Zihang Dai

IPC: G06F40/58 , G06F40/284 , G06V30/10 , G06V10/766

CPC classification number: G06F40/58 , G06F40/284 , G06V10/766 , G06V30/10

Abstract: Example embodiments of the present disclosure relate to systems and methods for pretraining image-processing models on weakly-supervised image-text pairs. The pretraining can include receiving a training sequence for the machine-learned image-processing model. The training sequence can include text tokens and image tokens. A prefix sequence can contain the image tokens. A remainder sequence can include a remainder set of the text tokens. The pretraining can include determining, using the prefix sequence as an input to the machine-learned image-processing model, an objective based on recovery of the remainder sequence. The pretraining can include updating one or more learnable parameters of the machine-learned image-processing model based on the objective.

15.

发明申请
VIDEO-TEXT MODELING WITH ZERO-SHOT TRANSFER FROM CONTRASTIVE CAPTIONERS 有权

公开(公告)号：US20250124708A1

公开(公告)日：2025-04-17

申请号：US18694604

申请日：2023-12-08

Applicant: Google LLC

Inventor： Shen Yan , Tao Zhu , Zirui Wang , Yuan Cao , Jiahui Yu

IPC: G06V20/40 , G06F16/583

Abstract: Provided is an efficient approach to establish a foundational video-text model for tasks including open-vocabulary video classification, text-to-video retrieval, video captioning and video question-answering. Some example implementations include a model which can be referred to as VideoCoCa. Example implementations reuse a pretrained image-text contrastive captioner (CoCa) model and adapt it to video-text tasks with little or minimal extra training. While previous works adapt image-text models with various cross-frame fusion modules (for example, cross-frame attention layer or perceiver resampler) and finetune the modified architecture on video-text data, aspects of the present disclosure leverage findings that the generative attentional pooling and contrastive attentional pooling layers in the image-text CoCa design are instantly adaptable to “flattened frame embeddings”, yielding a strong zero-shot transfer baseline for many video-text tasks.

16.

发明申请
MEDIA ITEM CHARACTERIZATION BASED ON MULTIMODAL EMBEDDINGS 有权

公开(公告)号：US20250111671A1

公开(公告)日：2025-04-03

申请号：US18900457

申请日：2024-09-27

Applicant: Google LLC

Inventor： Tao Zhu , Jiahui Yu , Jingchen Feng , Kai Chen , Pooya Abolghasemi , Gagan Bansal , Jieren Xu , Hui Miao , Yaping Zhang , Shuchao Bi , Yonghui Wu , Claire Cui , Rohan Anil

IPC: G06V20/40 , G06F40/284 , G10L25/57

Abstract: Methods and systems for media item characterization based on multimodal embeddings are provided herein. A media item including a sequence of video frames is identified. A set of video embeddings representing visual features of the sequence of video frames is obtained. A set of audio embeddings representing audio features of the sequence of video frames is obtained. A set of audiovisual embeddings is generated based on the set of video embeddings and the set of audio embeddings. Each of the set of audiovisual embeddings represents a visual feature and an audio feature of a respective video frame of the sequence of video frames. One or more media characteristics associated with the media item are determined based on the set of audiovisual embeddings.

17.

发明申请
RELATIVE MARGIN FOR CONTRASTIVE LEARNING 有权

公开(公告)号：US20250111235A1

公开(公告)日：2025-04-03

申请号：US18900506

申请日：2024-09-27

Applicant: Google LLC

Inventor： Siyuan Qiao , Chenxi Liu , Jiahui Yu , Yonghui Wu

IPC: G06N3/088

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training neural networks through contrastive learning. In particular, the contrastive learning is modified to use a relative margin to adjust a training pair's contribution to optimization.

18.

发明申请
Attribute Recognition with Image-Conditioned Prefix Language Modeling 有权

公开(公告)号：US20250054322A1

公开(公告)日：2025-02-13

申请号：US18787616

申请日：2024-07-29

Applicant: Google LLC

Inventor： Keren Ye , Yicheng Zhu , Junjie Ke , Jiahui Yu , Leonidas John Guibas , Peyman Milanfar , Feng Yang

IPC: G06V20/70 , G06F40/279

Abstract: Systems and methods for attribute recognition can include obtaining an image and a text string. The text string can be processed with a language model to generate a set of candidate attributes based on sequence based prediction. The image and the candidate attributes can be processed with an image-text model to determine a likelihood that the respective candidate attribute is depicted in the image. The likelihood determination can then be utilized to determine a predicted attribute for the object of interest.

19.

发明公开
Convolution-Augmented Transformer Models 审中-公开

公开(公告)号：US20240362453A1

公开(公告)日：2024-10-31

申请号：US18766038

申请日：2024-07-08

Applicant: Google LLC

Inventor： Anmol Gulati , Weikeng Qin , Zhengdong Zhang , Ruoming Pang , Niki Parmar , Jiahui Yu , Wei Han , Chung-Cheng Chiu , Yu Zhang , Yonghui Wu , Shibo Wang

IPC: G06N3/04 , G06N20/00 , G10L15/16

CPC classification number: G06N3/04 , G06N20/00 , G10L15/16

Abstract: Systems and methods can utilize a conformer model to process a data set for various data processing tasks, including, but not limited to, speech recognition, sound separation, protein synthesis determination, video or other image set analysis, and natural language processing. The conformer model can use feed-forward blocks, a self-attention block, and a convolution block to process data to learn global interactions and relative-offset-based local correlations of the input data.

20.

发明授权
Convolution-augmented transformer models 有权

公开(公告)号：US12079703B2

公开(公告)日：2024-09-03

申请号：US17139525

申请日：2020-12-31

Applicant: Google LLC

Inventor： Anmol Gulati , Ruoming Pang , Niki Parmar , Jiahui Yu , Wei Han , Chung-Cheng Chiu , Yu Zhang , Yonghui Wu , Shibo Wang , Weikeng Qin , Zhengdong Zhang

IPC: G06N3/04 , G06N20/00 , G10L15/16

CPC classification number: G06N3/04 , G06N20/00 , G10L15/16

Abstract: Systems and methods can utilize a conformer model to process a data set for various data processing tasks, including, but not limited to, speech recognition, sound separation, protein synthesis determination, video or other image set analysis, and natural language processing. The conformer model can use feed-forward blocks, a self-attention block, and a convolution block to process data to learn global interactions and relative-offset-based local correlations of the input data.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification