-
公开(公告)号:US20210089316A1
公开(公告)日:2021-03-25
申请号:US16582433
申请日:2019-09-25
Applicant: Intel Corporation
Inventor: William RASH , Subramaniam MAIYURAN , Varghese GEORGE , Bret L. TOLL , Rajesh SANKARAN , Robert S. CHAPPELL , Supratim PAL , Alexander F. HEINECKE , Elmoustapha OULD-AHMED-VALL , Gang CHEN
Abstract: Disclosed embodiments relate to deep learning implementations using systolic arrays and fused operations. In one example, a processor includes fetch and decode circuitry to fetch and decode an instruction having fields to specify an opcode and locations of a destination and N source matrices, the opcode indicating the processor is to load the N source matrices from memory, perform N convolutions on the N source matrices to generate N feature maps, and store results of the N convolutions in registers to be passed to an activation layer, wherein the processor is to perform the N convolutions and the activation layer with at most one memory load of each of the N source matrices. The processor further includes scheduling circuitry to schedule execution of the instruction and execution circuitry to execute the instruction as per the opcode.
-
公开(公告)号:US20220261949A1
公开(公告)日:2022-08-18
申请号:US17734983
申请日:2022-05-02
Applicant: Intel Corporation
Inventor: Chandra S. GURRAM , Gang Y. CHEN , Subramaniam MAIYURAN , Supratim PAL , Ashutosh GARG , Jorge E. PARRA , Darin M. STARKEY , Guei-Yuan LUEH , Wei-Yu CHEN
Abstract: Examples described herein relate to a software and hardware optimization that manages scenarios where a write operation to a register is less than an entirety of the register. A compiler detects instructions that make partial writes to the same register, groups such instructions, and provides hints to hardware of the partial write. The execution unit combines the output data for grouped instructions and updates the destination register as single write instead of multiple separate partial writes.
-
公开(公告)号:US20210192673A1
公开(公告)日:2021-06-24
申请号:US16726659
申请日:2019-12-24
Applicant: Intel Corporation
Inventor: Chandra S. GURRAM , Gang Y. CHEN , Subramaniam MAIYURAN , Supratim PAL , Ashutosh GARG , Jorge E. PARRA , Darin M. STARKEY , Guei-Yuan LUEH , Wei-Yu CHEN
Abstract: Examples described herein relate to a software and hardware optimization that manages scenarios where a write operation to a register is less than an entirety of the register. A compiler detects instructions that make partial writes to the same register, groups such instructions, and provides hints to hardware of the partial write. The execution unit combines the output data for grouped instructions and updates the destination register as single write instead of multiple separate partial writes.
-
-